JP2017168030A

JP2017168030A - History analyzer, history analysis method, history analysis system, and program

Info

Publication number: JP2017168030A
Application number: JP2016055018A
Authority: JP
Inventors: 室井　泰幸; Yasuyuki Muroi; 泰幸室井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-03-18
Filing date: 2016-03-18
Publication date: 2017-09-21
Anticipated expiration: 2036-03-18
Also published as: JP6720607B2

Abstract

PROBLEM TO BE SOLVED: To specify a derivative relation pertaining to data arranged in different devices on the basis of logs pertaining to these data.SOLUTION: A history analyzer of the present invention comprises: a log acquisition unit for acquiring an access log to a file provided in a file provision device and a process operation log that includes information pertaining to an operation executed by a process in a file manipulation device on a file arranged in the file provision device or the file manipulation device; and an analysis unit for specifying from the access log and the process operation log an operation process having manipulated a file to be manipulated that is provided in the file provision device, extracting from the process operation log a log that satisfies a specific condition among logs pertaining to a derivation source operation that could be executed on an original file of the file to be manipulated, in accordance with the type of operation executed on the file to be manipulated, and specifying a file recorded in the extracted log as an original file for the file to be manipulated.SELECTED DRAWING: Figure 22A

Description

本発明は、ファイルの派生関係を解析可能な履歴解析装置等に関する。 The present invention relates to a history analysis apparatus that can analyze a derivation relationship of a file.

近年、ファイルサーバにおける記憶容量（ストレージサイズ）の増大に伴い、ファイルサーバにおいて扱われるデータ量が増大している。また、ファイルサーバに登録されたファイルを複数人数で参照、編集する機会が増えている。 In recent years, as the storage capacity (storage size) of a file server increases, the amount of data handled by the file server has increased. In addition, there are increasing opportunities for a plurality of people to refer to and edit files registered in the file server.

ファイルサーバに保存されたファイルは、複数人からアクセスされることから、改変されたファイル（派生先ファイルと記載する場合がある）が作成されることも多い。例えば、ファイルサーバのファイルがクライアント側で改変された場合等、ファイルサーバにおいて入手可能な情報だけでは、ファイル間の派生関係を特定することが困難な場合がある。このため、ファイル間の派生関係を確認可能な技術が求められている。上記に関連して、以下のような特許文献が知られている。 Since the file stored in the file server is accessed by a plurality of people, a modified file (sometimes referred to as a derivation destination file) is often created. For example, when a file on a file server is modified on the client side, it may be difficult to specify a derivation relationship between files only with information available on the file server. For this reason, a technique capable of confirming the derivation relationship between files is required. In relation to the above, the following patent documents are known.

特許文献１は、クライアント端末におけるファイルの操作ログと、サーバ装置に対するアクセスログとを用いて、あるファイルに関する操作ログに対応するアクセスログを抽出する技術を開示する。 Patent Document 1 discloses a technique for extracting an access log corresponding to an operation log related to a certain file using a file operation log in a client terminal and an access log for a server device.

特許文献２は、クライアント端末の操作ログ情報と、ファイルの関係性を示すファイル関係性情報とを用いて、ある指定されたファイルの利用状況を判定する技術を開示する。
特許文献３は、ある基準ファイルに関する操作を監視し、当該基準ファイルに対する操作イベントに基づいて、当該基準ファイルに基づいて生成された派生先ファイルを検出する技術を開示する。 Patent Document 2 discloses a technique for determining the usage status of a specified file by using operation log information of a client terminal and file relationship information indicating file relationships.
Patent Document 3 discloses a technique for monitoring an operation related to a certain reference file and detecting a derivation destination file generated based on the reference file based on an operation event for the reference file.

特許文献４は、データベースを用いてファイルに対する操作履歴を管理する記述を開示する。特許文献４に開示された技術は、ファイル操作の追跡要求に適合する操作情報をデータベースから検索し、その結果を一覧可能に表示する。 Patent Document 4 discloses a description for managing an operation history for a file using a database. The technique disclosed in Patent Document 4 searches operation information suitable for a file operation tracking request from a database, and displays the result in a listable manner.

特開２０１４−１５３７４２号公報JP 2014-153742 A 特開２００９−１７６１１９号公報JP 2009-176119 A 特開２００９−０１５６５９号公報JP 2009-015659 A 特開２００８−０５２５７０号公報JP 2008-052570 A

ファイルサーバ等のデータを提供可能なサーバ（データ提供サーバ）に配置されたデータ（ファイル等）から派生されたデータ（以下、「派生先データ」と記載する場合がある）を追跡する場合、データ提供サーバのアクセスログを用いることが考えられる。しかしながら、アクセスログには、データの読み込み、書き込み程度の情報しか記載されないことがある。この場合、データ提供サーバ内で完結しないデータに関する操作（例えば、サーバの外部における編集やコピーなど）により作成された派生先データを特定するのが困難である。 When tracking data derived from data (files, etc.) placed on a server (data providing server) that can provide data such as file servers, etc., the data is tracked. It is conceivable to use the access log of the providing server. However, in the access log, there are cases where only information about the reading and writing of data is described. In this case, it is difficult to specify derivation destination data created by an operation related to data that is not completed in the data providing server (for example, editing or copying outside the server).

具体例として、データの編集場所、配置場所がデータ提供サーバとクライアント装置との双方にわたる場合を想定する。この場合、データ提供サーバ又はクライアント装置側のアクセスログには、派生元のデータ（以下、「派生元データ」又は「原本データ」と記載する場合がある）と派生先データとを直接関連付ける情報が残らないことがあり、データ間の派生関係を追跡することが困難になり得る。 As a specific example, a case is assumed in which the data editing location and the location location extend over both the data providing server and the client device. In this case, the access log on the data providing server or client device side includes information that directly associates the derivation source data (hereinafter sometimes referred to as “derivation source data” or “original data”) and the derivation destination data. May not remain, and it may be difficult to track derivation relationships between data.

これに対して、上記特許文献１乃至特許文献４に開示された技術は、アプリケーションの操作ログあるいは操作イベント等から、派生元ファイルと、そこから派生された派生先ファイルとを直接的に特定可能であることを前提としている。しかしながら、上記したように、現実のシステムにおいては、派生元ファイルと、派生先ファイルとを直接的に関連付けるようなログを取得可能であるとは限らない。例えば、ファイルを操作するアプリケーションから、上記のような操作ログを取得できない場合、上記各特許文献に記載された技術では、ファイル間の派生関係を適切に追跡することが困難であると考えられる。 On the other hand, the techniques disclosed in Patent Document 1 to Patent Document 4 can directly specify a derivation source file and a derivation destination file derived from the operation log or operation event of the application. It is assumed that. However, as described above, in an actual system, it is not always possible to obtain a log that directly associates a derivation source file with a derivation destination file. For example, when an operation log as described above cannot be acquired from an application that operates a file, it is considered that it is difficult to appropriately track the derivation relationship between files with the techniques described in the above patent documents.

本発明は、上記のような事情を鑑みてなされたものである。即ち、本発明は、異なる装置に配置されたデータに関するログに基づいて、それらのデータに関する派生関係を特定可能な履歴解析装置等を提供することを、主たる目的の一つとする。 The present invention has been made in view of the above circumstances. That is, one of the main objects of the present invention is to provide a history analysis device and the like that can specify a derivation relationship related to data based on logs related to data arranged in different devices.

上記の目的を達成すべく、本発明の一態様に係る履歴解析装置は、ファイル提供装置において提供されるファイルに対するファイル操作装置からのアクセスに関する情報を含むアクセスログと、前記ファイル提供装置又は前記ファイル操作装置に配置されたファイルに対して、前記ファイル操作装置におけるプロセスにより実行された操作に関する情報を含むプロセス操作ログと、を取得するログ取得部と、上記アクセスログ及び上記プロセス操作ログから、上記ファイル提供装置において提供された操作対象ファイルを操作した上記プロセスである操作プロセスを特定し、上記操作対象ファイルに対して実行された操作の種類に応じて、上記操作対象ファイルの原本ファイルに対して実行され得る操作である派生元操作に関するログのうち、特定の条件を満たすログを、上記プロセス操作ログから抽出し、抽出した上記派生元操作に関するログに記録されたファイルを、上記操作対象ファイルに対する原本ファイルとして特定する解析部と、を備える。 In order to achieve the above object, a history analysis apparatus according to an aspect of the present invention includes an access log including information on access from a file operation apparatus to a file provided in the file providing apparatus, and the file providing apparatus or the file From a log acquisition unit for acquiring a process operation log including information related to an operation executed by a process in the file operation device with respect to a file arranged in the operation device, from the access log and the process operation log, the An operation process that is the process that has operated the operation target file provided in the file providing apparatus is identified, and the original file of the operation target file is determined according to the type of operation performed on the operation target file. Of logs related to derivation operations that can be executed The specific conditions are met logs were extracted from the process operation log, extracted file recorded in the log relating to the derivation source operation, and a analysis section that identifies as the original file for the operation target file.

また、本発明の一態様に係る履歴解析方法は、ファイル提供装置において提供されるファイルに対するファイル操作装置からのアクセスに関する情報を含むアクセスログと、前記ファイル提供装置又は前記ファイル操作装置に配置されたファイルに対して、前記ファイル操作装置におけるプロセスにより実行された操作に関する情報を含むプロセス操作ログと、を取得し、上記アクセスログ及び上記プロセス操作ログから、上記ファイル提供装置において提供された操作対象ファイルを操作した上記プロセスである操作プロセスを特定し、上記操作対象ファイルに対して実行された操作の種類に応じて、上記操作対象ファイルの原本ファイルに対して実行され得る操作である派生元操作に関するログのうち、特定の条件を満たすログを上記プロセス操作ログから抽出し、抽出した上記派生操作に関するログに記録された上記派生元操作が実行されたファイルを、上記操作対象ファイルに対する原本ファイルとして特定する。 The history analysis method according to one aspect of the present invention is arranged in an access log including information on access from a file operation device to a file provided in the file providing device, and the file providing device or the file operation device. A process operation log including information related to an operation executed by a process in the file operation device for the file, and an operation target file provided in the file providing device from the access log and the process operation log The operation process that is the above-described process that has been operated is identified, and the derivation source operation that is an operation that can be performed on the original file of the operation target file according to the type of operation performed on the operation target file Of the logs, the logs that satisfy a specific condition Extracted from Seth operation log, it extracted the derived operating the derivation source logged operation about to take is executable to identify the original file for the operation target file.

また、本発明の一態様に係る履歴解析システムは、操作対象ファイルと、当該操作対象ファイルに対するアクセスに関する情報を含むアクセスログと、を通信ネットワークを介して提供可能なファイル提供装置と、自装置において、あるプロセスを実行することにより上記操作対象ファイル及び自装置に保持するファイルに関する操作を実行可能であるとともに、当該操作に関する情報を含むプロセス操作ログを通信ネットワークを介して提供可能なファイル操作装置と、上記通信ネットワークを介して、上記アクセスログを上記ファイル提供装置から取得し、上記プロセス操作ログを上記ファイル操作装置から取得するログ取得部と、上記アクセスログ及び上記プロセス操作ログから、上記ファイル提供装置において提供された操作対象ファイルを操作した上記プロセスである操作プロセスを特定し、上記操作対象ファイルに対して実行された操作の種類に応じて、上記操作対象ファイルの原本ファイルに対して実行され得る操作である派生元操作に関するログのうち、特定の条件を満たすログを、上記プロセス操作ログから抽出し、抽出した上記派生元操作に関するログに記録されたファイルを、上記操作対象ファイルに対する原本ファイルとして特定する解析部と、を有する、履歴解析装置と、を備える。 A history analysis system according to an aspect of the present invention includes a file providing device capable of providing an operation target file and an access log including information related to access to the operation target file via a communication network, and the own device. A file operation device capable of executing an operation related to the operation target file and a file held in the own device by executing a process and providing a process operation log including information related to the operation via a communication network; The access log is acquired from the file providing device via the communication network, the process operation log is acquired from the file operation device, and the file is provided from the access log and the process operation log. Operation target provided in the device A source operation that can be executed on the original file of the operation target file according to the type of operation executed on the operation target file. An analysis unit that extracts a log satisfying a specific condition from the process operation log and identifies a file recorded in the extracted log regarding the derivation source operation as an original file for the operation target file; A history analyzer.

また、同目的は、上記構成を有する履歴解析装置、履歴解析方法をコンピュータによって実現するコンピュータ・プログラム、及び、そのコンピュータ・プログラムが格納されているコンピュータ読み込み可能な記録媒体等によっても達成される。 The object is also achieved by a history analysis apparatus having the above-described configuration, a computer program that implements the history analysis method by a computer, a computer-readable recording medium that stores the computer program, and the like.

本発明によれば、異なる装置に配置されたデータに関するログに基づいて、それらのデータに関する派生関係を特定できる。 According to the present invention, it is possible to specify a derivation relationship related to data based on logs related to data arranged in different apparatuses.

図１は、本開示の第１の実施形態に係る履歴解析装置を含むシステムの機能的な構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of a system including a history analysis apparatus according to the first embodiment of the present disclosure. 図２は、サーバアクセスログの具体例を示す説明図である。FIG. 2 is an explanatory diagram showing a specific example of the server access log. 図３は、ファイルアクセスログの具体例を示す説明図である。FIG. 3 is an explanatory diagram showing a specific example of a file access log. 図４は、ディスク操作ログの具体例を示す説明図である。FIG. 4 is an explanatory diagram showing a specific example of the disk operation log. 図５は、ネットワーク操作ログの具体例を示す説明図である。FIG. 5 is an explanatory diagram showing a specific example of the network operation log. 図６Ａは、プロセス状態情報の具体例を示す説明図である。FIG. 6A is an explanatory diagram of a specific example of process state information. 図６Ｂは、プロセス状態情報の他の具体例を示す説明図である。FIG. 6B is an explanatory diagram illustrating another specific example of process state information. 図７Ａは、ネットワーク状態情報の具体例を示す説明図である。FIG. 7A is an explanatory diagram illustrating a specific example of network state information. 図７Ｂは、ネットワーク状態情報の他の具体例を示す説明図である。FIG. 7B is an explanatory diagram illustrating another specific example of the network state information. 図８は、操作履歴情報の具体例を示す説明図である。FIG. 8 is an explanatory diagram showing a specific example of the operation history information. 図９は、アプリケーション情報の具体例を示す説明図である。FIG. 9 is an explanatory diagram of a specific example of application information. 図１０は、ファイルを操作したアプリケーションに関する情報を抽出する処理手順を例示するフローチャートである。FIG. 10 is a flowchart illustrating a processing procedure for extracting information related to an application that has operated a file. 図１１は、アプリケーションによる操作履歴を特定する処理手順を例示するフローチャートである。FIG. 11 is a flowchart illustrating a processing procedure for identifying an operation history by an application. 図１２は、アプリケーションによる操作履歴（編集、新規作成）及び原本ファイルを特定する処理手順を例示するフローチャートである。FIG. 12 is a flowchart illustrating a processing procedure for specifying an operation history (editing, new creation) and an original file by an application. 図１３は、アプリケーションによる操作履歴（コピー、移動）及び原本ファイルを特定する処理手順を例示するフローチャートである。FIG. 13 is a flowchart illustrating a processing procedure for specifying an operation history (copy, move) and an original file by an application. 図１４は、アプリケーションによる操作履歴（ダウンロード）及び原本ファイルを特定する処理手順を例示するフローチャートである。FIG. 14 is a flowchart illustrating an operation history (download) by an application and a processing procedure for specifying an original file. 図１５は、本開示の第１の実施形態の第２の変形例に係る履歴解析装置を含むシステムの機能的な構成を例示するブロック図である。FIG. 15 is a block diagram illustrating a functional configuration of a system including the history analysis apparatus according to the second modification example of the first embodiment of the present disclosure. 図１６は、ファイルを操作したアプリケーションに関する情報を抽出する他の処理手順を例示するフローチャートである。FIG. 16 is a flowchart illustrating another processing procedure for extracting information related to an application that has operated a file. 図１７は、本開示の第１の実施形態の第３の変形例に係る履歴解析装置を含むシステムの機能的な構成を例示するブロック図である。FIG. 17 is a block diagram illustrating a functional configuration of a system including a history analysis apparatus according to the third modification example of the first embodiment of the present disclosure. 図１８は、Ｒｅａｄ操作ログの具体例を示す説明図である。FIG. 18 is an explanatory diagram of a specific example of the Read operation log. 図１９は、Ｗｒｉｔｅ操作ログの具体例を示す説明図である。FIG. 19 is an explanatory diagram of a specific example of a write operation log. 図２０は、ファイルリストの具体例を示す説明図である。FIG. 20 is an explanatory diagram of a specific example of a file list. 図２１Ａは、類似度を用いてファイル間の派生関係を特定する処理を例示するフローチャート（その１）である。FIG. 21A is a flowchart (part 1) illustrating the process of specifying the derivation relationship between files using the similarity. 図２１Ｂは、類似度を用いてファイル間の派生関係を特定する処理を例示するフローチャート（その２）である。FIG. 21B is a flowchart (part 2) illustrating the process of specifying the derivation relationship between files using the similarity. 本開示の第２の実施形態に係る履歴解析装置の機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional composition of the history analysis device concerning a 2nd embodiment of this indication. 本開示の第２の実施形態に係る履歴解析装置の、他の機能的な構成を例示するブロック図である。It is a block diagram which illustrates other functional composition of the history analysis device concerning a 2nd embodiment of this indication. 図２３は、本開示の各実施形態における履歴解析装置の構成要素を実現可能なハードウェア装置の構成を例示する図面である。FIG. 23 is a diagram illustrating a configuration of a hardware device capable of realizing the components of the history analysis device according to each embodiment of the present disclosure.

本発明の実施形態に関する説明に先立って、本発明に関する技術的な検討事項等についてより詳細に説明する。 Prior to describing the embodiment of the present invention, technical considerations and the like regarding the present invention will be described in more detail.

近年、企業においては、情報共有等を目的とし、文書等の各種データ（例えば、ファイル）をデータ提供サーバ（例えば、ファイルサーバ）に配置している場合が多い。以下、説明の便宜上、係るデータが「ファイル」により提供されることを想定する。なお、本開示はこれには限定されず、係るデータはファイル以外の適切な形式により提供されてもよい。 In recent years, companies often place various data such as documents (for example, files) on a data providing server (for example, a file server) for the purpose of information sharing or the like. Hereinafter, for convenience of explanation, it is assumed that such data is provided by a “file”. Note that the present disclosure is not limited to this, and the data may be provided in an appropriate format other than a file.

例えば、ユーザ等が、ファイルを編集して更新した後に、係るファイルを別名保存する場合等、元データ（原本ファイル）に基づいて派生先ファイルが生成される場合がある。ファイルサーバに配置された派生先ファイルから元データを追跡する場合、ファイルサーバにおいて完結しない操作が実行されると、元データを追跡することが困難になる場合がある。 For example, a derivation destination file may be generated based on the original data (original file), for example, when the user or the like edits and updates the file and then saves the file with a different name. When tracking original data from a derivation destination file arranged in a file server, it may be difficult to track the original data if an operation that is not completed is executed in the file server.

ファイルサーバにおいてファイルを共有する場合、例えば、ファイルサーバに配置されたファイルを直接編集せず、当該ファイルを一時的にユーザのクライアント装置（例えば、コンピュータ等）に転送してから編集し、別ファイルとしてファイルサーバに配置し直すことが考えられる。このように、例えば、ファイルの編集、格納等の操作がファイルサーバとクライアント装置との双方に亘って行われる場合を想定する。この場合、ファイルサーバ側とクライアント装置側とのそれぞれのアクセスログ情報を照らし合わせることで、それらのファイルに対して実行された一部の操作（例えばコピーなど）が確認され得る。しかしながら、上記したように、アクセスログに残らない情報（例えば、ファイル編集やネットワークからのダウンロードなど）に関しては、追跡するのが困難な場合がある。 When sharing a file on a file server, for example, the file placed on the file server is not directly edited, but the file is temporarily transferred to the user's client device (for example, a computer) and then edited. It is possible to relocate to the file server. Thus, for example, a case is assumed where operations such as file editing and storage are performed over both the file server and the client device. In this case, by comparing the access log information on the file server side and the client device side, a part of operations (for example, copying) executed on these files can be confirmed. However, as described above, information that does not remain in the access log (for example, file editing or network download) may be difficult to track.

またファイルを操作するアプリケーションによっては、当該アプリケーションに関する操作ログを取得することが困難な場合がある。また、ファイルを操作可能なアプリケーションについて、詳細な操作ログを提供するように改造すること、あるいは当該アプリケーション自体を他のアプリケーションに入れ替えることも、容易ではない場合がある。 Depending on the application that operates the file, it may be difficult to obtain an operation log related to the application. Further, it may not be easy to modify an application that can operate a file so as to provide a detailed operation log, or to replace the application itself with another application.

上記を鑑みて、本開示に係る技術は、派生元ファイルと、派生先ファイルとが異なる装置（例えばファイルサーバとクライアント端末等）において配置、操作され得る状況において、それぞれの装置から取得可能なログ等を用いて、それらのファイル間の派生関係を推定する。例えば、ある派生先ファイルに対して過去に実行されたファイル操作の履歴を追跡することにより、当該派生先ファイルに対する派生元ファイルを推定することが考えられる。また、例えば、ある派生元ファイルから、当該ファイルに実行されたファイル操作の履歴を追跡することで、当該派生元ファイルに対する派生先ファイルを推定することが考えられる。 In view of the above, the technology according to the present disclosure is a log that can be acquired from each device in a situation where the derivation source file and the derivation destination file can be arranged and operated on different devices (for example, a file server and a client terminal). Etc. are used to estimate the derivation relationship between these files. For example, it is conceivable to estimate a derivation source file for a derivation destination file by tracking a history of file operations performed in the past on a derivation destination file. Further, for example, it is conceivable to estimate a derivation destination file for the derivation source file by tracking a history of file operations performed on the derivation source file from a certain derivation source file.

以下、各実施形態を用いて説明する本開示に係る技術は、ファイル操作が、コピー（移動）、削除、新規作成、変更（同一ファイル更新、別名保存）等に分類されることに着目する。本開示に係る技術は、例えば、ファイルサーバを利用する各クライアント装置から取得した、アプリケーションに関する情報、ファイルのアクセスログ、記憶装置又はネットワークの入出力情報、及びプロセスの実行状態に関する情報等を利用して、アプリケーションによるファイル操作を推定する。また、本開示に係る技術は、ファイルサーバのアクセスログと、クライアント端末のアクセスログ、及び上記推定したファイル操作を利用することで、ファイルの変更履歴を作成可能である。本開示に係る技術は、例えば、係る操作を再帰的に繰り返すことで、ある派生先ファイルに関する原本ファイル（派生元ファイル）を特定し、当該ファイルに関する操作履歴をたどることを可能にする。また、本開示に係る技術は、例えば、あるファイルに対する更新情報を時系列にたどることにより、当該ファイルに関する操作により生成された派生先ファイルを特定する。本開示に係る技術は、例えば、アプリケーションの操作ログ等から、派生元ファイルと派生先ファイルとの間の直接的な関係が得られない場合であっても、ファイル間の派生関係を推定可能である。これにより、本開示に係る技術は、ファイルに関する変更履歴を階層的に表現できる。 Hereinafter, the technique according to the present disclosure described using each embodiment focuses on the fact that file operations are classified into copy (move), deletion, new creation, change (same file update, alias storage), and the like. The technology according to the present disclosure uses, for example, information about applications, file access logs, storage device or network input / output information, information about process execution status, and the like acquired from each client device that uses a file server. The file operation by the application is estimated. Further, the technology according to the present disclosure can create a file change history by using the access log of the file server, the access log of the client terminal, and the estimated file operation. The technology according to the present disclosure makes it possible to specify an original file (derivation source file) related to a certain derivation destination file and trace the operation history related to the file by recursively repeating such an operation, for example. Also, the technology according to the present disclosure specifies a derivation destination file generated by an operation related to the file, for example, by tracing update information for a certain file in time series. The technology according to the present disclosure can estimate the derivation relationship between files even when the direct relationship between the derivation source file and the derivation destination file cannot be obtained from the operation log of the application, for example. is there. Thereby, the technology according to the present disclosure can hierarchically represent the change history related to the file.

以下、本開示に係る技術を実現可能な履歴解析装置等について、各実施形態を用いて説明する。以下において説明する履歴解析装置は、単体の装置（物理的あるいは仮想的な装置）を用いて構成されてもよく、複数の離間した装置（物理的あるいは仮想的な装置）を用いて実現されてもよい。履歴解析装置が複数の装置により構成される場合、各装置の間は有線、無線、又はそれらを適切に組合せた通信ネットワーク（通信回線）により通信可能に接続されてもよい。係る通信ネットワークは、物理的な通信ネットワークであってもよく、仮想的な通信ネットワークであってもよい。以下において説明する履歴解析装置、あるいは、その構成要素を実現可能なハードウェア構成については、後述する。 Hereinafter, a history analysis apparatus and the like that can realize the technology according to the present disclosure will be described using each embodiment. The history analysis device described below may be configured using a single device (physical or virtual device), and may be realized using a plurality of separated devices (physical or virtual devices). Also good. When the history analysis device is configured by a plurality of devices, the devices may be communicably connected via a communication network (communication line) that is wired, wireless, or an appropriate combination thereof. Such a communication network may be a physical communication network or a virtual communication network. A history analysis device described below or a hardware configuration capable of realizing the components will be described later.

＜第１の実施形態＞
本開示に関する第１の実施形態の構成について図面を参照して詳細に説明する。 <First Embodiment>
The configuration of the first embodiment related to the present disclosure will be described in detail with reference to the drawings.

［構成］
図１は、本実施形態に係る履歴解析装置３００を含む、システムの構成を含む説明図である。 [Constitution]
FIG. 1 is an explanatory diagram including a system configuration including a history analysis apparatus 300 according to the present embodiment.

図１に例示するシステムは、ファイルサーバ１００、クライアント２００、履歴解析装置３００、ネットワーク４００から構成される。 The system illustrated in FIG. 1 includes a file server 100, a client 200, a history analysis apparatus 300, and a network 400.

ファイルサーバ１００は、ネットワーク４００を介して各種データ（ファイル）を提供可能なデータ提供装置である。ファイルサーバ１００は、共有記憶装置１１０と、サーバアクセスログ１２０とを少なくとも有する。ファイルサーバ１００は、例えば、コンピュータ等の情報処理装置により実現されてよい。ファイルサーバ１００は、例えば、物理的な装置により実現されてもよく、周知の仮想化基盤を用いた仮想マシンにより実現されてもよい。 The file server 100 is a data providing apparatus that can provide various data (files) via the network 400. The file server 100 has at least a shared storage device 110 and a server access log 120. The file server 100 may be realized by an information processing apparatus such as a computer, for example. The file server 100 may be realized by a physical device, for example, or may be realized by a virtual machine using a well-known virtualization platform.

ファイルサーバ１００は、共有記憶装置１１０に配置されたファイルを、１以上のクライアント２００に対して提供可能である。ファイルサーバ１００は、例えば、周知技術を含む適切な技術を用いて、ファイルの共有を実現してもよい。ファイルサーバ１００は、一例として、ＮＦＳ（ＮｅｔｗｏｒｋＦｉｌｅＳｙｓｔｅｍ）、ＳＭＢ（ＳｅｒｖｅｒＭｅｓｓａｇｅＢｌｏｃｋ）を用いて、ファイルの共有を実現してもよい。後述するライアント２００（具体的には、クライアント２００において実行されるアプリケーション）は、例えば、上記技術を用いて、ファイルサーバ１００において提供されるファイルにアクセス可能である。 The file server 100 can provide a file arranged in the shared storage device 110 to one or more clients 200. The file server 100 may realize file sharing using an appropriate technique including a known technique, for example. As an example, the file server 100 may realize file sharing using NFS (Network File System) or SMB (Server Message Block). A client 200 described below (specifically, an application executed on the client 200) can access a file provided on the file server 100 using, for example, the above technique.

共有記憶装置１１０は、ファイルを保持し、当該ファイルに関する１以上のクライアント２００からの読み込み・書き込み・削除などのファイル操作を処理することが可能な記憶装置である。共有記憶装置１１０は、例えば、ハードディスクや半導体記憶装置等の物理的なストレージデバイスを用いて実現されてもよい。また、共有記憶装置１１０は、周知の仮想化基盤を用いた仮想ストレージを用いて実現されてもよい。 The shared storage device 110 is a storage device that holds a file and can process file operations such as reading, writing, and deletion from one or more clients 200 regarding the file. The shared storage device 110 may be realized using, for example, a physical storage device such as a hard disk or a semiconductor storage device. Further, the shared storage device 110 may be realized using a virtual storage using a well-known virtualization platform.

サーバアクセスログ１２０は、共有記憶装置１１０に保持されたファイルに対する操作（ファイル操作）の記録（ログ）である。サーバアクセスログ１２０には、クライアント２００により実行された、ファイル操作が記録されてもよい。サーバアクセスログ１２０には、例えば図２に例示するように、あるファイルに関する「アクセス時間（１２０ａ）、アクセス元（１２０ｂ）、ファイル名（１２０ｃ）、ファイル操作（１２０ｄ）、サイズ（１２０ｄ）」の各要素が時系列順に複数記録されてよい。サーバアクセスログ１２０は、例えば、図２に例示するような形式で、ファイル操作に関する記録を保持してもよいが、これには限定されない。 The server access log 120 is a record (log) of an operation (file operation) for a file held in the shared storage device 110. The server access log 120 may record file operations executed by the client 200. In the server access log 120, for example, as illustrated in FIG. 2, the “access time (120a), access source (120b), file name (120c), file operation (120d), size (120d)” relating to a certain file is stored. A plurality of each element may be recorded in chronological order. The server access log 120 may hold, for example, a record relating to file operations in the format illustrated in FIG. 2, but is not limited thereto.

アクセス時間１２０ａは、共有記憶装置１１０に保持されたファイルに対する操作が実行された時間を表す。ファイルに対する操作が実行された時間は、当該操作によりファイルがアクセスされた時間であってもよい。 The access time 120a represents a time when an operation for a file held in the shared storage device 110 is executed. The time when the operation on the file is executed may be the time when the file is accessed by the operation.

アクセス元１２０ｂは、当該ファイルにアクセスしたクライアント２００を特定可能な情報を表す。アクセス元１２０ｂは、例えば、クライアント２００を識別可能な名称（例えばコンピュータ名等）を表す情報、あるいは、ネットワークアドレスを表す情報であってもよい。 The access source 120b represents information that can identify the client 200 that has accessed the file. The access source 120b may be, for example, information indicating a name (for example, a computer name) that can identify the client 200, or information indicating a network address.

ファイル名１２０ｃは、操作されたファイルを特定可能な情報を表す。ファイル名１２０ｃは、データが保存されたファイルを特定可能な名称を表す情報であってもよい。係るファイル名１２０ｃは、例えば、ファイルサーバにおいて提供されるファイルのパス（共有パス）を表す情報であってもよい。 The file name 120c represents information that can identify the operated file. The file name 120c may be information indicating a name that can specify a file in which data is stored. For example, the file name 120c may be information indicating a path (shared path) of a file provided in the file server.

サーバアクセスログ１２０に記録されるファイル名１２０ｃと、ファイルアクセスログ２３０（後述）に記載されるファイル名２３０ｂとは、例えば、同じパス名（ファイルサーバの共有パスのパス名等）で記載され、それらの間の同一性を確認可能である。例えば、ファイルサーバ１００における特定のファイルにクライアント２００がアクセスした場合、サーバアクセスログ１２０と、ファイルアクセスログ２３０とに対して、当該ファイルを表す同じパス名が記録されてもよい。例えば、ファイルサーバ１００、クライアント２００がＯＳとしてＷｉｎｄｏｗｓ（登録商標）を採用している場合、係るパス名は、コンピュータ名及び共有名等を含む完全パスの形式により表されてもよい。 The file name 120c recorded in the server access log 120 and the file name 230b described in the file access log 230 (described later) are described with the same path name (such as the path name of the shared path of the file server), for example. The identity between them can be confirmed. For example, when the client 200 accesses a specific file in the file server 100, the same path name representing the file may be recorded in the server access log 120 and the file access log 230. For example, when the file server 100 and the client 200 employ Windows (registered trademark) as the OS, the path name may be represented by a complete path format including a computer name, a shared name, and the like.

ファイル操作１２０ｄは、ファイルに対して実行された操作を表す。係るファイル操作１２０ｄは、例えば、「Ｒｅａｄ（読み込み）、Ｗｒｉｔｅ（書き込み）、Ｄｅｌｅｔｅ（削除）、Ｒｅｎａｍｅ（リネーム、名称変更）」のいずれかであってよい。サーバアクセスログ１２０には、上記以外の操作が記録されてもよい。 The file operation 120d represents an operation executed on the file. For example, the file operation 120d may be any one of “Read (read), Write (write), Delete (delete), and Rename (rename, rename)”. Operations other than those described above may be recorded in the server access log 120.

サイズ１２０ｅは、何からの操作が実行されたファイルのサイズを表す。 The size 120e represents the size of a file from which an operation has been executed.

クライアント２００は、ファイルを操作するアプリケーションを実行することが可能な装置である。クライアント２００は、記憶装置２１０、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２２０を備える。クライアント２００は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等のコンピュータであってもよく、通信機能を有するその他の情報処理装置であってもよい。クライアント２００は、物理的な装置により実現されてもよく、周知の仮想化基盤を用いた仮想マシンにより実現されてもよい。係るクライアント２００は、通信ネットワークを介してファイルサーバ１００、履歴解析装置３００と通信可能に接続される。なお、ファイルサーバ１００に対して、複数のクライアント２００が接続されてよい。 The client 200 is a device that can execute an application that operates a file. The client 200 includes a storage device 210 and a CPU (Central Processing Unit) 220. The client 200 may be, for example, a computer such as a PC (Personal Computer), or may be another information processing apparatus having a communication function. The client 200 may be realized by a physical device or may be realized by a virtual machine using a well-known virtualization platform. Such a client 200 is communicably connected to the file server 100 and the history analysis device 300 via a communication network. A plurality of clients 200 may be connected to the file server 100.

クライアント２００は、ファイルアクセスログ２３０、ディスク操作ログ２４０、ネットワーク操作ログ２５０、プロセス状態情報２６０、ネットワーク状態情報２７０を有する。なお、これらのログ等のデータは、記憶装置２１０に保持されてもよい。 The client 200 includes a file access log 230, a disk operation log 240, a network operation log 250, process status information 260, and network status information 270. Note that data such as logs may be stored in the storage device 210.

クライアント２００（具体的にはクライアント２００において実行されるアプリケーション）は、ファイルサーバ１００において提供されるファイルにアクセス可能である。また、クライアント２００は、自装置に保持する記憶装置２１０（後述）に配置されたファイルにアクセス可能である。 A client 200 (specifically, an application executed on the client 200) can access a file provided on the file server 100. Further, the client 200 can access a file arranged in a storage device 210 (described later) held in the own device.

記憶装置２１０は、ファイルの記憶（保管）、読み込み、書き込み、削除などのファイル操作を処理可能である。具体的には、記憶装置２１０は、ＣＰＵ２２０により実行されたアプリケーションによる、ファイル操作を処理可能である。記憶装置２１０は、例えば、ハードディスクや半導体記憶装置等の物理的なストレージデバイスを用いて実現されてもよい。また、クラインと２００が仮想マシンとして実現された場合、記憶装置２１０は仮想ストレージとして実現されてもよい。 The storage device 210 can process file operations such as storage (storage), reading, writing, and deletion of files. Specifically, the storage device 210 can process a file operation by an application executed by the CPU 220. The storage device 210 may be realized using a physical storage device such as a hard disk or a semiconductor storage device. In addition, when the Klein 200 is realized as a virtual machine, the storage device 210 may be realized as a virtual storage.

ＣＰＵ２２０は、クライアント２００におけるアプリケーション実行処理を行う処理デバイスである。ＣＰＵ２２０は、汎用のマイクロプロセッサ等により実現されてもよい。あるアプリケーションがクライアント２００において実行された場合、ＣＰＵ２２０は、当該アプリケーションに関するプロセスを実行する。 The CPU 220 is a processing device that performs application execution processing in the client 200. The CPU 220 may be realized by a general-purpose microprocessor or the like. When an application is executed in the client 200, the CPU 220 executes a process related to the application.

ファイルアクセスログ２３０は、例えば、クライアント２００によりファイルサーバにおける共有記憶装置１１０に対して実行されたファイル操作に関する記録（データ）を保持する。ファイルアクセスログ２３０には、例えば、図３に例示するような形式で、「アクセス時間（２３０ａ）、ファイル名（２３０ｂ）、ファイル操作（２３０ｃ）、サイズ（２３０ｄ）」の各要素が時系列順に複数記録されてよい。 The file access log 230 holds, for example, a record (data) related to a file operation executed by the client 200 on the shared storage device 110 in the file server. In the file access log 230, for example, the elements “access time (230a), file name (230b), file operation (230c), size (230d)” in the format illustrated in FIG. Multiple records may be recorded.

アクセス時間２３０ａは、上記アクセス時間１２０ａと同様、共有記憶装置１１０に保持されたファイルに対する操作が実行された時間を表す。ファイルに対する操作が実行された時間は、当該操作によりファイルがアクセスされた時間であってもよい。 Similar to the access time 120a, the access time 230a represents the time when an operation is performed on a file held in the shared storage device 110. The time when the operation on the file is executed may be the time when the file is accessed by the operation.

ファイル名２３０ｂは、上記ファイル名１２０ｃと同様、ファイルを特定可能な情報を保持してよい。ファイル名２３０ｂは、データが保存されたファイルを特定可能な名称を表す情報であってもよい。 Similarly to the file name 120c, the file name 230b may hold information that can identify the file. The file name 230b may be information representing a name that can specify a file in which data is stored.

ファイル操作２３０ｃは、ファイルに対して実行された操作を表す。ファイル操作２３０ｃには、上記ファイル操作１２０ｄと同様、「Ｒｅａｄ（読み込み）、Ｗｒｉｔｅ（書き込み）、Ｄｅｌｅｔｅ（削除）、Ｒｅｎａｍｅ（名称変更）」のいずれかが記録されてよい。ファイルアクセスログ２３０のファイル操作には、上記以外の操作が記録されてもよい。 The file operation 230c represents an operation performed on the file. Similarly to the file operation 120d, any one of “Read (read), Write (write), Delete (delete), and Rename (rename)” may be recorded in the file operation 230c. In the file operation of the file access log 230, operations other than those described above may be recorded.

サイズ２３０ｄは、操作されたファイルのサイズを表す。 The size 230d represents the size of the operated file.

ディスク操作ログ２４０は、クライアント２００が、記憶装置２１０に対して行ったディスクアクセス操作が記録されたデータを保持する。より具体的には、ディスク操作ログ２４０には、例えばアプリケーションが、記憶装置２１０に対して実行したファイルアクセスに関する操作が記録されてよい。ディスク操作ログ２４０には、例えば、図４に例示するような形式で、「アクセス時間２４０ａ、プロセスＩＤ２４０ｂ、ファイル名２４０ｃ、ファイル操作２４０ｄ、サイズ２４０ｅ」の各要素が時系列順に複数記録されてよい。 The disk operation log 240 holds data in which disk access operations performed by the client 200 on the storage device 210 are recorded. More specifically, the disk operation log 240 may record an operation related to file access performed by the application on the storage device 210, for example. In the disk operation log 240, for example, a plurality of elements “access time 240a, process ID 240b, file name 240c, file operation 240d, size 240e” may be recorded in time series in the format illustrated in FIG. .

アクセス時間２４０ａは、ディスク操作が発生した時間を表す。ディスク操作が発生した時間は、当該ディスク操作によりファイルがアクセスされた時間であってもよい。 The access time 240a represents the time when the disk operation has occurred. The time when the disk operation occurs may be the time when the file is accessed by the disk operation.

プロセスＩＤ（Ｉｄｅｎｔｉｆｉｅｒ）２４０ｂは、ディスクアクセスを実行したプロセスを識別可能な情報（識別子）を表す。 The process ID (Identifier) 240b represents information (identifier) that can identify the process that executed the disk access.

ファイル名２４０ｃは、上記ファイル名１２０ｃ、ファイル名２３０ｂ等と同様、ファイルを特定可能な情報を保持してよい。即ち、ファイル名２４０ｃは、データが保存されたファイルを特定可能なファイル名（パス名）等を表す情報であってもよい。 The file name 240c may hold information that can specify a file, like the file name 120c and the file name 230b. That is, the file name 240c may be information indicating a file name (path name) that can specify a file in which data is stored.

ファイル操作２４０ｄには、上記ファイル操作１２０ｄ、２３０ｃと同様の情報が記録されてもよい。 Information similar to the file operations 120d and 230c may be recorded in the file operation 240d.

サイズ２４０ｅには、上記サイズ１２０ｅ、サイズ２３０ｄと同様、ファイルのサイズが記録されてよい。 In the size 240e, the file size may be recorded in the same manner as the size 120e and the size 230d.

ネットワーク操作ログ２５０は、クライアント２００が、ネットワーク４００を介して他のネットワーク４１０との間でデータを送受信した際のネットワーク操作が記録されたデータを保持する。ネットワーク操作ログ２５０には、例えば、図５に例示するような形式で、「アクセス時間２５０ａ、アクセス元ポート番号２５０ｂ、送受信方向２５０ｃ、アクセス先ホスト２５０ｄ、アクセスポート番号２５０ｅ、サイズ２５０ｆ」の各要素が時系列順に複数記載されてよい。 The network operation log 250 holds data in which a network operation when the client 200 transmits / receives data to / from another network 410 via the network 400 is recorded. The network operation log 250 includes, for example, each element of “access time 250a, access source port number 250b, transmission / reception direction 250c, access destination host 250d, access port number 250e, size 250f” in the format illustrated in FIG. May be described in a time-series order.

アクセス時間２５０ａは、ネットワーク操作が実行された時間を表す。 The access time 250a represents the time when the network operation is executed.

アクセス元ポート番号２５０ｂは、ネットワーク操作を行った際のクライアント２００におけるポート番号を表す。 The access source port number 250b represents a port number in the client 200 when a network operation is performed.

送受信方向２５０ｃは、ネットワーク操作の種類を表す。具体的には、送受信方向２５０ｃは、ネットワーク操作が送信（Ｕｐｌｏａｄ）であるか、受信（Ｄｏｗｎｌｏａｄ）であるかを表してもよい。 The transmission / reception direction 250c represents the type of network operation. Specifically, the transmission / reception direction 250c may represent whether the network operation is transmission (Upload) or reception (Download).

アクセス先ホスト２５０ｄは、クライアント２００において実行されたネットワーク操作のアクセス先を識別可能な情報を表す。アクセス先ホスト２５０ｄは、例えば、アクセス先ホストを識別可能な名称（ホスト名等）であってもよく、アクセス先ホストのネットワークアドレスであってもよい。 The access destination host 250 d represents information that can identify the access destination of the network operation executed in the client 200. The access destination host 250d may be, for example, a name (host name or the like) that can identify the access destination host, or a network address of the access destination host.

アクセス先ポート番号２５０ｅは、クライアント２００において実行されたネットワーク操作により接続される、アクセス先ホストのポート番号を表す。 The access destination port number 250e represents the port number of the access destination host connected by the network operation executed in the client 200.

サイズ２５０ｆは、クライアント２００において実行されたネットワーク操作により送信又は受信されたデータのサイズを表す。 The size 250 f represents the size of data transmitted or received by a network operation executed in the client 200.

プロセス状態情報２６０は、ＣＰＵ２２０において処理される１以上のプロセスの状態が時系列に記録されたデータ（プロセス状態）を保持する。プロセス状態情報２６０には、例えば、ある規定された時間ごと（例えば、一定時間ごと）のプロセスの状態が記録されてよい。プロセス状態情報２６０には、例えば、図６Ａに例示するような形式で「プロセスＩＤ（２６０ａ）、プロセス名（２６０ｂ）、実行ユーザ（２６０ｃ）」の各要素が記録される。なお、プロセス状態情報２６０には、例えば、図６Ｂに例示するような形式で、あるプロセス状態が記録された時間を表す時刻情報２６０ｄが記録されてもよい。 The process state information 260 holds data (process state) in which the states of one or more processes processed by the CPU 220 are recorded in time series. In the process state information 260, for example, the state of a process at every predetermined time (for example, every predetermined time) may be recorded. In the process state information 260, for example, elements of “process ID (260a), process name (260b), execution user (260c)” are recorded in the format illustrated in FIG. 6A. The process state information 260 may be recorded with time information 260d representing the time when a certain process state is recorded, for example, in the format illustrated in FIG. 6B.

プロセスＩＤ２６０ａは、クライアント２００において実行されたプロセスを識別可能な情報（識別子）を表す。 The process ID 260a represents information (identifier) that can identify a process executed in the client 200.

プロセス名２６０ｂは、プロセスＩＤ２６０ａにより識別されるプロセスの名称を表す。プロセス名２６０ｂは、例えば、クライアント２００で実行されたアプリケーションの名称を表してもよい。 The process name 260b represents the name of the process identified by the process ID 260a. The process name 260b may represent, for example, the name of an application executed on the client 200.

実行ユーザ２６０ｃは、クライアント２００においてプロセスを実行したユーザを表す。 The execution user 260 c represents a user who executed a process in the client 200.

ネットワーク状態情報２７０は、クライアント２００において使用されている、ネットワークのポート番号の利用状況が時系列に記録されたデータを保持する。ネットワーク状態情報２７０には、例えば、ある既定された時間ごと（例えば、一定時間ごと）に、使用されているネットワークのポート番号が記録されてよい。ネットワーク状態情報２７０には、例えば、図７Ａに例示するような形式で、ポート番号（２７０ａ）と、プロセスＩＤ（２７０ｂ）との各要素が記載される。なお、ネットワーク状態情報２７０には、例えば、図７Ｂに例示するような形式で、あるポート番号の利用状況が記録された時間を表す時刻情報（２７０ｃ）が記録されてもよい。 The network status information 270 holds data used in the client 200 and recorded in a time-series manner in which the network port number usage status is recorded. In the network status information 270, for example, the port number of the network being used may be recorded at every predetermined time (for example, every predetermined time). In the network status information 270, for example, each element of a port number (270a) and a process ID (270b) is described in a format illustrated in FIG. 7A. In the network status information 270, time information (270c) indicating the time when the usage status of a certain port number is recorded may be recorded in the format illustrated in FIG. 7B, for example.

ポート番号２７０ａは、クライアント２００において使用されているネットワークのポート番号を表す。 The port number 270a represents a network port number used in the client 200.

プロセスＩＤ２７０ｂは、ポート番号２７０ａのポートを使用しているプロセスのプロセスＩＤを表す。 The process ID 270b represents the process ID of the process using the port with the port number 270a.

履歴解析装置３００は、ログ取得部３１０と、解析部３４０とを備える。履歴解析装置３００は、操作履歴情報３２０と、アプリケーション情報３３０とを有してもよい。履歴解析装置３００は、コンピュータ等の情報処理装置により実現され得る。履歴解析装置３００は、例えば、物理的な装置により実現されてもよく、周知の仮想化基盤を用いた仮想マシンにより実現されてもよい。係る履歴解析装置３００は、ネットワーク４００を介してファイルサーバ１００、クライアント２００と通信可能に接続される。 The history analysis apparatus 300 includes a log acquisition unit 310 and an analysis unit 340. The history analysis device 300 may include operation history information 320 and application information 330. The history analysis apparatus 300 can be realized by an information processing apparatus such as a computer. The history analysis device 300 may be realized by a physical device, for example, or may be realized by a virtual machine using a well-known virtualization platform. The history analysis apparatus 300 is connected to the file server 100 and the client 200 via the network 400 so that they can communicate with each other.

ログ取得部３１０は、ファイルサーバ１００及びクライアント２００から、ネットワーク４００を介して各種ログ情報を取得し、操作履歴情報３２０を生成する機能を有する。係る各種ログ情報には、例えば、サーバアクセスログ１２０、ファイルアクセスログ２３０、ディスク操作ログ２４０、ネットワーク操作ログ２５０、プロセス状態情報２６０、ネットワーク状態情報２７０等が含まれてよい。 The log acquisition unit 310 has a function of acquiring various log information from the file server 100 and the client 200 via the network 400 and generating operation history information 320. Such various log information may include, for example, a server access log 120, a file access log 230, a disk operation log 240, a network operation log 250, process state information 260, network state information 270, and the like.

操作履歴情報３２０は、ファイルサーバ１００又はクライアント２００において、あるファイルに対して実行された操作履歴をデータとして保持する。操作履歴情報３２０には、具体的には、あるファイルに関する操作履歴を解析部３４０が解析した解析結果が記録される。操作履歴情報３２０には、例えば、図８に例示するような形式で、「操作時間（３２０ａ）、ファイル名（３２０ｂ）、ユーザ操作（３２０ｃ）、操作元ホスト（３２０ｄ）、操作元ファイル（３２０ｅ）、操作ユーザ（３２０ｆ）」の各要素が記録されてよい。 The operation history information 320 holds, as data, an operation history executed on a certain file in the file server 100 or the client 200. Specifically, the operation history information 320 records an analysis result obtained by analyzing the operation history related to a certain file by the analysis unit 340. The operation history information 320 includes, for example, “operation time (320a), file name (320b), user operation (320c), operation source host (320d), operation source file (320e) in a format illustrated in FIG. ), Operation element (320f) "may be recorded.

操作時間３２０ａは、あるファイルに対する操作が実行された時間を表す。係る操作時間３２０ａは、当該ファイルに対してアクセスされた時間であってもよい。 The operation time 320a represents a time when an operation on a certain file is executed. The operation time 320a may be a time when the file is accessed.

ファイル名３２０ｂは、操作が実行されたファイルを特定可能な情報（例えば、ファイルの名、パス名等）を表す。 The file name 320b represents information (for example, a file name, a path name, etc.) that can identify the file on which the operation has been executed.

ユーザ操作３２０ｃは、ファイルに対して実行された操作を表す。ユーザ操作３２０ｃには、例えば、「コピー、新規作成、編集、派生、削除、移動、リネーム」のいずれかが記録されてよい。ユーザ操作３２０ｃには、上記以外の操作が記録されてもよい。 The user operation 320c represents an operation executed on the file. For example, any one of “copy, new creation, editing, derivation, deletion, movement, renaming” may be recorded in the user operation 320 c. Operations other than those described above may be recorded in the user operation 320c.

操作元ホスト３２０ｄは、ファイルの操作が実行されたクライアント２００を特定可能な除法を表す。 The operation source host 320d represents a division method that can identify the client 200 on which the file operation is executed.

操作元ファイル３２０ｅは、あるファイルの原本ファイル（派生元ファイル）を特定可能な情報（例えば、ファイル名、パス名等）を表す。 The operation source file 320e represents information (for example, a file name, a path name, etc.) that can specify an original file (derivation source file) of a certain file.

操作ユーザ３２０ｆは、例えば、ファイルを操作したユーザを特定可能な情報（例えば、ユーザ名等）を表す。 The operation user 320f represents, for example, information (for example, a user name) that can specify a user who has operated the file.

アプリケーション情報３３０は、アプリケーションに関する情報（例えば、アプリケーションで使用できる拡張子及び操作に関する情報）をデータとして保持する。アプリケーション情報３３０には、例えば、図９に例示するように、アプリケーションに関する「アプリケーション名（３３０ａ）、拡張子（３３０ｂ）、可能操作（３３０ｃ）、ポート番号（３３０ｄ）」を表す情報が記録されてよい。アプリケーション情報３３０は、ユーザ等によって予め登録されていてもよい。 The application information 330 holds information related to the application (for example, information related to extensions and operations that can be used in the application) as data. In the application information 330, for example, as illustrated in FIG. 9, information indicating “application name (330a), extension (330b), possible operation (330c), port number (330d)” related to the application is recorded. Good. The application information 330 may be registered in advance by a user or the like.

アプリケーション名３３０ａは、アプリケーションの名称を表す。 The application name 330a represents the name of the application.

拡張子３３０ｂは、例えば、アプリケーションが扱えるファイルの拡張子（アプリケーションに関連付けされたファイルの拡張子）を表す。 The extension 330b represents, for example, an extension of a file that can be handled by the application (an extension of a file associated with the application).

可能操作３３０ｃは、アプリケーションが実行可能なファイル操作の種類を表す。可能操作３３０ｃには、例えば、「Ｅｄｉｔ（編集）、Ｃｏｐｙ（コピー：複製）、Ｄｏｗｎｌｏａｄ（ダウンロード）、Ｕｐｌｏａｄ（アップロード）」のいずれかが記載されてよい。可能操作３３０ｃには、上記以外の操作が記録されてもよい。 The possible operation 330c represents the type of file operation that can be executed by the application. In the possible operation 330c, for example, any one of “Edit (edit), Copy (copy: copy), Download (download), and Upload (upload)” may be described. Operations other than those described above may be recorded in the possible operation 330c.

ポート番号３３０ｄは、アプリケーションがネットワーク通信の際に使用するポート番号を表す。 The port number 330d represents a port number used by the application during network communication.

解析部３４０は、ログ取得部３１０において取得した各種ログ情報、操作履歴情報３２０、及び、アプリケーション情報３３０から、ファイルサーバに配置されたあるファイル（「対象ファイル」と記載する場合がある）の派生元であるファイル（「原本ファイル」と記載する場合がある）と、当該対象ファイルに関する操作履歴とを推定する。解析部３４０の具体的な処理について後述する。 The analysis unit 340 derives a file (may be described as “target file”) arranged in the file server from various log information, operation history information 320, and application information 330 acquired by the log acquisition unit 310. An original file (may be described as “original file”) and an operation history related to the target file are estimated. Specific processing of the analysis unit 340 will be described later.

ネットワーク４００は、ファイルサーバ１００と、１以上のクライアント２００と、履歴解析装置３００とを通信可能に接続し、これらの間においてファイルを転送可能とする通信ネットワークである。ネットワーク４００は、他のネットワーク４１０に接続可能であり、他のネットワーク４１０との間で、例えば、Ｗｅｂ（ＷｏｒｌｄＷｉｄｅＷｅｂ）や電子メールなど用いて、情報（データ）を送受信することができる。係るネットワーク４００は、有線通信、無線通信、あるいはそれらの適切な組合せにより実現されてもよい。係るネットワーク４００は、周知の仮想化基盤を用いた仮想ネットワークとして実現されてもよい。 The network 400 is a communication network that connects the file server 100, one or more clients 200, and the history analysis apparatus 300 so that they can communicate with each other, and that can transfer files between them. The network 400 can be connected to another network 410, and can transmit / receive information (data) to / from the other network 410 using, for example, the Web (World Wide Web) or electronic mail. Such a network 400 may be realized by wired communication, wireless communication, or an appropriate combination thereof. The network 400 may be realized as a virtual network using a well-known virtualization platform.

ネットワーク４００は、他のネットワーク４１０（通信ネットワーク）と通信可能に接続されていてもよい。他のネットワーク４１０は、有線通信、無線通信、あるいはそれらの適切な組合せにより実現されてもよく、周知の仮想化基盤を用いた仮想ネットワークとして実現されてもよい。 The network 400 may be communicably connected to another network 410 (communication network). The other network 410 may be realized by wired communication, wireless communication, or an appropriate combination thereof, or may be realized as a virtual network using a well-known virtualization infrastructure.

なお、上記説明したサーバアクセスログ１２０及びファイルアクセスログ２３０には、共通の時間情報（時刻情報）が記録されることを想定する。このため、ファイルサーバ１００の時間情報と、各クライアント２００の時間情報とは、同期されていてよい。なお、これらの時間情報を厳密に同期することが困難な場合には、例えば、アクセス時間とサイズとを比較することにより、サーバアクセスログ１２０とファイルアクセスログ２３０とにおけるアクセス時間と、ファイル名とについて、同一性を確保できることを想定する。例えば、履歴解析装置３００（特には解析部３４０）は、サーバアクセスログ１２０に記録された特定のログと、ファイルアクセスログ２３０に記録された特定のログとが、同一のファイルについて実行された操作に関するログであるか否かを、以下のような方法で判定してよい。即ち、履歴解析装置３００は、それら２つのログに記録されたアクセス時間の差異と、サイズの差異との少なくとも一方が、規定値より小さい場合、それらのログは同一のファイルについて実行された操作に関するログであると判定してよい。 It is assumed that common time information (time information) is recorded in the server access log 120 and the file access log 230 described above. For this reason, the time information of the file server 100 and the time information of each client 200 may be synchronized. When it is difficult to strictly synchronize these pieces of time information, for example, by comparing the access time and the size, the access time in the server access log 120 and the file access log 230, the file name, Assuming that identity can be secured. For example, the history analysis apparatus 300 (particularly the analysis unit 340) performs an operation in which a specific log recorded in the server access log 120 and a specific log recorded in the file access log 230 are executed on the same file. The following method may be used to determine whether or not the log is related to. That is, when at least one of the difference between the access times recorded in the two logs and the difference in size is smaller than the specified value, the history analysis apparatus 300 relates to the operation performed on the same file. You may determine that it is a log.

［動作］
以下、本実施形態における履歴解析装置３００の動作について図面を参照して説明する。具体的には、ファイルサーバ１００の共有記憶装置１１０に保持されている対象ファイル５０１に対する原本ファイル５４１と、対象ファイル５０１に関する操作履歴とを推定する処理について、説明する。対象ファイル５０１は、履歴解析の対象となるファイルである。原本ファイル５４１は、対象ファイル５０１の原本（派生元）のファイルである。 [Operation]
Hereinafter, the operation of the history analysis apparatus 300 in the present embodiment will be described with reference to the drawings. Specifically, a process for estimating the original file 541 for the target file 501 held in the shared storage device 110 of the file server 100 and the operation history related to the target file 501 will be described. The target file 501 is a file to be subjected to history analysis. The original file 541 is an original (derived) file of the target file 501.

一般的に、原本ファイル（派生元）から読み込まれたデータに対して、編集、コピー、別名保存等が実行されることで、派生先ファイルが作成されると考えられる。換言すると、対象ファイル５０１に対する書き込み（Ｗｒｉｔｅ）操作が実行される前に、原本ファイル５４１から読み込み操作（ｒｅａｄ）が実行されていると考えられる。履歴解析装置３００（解析部３４０）は、対象ファイル５０１に関する書き込み(Ｗｒｉｔｅ）操作の前に実行された、ある特定の条件を満たす読み込み（Ｒｅａｄ）操作を特定する。そして、当該Ｒｅａｄ操作が実行されたファイルを、対象ファイル５０１に関する原本ファイル５４１と推定する。以下、派生元ファイルに対して実行された操作を、派生元操作と記載する場合がある。 Generally, it is considered that a derivation destination file is created by executing editing, copying, alias saving, and the like on data read from an original file (derivation source). In other words, it is considered that the read operation (read) is executed from the original file 541 before the write operation for the target file 501 is executed. The history analysis apparatus 300 (the analysis unit 340) specifies a read operation that satisfies a certain specific condition and is executed before a write operation related to the target file 501. Then, the file on which the Read operation is executed is estimated as the original file 541 related to the target file 501. Hereinafter, an operation performed on a derivation source file may be referred to as a derivation source operation.

履歴解析装置３００は、ログ取得部３１０を使用して、ファイルサーバ１００より、サーバアクセスログ１２０を取得する。また、履歴解析装置３００は、１以上のクライアント２００より、ファイルアクセスログ２３０、ディスク操作ログ２４０、ネットワーク操作ログ２５０、プロセス状態情報２６０、ネットワーク状態情報２７０の各ログを取得する（図１０のステップＳ１００１）。 The history analysis apparatus 300 acquires the server access log 120 from the file server 100 using the log acquisition unit 310. Further, the history analysis apparatus 300 acquires each log of the file access log 230, the disk operation log 240, the network operation log 250, the process status information 260, and the network status information 270 from one or more clients 200 (step of FIG. 10). S1001).

解析部３４０は、サーバアクセスログ１２０から、対象ファイル５０１の操作が記載されているログを抽出して時系列順に並べる。解析部３４０は並べたログからＷｒｉｔｅ操作に関するログを抽出することにより、対象ファイル５０１に対するＷｒｉｔｅ操作に該当する第１アクセスログリスト５０２を抽出する（ステップＳ１００２）。 The analysis unit 340 extracts from the server access log 120 a log that describes the operation of the target file 501 and arranges the logs in time series. The analysis unit 340 extracts the first access log list 502 corresponding to the write operation for the target file 501 by extracting the log related to the write operation from the arranged logs (step S1002).

解析部３４０は、第１アクセスログリスト５０２の中で最も古いアクセス時間の第１アクセスログ５０３を抽出する（ステップＳ１００３）。解析部３４０は、上記に限定されず、例えば、最も古いアクセス時間からある基準時間の範囲内のログを、第１アクセスログ５０３として抽出してもよい。 The analysis unit 340 extracts the first access log 503 having the oldest access time from the first access log list 502 (step S1003). The analysis unit 340 is not limited to the above, and may extract, for example, a log within a certain reference time range from the oldest access time as the first access log 503.

解析部３４０は、第１アクセスログ５０３に記録された「アクセス元（図２の１２０ｂ）」を参照する。解析部３４０は、当該アクセス元１２０ｂに該当するクライアント２００のファイルアクセスログ２３０を取得する（ステップＳ１００４）。 The analysis unit 340 refers to “access source (120b in FIG. 2)” recorded in the first access log 503. The analysis unit 340 acquires the file access log 230 of the client 200 corresponding to the access source 120b (step S1004).

解析部３４０は、取得したファイルアクセスログ２３０から、対象ファイル５０１と同一のファイル名を含むログのリストを選択して時系列順に並べる。これにより、解析部３４０は、ファイルアクセスログリスト５１１を作成する（ステップＳ１００５）。 The analysis unit 340 selects a log list including the same file name as the target file 501 from the acquired file access log 230 and arranges the list in chronological order. As a result, the analysis unit 340 creates the file access log list 511 (step S1005).

解析部３４０は、ファイルアクセスログリスト５１１から、アクセス時間２３０ａ及びファイル操作２３０ｃが、それぞれ、サーバアクセスログ１２０に記録されたアクセス時間１２０ａ及びファイル操作１２０ｄと合致するログを取得する（ステップＳ１００６）。以下、当該取得したログを、以下書き込みログ５１２と記載する。 The analysis unit 340 acquires, from the file access log list 511, a log in which the access time 230a and the file operation 230c match the access time 120a and the file operation 120d recorded in the server access log 120, respectively (step S1006). Hereinafter, the acquired log is referred to as a write log 512 hereinafter.

なお、ファイルアクセスログ２３０と、ディスク操作ログ２４０とに、同様のデータが記録されている場合には、上記ステップＳ１００４乃至ステップＳ１００５において、ファイルアクセスログ２３０を用いずに、ディスク操作ログ２４０を用いてもよい。 If the same data is recorded in the file access log 230 and the disk operation log 240, the disk operation log 240 is used in step S1004 to step S1005 without using the file access log 230. May be.

解析部３４０は、ディスク操作ログ２４０から、書き込みログ５１２とアクセス時刻が一致するログを選択する。解析部３４０は、選択したログの内、プロセスＩＤ２４０ｂに記録されたデータを取得する（ステップＳ１００７）。以下、プロセスＩＤ２４０ｂのデータを、プロセスＩＤ５２１（操作プロセスＩＤ）と記載する。係るプロセスＩＤ５２１に該当するプロセスが、対象ファイルに対するＷｒｉｔｅ操作を実行したアプリケーションのプロセスである。 The analysis unit 340 selects, from the disk operation log 240, a log whose access time coincides with the write log 512. The analysis unit 340 acquires the data recorded in the process ID 240b in the selected log (step S1007). Hereinafter, the data of the process ID 240b is described as a process ID 521 (operation process ID). The process corresponding to the process ID 521 is the process of the application that executed the write operation for the target file.

解析部３４０は、クライアント２００から取得したプロセス状態情報２６０の中から、プロセスＩＤ５２１と同一のプロセスＩＤを含むデータを検索し、当該データに含まれるプロセス名２６０ｂを取得する（ステップＳ１００８）。以下、取得したプロセス名２６０ｂを、プロセス名５２２と記載する。 The analysis unit 340 searches the process state information 260 acquired from the client 200 for data including the same process ID as the process ID 521, and acquires the process name 260b included in the data (step S1008). Hereinafter, the acquired process name 260b is referred to as a process name 522.

解析部３４０は、アプリケーション情報３３０から、アプリケーション名（図９の３３０ａ）がプロセス名５２２に合致するデータを抽出し、プロセス名５２２に相当するアプリケーションに関する可能操作（図９の３３０ｃ）を特定する（ステップＳ１００９）。以下、特定した可能操作３３０ｃを、可能操作５２３と記載する。プロセス名５２２を抽出できない場合、解析部３４０は、当該アプリケーションを未知のアプリケーションとして扱う。この場合、解析部３４０は、当該アプリケーションが、任意のファイル操作及びネットワーク操作の少なくとも一方を実行可能であると推定する。 The analysis unit 340 extracts, from the application information 330, data whose application name (330a in FIG. 9) matches the process name 522, and identifies a possible operation (330c in FIG. 9) related to the application corresponding to the process name 522 ( Step S1009). Hereinafter, the identified possible operation 330c is referred to as a possible operation 523. When the process name 522 cannot be extracted, the analysis unit 340 handles the application as an unknown application. In this case, the analysis unit 340 estimates that the application can execute at least one of arbitrary file operation and network operation.

以下、解析部３４０が、書き込みが行われた対象ファイル５０１の更新元又はコピー元になったファイル（派生元ファイル（原本ファイル））を検出する処理について説明する。解析部３４０は、例えば、上記説明したステップＳ１００９における処理の後、以下に説明する原本ファイルを推定する処理を実行してもよい。 Hereinafter, a process in which the analysis unit 340 detects a file (a derivation source file (original file)) that is an update source or a copy source of the target file 501 that has been written will be described. For example, the analysis unit 340 may execute a process for estimating an original file described below after the process in step S1009 described above.

アプリケーション情報３３０のプロセス名５２２における可能操作５２３に「Ｅｄｉｔ（編集）」の要素が含まれる場合を想定する。この場合、クライアント２００において、ファイルの新規作成、編集による改変、別名保存による派生等が行われている可能性がある。以下、この場合に、解析部３４０が実行する処理について図１１、図１２に例示するフローチャートを参照して説明する。 Assume that a possible operation 523 in the process name 522 of the application information 330 includes an element “Edit (edit)”. In this case, in the client 200, there is a possibility that new file creation, modification by editing, derivation by alias saving, and the like are performed. Hereinafter, processing performed by the analysis unit 340 in this case will be described with reference to flowcharts illustrated in FIGS. 11 and 12.

解析部３４０は、プロセス状態情報２６０から、書き込みログ５１２のアクセス時間よりも前（過去）に、プロセスＩＤ５２１を含むデータが記録された最も古い時間を検出する（ステップＳ１２０１）。換言すると、解析部３４０は、書き込みログ５１２のアクセス時間よりも過去に、プロセスＩＤ５２１を含むデータが初めてプロセス状態情報２６０に記録された時間を検出する。以下、プロセスＩＤ５２１を含むデータが記録された最も古い時間を「プロセス起動時間」と記載する場合がある。 The analysis unit 340 detects, from the process state information 260, the oldest time when data including the process ID 521 is recorded before (in the past) the access time of the write log 512 (step S1201). In other words, the analysis unit 340 detects the time when the data including the process ID 521 is recorded in the process state information 260 for the first time before the access time of the write log 512. Hereinafter, the oldest time in which data including the process ID 521 is recorded may be referred to as “process activation time”.

具体的には、解析部３４０は、例えば、書き込みログ５１２のアクセス時間から、プロセス状態情報２６０の記録を過去に遡り、プロセスＩＤ５２１を含むプロセス状態が記録されたデータを確認してよい。解析部３４０は、例えば、プロセスＩＤ５２１を含むデータの内、記録された時間が最も古いデータを抽出する。そして、そのデータが記録された時間を、プロセス起動時間として扱ってよい。以下、係るプロセス起動時間を、プロセス起動時間５３１と記載する場合がある。解析部２３０は、プロセスＩＤ５２１に該当するアプリケーションが、少なくとも、プロセス起動時間５３１から書き込みログ５１２のアクセス時間の間で生存していた（実行されていた）と推定してよい。 Specifically, the analysis unit 340 may, for example, trace the record of the process state information 260 from the access time of the write log 512 and confirm the data in which the process state including the process ID 521 is recorded. For example, the analysis unit 340 extracts data having the oldest recorded time from data including the process ID 521. Then, the time when the data is recorded may be treated as the process activation time. Hereinafter, such process activation time may be referred to as process activation time 531. The analysis unit 230 may estimate that the application corresponding to the process ID 521 is alive (executed) at least between the process activation time 531 and the access time of the write log 512.

解析部３４０は、ファイルアクセスログ２３０の中から、プロセス起動時間５３１から、書き込みログ５２１のアクセス時間までの間の書き込まれたログを確認する。解析部３４０は、それらのログのうち、可能操作５２３に関連するログを絞り込む。これにより、解析部３４０は、プロセスＩＤ５２１に該当するアプリケーションによる操作が記録されたログの候補である、アクセスログリスト５３２（第２アクセスログリスト）を抽出する（ステップＳ１２０２）。解析部３４０は、例えば、可能操作５２３が「Ｅｄｉｔ（編集）」である場合、ファイルアクセスログ２３０の中から、ファイル操作（図３の２３０ｃ）が「Ｒｅａｄ、Ｗｒｉｔｅ、Ｒｅｎａｍｅ」に相当するログを抽出してもよい。 The analysis unit 340 confirms the written log from the process activation time 531 to the access time of the write log 521 from the file access log 230. The analysis part 340 narrows down the log relevant to the possible operation 523 among those logs. Thereby, the analysis unit 340 extracts an access log list 532 (second access log list) that is a log candidate in which an operation by the application corresponding to the process ID 521 is recorded (step S1202). For example, when the possible operation 523 is “Edit (edit)”, the analysis unit 340 displays, from the file access log 230, a file operation (230c in FIG. 3) corresponding to “Read, Write, Rename”. It may be extracted.

解析部３４０は、抽出したアクセスログリスト５３２のうち、ディスク操作ログ２４０に含まれるプロセスＩＤが、プロセスＩＤ５２１と合致するログを抽出する。解析部３４０は、抽出したログを用いて、第３アクセスログリストを生成する（ステップＳ１２０３）。以下、第３アクセスログリストを、アクセスログリスト５３３と記載する場合がある。 The analysis unit 340 extracts a log in which the process ID included in the disk operation log 240 matches the process ID 521 from the extracted access log list 532. The analysis unit 340 generates a third access log list using the extracted log (step S1203). Hereinafter, the third access log list may be referred to as an access log list 533.

解析部３４０は、アクセスログリスト５３３に読み込み（Ｒｅａｄ）操作のログが含まれるか確認する（ステップＳ１２０４）。解析部３４０は、例えば、操作対象ファイルに実行された操作（Ｗｒｉｔｅ操作）に応じて、派生元ファイルに対して実行され得る操作（派生元操作）である読み込み操作（Ｒｅａｄ操作）を特定し、アクセスログリスト５３３に読み込み（Ｒｅａｄ）操作のログが含まれるか確認してもよい。 The analysis unit 340 confirms whether the access log list 533 includes a read operation read log (step S1204). The analysis unit 340 specifies, for example, a read operation (Read operation) that is an operation (derivation source operation) that can be executed on the derivation source file in accordance with an operation (Write operation) performed on the operation target file. It may be confirmed whether the access log list 533 includes a read operation log.

以下、アクセスログリスト５３３にＲｅａｄ操作ログが含まれる場合（ステップＳ１２０４においてＹＥＳ）について説明する。 Hereinafter, a case where the read operation log is included in the access log list 533 (YES in step S1204) will be described.

あるプロセスが単一の文書しか編集できない場合、古い読み込みアクセスによる更新は破棄されたものと考えられる。例えば、読み込みアクセス（Ｒｅａｄ）が複数回発生した場合、新しい読み込みアクセスにより、旧い読み込みアクセスの内容が破棄され得る。よって、解析部３４０は、アクセスログリスト５３３（第３アクセスログリスト）の中で最も新しい（アクセス時間が新しい）Ｒｅａｄ操作が含まれる候補ログを取得する（ステップＳ１２０５）。解析部３４０は、例えば、アクセスログリスト５３３の中で、書き込みログ５１２のアクセス時間に最も近い時間に実行されたＲｅａｄ操作に関するログを、候補ログとして取得してもよい。複数のＲｅａｄ操作が同じ時間に記録されている場合、解析部３４０は、最後に記録されたＲｅａｄ操作に関するログを、候補ログとして取得してもよい。以下、係るログを、候補ログ５３４と記載する。 If a process can only edit a single document, the old read access update is considered discarded. For example, when a read access (Read) occurs a plurality of times, the contents of the old read access can be discarded by a new read access. Therefore, the analysis unit 340 acquires a candidate log including the latest (newest access time) Read operation in the access log list 533 (third access log list) (step S1205). For example, the analysis unit 340 may acquire, as a candidate log, a log related to a Read operation executed at a time closest to the access time of the write log 512 in the access log list 533. When a plurality of Read operations are recorded at the same time, the analysis unit 340 may acquire a log regarding the Read operation recorded last as a candidate log. Hereinafter, this log is referred to as a candidate log 534.

この場合、解析部３４０は、対象ファイル５０１の更新元となった原本ファイル５４１は、候補ログ５３４のファイル名により特定されるファイルであると推定してよい（ステップＳ１２０６）。換言すると、解析部３４０は、候補ログ５３４のファイル名を、原本ファイル５４１のファイル名であると推定する。 In this case, the analysis unit 340 may estimate that the original file 541 that is the update source of the target file 501 is a file specified by the file name of the candidate log 534 (step S1206). In other words, the analysis unit 340 estimates that the file name of the candidate log 534 is the file name of the original file 541.

原本ファイル５４１のファイル名（即ち、候補ログ５３４に記録されたファイル名）と、対象ファイル５０１のファイル名とが合致する（同一である）場合は、解析部３４０は、当該ファイルに対して実行されたユーザ操作が「更新」であると推定する。以下、推定したユーザ操作を、ユーザ操作５３５と記載する場合がある。 If the file name of the original file 541 (that is, the file name recorded in the candidate log 534) matches the file name of the target file 501, the analysis unit 340 executes the file It is estimated that the performed user operation is “update”. Hereinafter, the estimated user operation may be referred to as a user operation 535.

原本ファイル５４１のファイル名と、対象ファイル５０１のファイル名とが異なる場合、解析部３４０は、対象ファイル５０１は、原本ファイル５４１とは異なるファイル名で保存された派生先ファイルであると推定する。この場合、解析部３４０は、ユーザ操作が「派生」であると推定してよい。 When the file name of the original file 541 and the file name of the target file 501 are different, the analysis unit 340 estimates that the target file 501 is a derivation destination file stored with a file name different from that of the original file 541. In this case, the analysis unit 340 may estimate that the user operation is “derivation”.

以下、Ｒｅａｄ操作に関するログが、アクセスログリスト５３３（第３アクセスログリスト）に含まれない場合（ステップＳ１２０４においてＮＯ）について説明する。 Hereinafter, a case where the log related to the Read operation is not included in the access log list 533 (third access log list) (NO in step S1204) will be described.

この場合、解析部３０４は、対象ファイル５０１に関する原本ファイル５４１が検出されないと判定してよい（ステップＳ１２０７）。 In this case, the analysis unit 304 may determine that the original file 541 related to the target file 501 is not detected (step S1207).

解析部３４０は、他に可能操作５２３に他の要素（Ｅｄｉｔ以外の要素）が存在するか確認する（ステップＳ１２０８）。 The analysis unit 340 checks whether there are other elements (elements other than Edit) in the other possible operations 523 (step S1208).

ステップＳ１２０８においてＮＯ（他の可能操作がない）場合、解析部３４０は、対象ファイル５０１は新たに作成されたファイルであると推定する（ステップＳ１２０９）。これにより、解析部３４０は、ユーザ操作５３５が「新規作成」であると推定してよい。 If NO in step S1208 (there is no other possible operation), the analysis unit 340 estimates that the target file 501 is a newly created file (step S1209). Thereby, the analysis unit 340 may estimate that the user operation 535 is “new creation”.

ステップＳ１２０８においてＹＥＳ（他の可能操作がある）場合、解析部３４０は、他の操作について確認する（ステップＳ１２１０）。 If YES in step S1208 (there is another possible operation), the analysis unit 340 confirms the other operation (step S1210).

以下、アプリケーション情報３３０におけるプロセス名５２２を含むログの可能操作５２３に「コピー（Ｃｏｐｙ）」の要素が含まれる場合について、図１３のフローチャートを参照して説明する。図１２のフローチャートと同様の処理については同様の参照符号を付すことで重複する説明を省略する。 Hereinafter, a case where the element “copy (Copy)” is included in the log possible operation 523 including the process name 522 in the application information 330 will be described with reference to the flowchart of FIG. 13. The same processes as those in the flowchart of FIG. 12 are denoted by the same reference numerals, and redundant description is omitted.

あるファイルについてコピーの操作が実行された場合、アプリケーションは、当該ファイルの内容を編集せずに、当該ファイルを別の配置場所に複製、移動したと考えられる。具体的な一例として、Ｗｉｎｄｏｗｓ（登録商標）におけるエクスプローラを用いたファイル操作を想定すると、ファイルを別フォルダにコピーもしくは移動する操作が行われたと考えられる。 When a copy operation is performed on a certain file, it is considered that the application has copied and moved the file to another location without editing the contents of the file. As a specific example, assuming a file operation using an Explorer in Windows (registered trademark), it is considered that an operation for copying or moving a file to another folder has been performed.

あるファイルがコピーされる場合、コピー先のファイルに関する書き込み操作が発生する前に、コピー元のファイルに対する読み込み操作が、コピー先のファイルと同程度のサイズで実行されていると考えられる。 When a certain file is copied, it is considered that the read operation for the copy source file is executed with the same size as the copy destination file before the write operation for the copy destination file occurs.

そこで、解析部３４０は、コピー元のファイルを追跡すべく、例えば、上記説明した可能操作に「Ｅｄｉｔ（編集）」の要素が含まれている場合と同様の処理を実行してよい（ステップＳ１２０１乃至ステップＳ１２０５）。これにより、解析部３４０は、アクセスログリスト５３３（第３アクセスログリスト）を生成し、最も新しいＲｅａｄ情報が含まれる候補ログ５３４を取得する。 Therefore, the analysis unit 340 may execute the same processing as in the case where the element “Edit (edit)” is included in the possible operation described above, for example, in order to track the copy source file (step S1201). To Step S1205). Thereby, the analysis unit 340 generates an access log list 533 (third access log list), and acquires a candidate log 534 including the latest Read information.

解析部３４０は、抽出した候補ログ５３４のうち、ログに記録されたサイズ（２３０ｄ）が、書き込みログ５１２のサイズに一致するデータを抽出する（ステップＳ１３０１）。即ち、解析部３４０は、ディスクに対する書き込み操作が発生する前に実行された読み込み操作に関するログうち、書き込まれたサイズに合致するサイズの読み込み操作のログを抽出する。以下、抽出した読み込み操作に関するログを、複製元候補ログと記載する場合がある。 The analysis unit 340 extracts, from the extracted candidate log 534, data whose size (230d) recorded in the log matches the size of the write log 512 (step S1301). That is, the analysis unit 340 extracts a log of a read operation having a size that matches the written size from among the logs related to the read operation executed before the write operation to the disk occurs. Hereinafter, the extracted log related to the read operation may be referred to as a replication source candidate log.

これにより、解析部３４０は、対象ファイル５０１の更新元となった原本ファイル５４１のファイル名が、複製元候補ログに記載されたファイル名であると推定できる。 Accordingly, the analysis unit 340 can estimate that the file name of the original file 541 that is the update source of the target file 501 is the file name described in the copy source candidate log.

解析部３４０は、ファイルアクセスログ２３０を参照し、複製元候補ログのアクセス時間より後に、当該複製元候補ログに記載されたファイル名のファイルに関するＤｅｌｅｔｅ操作のログが存在するか否かを判定する（ステップＳ１３０２）。具体的には、解析部３４０は、複製元候補ログのアクセス時間から、所定の削除基準時間以内に、当該複製元候補ログに記載されたファイル名のファイルに関するＤｅｌｅｔｅ操作のログが存在するか否かを判定してもよい。係る削除基準時間は、適宜定められてよい。係る削除基準時間は、例えば、ある基準に基づいて定められた、十分に短い一定の時間であってもよい。 The analysis unit 340 refers to the file access log 230 and determines whether there is a delete operation log related to the file with the file name described in the replication source candidate log after the access time of the replication source candidate log. (Step S1302). Specifically, the analysis unit 340 determines whether there is a delete operation log related to the file having the file name described in the replication source candidate log within a predetermined deletion reference time from the access time of the replication source candidate log. It may be determined. Such deletion reference time may be determined as appropriate. The deletion reference time may be a sufficiently short fixed time determined based on a certain reference, for example.

ステップＳ１３０２においてＹＥＳの場合、解析部３４０は、対象ファイル５０１に関する原本ファイル５４１が削除されたと推定し、ユーザ操作５３５について「移動」操作であると判定する（ステップＳ１３０３）。また、ステップＳ１３０２においてＮＯの場合、解析部３４０は、対象ファイル５０１は、原本ファイル５４１からコピーされたと推定し、ユーザ操作５３５について「コピー」操作であると判定する（ステップＳ１３０４）。 If YES in step S1302, the analysis unit 340 estimates that the original file 541 related to the target file 501 has been deleted, and determines that the user operation 535 is a “move” operation (step S1303). If NO in step S1302, the analysis unit 340 estimates that the target file 501 has been copied from the original file 541 and determines that the user operation 535 is a “copy” operation (step S1304).

以上、解析部３４０が、ユーザ操作５３５について、「コピー」、「更新」、「新規作成」、「派生」と判定する方法について説明した。解析部３４０は、上記以外の「リネーム（Ｒｅｎａｍｅ）」、「削除（Ｄｅｌｅｔｅ）」の操作については、ディスク操作ログ２４０を参照して、直接的にこれらの操作を特定可能である。この場合、解析部３４０は、ディスク操作ログ２４０の内容から、ユーザ操作５３５を特定可能である。解析部３４０は、ユーザ操作が「リネーム」である場合の原本ファイル５４１を、ディスク操作ログ２４０の内容から特定可能である。 As described above, the method in which the analysis unit 340 determines that the user operation 535 is “copy”, “update”, “new creation”, and “derivation” has been described. The analysis unit 340 can directly identify these operations for “rename” and “delete” operations other than those described above with reference to the disk operation log 240. In this case, the analysis unit 340 can specify the user operation 535 from the contents of the disk operation log 240. The analysis unit 340 can identify the original file 541 when the user operation is “rename” from the contents of the disk operation log 240.

解析部３４０は、対象ファイル５０１に関する原本ファイル５４１を特定できた場合、操作履歴情報３２０にファイルの更新情報を記載する。 When the original file 541 related to the target file 501 can be identified, the analysis unit 340 describes the file update information in the operation history information 320.

解析部３４０は、上記推定結果と、収集した各種ログを用いて、操作履歴情報３２０に以下のデータを記録することができる。即ち、操作履歴情報３２０の操作時間３２０ａには、対象ファイル５０１の操作時間（対象ファイル５０１のアクセス時間）が設定されてよい。ファイル名３２０ｂには、対象ファイル５０１のファイル名が設定されてよい。ユーザ操作３２０ｃは、上記推定したユーザ操作５３５が設定されてよい。操作元ホスト３２０ｄには、対象ファイル５０１に関する操作を実行したクライアント２００のホスト名等が設定されてよい。操作元ファイル３２０ｅには、上記処理において推定された候補ログ５３４のファイル名が設定されてよい。操作ユーザ３２０ｆには、プロセス状態情報２６０から抽出した実行ユーザ２６０ｃが設定されてよい。解析部３４０は、例えば、対象ファイル５０１の操作時間の近傍で、プロセスＩＤ５２１に関連付けて登録された実行ユーザ２６０ｃのデータを、操作ユーザ３２０ｆに登録してよい。 The analysis unit 340 can record the following data in the operation history information 320 using the estimation result and the collected various logs. That is, the operation time of the target file 501 (access time of the target file 501) may be set as the operation time 320a of the operation history information 320. As the file name 320b, the file name of the target file 501 may be set. As the user operation 320c, the estimated user operation 535 may be set. In the operation source host 320d, a host name or the like of the client 200 that has performed an operation related to the target file 501 may be set. The file name of the candidate log 534 estimated in the above process may be set in the operation source file 320e. The execution user 260c extracted from the process state information 260 may be set as the operation user 320f. For example, the analysis unit 340 may register the data of the execution user 260c registered in association with the process ID 521 in the vicinity of the operation time of the target file 501 with the operation user 320f.

解析部３４０は、例えば、特定した原本ファイル５４１を対象ファイル５０１におきかえて（即ち、特定した原本ファイルを新たな対象ファイルとして）、上記処理を再帰的に実行してよい。これにより、操作履歴情報３２０に情報が追記され、これを順に参照することで、ファイルの変更元履歴を追跡することができる。 The analysis unit 340 may, for example, replace the identified original file 541 with the target file 501 (that is, specify the identified original file as a new target file) and execute the above processing recursively. As a result, information is added to the operation history information 320, and the change source history of the file can be tracked by sequentially referring to the information.

上記のように構成された本実施形態における履歴解析装置３００は、例えば、アプリケーションによる操作ログが取得できない環境であっても、ファイルの更新履歴の追跡が可能となる。なぜならば、履歴解析装置３００（特に解析部３４０）は、クライアント２００に記録された各種ログを用いて、対象ファイル５０１を操作したアプリケーションと、その操作とを特定できるからである。具体的には、履歴解析装置３００は、例えば、対象ファイル５０１のアクセス時間を参照して、ファイルアクセスログ２３０、ディスク操作ログ２４０、プロセス状態情報２６０等から、対象ファイル５０１に関する操作を実行したアプリケーションを抽出する。そして、当該アプリケーションが実行可能な操作、及び、当該アプリケーションが実行したファイル操作のログ等から、対象ファイル５０１に関する原本ファイル５４１と、アプリケーションにより実行されたユーザ操作と、を特定することができる。 The history analysis apparatus 300 according to the present embodiment configured as described above can track the update history of a file even in an environment where an operation log by an application cannot be acquired. This is because the history analysis apparatus 300 (particularly the analysis unit 340) can identify the application that operated the target file 501 and the operation using various logs recorded in the client 200. Specifically, the history analysis apparatus 300 refers to, for example, the access time of the target file 501 and executes an operation related to the target file 501 from the file access log 230, the disk operation log 240, the process state information 260, and the like. To extract. The original file 541 related to the target file 501 and the user operation executed by the application can be specified from the operation that can be executed by the application and the log of the file operation executed by the application.

履歴解析装置３００は、対象ファイル５０１の更新履歴を遡ることにより、当該対象ファイル５０１の元になったオリジナルデータを検出可能である。その理由は、履歴解析装置３００は、上記したように、ある対象ファイル５０１に関する原本ファイル５４１を特定可能であるからである。即ち、履歴解析装置３００は、係る処理を繰り返し（再帰的に）実行することにより、ある対象ファイル５０１に関する原本ファイル５４１を順次追跡することができる。 The history analysis apparatus 300 can detect the original data that is the source of the target file 501 by tracing the update history of the target file 501. This is because the history analysis apparatus 300 can specify the original file 541 related to a certain target file 501 as described above. That is, the history analysis apparatus 300 can sequentially track the original file 541 related to a certain target file 501 by repeatedly (recursively) executing such processing.

履歴解析装置３００は、対象ファイル５０１に関する原本ファイル５４１を特定することで、原本ファイル５４１を操作（例えば、新規作成、編集等）したユーザを特定可能である。 The history analysis apparatus 300 can specify a user who has operated (for example, newly created or edited) the original file 541 by specifying the original file 541 related to the target file 501.

以上より、本実施形態における履歴解析装置３００によれば、異なる装置（例えば、ファイルサーバ１００と、クライアント２００）にそれぞれ配置されたファイル（例えば、対象ファイル５０１、原本ファイル５４１）に関するログ（サーバアクセスログ１２０、ファイルアクセスログ２３０、ディスク操作ログ２４０等）に基づいて、それらのファイルに関する派生関係を特定できる。 As described above, according to the history analysis apparatus 300 in the present embodiment, logs (server access) regarding files (for example, the target file 501 and the original file 541) respectively arranged in different apparatuses (for example, the file server 100 and the client 200). Log 120, file access log 230, disk operation log 240, etc.), the derivation relationship for these files can be specified.

＜第１の変形例＞
以下、上記説明した第１の実施形態に係る第１の変形例（以下、変形例１と記載する場合がある）について説明する。本変形例は、通信ネットワークからのダウンロード（例えば、Ｗｅｂサーバからのダウンロード）等の操作が含まれる場合に対応する。 <First Modification>
Hereinafter, the first modified example (hereinafter sometimes referred to as modified example 1) according to the first embodiment described above will be described. This modification corresponds to a case where an operation such as downloading from a communication network (for example, downloading from a Web server) is included.

本変形例における履歴解析装置３００を実現可能な装置構成は、上記第１の実施形態と同様としてよい。本変形例における履歴解析装置３００は、以下に説明する通り、上記第１の実施形態における履歴解析装置３００とは動作が異なる。本変形例における履歴解析装置３００は、ネットワーク操作ログ２５０及びネットワーク状態情報２７０を参照する。 The apparatus configuration capable of realizing the history analysis apparatus 300 in the present modification may be the same as that in the first embodiment. As will be described below, the history analysis apparatus 300 in the present modification is different in operation from the history analysis apparatus 300 in the first embodiment. The history analysis apparatus 300 in this modification refers to the network operation log 250 and the network state information 270.

本変形例においても、図１０に例示した、対象ファイル５０１を操作するアプリケーションに関する可能操作（可能操作５２３）を取得する処理は、上記第１の実施形態と同様としてよい。 Also in the present modification, the process of acquiring the possible operation (the possible operation 523) related to the application that operates the target file 501 illustrated in FIG. 10 may be the same as that in the first embodiment.

以下、アプリケーション情報３３０のプロセス名５２２における可能操作にＤｏｗｎｌｏａｄの要素が含まれる場合について説明する。この場合、クライアント２００においては、例えば、ネットワーク４００経由で、他のネットワーク４１０からデータがダウンロードされている可能性がある。係るダウンロードは、例えば、Ｗｅｂブラウザを用いたダウンロード等、現在は一般的な方法により実行され得る。ダウンロード操作により、ネットワーク４００からデータが受信されることから、一般的に、ダウンロード処理を実行した後（例えば直後）に、ダウンロードしたデータのサイズと同程度のサイズで、ファイルに対する書き込みが発生していると考えられる。そこで、履歴解析装置３００（解析部３４０）は、対象ファイル５０１に関する書き込み（Ｗｒｉｔｅ）操作の前に実行された派生元操作として、ある特定の条件を満たすダウンロード（Ｒｅａｄ）操作を特定する。そして、履歴解析装置３００は、当該ダウンロード操作により受信したデータを、対象ファイル５０１に関する原本ファイル５４１と推定する。 Hereinafter, a case where a download element is included in the possible operations in the process name 522 of the application information 330 will be described. In this case, in the client 200, for example, data may be downloaded from another network 410 via the network 400. Such a download can be executed by a general method such as a download using a web browser. Since data is received from the network 400 by the download operation, generally, after executing the download process (for example, immediately after), writing to the file is generated with the same size as the size of the downloaded data. It is thought that there is. Therefore, the history analysis apparatus 300 (analysis unit 340) specifies a download operation that satisfies a specific condition as a derivation source operation executed before a write operation related to the target file 501. Then, the history analysis apparatus 300 estimates the data received by the download operation as an original file 541 related to the target file 501.

解析部３４０は、ダウンロードしたファイルを追跡すべく、上記第１の実施形態において、可能操作５２３に「編集」の要素が含まれている場合と似た処理を実行する。以下、係る解析部３４０の処理について、図１４に例示するフローチャートを参照して説明する。 In order to track the downloaded file, the analysis unit 340 executes a process similar to the case where the element “edit” is included in the possible operation 523 in the first embodiment. Hereinafter, the processing of the analysis unit 340 will be described with reference to the flowchart illustrated in FIG.

解析部３４０は、上記説明したステップＳ１２０１と同様の処理により、対象ファイル５０１と同じファイル名のファイルについて書き込み処理を実行したプロセス（プロセスＩＤ５１２）の起動時間（プロセス起動時間）を抽出する（ステップＳ１４０１）。 The analysis unit 340 extracts the activation time (process activation time) of the process (process ID 512) that executed the write process for the file having the same file name as the target file 501 by the same process as in step S1201 described above (step S1401). ).

解析部３４０は、プロセス起動時間から書き込みログ５１２のアクセス時間までの間に取得されたネットワーク状態情報２７０を参照し、プロセスＩＤ（２７０ｂ）にプロセスＩＤ５２１（操作プロセスＩＤ）が記録されたデータを抽出する。プロセスＩＤ５２１は、上記したように、クライアント２００において、対象ファイル５０１に関する操作を実行したプロセスに関するプロセスＩＤである。 The analysis unit 340 refers to the network state information 270 acquired between the process activation time and the access time of the write log 512, and extracts data in which the process ID 521 (operation process ID) is recorded in the process ID (270b). To do. As described above, the process ID 521 is a process ID related to a process in which an operation related to the target file 501 is executed in the client 200.

解析部３４０は、例えば、ネットワーク操作ログ２５０に記録されたログのうち、プロセス起動時間５３１から書き込みログ５１２のアクセス時間の範囲で以下のログを抽出する。即ち、解析部３４０は、例えば、ネットワーク操作ログ２５０から、プロセスＩＤ５２１のプロセスが使用するポート番号と、アクセス先ポート番号２５０ｅが一致するログを抽出してよい。解析部３４０は、例えば、アプリケーション名（３３０ａ）がプロセス名５２２に一致するデータがアプリケーション情報３３０に登録されている場合、そのポート番号（３３０ｄ）と、アクセス先ポート番号２５０ｅとが一致するログを抽出してよい。解析部３４０は、抽出したログを含むネットワークログリスト５６０を生成する（ステップＳ１４０２）。ネットワークログリスト５６０は、ネットワーク操作ログ２５０と同じ構成としてよい。以下、ネットワークログリスト５６０のうちの一行（一つログ）を、ネットワークログ５６１と記載する。 For example, the analysis unit 340 extracts the following logs from the process activation time 531 to the access time of the write log 512 from the logs recorded in the network operation log 250. That is, for example, the analysis unit 340 may extract a log in which the port number used by the process with the process ID 521 matches the access destination port number 250e from the network operation log 250. For example, when data whose application name (330a) matches the process name 522 is registered in the application information 330, the analysis unit 340 records a log where the port number (330d) matches the access destination port number 250e. May be extracted. The analysis unit 340 generates a network log list 560 including the extracted logs (step S1402). The network log list 560 may have the same configuration as the network operation log 250. Hereinafter, one line (one log) in the network log list 560 is referred to as a network log 561.

解析部３４０は、ネットワークログリスト５６０から、対象ファイル５０１のアクセス時間に近い順に、ネットワークログリスト５６０内の各ネットワークログ５６１を選択する（ステップＳ１４０３）。 The analysis unit 340 selects each network log 561 in the network log list 560 from the network log list 560 in the order close to the access time of the target file 501 (step S1403).

解析部３４０は、選択したネットワークログが、Ｄｏｗｎｌｏａｄ操作に関するログであるか否かを判定する（ステップＳ１４０４）解析部３４０は、例えば、操作対象ファイルに実行された操作（Ｗｒｉｔｅ操作）に応じて、派生元ファイルに対して実行され得る操作（派生元操作）として、Ｄｏｗｎｌｏａｄ（ダウンロード）操作を特定し、ネットワークログリスト５６０から選択したネットワークログ５６１が、Ｄｏｎｗｌｏａｄ操作のログであるか否かを確認する。解析部３４０は、ネットワークログにおける送受信方向（２５０ｃ）を参照して、当該ログがＤｏｗｎｌｏａｄ操作に関するログであるか否かを判定可能である。 The analysis unit 340 determines whether or not the selected network log is a log related to a download operation (step S1404). The analysis unit 340, for example, according to an operation (Write operation) performed on the operation target file, As an operation (derivation source operation) that can be executed on the derivation source file, a Download operation is specified, and it is confirmed whether or not the network log 561 selected from the network log list 560 is a log of the Donload operation. . The analysis unit 340 can determine whether or not the log is a log related to the download operation with reference to the transmission / reception direction (250c) in the network log.

以下、ステップＳ１４０４においてＹＥＳの場合について説明する。 Hereinafter, the case of YES in step S1404 will be described.

この場合、解析部３４０は、Ｄｏｎｗｌｏａｄ操作によりデータを受信した際の通信プロトコルに応じて、ネットワークログに設定されたサイズ（２５０ｆ）を補正してよい（ステップＳ１４０５）。通信プロトコルによっては、ヘッダ情報やエンコーディングなどにより、ダウンロードしたデータサイズと、ファイルに書き込まれるデータサイズとが異なる場合がありえるからである。解析部３４０は、例えば、通信プロトコルに関連付けされた適切な数式等を用いて、データサイズの補正を行ってもよい。以下、補正後のサイズを補正サイズと記載する場合がある。 In this case, the analysis unit 340 may correct the size (250f) set in the network log according to the communication protocol when data is received by the Donload operation (step S1405). This is because, depending on the communication protocol, the downloaded data size and the data size written to the file may differ depending on header information and encoding. The analysis unit 340 may correct the data size using, for example, an appropriate mathematical expression associated with the communication protocol. Hereinafter, the corrected size may be referred to as a corrected size.

解析部３４０は、ネットワークログ５６１に記録されたサイズ（又は補正サイズ）が、書き込みログ５１２に記録されたサイズと適合するか否かを判定する（ステップＳ１４０６）。解析部３４０は、例えば、ネットワークログ５６１に記録されたサイズ（又は、当該サイズ）と、書き込みログ５１２に記録されたサイズとが、一致する場合に、これらが適合すると判定してもよい。解析部３４０は、例えば、ネットワークログ５６１に記録されたサイズ（又は、当該サイズ）と、書き込みログ５１２に記録されたサイズとの差分が、ある所定の基準値（閾値）よりも小さい場合に、これらが適合すると判定してもよい。 The analysis unit 340 determines whether the size (or correction size) recorded in the network log 561 matches the size recorded in the write log 512 (step S1406). For example, when the size recorded in the network log 561 (or the size) matches the size recorded in the write log 512, the analysis unit 340 may determine that these match. For example, when the difference between the size (or the size) recorded in the network log 561 and the size recorded in the write log 512 is smaller than the predetermined reference value (threshold value), the analysis unit 340 You may determine that these match.

ステップＳ１４０３乃至ステップＳ１４０６において、解析部３４０は、例えば、以下のような処理を実行してもよい。即ち、解析部３４０は、ネットワークログリスト５６０の内、送受信方向２５０ｃが「Ｄｏｗｎｌｏａｄ」であり、サイズ（または補正サイズ）２５０ｆが対象ファイル５０１のサイズに適合し、対象ファイル５０１のアクセス時間に最も近いネットワークログ５６１を選択してよい。 In steps S1403 to S1406, the analysis unit 340 may execute the following processing, for example. In other words, the analysis unit 340 has a transmission / reception direction 250c of “Download” in the network log list 560, a size (or correction size) 250f that matches the size of the target file 501, and is closest to the access time of the target file 501. The network log 561 may be selected.

ステップＳ１４０６においてＹＥＳの場合、解析部３４０は、Ｄｏｗｎｌｏａｄ操作が実行されたと推定する（ステップＳ１４０７）。この場合、解析部３４０は、対象ファイル５０１に関する操作を実行したプロセス（プロセスＩＤ５２１に該当するプロセス）が、対象ファイル５０１に対する書き込みを実行する以前（書き込みログ５１２のアクセス時間以前）に、ネットワークからファイルをダウンロードしたと推定する。また、解析部３４０は、対象ファイル５０１の更新元となった原本ファイル５４１は、ネットワークログ５６１のアクセス先ホスト（２５０ｄ）から取得（受信）したデータであると推定してもよい。 If YES in step S1406, analysis unit 340 estimates that a download operation has been executed (step S1407). In this case, the analysis unit 340 transmits the file from the network before the process (the process corresponding to the process ID 521) that executed the operation on the target file 501 executes the writing to the target file 501 (before the access time of the write log 512). Estimated that you downloaded. The analysis unit 340 may estimate that the original file 541 that is the update source of the target file 501 is data acquired (received) from the access destination host (250d) of the network log 561.

ステップＳ１４０４又はステップＳ１４０６においてＮＯの場合（サイズ不適合）、解析部３４０は、ネットワークログリストに含まれる他のログについて同様の処理を実行する（ステップＳ１４０８乃至ステップＳ１４０４）。 If NO in step S1404 or step S1406 (size mismatch), the analysis unit 340 executes the same processing for other logs included in the network log list (steps S1408 to S1404).

ネットワークログリストに、適合するＤｏｗｎｌｏａｄのログがない場合（ステップＳ１４０８においてＮＯ）、解析部３４０は、Ｄｏｗｎｌｏａｄ操作は実行されていないと判定してもよい。その場合、解析部３４０は、他に可能操作があれば、その可能操作に関する処理を続行してもよい（ステップＳ１４０９乃至ステップＳ１４１１）。 If there is no compatible download log in the network log list (NO in step S1408), the analysis unit 340 may determine that the download operation has not been executed. In that case, if there is another possible operation, the analysis unit 340 may continue the processing related to the possible operation (steps S1409 to S1411).

なお、本変形例における履歴解析装置３００が実行する上記以外の処理は、例えば、上記第１の実施形態と同様としてよい。 Note that processes other than those executed by the history analysis apparatus 300 in the present modification may be the same as those in the first embodiment, for example.

上記のように構成された本変形例における履歴解析装置３００は、対象ファイル５０１に関する原本ファイル５４１が、通信ネットワークを介してダウンロードされたデータであるか否かを推定可能である。また、原本ファイル５４１が通信ネットワークを介してダウンロードされたデータである場合、そのダウンロード元を特定可能である。なぜならば、履歴解析装置３００は、原本ファイル５４１を操作したアプリケーション（プロセスＩＤ５２１に相当するアプリケーション）により、原本ファイル５４１のサイズ（書き込みログ５１２のサイズ）に合致するサイズのデータがダウンロードされたことを特定可能だからである。 The history analysis apparatus 300 according to this modification configured as described above can estimate whether or not the original file 541 related to the target file 501 is data downloaded via a communication network. When the original file 541 is data downloaded via a communication network, the download source can be specified. This is because the history analysis apparatus 300 downloads data having a size that matches the size of the original file 541 (the size of the write log 512) by the application that operated the original file 541 (the application corresponding to the process ID 521). This is because it can be identified.

上記のような本変形例における履歴解析装置３００を用いることにより、例えば、原本ファイル５４１のダウンロード元が外部サイトなどであった場合に発生し得る問題を確認することができる。 By using the history analysis apparatus 300 in this modification as described above, for example, a problem that may occur when the original file 541 is downloaded from an external site or the like can be confirmed.

また、本変形例における履歴解析装置３００は、上記第１の実施形態と同様、原本ファイル５４１を特定し可能である。また、本変形例における履歴解析装置３００は、原本ファイル５４１を操作したユーザを特定することが可能である。 Further, the history analysis apparatus 300 in this modification can specify the original file 541 as in the first embodiment. In addition, the history analysis apparatus 300 according to this modification can specify the user who has operated the original file 541.

＜第２の変形例＞
以下、上記第１の実施形態に関する第２の変形例（以下、変形例２と記載する場合がある）について説明する。本変形例における履歴解析装置３００は、あるファイルから派生した派生先ファイルを検出可能である。派生先ファイルは、例えば、あるファイルに基づいて適切な方法（例えば、編集、コピー、別名保存等）により生成されたファイルであってよい。
本変形例における履歴解析装置３００を実現可能な装置構成は、図１５に例示するように、上記第１の実施形態及び変形例１と同様としてよい。本変形例における履歴解析装置３００は、以下に説明する通り、上記第１の実施形態とは動作が異なる。なお、図１５において、図１と同様の構成については同様の参照符号を付することにより、詳細な説明を省略する。 <Second Modification>
Hereinafter, a second modification example (hereinafter, may be referred to as modification example 2) related to the first embodiment will be described. The history analysis apparatus 300 in this modification can detect a derivation destination file derived from a certain file. The derivation destination file may be a file generated by an appropriate method (for example, editing, copying, alias saving, etc.) based on a certain file, for example.
An apparatus configuration capable of realizing the history analysis apparatus 300 in the present modification may be the same as that in the first embodiment and Modification 1 as illustrated in FIG. As will be described below, the history analysis apparatus 300 in the present modification is different in operation from the first embodiment. In FIG. 15, the same components as those in FIG. 1 are denoted by the same reference numerals, and detailed description thereof is omitted.

以下、本変形例において、履歴解析装置３００が、ファイルサーバ１００の共有記憶装置１１０に格納された対象ファイル７０１に関する派生先ファイル７０２と、ファイル７０１に関する操作履歴とを推定する処理について説明する。なお、図１５においては、説明の便宜上、派生先ファイル７０２はクライアント２００に配置されているが、派生先ファイル７０２はファイルサーバ１００に配置されてもよい。 Hereinafter, in the present modification, a process in which the history analysis apparatus 300 estimates the derivation destination file 702 related to the target file 701 stored in the shared storage device 110 of the file server 100 and the operation history related to the file 701 will be described. In FIG. 15, for convenience of explanation, the derivation destination file 702 is arranged in the client 200, but the derivation destination file 702 may be arranged in the file server 100.

図１６は、本変形例における履歴解析装置３００の動作の一例を示すフローチャートである。なお、図１６において、図１０と同様の処理（ステップ）については同様の参照符号を付すことで、詳細な説明を省略する。履歴解析装置３００におけるログ取得部３１０の動作は上記第１の実施形態及び第１の変形例と同様としてよいので、詳細な説明を省略する。 FIG. 16 is a flowchart showing an example of the operation of the history analysis apparatus 300 in the present modification. In FIG. 16, processes (steps) similar to those in FIG. 10 are denoted by the same reference numerals, and detailed description thereof is omitted. Since the operation of the log acquisition unit 310 in the history analysis apparatus 300 may be the same as that in the first embodiment and the first modification example, detailed description thereof is omitted.

解析部３４０は、上記第１の実施形態と同様、図１６におけるステップＳ１００１乃至Ｓ１００５を実行する。即ち、解析部３４０は、第１アクセスログ５０３のアクセス元（１２０ｂ）を参照し、該当するクライアント２００のファイルアクセスログ２３０を取得する。解析部３４０は、対象ファイル７０１と同一のファイル名を含むログを抽出して時系列順に並べたファイルアクセスログリスト５１１を作成する。 The analysis unit 340 executes steps S1001 to S1005 in FIG. 16 as in the first embodiment. That is, the analysis unit 340 refers to the access source (120b) of the first access log 503 and acquires the file access log 230 of the corresponding client 200. The analysis unit 340 creates a file access log list 511 in which logs including the same file name as the target file 701 are extracted and arranged in time series.

解析部３４０は、ファイルアクセスログリスト５１１から、Ｒｅａｄアクセスのみを抽出し、読み込みリスト５１３を作成する（ステップＳ１６０１）。係る読み込みリスト５１３は、クライアント２００において実行された、対象ファイル７０１からの読み込み操作のログを表す。以下、読み込みリスト５１３に含まれる一つのログを、読み込みログ５１４と記載する。 The analysis unit 340 extracts only Read access from the file access log list 511 and creates a read list 513 (step S1601). The read list 513 represents a log of a read operation from the target file 701 executed by the client 200. Hereinafter, one log included in the reading list 513 is referred to as a reading log 514.

解析部３４０は、例えば、読み込みログ５１４に記録された操作により読み込まれた内容（即ち、対象ファイル７０１から読み込まれた内容）が、別名保存やコピー処理等により、複製されているか否かを時系列順に確認する。 For example, the analysis unit 340 determines whether or not the content read by the operation recorded in the read log 514 (that is, the content read from the target file 701) has been duplicated by alias storage or copy processing. Check in sequence order.

解析部３４０は、例えば、上記第１の実施形態と同様に、読み込みログ５１４とプロセス状態情報２６０とから、読み込み操作を実行したアプリケーションのプロセスＩＤ５２１（操作プロセスＩＤ）、可能操作５２３の取得を行う（図１６のステップＳ１６０２乃至Ｓ１００９）。なお、ステップＳ１６０２において、解析部３４０は、例えば、ディスク操作ログ２４０から、読み込みログ５１４とアクセス時刻が一致するログを選択し、選択したログの内のプロセスＩＤ２４０ｂを取得してもよい。 For example, as in the first embodiment, the analysis unit 340 acquires the process ID 521 (operation process ID) and the possible operation 523 of the application that executed the read operation from the read log 514 and the process state information 260. (Steps S1602 to S1009 in FIG. 16). In step S1602, the analysis unit 340 may select, for example, a log whose access time matches the read log 514 from the disk operation log 240, and acquire the process ID 240b in the selected log.

解析部３４０は、上記第１の実施形態における処理と同様に、ファイル（対象ファイル７０１）を読み込んだプロセス（プロセスＩＤ５３１に該当するプロセス）による書き込み操作の有無を確認する。解析部３４０は、例えば、ディスク操作ログ２４０に、対象ファイル７０１のアクセス時間よりも後に、プロセスＩＤ５３１に該当するプロセスによる書き込み操作のログが含まれるか否かを確認してもよい。これにより、解析部３４０は、例えば、対象ファイル７０１が、他のファイル等に書き込まれているか否かを確認することができる。 Similar to the processing in the first embodiment, the analysis unit 340 confirms the presence or absence of a write operation by the process (process corresponding to the process ID 531) that reads the file (target file 701). For example, the analysis unit 340 may check whether the disk operation log 240 includes a log of a write operation by the process corresponding to the process ID 531 after the access time of the target file 701. Thereby, the analysis part 340 can confirm whether the object file 701 is written in the other file etc., for example.

ステップＳ１００９において特定した可能操作５２３に「Ｅｄｉｔ（編集）」の要素が含まれる場合、クライアント２００において、ファイルの更新又は別名保存等により、派生先ファイルが生成されている可能性がある。 If the “Edit (edit)” element is included in the possible operation 523 specified in step S <b> 1009, there is a possibility that a derivation destination file has been generated in the client 200 by file update or alias saving.

解析部３４０は、プロセスＩＤ５３１に該当するプロセスによる書き込み操作が確認された場合、書き込み先ファイルのファイル名を確認する。書き込み先ファイルのファイル名が対象ファイル７０１と同じである場合、解析部３４０は、係る操作を「更新」操作として操作履歴情報３２０に記録してよい。書き込み先ファイルのファイル名が対象ファイル７０１と異なる場合、解析部３４０は、係る操作を「派生」操作として操作履歴情報３２０に記録してよい。 When the write operation by the process corresponding to the process ID 531 is confirmed, the analysis unit 340 confirms the file name of the write destination file. When the file name of the write destination file is the same as that of the target file 701, the analysis unit 340 may record the operation as the “update” operation in the operation history information 320. When the file name of the write destination file is different from that of the target file 701, the analysis unit 340 may record the operation as the “derivation” operation in the operation history information 320.

ステップＳ１００９において特定した可能操作５２３に「Ｃｏｐｙ」の要素が含まれる場合、対象ファイル７０１が、コピー又は移動された可能性がある。具体例として、Ｗｉｎｄｏｗｓ（登録商標）におけるエクスプローラのファイル操作を想定する。この場合、対象ファイル７０１は、ファイルの中身を編集せずに、別フォルダにコピーもしくは移動された可能性がある。 When the “Copy” element is included in the possible operation 523 specified in step S1009, the target file 701 may be copied or moved. As a specific example, an Explorer file operation in Windows (registered trademark) is assumed. In this case, the target file 701 may have been copied or moved to another folder without editing the contents of the file.

解析部３４０は、例えば、上記第１の実施形態と同様処理により、対象ファイル７０１のコピー又は移動を推定可能である。即ち、解析部３４０は、対象ファイル７０１に関する読み込み操作の後、当該ファイルについて削除（Ｄｅｌｅｔｅ）操作が実行されたか否かに基づいて、対象ファイル７０１のコピー又は移動を推定可能である。 The analysis unit 340 can estimate the copy or movement of the target file 701, for example, by the same processing as in the first embodiment. That is, the analysis unit 340 can estimate the copy or movement of the target file 701 based on whether or not a delete operation has been performed on the file after the read operation on the target file 701.

解析部３４０は、ファイル操作の推定結果を、操作履歴情報３２０に記録してよい。 The analysis unit 340 may record the estimation result of the file operation in the operation history information 320.

解析部３４０は、読み込みリスト５１３に含まれる全てのログについて上記処理を再帰的に行うことによって、対象ファイル７０１に関する派生先のファイル７０２を特定可能である。 The analysis unit 340 can specify the derivation destination file 702 related to the target file 701 by performing the above processing recursively for all the logs included in the reading list 513.

上記のように構成された本変形例における履歴解析装置３００によれば、派生元ファイル（対象ファイル７０１）から派生された派生先ファイル（派生先ファイル７０２）を特定可能である。これにより、以下のような効果が得られる。 According to the history analysis apparatus 300 in the present modification configured as described above, a derivation destination file (derivation destination file 702) derived from a derivation source file (target file 701) can be specified. Thereby, the following effects are obtained.

例えば、不適切な情報が含まれるファイルを見つけた場合、派生先ファイルを特定することで、不適切な情報が含まれるファイルが影響する範囲を特定可能である。一つの具体例として、本変形例における履歴解析装置３００を用いることで、対象ファイル７０１に誤った情報が含まれる場合に、修正が必要される派生先ファイル７０２の範囲を特定することができる。また、他の具体例として、ある対象ファイル７０１から派生先ファイル７０２に対して情報漏洩等が発生した場合、その漏洩範囲を特定可能である。 For example, when a file including inappropriate information is found, it is possible to specify a range affected by a file including inappropriate information by specifying a derivation destination file. As one specific example, by using the history analysis device 300 according to this modification, it is possible to specify the range of the derivation destination file 702 that needs to be corrected when the target file 701 includes incorrect information. As another specific example, when information leakage or the like occurs from a certain target file 701 to a derivation destination file 702, the leakage range can be specified.

また、本変形例における履歴解析装置３００によれば、対象ファイル７０１の派生先ファイル７０２を追跡することで、ある対象ファイル７０１に関する最新版を検索することができる。本変形例における履歴解析装置３００によれば、対象ファイル７０１の利用状況（例えば、派生先ファイル７０２の生成状況等）等を取得することができ、これをフォルダツリーの形などで可視化することが可能となる。本変形例における履歴解析装置３００によれば、セキュリティの確保を求められる領域からのデータ持ち出し状況を確認できる。例えば、セキュリティが確保された領域に配置された対象ファイル７０１から生成された派生先ファイル７０２の書き込み先を特定することにより、データの持ち出しを確認可能である。 Further, according to the history analysis apparatus 300 in this modification, the latest version related to a certain target file 701 can be searched by tracking the derivation destination file 702 of the target file 701. According to the history analysis apparatus 300 in the present modification, the usage status of the target file 701 (for example, the generation status of the derivation destination file 702) can be acquired and visualized in the form of a folder tree. It becomes possible. According to the history analysis apparatus 300 in the present modification, it is possible to check the data take-out situation from the area where security is required. For example, it is possible to confirm data take-out by specifying the write destination of the derivation destination file 702 generated from the target file 701 arranged in the area where security is ensured.

＜第３の変形例＞
以下、上記第１の実施形態に関する第３の変形例（以下、変形例３と記載する）について説明する。本変形例における履歴解析装置３００は、あるプロセス（アプリケーション）が複数のファイルを処理できる場合に、派生元ファイルから作成された派生先ファイルを検出可能である。 <Third Modification>
Hereinafter, a third modification example (hereinafter referred to as modification example 3) related to the first embodiment will be described. The history analysis apparatus 300 in this modification can detect a derivation destination file created from a derivation source file when a certain process (application) can process a plurality of files.

本変形例における履歴解析装置３００は、図１７に例示するように、上記第１の実施形態における履歴解析装置３００に対して、プロセスディスク操作ログ３５０、Ｒｅａｄ操作ログ３６０、Ｗｒｉｔｅ操作ログ３７０を更に有してよい。また、本変形例における履歴解析装置３００は、ファイルリスト３８０を作成してよい。履歴解析装置３００の他の装置構成は、上記第１の実施形態と同様としてよい。本変形例における履歴解析装置３００は、以下に説明する通り、上記第１の実施形態とは動作が異なる。 As illustrated in FIG. 17, the history analysis apparatus 300 according to the present modification further includes a process disk operation log 350, a Read operation log 360, and a Write operation log 370 with respect to the history analysis apparatus 300 according to the first embodiment. You may have. Further, the history analysis apparatus 300 in the present modification may create the file list 380. Other apparatus configurations of the history analysis apparatus 300 may be the same as those in the first embodiment. As will be described below, the history analysis apparatus 300 in the present modification is different in operation from the first embodiment.

プロセスディスク操作ログ３５０は、ディスク操作ログ２４０（図４）と同様の構成としてよい。 The process disk operation log 350 may have the same configuration as the disk operation log 240 (FIG. 4).

Ｒｅａｄ操作ログ３６０は、Ｒｅａｄ操作に関する情報を記録したログを表す。Ｒｅａｄ操作ログ３６０には、例えば、図１８に示すように、「操作ＩＤ（３６０ａ）、アクセス時間（３６０ｂ）、ファイル名（３６０ｃ）、サイズ（３６０ｄ）」の各要素が、時系列に記録される。Ｒｅａｄ操作ログ３６０を、「Ｒｅａｄ操作ログリスト」と記載する場合がある。 The Read operation log 360 represents a log in which information related to the Read operation is recorded. In the Read operation log 360, for example, as shown in FIG. 18, each element of “operation ID (360a), access time (360b), file name (360c), size (360d)” is recorded in time series. The The Read operation log 360 may be described as a “Read operation log list”.

操作ＩＤ３６０ａは、各Ｒｅａｄ操作を識別可能な識別情報（識別子）である。 The operation ID 360a is identification information (identifier) that can identify each Read operation.

アクセス時間３６０ｂは、読み込み（Ｒｅａｄ）操作が実行されたファイルへのアクセスが発生した時間を表す。アクセス時間３６０ｂは、Ｒｅａｄ操作が実行された時間であってもよい。 The access time 360b represents the time when access to a file for which a read operation has been executed occurs. The access time 360b may be a time when the Read operation is executed.

ファイル名３６０ｃは、Ｒｅａｄ操作が実行されたファイルの名称を表す。 The file name 360c represents the name of the file on which the Read operation is executed.

サイズ３６０ｄは、Ｒｅａｄ操作が実行されたファイルのサイズを表す。 The size 360d represents the size of the file on which the Read operation is executed.

Ｗｒｉｔｅ操作ログ３７０は、Ｗｒｉｔｅ操作に関する情報を記録したログを表す。Ｗｒｉｔｅ操作ログ３７０には、例えば、図１９に例示するように、「アクセス時間（３７０ａ）、ファイル名（３７０ｂ）、サイズ（３７０ｃ）、推定ＲｅａｄＩＤリスト（３７０ｄ）」の各要素が時系列に記録される。Ｗｒｉｔｅ操作ログ３７０を、「Ｗｒｉｔｅ操作ログリスト」と記載する場合がある。 The write operation log 370 represents a log in which information related to the write operation is recorded. In the write operation log 370, for example, as illustrated in FIG. 19, each element of “access time (370a), file name (370b), size (370c), estimated ReadID list (370d)” is recorded in time series. Is done. The write operation log 370 may be described as a “write operation log list”.

アクセス時間３７０ａは、書き込み（Ｗｒｉｔｅ）操作が実行されたファイルへのアクセスが発生した時間を表す。アクセス時間３７０ａは、Ｗｒｉｔｅ操作が実行された時間であってもよい。 The access time 370a represents a time when access to a file for which a write operation has been executed occurs. The access time 370a may be a time when the write operation is executed.

ファイル名３７０ｂは、Ｗｒｉｔｅ操作が実行されたファイルの名称を表す。 The file name 370b represents the name of the file for which the Write operation has been executed.

サイズ３７０ｃは、Ｗｒｉｔｅ操作が実行されたファイルのサイズを表す。 The size 370c represents the size of the file for which the Write operation has been executed.

推定ＲｅａｄＩＤリスト３７０ｄは、Ｗｒｉｔｅ操作が実行される前に実行されたＲｅａｄ操作を特定可能な情報を表す。推定ＲｅａｄＩＤリスト３７０ｄには、Ｒｅａｄ操作ログ３６０の操作ＩＤ３６０ａが１以上記録される。推定ＲｅａｄＩＤリスト３７０ｄを、「推定読み込み候補リスト」と記載する場合がある。 The estimated ReadID list 370d represents information that can identify the Read operation that was executed before the Write operation was executed. One or more operation IDs 360a of the Read operation log 360 are recorded in the estimated ReadID list 370d. The estimated ReadID list 370d may be described as an “estimated reading candidate list”.

ファイルリスト３８０は、ある派生元ファイルと、派生先ファイルとの間の類似度に関する情報を保持する。ファイルリスト３８０には、例えば、ある派生元ファイルに関する派生先ファイルの候補のリストが記録されてもよい。ファイルリスト３８０を、「候補データリスト」と記載する場合がある。ファイルリスト３８０は、例えば、図２０に示すように、「ファイル名（３８０ａ）、スコア（３８０ｂ）」の要素が、複数記載される。以下、ファイルリスト３８０を「候補ファイルリスト」と記載する場合がある。 The file list 380 holds information regarding the degree of similarity between a certain derivation source file and a derivation destination file. In the file list 380, for example, a list of derivation destination file candidates regarding a derivation source file may be recorded. The file list 380 may be described as a “candidate data list”. In the file list 380, for example, as shown in FIG. 20, a plurality of elements “file name (380a), score (380b)” are described. Hereinafter, the file list 380 may be referred to as a “candidate file list”.

ファイル名３８０ａは、ファイルを特定可能なファイル名（パス名等）を表す。ファイル名３８０ａには、例えば、ある派生元ファイルに関する派生先ファイルのファイル名が記録される。ファイル名３８０ａに記録されるデータは、具体的なファイル名に限定されない。ファイル名３８０ａには、例えば、「＜新規作成＞」を表すデータ等、派生元ファイルに関するスコアを設定可能な要素が適宜設定されてよい。 The file name 380a represents a file name (such as a path name) that can identify the file. In the file name 380a, for example, a file name of a derivation destination file relating to a certain derivation source file is recorded. The data recorded in the file name 380a is not limited to a specific file name. In the file name 380a, for example, an element capable of setting a score relating to the derivation source file such as data representing “<new creation>” may be set as appropriate.

スコア３８０ｂは、ファイル名３８０ａに記録されたファイルと、派生元ファイルとの間の類似度を表す。スコア３８０ｂには、類似度を表す数値（例えば”０”から”１”の範囲の数値等）が設定されてもよい。 The score 380b represents the similarity between the file recorded in the file name 380a and the derivation source file. In the score 380b, a numerical value indicating the degree of similarity (for example, a numerical value in the range of “0” to “1”) may be set.

以下、本変形例における履歴解析装置３００の動作について説明する。 Hereinafter, the operation of the history analysis apparatus 300 according to this modification will be described.

本変形例における履歴解析装置３００は、上記第１の実施形態と同様、図１０に例示する各処理を実行することで、対象ファイル５０１に関して、プロセスＩＤ５２１（操作プロセスＩＤ）及びアプリケーションの可能操作５２３を取得する。 Similar to the first embodiment, the history analysis apparatus 300 according to the present modified example executes each process illustrated in FIG. 10, so that the process ID 521 (operation process ID) and the application possible operation 523 are performed on the target file 501. To get.

以下、本変形例における履歴解析装置３００が派生先ファイルを特定する処理について、図２１Ａ、図２１Ｂに例示するフローチャートを参照して説明する。 Hereinafter, a process in which the history analysis apparatus 300 according to the present modification specifies a derivation destination file will be described with reference to flowcharts illustrated in FIGS. 21A and 21B.

アプリケーションの可能操作５２３に「Ｅｄｉｔ（編集）」が含まれる場合、ファイル改変を伴う読み書き操作（新規作成を含む）が実行されている可能性がある。 When “Edit (edit)” is included in the possible operation 523 of the application, there is a possibility that a read / write operation (including new creation) involving file modification is being executed.

履歴解析装置３００（特には解析部３４０）は、プロセスＩＤ５２１に該当するアプリケーションの生存期間（実行期間）を確認する。解析部３４０は、プロセス状態情報２６０を参照し、対象ファイル５０１に対する操作が行われた時間（アクセス時間）の前後で、プロセスＩＤ５２１が記録された最も古い（過去の）時間と、最も新しい時間と、を取得する（ステップＳ２１０１）。以下、プロセスＩＤ５２１が記録された最も古い時間をプロセス開始時間６０１、最新の時間をプロセス終了時間６０２と記載する。即ち、プロセス開始時間６０１と、プロセス終了時間６０２とは、プロセスＩＤ５２１の生存期間（実行期間）を表す。 The history analysis apparatus 300 (in particular, the analysis unit 340) confirms the lifetime (execution period) of the application corresponding to the process ID 521. The analysis unit 340 refers to the process state information 260, and before and after the time (access time) when the operation on the target file 501 is performed, the oldest (past) time when the process ID 521 is recorded, the newest time, Are acquired (step S2101). Hereinafter, the oldest time when the process ID 521 is recorded is described as a process start time 601 and the latest time is described as a process end time 602. That is, the process start time 601 and the process end time 602 represent the lifetime (execution period) of the process ID 521.

解析部３４０は、ディスク操作ログ２４０から、アクセス時間（２４０ａ）がプロセス開始時間６０１からプロセス終了時間６０２までの間に含まれるとともに、プロセスＩＤ（２４０ｂ）がプロセスＩＤ５２１に該当する全操作のログ（プロセスディスク操作ログ３５０）を抽出する（ステップＳ２１０２）。以下、抽出したプロセスディスク操作ログ３５０を、「プロセス全操作ログリスト」と記載する場合がある。 The analysis unit 340 includes, from the disk operation log 240, an access time (240a) included between the process start time 601 and the process end time 602, and a process ID (240b) of all operations corresponding to the process ID 521 ( The process disk operation log 350) is extracted (step S2102). Hereinafter, the extracted process disk operation log 350 may be described as a “process all operation log list”.

解析部３４０は、プロセスディスク操作ログ３５０のうち、ファイル操作が「Ｒｅａｄ操作」であるアクセスに関するログを抽出する。解析部３４０は、抽出したログを時系列順に並べて時系列順に各ログに操作ＩＤを付与し、Ｒｅａｄ操作ログ３６０として記録する（ステップＳ２１０３）。 The analysis unit 340 extracts, from the process disk operation log 350, a log related to an access whose file operation is “Read operation”. The analysis unit 340 arranges the extracted logs in chronological order, assigns an operation ID to each log in chronological order, and records it as a Read operation log 360 (step S2103).

解析部３４０は、プロセスディスク操作ログ３５０のうち、アクセス時間が対象ファイル５０１の操作以降の時間であり、ファイル操作が「Ｗｒｉｔｅ操作」であるアクセスに関するログを抽出し、Ｗｒｉｔｅ操作ログ３７０として記録する（ステップＳ２１０４）。 The analysis unit 340 extracts, from the process disk operation log 350, a log related to access in which the access time is the time after the operation of the target file 501, and the file operation is “Write operation”, and records it as a Write operation log 370. (Step S2104).

例えば、あるファイル（派生元ファイル）について別名保存が実行される場合を想定する。この場合、派生元ファイルに対する読み込みアクセスが行われた後に、派生先ファイルに対する書き込みアクセスが行われると考えられる。 For example, assume that alias saving is executed for a certain file (derivation source file). In this case, it is considered that after the read access to the derivation source file is performed, the write access to the derivation destination file is performed.

解析部３４０は、Ｗｒｉｔｅ操作ログ３７０に記録されたＷｒｉｔｅ操作に対して、Ｒｅａｄ操作ログ３６０から、当該Ｗｒｉｔｅ操作のアクセス時間（３７０ａ）よりも前に実行されたＲｅａｄ操作を列挙する。解析部３４０は、あるＷｒｉｔｅ操作に関して列挙されたＲｅａｄ操作に関する操作ＩＤ（３６０ａ）のリストを、当該Ｗｒｉｔｅ操作の推定ＲｅａｄＩＤリスト３７０ｄに転記する（ステップＳ２１０５）。解析部３４０は、上記処理を、Ｗｒｉｔｅ操作ログ３７０に記録された全てのＷｒｉｔｅ操作について実行してもよい。解析部３４０は、Ｗｒｉｔｅ操作ログ３７０のアクセス時間が古い順に上記処理を行い、抽出されたＲｅａｄ操作のファイル名が重複する場合は、最新の操作ＩＤに関するＲｅａｄ操作のログのみを残す。これにより、同じファイルが重複してＲｅａｄＩＤリストに登録されることを防ぐ。 For the write operation recorded in the write operation log 370, the analysis unit 340 lists the read operations executed before the access time (370a) of the write operation from the read operation log 360. The analysis unit 340 transcribes a list of operation IDs (360a) related to Read operations listed for a certain Write operation to the estimated ReadID list 370d of the Write operation (Step S2105). The analysis unit 340 may execute the above process for all write operations recorded in the write operation log 370. The analysis unit 340 performs the above-described processing in order of the access time of the write operation log 370 in order from the oldest, and when the file name of the extracted read operation is duplicated, only the read operation log related to the latest operation ID is left. This prevents duplicate registration of the same file in the ReadID list.

解析部３４０は、書き込みファイルの重複を防ぐため、Ｗｒｉｔｅ操作ログ３７０に、ファイル名が共通するＷｒｉｔｅ操作が重複して記録されている場合、最も新しいＷｒｉｔｅ操作に関するログのみを残す（ステップＳ２１０６）。 In order to prevent duplication of the write file, the analysis unit 340 leaves only the log related to the newest write operation when the write operation having the same file name is recorded in the write operation log 370 (step S2106).

上記処理により、Ｗｒｉｔｅ操作ログ３７０に記録されたＷｒｉｔｅ操作について、派生元ファイルに対するＲｅａｄ操作の候補が、推定ＲｅａｄＩＤリストとして取得される。 As a result of the above processing, candidates for the Read operation for the derivation source file for the Write operation recorded in the Write operation log 370 are acquired as an estimated ReadID list.

対象ファイル５０１に関する操作が読み込み操作であった場合、解析部３４０は、Ｗｒｉｔｅ操作ログ３７０のうち、推定ＲｅａｄＩＤリストが記録されたログを列挙する（ステップＳ２１０８）。この場合、解析部３４０は、具体例として、以下のような処理を実行してもよい。即ち、解析部３４０は、対象ファイル５０１に関する読み込み（Ｒｅａｄ）操作に付与された操作ＩＤが推定ＲｅａｄＩＤリストに含まれるＷｒｉｔｅ操作のログを、Ｗｒｉｔｅ操作ログ３７０から抽出してもよい。解析部３４０は、抽出されたＷｒｉｔｅ操作のログに記録されたファイル名（３７０ｂ）を取得することで、書き込みが行われた可能性のあるファイルリスト３８０を作成する（ステップＳ２１０９）。換言すると、この場合、ファイルリスト３８０に登録されたファイル名は、プロセスＩＤ５２１に該当するプロセスにより、１以上のファイルに対するＲｅａｄ操作が実行された後、Ｗｒｉｔｅ操作が実行されたファイルを表す。 If the operation related to the target file 501 is a read operation, the analysis unit 340 lists the logs in which the estimated ReadID list is recorded in the write operation log 370 (step S2108). In this case, the analysis unit 340 may execute the following process as a specific example. That is, the analysis unit 340 may extract a write operation log in which the operation ID given to the read operation regarding the target file 501 is included in the estimated ReadID list from the write operation log 370. The analysis unit 340 acquires the file name (370b) recorded in the extracted write operation log, thereby creating a file list 380 that may have been written (step S2109). In other words, in this case, the file name registered in the file list 380 represents a file in which a Write operation is executed after a Read operation for one or more files is executed by a process corresponding to the process ID 521.

対象ファイル５０１が書き込み操作であった場合、解析部３４０は、Ｗｒｉｔｅ操作ログ３７０のうち、対象ファイル５０１とファイル名が一致するログを抽出し、当該ログに記録された推定ＲｅａｄＩＤリストを取得する（ステップＳ２１１０）。 When the target file 501 is a write operation, the analysis unit 340 extracts a log whose file name matches the target file 501 from the write operation log 370 and acquires an estimated ReadID list recorded in the log ( Step S2110).

解析部３４０は、推定ＲｅａｄＩＤリストに含まれる操作ＩＤと、Ｒｅａｄ操作ログ３６０とを照らし合わせ、当該操作ＩＤに関するファイル名を取得することで、ファイルリスト３８０を作成する（ステップＳ２１１１）。換言すると、この場合、ファイルリスト３８０に登録されたファイル名は、プロセスＩＤ５２１に該当するプロセスにより、対象ファイル５０１に書き込み操作が実行される前に、読み込み操作が実行されたファイルを表す。 The analysis unit 340 creates the file list 380 by comparing the operation ID included in the estimated ReadID list with the Read operation log 360 and acquiring the file name related to the operation ID (step S2111). In other words, in this case, the file name registered in the file list 380 represents a file in which the read operation is executed before the write operation is executed on the target file 501 by the process corresponding to the process ID 521.

上記処理において、ファイルリスト３８０が作成されない場合は、解析部３４０は、対象ファイル５０１について、新規作成されたファイルとして扱ってもよい。 In the above process, when the file list 380 is not created, the analysis unit 340 may treat the target file 501 as a newly created file.

解析部３４０は、対象ファイル５０１と、ファイルリスト３８０に記録されたファイルとの間の類似度を判定する（ステップＳ２１１２）。解析部３４０は、例えば、コサイン類似度計算など、既存の類似判定技術を用いて、これらのファイルの類似度を判定し、ファイルリスト３８０に記録された各ファイルに対するスコアを算出する。 The analysis unit 340 determines the similarity between the target file 501 and the files recorded in the file list 380 (step S2112). The analysis unit 340 determines the similarity of these files using an existing similarity determination technique such as cosine similarity calculation, and calculates a score for each file recorded in the file list 380.

なお、対象ファイル５０１は、新規作成されたファイルである可能性がある。よって、新規作成の場合を判別にするために、新規作成に該当する項目と、そのスコアとが、ファイルリスト３８０に設定されてもよい。新規作成に設定されるスコアは、一定の値又はファイルサイズに応じて変化する値等であってもよい。 Note that the target file 501 may be a newly created file. Therefore, in order to determine the case of new creation, the item corresponding to the new creation and its score may be set in the file list 380. The score set for new creation may be a fixed value or a value that changes according to the file size.

解析部３４０は、ファイルリスト３８０に記録されたファイルを、スコア順に並べることで、対象ファイル５０１から更新が行われた可能性があるファイルを列挙することができる。解析部３４０は、ファイルリスト３８０に記録されたファイルの内、ある基準値（基準スコア）以上のスコアが算出されたファイルについて、対象ファイル５０１に関する派生先ファイルであると推定できる。ある基準スコア以上のスコアを新規作成のスコアとすることで、ファイルが新規作成された場合についても推定することができる。 The analysis unit 340 can list the files that may have been updated from the target file 501 by arranging the files recorded in the file list 380 in the order of the scores. The analysis unit 340 can estimate that among the files recorded in the file list 380, a file having a score equal to or higher than a certain reference value (reference score) is a derivation destination file for the target file 501. By setting a score above a certain reference score as a newly created score, it is possible to estimate the case where a file is newly created.

なお、読み込みファイルと書き込みファイルとが１対１に対応している場合には、派生先ファイルが発生していないと考えられる。この場合、解析部３４０は、そのようなファイルをスコア計算から除外することで、誤検出の割合を減らすことができる。 If the read file and the write file have a one-to-one correspondence, it is considered that no derivation destination file has occurred. In this case, the analysis unit 340 can reduce the false detection rate by excluding such a file from the score calculation.

上記のように構成された本変形例における履歴解析装置３００によれば、一つのプロセスで複数のファイルを扱うアプリケーションなどによりファイルが操作された際に、ファイルの派生関係を推定することができる。 According to the history analysis apparatus 300 in the present modification configured as described above, when a file is operated by an application or the like that handles a plurality of files in one process, the derivation relationship of the files can be estimated.

なお、上記第１の実施形態及びその変形例（第１乃至第３の変形例）においては、ファイルサーバ１００と、クライアント２００と、履歴解析装置３００とが、それぞれ独立した物理的又は論理的な装置として実現される態様について説明した。しかしながら、本開示は上記には限定されない。例えば、いずれかのクライアント２００が、履歴解析装置３００の構成要素を備えることで、履歴解析装置３００の機能を備えてもよい。即ち、クライアント２００と、履歴解析装置３００とが統合された態様も、本開示に含まれる。 In the first embodiment and its modifications (first to third modifications), the file server 100, the client 200, and the history analysis apparatus 300 are independent of each other physical or logical. The aspect realized as an apparatus has been described. However, the present disclosure is not limited to the above. For example, any client 200 may include the function of the history analysis device 300 by including the components of the history analysis device 300. That is, an aspect in which the client 200 and the history analysis apparatus 300 are integrated is also included in the present disclosure.

＜第２の実施形態＞
以下、上記説明した本開示の第１の実施形態の基礎となる、本開示に係る第２の実施形態について説明する。 <Second Embodiment>
Hereinafter, the second embodiment according to the present disclosure, which is the basis of the first embodiment of the present disclosure described above, will be described.

図２２Ａは、本実施形態における履歴解析装置２２００の機能的な構成を例示するブロック図である。図２２Ａに例示するように、履歴解析装置２２００は、ログ取得部２２０１と、解析部２２０２と、を備える。履歴解析装置２２００を構成するこれらの構成要素の間は、適切な通信方法を用いて通信可能に接続されていてもよい。 FIG. 22A is a block diagram illustrating a functional configuration of the history analysis apparatus 2200 according to this embodiment. As illustrated in FIG. 22A, the history analysis device 2200 includes a log acquisition unit 2201 and an analysis unit 2202. These components constituting the history analysis device 2200 may be communicably connected using an appropriate communication method.

履歴解析装置２２００は、図２２Ｂに例示するように、ファイル提供装置２２１０、ファイル操作装置２２２０と通信ネットワークを介して通信可能に接続されていてもよい。ファイル提供装置２２１０は、例えば、上記第１の実施形態及びその変形例におけるファイルサーバ１００と同様に構成された装置であってもよい。ファイル操作装置２２２０は、上記第１の実施形態及びその変形例におけるクライアント２００と同様に構成された装置であってもよい。 The history analysis apparatus 2200 may be communicably connected to the file providing apparatus 2210 and the file operation apparatus 2220 via a communication network, as illustrated in FIG. 22B. The file providing device 2210 may be, for example, a device configured in the same manner as the file server 100 in the first embodiment and the modifications thereof. The file operation device 2220 may be a device configured in the same manner as the client 200 in the first embodiment and the modifications thereof.

履歴解析装置２２００におけるログ取得部２２０１は、ファイル提供装置２２１０において提供されるファイルに対するファイル操作装置２２２０からのアクセスに関する情報を含むアクセスログを取得する。また、ログ取得部２２０１は、ファイル提供２２１０装置又はファイル操作装置２２２０に配置されたファイルに対して、ファイル操作装置２２２０におけるプロセスにより実行された操作に関する情報を含むプロセス操作ログを取得する
アクセスログには、例えば、以下のような情報が含まれてもよい。アクセスログには、例えば、ファイル提供装置２２１０において提供されるファイルに対する操作が実行された時間を表す情報が含まれてもよい。アクセスログには、例えば、当該ファイルにアクセスしたファイル操作装置２２２０を特定可能な情報が含まれてもよい。アクセスログには、例えば、当該ファイルそのものを特定可能な情報（例えば、パス名、ファイル名等）が含まれてもよい。アクセスログには、例えば、当該ファイルに対して実行された操作の内容（例えば、読み込み、書き込み、削除、リネーム等）表す情報が含まれてもよい。アクセスログには、例えば、当該ファイルのサイズを表す情報が含まれてもよい。アクセスログは、例えば、上記第１の上記第１の実施形態及びその変形例におけるサーバアクセスログ１２０と同様のデータを保持してもよい。 A log acquisition unit 2201 in the history analysis device 2200 acquires an access log including information related to access from the file operation device 2220 to a file provided in the file providing device 2210. In addition, the log acquisition unit 2201 acquires a process operation log including information related to an operation executed by a process in the file operation device 2220 with respect to a file arranged in the file providing 2210 device or the file operation device 2220. For example, the following information may be included. The access log may include, for example, information indicating a time when an operation on a file provided in the file providing apparatus 2210 is executed. The access log may include, for example, information that can identify the file operation device 2220 that has accessed the file. The access log may include, for example, information (for example, a path name, a file name, etc.) that can specify the file itself. The access log may include, for example, information indicating the content of an operation executed on the file (for example, read, write, delete, rename, etc.). The access log may include information indicating the size of the file, for example. For example, the access log may hold the same data as the server access log 120 in the first embodiment and the modifications thereof.

プロセス操作ログには、例えば、以下のような情報が含まれてもよい。プロセス操作ログには、例えば、ファイル操作装置２２２０において上記プロセスが実行された時間を表す情報が含まれてもよい。プロセス操作ログには、例えば、当該プロセスを識別可能な情報（識別子等）が含まれてもよい。プロセス操作ログには、当該プロセスにより操作（アクセスされた）ファイルを特定可能な情報（パス名、ファイル名）等が含まれてもよい。プロセス操作ログには、当該プロセスにより実行された操作の内容を表す情報が含まれてもよい。プロセス操作ログには、例えば、当該ファイルのサイズを表す情報が含まれてもよい。プロセス操作ログは、例えば、上記第１の上記第１の実施形態及びその変形例におけるファイルアクセスログ２３０及びディスク操作ログ２４０少なくとも一方と同様のデータを保持してもよい。 The process operation log may include the following information, for example. The process operation log may include, for example, information indicating the time when the process is executed in the file operation device 2220. The process operation log may include, for example, information (identifier or the like) that can identify the process. The process operation log may include information (path name, file name) and the like that can identify the file operated (accessed) by the process. The process operation log may include information indicating the content of the operation executed by the process. The process operation log may include information indicating the size of the file, for example. For example, the process operation log may hold data similar to at least one of the file access log 230 and the disk operation log 240 in the first embodiment and the modifications thereof.

係るログ取得部２２０１は、例えば、上記第１の実施形態及びその変形例におけるログ取得部３１０と同様に構成されてもよく、同様の機能を提供してもよい。 For example, the log acquisition unit 2201 may be configured in the same manner as the log acquisition unit 310 in the first embodiment and the modifications thereof, and may provide the same function.

解析部２２０２は、上記アクセスログ及び上記プロセス操作ログから、上記ファイル提供装置において提供された操作対象ファイルを操作した上記プロセスである操作プロセスを特定する。具体的には、解析部２２０２は、例えば、アクセスログとプロセス操作ログとから、操作対象ファイルに関するログを抽出してもよい。そして、解析部２２０２は、抽出した操作対象ファイルに関するログのうち、アクセスログとプロセス操作ログとにおいて操作が実行された時間と、操作の内容とが共通するログを更に抽出し、当該ログに関する操作を実行したプロセスを特定してもよい。 The analysis unit 2202 identifies an operation process, which is the process that has operated the operation target file provided in the file providing apparatus, from the access log and the process operation log. Specifically, the analysis unit 2202 may extract a log related to the operation target file from, for example, an access log and a process operation log. Then, the analysis unit 2202 further extracts, from the extracted logs related to the operation target file, a log in which the time when the operation is performed in the access log and the process operation log and the content of the operation are common, and the operation related to the log You may identify the process that executed.

解析部２２０２は、上記操作対象ファイルに対して実行された操作の種類に応じて、上記操作対象ファイルの派生元ファイルに対して実行され得る操作である派生元操作に関するログのうち、派生元操作に関する条件を満たすログを上記プロセス操作ログから抽出する。 The analysis unit 2202 includes a derivation source operation in a log related to a derivation source operation that is an operation that can be performed on the derivation source file of the operation target file according to the type of operation performed on the operation target file. A log satisfying the condition is extracted from the process operation log.

以下、具体例として、上記操作対象ファイルに対して実行された操作の種類が書き込み操作（Ｗｒｉｔｅ操作）である場合について説明する。この場合、解析部２２０２は、原本ファイルから読み込み操作により読み込まれたデータが、派生先ファイルである操作対象ファイルに書き込まれると推定する。よって、解析部２２０２は、例えば、操作対象ファイルの原本ファイルに対して実行され得る操作（派生元操作）として、読み込み操作（Ｒｅａｄ操作）を特定する。そして、解析部２２０２は、プロセス操作ログから、読み込み操作に関するログのうち、ある特定の条件を満たすログを抽出する。具体的には、解析部２２０２は、例えば、プロセス操作ログに含まれる読み込み操作の内、操作対象ファイルに関する書き込み操作が実行された時間と最も近い時間に実行された読み込み操作に関するログを抽出してもよい。なお、上記特定の条件は、操作対象ファイルに実行された操作と、派生元操作との間の関係等に応じて適宜定められてよい。 Hereinafter, as a specific example, a case where the type of operation performed on the operation target file is a write operation (Write operation) will be described. In this case, the analysis unit 2202 estimates that the data read by the reading operation from the original file is written to the operation target file that is the derivation destination file. Therefore, for example, the analysis unit 2202 specifies a read operation (Read operation) as an operation (derivation source operation) that can be performed on the original file of the operation target file. Then, the analysis unit 2202 extracts, from the process operation log, a log that satisfies a specific condition from among the logs related to the reading operation. Specifically, for example, the analysis unit 2202 extracts a log related to a read operation executed at a time closest to the time when the write operation related to the operation target file is executed among the read operations included in the process operation log. Also good. The specific condition may be appropriately determined according to the relationship between the operation executed on the operation target file and the derivation source operation.

解析部２２０２は、抽出した派生元操作に関するログに記録されたファイルを、上記操作対象ファイルに対する原本ファイルとして特定する。係る原本ファイルは、ファイル操作装置２２２０に保持されていてもよい。 The analysis unit 2202 identifies the file recorded in the log relating to the extracted derivation source operation as the original file for the operation target file. Such an original file may be held in the file operation device 2220.

係る解析部２２０２は、例えば、上記第１の実施形態及びその変形例における解析部３４０と同様に構成されてもよく、同様の機能を提供してもよい。 For example, the analysis unit 2202 may be configured in the same manner as the analysis unit 340 in the first embodiment and the modifications thereof, and may provide the same function.

上記のように構成された本実施形態における履歴解析装置２２００によれば、異なる装置（例えばファイル提供装置２２１０、ファイル操作装置２２２０）にそれぞれ配置されたデータに関するログ（例えば、アクセスログ及びプロセス操作ログ）に基づいて、それらのデータに関する派生関係を特定することができる。その理由は、以下の通りである。まず、履歴解析装置２２００は、ファイル提供装置２２１０に配置された操作対象ファイルに関する操作を実行した操作プロセスを特定することができる。また、履歴解析装置２２００は、当該操作プロセスがファイル操作装置２２２０において実行した操作に関するログのうち、操作対象ファイルに関する原本ファイルに対して実行され得る操作を特定することができる。即ち、これにより、原本ファイルに関して実行され得る操作の候補をログから抽出することができる。係る候補のログの内、特定の条件を満たすログを選択することで、操作対象ファイルに関する原本ファイルを特定可能である。操作プロセスは、ファイル操作装置２２２０に配置されたファイルを操作可能であることから、原本ファイルは、ファイル操作装置２２２０に配置されていてよい。よって、履歴解析装置２２００は、例えば、ファイル提供装置２２１０に配置された操作対象ファイルと、ファイル操作装置２２２０に配置された原本ファイルの間の派生関係を特定することができる。 According to the history analysis apparatus 2200 of the present embodiment configured as described above, logs (for example, an access log and a process operation log) relating to data respectively arranged in different apparatuses (for example, the file providing apparatus 2210 and the file operation apparatus 2220). ) To identify the derivation relationship for those data. The reason is as follows. First, the history analysis apparatus 2200 can identify an operation process that has performed an operation related to an operation target file arranged in the file providing apparatus 2210. Further, the history analysis apparatus 2200 can specify an operation that can be executed on the original file related to the operation target file among the logs related to the operation executed by the operation process in the file operation apparatus 2220. In other words, it is possible to extract from the log candidates for operations that can be performed on the original file. By selecting a log that satisfies a specific condition from among the candidate logs, it is possible to specify the original file related to the operation target file. Since the operation process can operate the file arranged in the file operation device 2220, the original file may be arranged in the file operation device 2220. Therefore, for example, the history analysis apparatus 2200 can specify the derivation relationship between the operation target file arranged in the file providing apparatus 2210 and the original file arranged in the file operation apparatus 2220.

上記各実施形態を用いて、本開示に関する技術を履歴解析装置（３００、２２００）に適用した例を説明した。上記各実施形態では、例えば、履歴解析装置（３００、２２００）を動作させることによって、本開示に関する履歴解析方法を実施することができる。履歴解析方法を実施する方法は上記に限定されず、履歴解析装置（３００、２２００）と同様の動作あるいは処理を実行可能な、適切な装置（例えば、コンピュータ等の情報処理装置）によって実施することも可能である。また、本開示に関する技術は、履歴解析装置３００と、ファイルサーバ１００と、クライアント２００とを含むシステム、又は、履歴解析装置２２００と、ファイル提供装置２２１０と、ファイル操作装置２２２０とを含むシステムとして実施されてもよい。 The example which applied the technique regarding this indication to the log | history analyzer (300, 2200) was demonstrated using each said embodiment. In each of the above embodiments, for example, the history analysis method according to the present disclosure can be performed by operating the history analysis device (300, 2200). The method of executing the history analysis method is not limited to the above, and it is executed by an appropriate device (for example, an information processing device such as a computer) that can execute the same operation or processing as the history analysis device (300, 2200). Is also possible. The technology related to the present disclosure is implemented as a system including the history analysis apparatus 300, the file server 100, and the client 200, or a system including the history analysis apparatus 2200, the file providing apparatus 2210, and the file operation apparatus 2220. May be.

＜ハードウェア及びソフトウェア・プログラム（コンピュータ・プログラム）の構成＞
以下、上記説明した各実施形態を実現可能なハードウェア構成について説明する。 <Configuration of hardware and software program (computer program)>
Hereinafter, a hardware configuration capable of realizing each of the above-described embodiments will be described.

以下の説明においては、上記各実施形態において説明した履歴解析装置（３００、２２００）をまとめて、単に「履歴解析装置」と記載する。また、これら履歴解析装置の各構成要素を、単に「履歴解析装置の構成要素」と記載する場合がある。 In the following description, the history analysis devices (300, 2200) described in the above embodiments are collectively referred to as “history analysis device”. In addition, each component of the history analysis device may be simply referred to as “component of the history analysis device”.

上記各実施形態において説明した履歴解析装置は、１つ又は複数の専用のハードウェア装置により構成されてもよい。その場合、上記各図（図１、図１０、図１５、図１７、図２２Ａ及び図２２Ｂ）に示した各構成要素は、その一部又は全部を統合したハードウェア（処理ロジックを実装した集積回路あるいは記憶デバイス等）を用いて実現されてもよい。 The history analysis device described in the above embodiments may be configured by one or a plurality of dedicated hardware devices. In that case, each component shown in each of the above drawings (FIGS. 1, 10, 15, 17, 17, 22A and 22B) is a hardware in which a part or all of them are integrated (integrated with processing logic mounted). It may be realized using a circuit or a storage device.

本開示に係る履歴解析装置が、専用のハードウェアにより実現される場合、係る履歴解析装置の構成要素は、例えば、それぞれの機能を提供可能な回路構成（ｃｉｒｃｕｉｔｒｙ）により実現されてもよい。係る回路構成は、例えば、ＳｏＣ（ＳｙｓｔｅｍｏｎａＣｈｉｐ）等の集積回路や、当該集積回路を用いて実現されたチップセット等を含む。この場合、履歴解析装置の構成要素が保持するデータは、例えば、ＳｏＣとして統合されたＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）領域やフラッシュメモリ領域、あるいは、当該ＳｏＣに接続された記憶デバイス（半導体記憶装置等）に記憶されてもよい。また、この場合、履歴解析装置の各構成要素を接続する通信回線としては、周知の通信ネットワークを採用してもよい。また、各構成要素を接続する通信回線は、それぞれの構成要素間をピアツーピアで接続してもよい。 When the history analysis device according to the present disclosure is realized by dedicated hardware, the components of the history analysis device may be realized by, for example, a circuit configuration that can provide each function. Such a circuit configuration includes, for example, an integrated circuit such as SoC (System on a Chip), a chip set realized using the integrated circuit, and the like. In this case, the data held by the components of the history analysis apparatus is, for example, a RAM (Random Access Memory) area integrated as SoC, a flash memory area, or a storage device (semiconductor storage device or the like) connected to the SoC. May be stored. In this case, a well-known communication network may be employed as a communication line that connects each component of the history analysis apparatus. Further, the communication line connecting each component may be connected between each component by peer-to-peer.

また、上述した履歴解析装置は、図２３に例示するような汎用のハードウェアと、係るハードウェアによって実行される各種ソフトウェア・プログラム（コンピュータ・プログラム）とによって構成されてもよい。この場合、履歴解析装置は、適切な数の汎用のハードウェア装置２３００と、ソフトウェア・プログラムとの組合せにより構成されてもよい。 The history analysis apparatus described above may be configured by general-purpose hardware as exemplified in FIG. 23 and various software programs (computer programs) executed by the hardware. In this case, the history analysis apparatus may be configured by a combination of an appropriate number of general-purpose hardware devices 2300 and software programs.

図２３における演算装置２３０１は、汎用のＣＰＵ（中央処理装置：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やマイクロプロセッサ等の演算処理装置である。演算装置２３０１は、例えば後述する不揮発性記憶装置２３０３に記憶された各種ソフトウェア・プログラムを記憶装置２３０２に読み出し、係るソフトウェア・プログラムに従って処理を実行してもよい。例えば、上記各実施形態における履歴解析装置の構成要素の機能は、演算装置２３０１により実行されるソフトウェア・プログラムを用いて実現されてもよい。 An arithmetic device 2301 in FIG. 23 is an arithmetic processing device such as a general-purpose CPU (Central Processing Unit) or a microprocessor. For example, the arithmetic device 2301 may read various software programs stored in a non-volatile storage device 2303 (to be described later) into the storage device 2302 and execute processing according to the software programs. For example, the functions of the components of the history analysis device in each of the above embodiments may be realized using a software program executed by the arithmetic device 2301.

演算装置２３０１は、例えば、複数のＣＰＵコアを備えたマルチコアＣＰＵであってもよい。また演算装置２３０１は、１つのＣＰＵコアが複数のスレッドを実行可能な、マルチスレッドＣＰＵであってもよい。ハードウェア装置２３００は、複数の演算装置２３０１を備えてもよい。 The arithmetic device 2301 may be a multi-core CPU including a plurality of CPU cores, for example. The arithmetic device 2301 may be a multi-thread CPU in which one CPU core can execute a plurality of threads. The hardware device 2300 may include a plurality of arithmetic devices 2301.

記憶装置２３０２は、演算装置２３０１から参照可能な、ＲＡＭ等のメモリ装置であり、ソフトウェア・プログラムや各種データ等を記憶する。なお、記憶装置２３０２は、揮発性のメモリ装置であってもよい。 The storage device 2302 is a memory device such as a RAM that can be referred to from the arithmetic device 2301, and stores software programs, various data, and the like. Note that the storage device 2302 may be a volatile memory device.

不揮発性記憶装置２３０３は、例えば磁気ディスクドライブや、フラッシュメモリによる半導体記憶装置等の、不揮発性の記憶装置である。不揮発性記憶装置２３０３は、各種ソフトウェア・プログラムやデータ等を記憶可能である。 The nonvolatile storage device 2303 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using a flash memory. The nonvolatile storage device 2303 can store various software programs, data, and the like.

ドライブ装置２３０４は、例えば、後述する記録媒体２３０５に対するデータの読み込みや書き込みを処理する装置である。 The drive device 2304 is, for example, a device that processes reading and writing of data with respect to a recording medium 2305 to be described later.

記録媒体２３０５は、例えば光ディスク、光磁気ディスク、半導体フラッシュメモリ等、データを記録可能な任意の記録媒体である。 The recording medium 2305 is an arbitrary recording medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.

ネットワークインタフェース２３０６は、通信ネットワークとの間のデータの送受信を制御する装置である。履歴管理装置は、ネットワークインタフェース２３０６を介して、ファイルサーバ１００（ファイル提供装置２２１０）、クライアント２００（ファイル操作装置２２２０）等から各種ログを取得してもよい。 The network interface 2306 is a device that controls transmission / reception of data to / from the communication network. The history management apparatus may acquire various logs from the file server 100 (file providing apparatus 2210), the client 200 (file operation apparatus 2220), and the like via the network interface 2306.

上述した各実施形態を例に説明した本発明における履歴解析装置、あるいはその構成要素は、例えば、図２３に例示するハードウェア装置２３００に対して、上記各実施形態において説明した機能を実現可能なソフトウェア・プログラムを供給することにより、実現されてもよい。 The history analysis apparatus according to the present invention described with reference to each of the above-described embodiments, or its components, for example, can implement the functions described in the above-described embodiments with respect to the hardware device 2300 illustrated in FIG. It may be realized by supplying a software program.

より具体的には、例えば、係るハードウェア装置２３００に対して供給したソフトウェア・プログラムを、演算装置２３０１が実行することによって、本発明が実現されてもよい。この場合、係るハードウェア装置２３００で稼働しているオペレーティングシステムや、データベース管理ソフト、ネットワークソフト、仮想環境基盤等のミドルウェアなどが各処理の一部を実行してもよい。 More specifically, for example, the present invention may be realized when the arithmetic device 2301 executes a software program supplied to the hardware device 2300. In this case, an operating system running on the hardware device 2300, middleware such as database management software, network software, and virtual environment infrastructure may execute a part of each process.

上述した各実施形態において、上記各図に示した各部は、上述したハードウェアにより実行されるソフトウェア・プログラムの機能（処理）単位である、ソフトウェアモジュールとして実現することができる。ただし、これらの図面に示した各ソフトウェアモジュールの区分けは、説明の便宜上の構成であり、実装に際しては、様々な構成が想定され得る。 In each embodiment described above, each unit illustrated in each of the above drawings can be realized as a software module, which is a function (processing) unit of a software program executed by the hardware described above. However, the division of each software module shown in these drawings is a configuration for convenience of explanation, and various configurations can be assumed for implementation.

上記各実施形態及びその変形例に例示した履歴解析装置の各構成要素をソフトウェアモジュールとして実現する場合、例えば、これらのソフトウェアモジュールが不揮発性記憶装置２３０３に記憶されてもよい。そして、演算装置２３０１がそれぞれの処理を実行する際に、これらのソフトウェアモジュールを記憶装置２３０２に読み出してもよい。 When each component of the history analysis apparatus exemplified in each of the above embodiments and the modifications thereof is realized as a software module, for example, these software modules may be stored in the nonvolatile storage device 2303. These software modules may be read out to the storage device 2302 when the arithmetic device 2301 executes each process.

また、これらのソフトウェアモジュールは、共有メモリやプロセス間通信等の適宜の方法により、相互に各種データを伝達できるように構成されてもよい。このような構成により、これらのソフトウェアモジュールは、相互に通信可能に接続される。 In addition, these software modules may be configured to transmit various data to each other by an appropriate method such as shared memory or inter-process communication. With such a configuration, these software modules are connected so as to communicate with each other.

更に、上記ソフトウェア・プログラムは記録媒体２３０５に記録されてもよい。この場合、上記ソフトウェア・プログラムは、上記履歴解析装置の出荷段階、あるいは運用段階等において、適宜ドライブ装置２３０４を通じて不揮発性記憶装置２３０３に格納されるよう構成されてもよい。 Further, the software program may be recorded on the recording medium 2305. In this case, the software program may be stored in the non-volatile storage device 2303 through the drive device 2304 as appropriate at the shipping stage or operation stage of the history analysis apparatus.

なお、上記の場合において、上記ハードウェア装置２３００への各種ソフトウェア・プログラムの供給方法は、出荷前の製造段階、あるいは出荷後のメンテナンス段階等において、適当な治具を利用して当該装置内にインストールする方法を採用してもよい。また、各種ソフトウェア・プログラムの供給方法は、インターネット等の通信回線を介して外部からダウンロードする方法等のように、現在では一般的な手順を採用してもよい。 In the above case, the method of supplying various software programs to the hardware device 2300 is performed in the device using an appropriate jig in the manufacturing stage before shipment or the maintenance stage after shipment. An installation method may be adopted. As a method for supplying various software programs, a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.

そして、このような場合において、本発明は、係るソフトウェア・プログラムを構成するコード、あるいは係るコードが記録されたところの、コンピュータ読み込み可能な記録媒体によって構成されると捉えることができる。この場合、係る記録媒体は、ハードウェア装置２３００と独立した媒体に限らず、ＬＡＮやインターネットなどにより伝送されたソフトウェア・プログラムをダウンロードして記憶又は一時記憶した記録媒体を含む。 In such a case, the present invention can be understood to be constituted by a computer-readable recording medium on which the code constituting the software program or the code is recorded. In this case, the recording medium is not limited to a medium independent of the hardware device 2300, and includes a recording medium in which a software program transmitted via a LAN or the Internet is downloaded and stored or temporarily stored.

また、上述した履歴解析装置の構成要素は、図２３に例示するハードウェア装置２３００を仮想化した仮想化環境と、当該仮想化環境において実行される各種ソフトウェア・プログラム（コンピュータ・プログラム）とによって構成されてもよい。この場合、図２３に例示するハードウェア装置２３００の構成要素は、当該仮想化環境における仮想デバイスとして提供される。なお、この場合も、図２３に例示するハードウェア装置２３００を物理的な装置として構成した場合と同様の構成にて、本発明を実現可能である。 Further, the components of the history analysis apparatus described above are configured by a virtualized environment in which the hardware device 2300 illustrated in FIG. 23 is virtualized and various software programs (computer programs) executed in the virtualized environment. May be. In this case, the components of the hardware device 2300 illustrated in FIG. 23 are provided as virtual devices in the virtual environment. In this case as well, the present invention can be realized with the same configuration as when the hardware device 2300 illustrated in FIG. 23 is configured as a physical device.

以上、本発明を、上述した模範的な実施形態に適用した例として説明した。しかしながら、本発明の技術的範囲は、上述した各実施形態に記載した範囲には限定されない。当業者には、係る実施形態に対して多様な変更又は改良を加えることが可能であることは明らかである。そのような場合、係る変更又は改良を加えた新たな実施形態も、本発明の技術的範囲に含まれ得る。上述した各実施形態を組合せた実施形態も本発明の技術的範囲に含まれる。更に、上述した各実施形態と、上述した各実施形態に変更又は改良を加えた新たな実施形態とを組合せた実施形態も、本発明の技術的範囲に含まれ得る。そしてこのことは、特許請求の範囲に記載した事項から明らかである。 In the above, this invention was demonstrated as an example applied to exemplary embodiment mentioned above. However, the technical scope of the present invention is not limited to the scope described in the above embodiments. It will be apparent to those skilled in the art that various modifications and improvements can be made to such embodiments. In such a case, new embodiments to which such changes or improvements are added can also be included in the technical scope of the present invention. Embodiments combining the above-described embodiments are also included in the technical scope of the present invention. Furthermore, embodiments combining the above-described embodiments and new embodiments obtained by changing or improving the above-described embodiments may also be included in the technical scope of the present invention. This is clear from the matters described in the claims.

１００ファイルサーバ
１１０共有記憶装置
１２０サーバアクセスログ
２００クライアント
２１０記憶装置
２２０ＣＰＵ
３００履歴解析装置
３１０ログ取得部
３４０解析部
４００ネットワーク
２２００履歴解析装置
２２０１ログ取得部
２２０２解析部
２２１０ファイル提供装置
２２２０ファイル操作装置
２３０１演算装置
２３０２記憶装置
２３０３不揮発性記憶装置
２３０４ドライブ装置
２３０５記録媒体
２３０６ネットワークインタフェース 100 File Server 110 Shared Storage Device 120 Server Access Log 200 Client 210 Storage Device 220 CPU
300 History Analysis Device 310 Log Acquisition Unit 340 Analysis Unit 400 Network 2200 History Analysis Device 2201 Log Acquisition Unit 2202 Analysis Unit 2210 File Providing Device 2220 File Manipulation Device 2301 Arithmetic Device 2302 Storage Device 2303 Non-volatile Storage Device 2304 Drive Device 2305 Recording Medium 2306 Network interface

Claims

An access log including information related to access from a file operation device to a file provided in the file providing device and a file placed in the file providing device or the file operation device are executed by a process in the file operation device. A process operation log including information related to the operation, and a log acquisition unit for acquiring
From the access log and the process operation log, an operation process that is the process that has operated the operation target file provided in the file providing apparatus is specified, and according to the type of operation performed on the operation target file Among the logs related to the derivation source operation, which is an operation that can be executed on the original file of the operation target file, a log that satisfies a specific condition is extracted from the process operation log, and the extracted log regarding the derivation source operation is extracted. A history analysis apparatus comprising: an analysis unit that identifies the recorded file as an original file for the operation target file.

When the operation performed on the operation target file by the operation process is a write operation, the analysis unit
Among the read operations related to the file that is the derivation source operation executed by the operation process after the process start time when the operation process was started, before the time when the write operation related to the operation target file is executed, When a log of a read operation executed at a time closest to a time when a write operation related to the operation target file is executed is included in the process operation log, the file on which the read operation is executed is related to the operation target file. The history analysis device according to claim 1 specified as an original file.

The analysis unit
When the identified name of the original file and the name of the operation target file match, it is determined that the original file has been edited by the operation process;
The history analysis apparatus according to claim 2, wherein when the specified name of the original file and the name of the operation target file do not match, it is determined that the original file has been saved with a different name by the operation process.

The analysis unit
When the identified size of the original file matches the size of the operation target file, information indicating deletion of the original file is within a reference time after the time when the write operation related to the operation target file is executed. Whether the original file has been moved as the operation target file by the operation process or whether the original file has been copied as the operation target file by the operation process based on whether or not it is recorded in the process operation log The history analysis apparatus according to claim 2, wherein

The log acquisition unit further acquires a network operation log that is a log related to a data transmission / reception operation performed on the communication network in the file operation device,
When the operation performed on the operation target file by the operation process is a write operation, the analysis unit
Among the download operations that are the derivation source operations related to the file, executed by the operation process after the process start time, the size of the data received by the download operation or the correction size calculated from the size of the received data is The history analysis device according to claim 2, wherein when the log that matches the size of the operation target file is included in the network operation log, the received data is specified as an original file related to the operation target file.

The analysis unit
From the process operation log, a log of a read operation executed on the operation target file is extracted, and for each extracted log, the operation process that has executed the read operation on the operation target file is specified,
From the process operation log, a log related to a write operation executed by the operation process is identified after the time when the read operation related to the operation target file is executed by the operation process, and the write operation recorded in the specified log is The history analysis apparatus according to claim 2, wherein the executed file is specified as a derivation destination file related to the operation target file.

The analysis unit
A process all operation log list that is a log related to an operation executed by the operation process is extracted from the process operation log between the process start time and a process end time when the operation process is finished.
Create a read operation log list obtained by extracting the log related to the read operation from the process all operation log list,
Create a write operation log list obtained by extracting a log related to a read operation executed after the time when an operation related to the operation target file is executed from the process all operation log list,
With reference to the read operation log list, if there is a read operation executed before the execution time of the write operation recorded in the log for all the logs included in the write operation log list, the read operation is performed. Associate one or more logs related to the operation as an estimated reading candidate list,
When the operation related to the operation target file is a read operation, the file recorded in the log of the write operation log list associated with the estimated read candidate list is recorded in the candidate file list,
When the operation related to the operation target file is a write operation, the estimated read candidate list associated with the log related to the write operation executed on the file having the same name as the operation target file in the write operation log list. Record the file in the candidate file list;
Calculating a score representing the degree of similarity between the file recorded in the candidate file list and the operation target file;
Among the files recorded in the candidate file list, the file whose score is equal to or higher than a reference value is specified as a derivation destination file for the operation target file.
The history analysis apparatus according to claim 2.

An access log including information related to access from a file operation device to a file provided in the file providing device and a file placed in the file providing device or the file operation device are executed by a process in the file operation device. Process operation log containing information related to the
From the access log and the process operation log, specify the operation process that is the process that operated the operation target file provided in the file providing device,
A log that satisfies a specific condition among logs related to a derivation source operation that is an operation that can be executed on the original file of the operation target file according to the type of operation executed on the operation target file. Extracted from the operation log
A history analysis method for specifying, as an original file for the operation target file, a file in which the derivation source operation recorded in the extracted log relating to the derivation source operation is executed.

On the computer,
An access log including information related to access from a file operation device to a file provided in the file providing device and a file placed in the file providing device or the file operation device are executed by a process in the file operation device. Processing to obtain a process operation log including information related to the operation performed,
A process of specifying an operation process that is the process that has operated the operation target file provided in the file providing device from the access log and the process operation log;
A log that satisfies a specific condition among logs related to a derivation source operation that is an operation that can be executed on the original file of the operation target file according to the type of operation executed on the operation target file. Processing to extract from the operation log;
A program for executing a process of specifying a file on which the derivation source operation recorded in the extracted log relating to the derivation source operation is executed as an original file for the operation target file.

A file providing device capable of providing an operation target file and an access log including information on access to the operation target file via a communication network;
A file capable of executing an operation related to the operation target file and a file held in the own device by executing a process in the own device and providing a process operation log including information related to the operation via a communication network. An operating device;
A log acquisition unit for acquiring the access log from the file providing device via the communication network, and acquiring the process operation log from the file operation device;
From the access log and the process operation log, an operation process that is the process that has operated the operation target file provided in the file providing apparatus is specified, and according to the type of operation performed on the operation target file Then, among the logs related to the derivation source operation that is an operation that can be executed on the original file of the operation target file, a log that satisfies a specific condition is extracted from the process operation log, and the extracted log related to the derivation source operation An analysis unit that identifies the file recorded in the file as an original file for the operation target file,
A history analysis system comprising a history analysis device.