JP2019028788A

JP2019028788A - Secret word specifying apparatus, secret word specifying method, and secret word specifying program

Info

Publication number: JP2019028788A
Application number: JP2017148345A
Authority: JP
Inventors: 聡美齊藤; Satomi Saito; 悟鳥居; Satoru Torii; 津田　宏; Hiroshi Tsuda; 宏津田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2019-02-21

Abstract

To improve accuracy in extraction of storage destination of secret information.SOLUTION: A secret word specifying apparatus has a counting unit for counting the number of times of appearance of each of words consisting of a file path character string including any one or more of pre-registered character strings among file path character strings of each of files which a directory stored in a storage apparatus stores, and a specifying unit for, if the number of times of appearance of each of the words satisfies a predetermined condition, specifying each of the words as a word relating to secret information.SELECTED DRAWING: Figure 3

Description

本発明は、機密語特定装置、機密語特定方法及び機密語特定プログラムに関する。 The present invention relates to a confidential word specifying device, a confidential word specifying method, and a confidential word specifying program.

社内の共有ファイルサーバにおいては、複数のユーザによって自由に操作可能とされるファイルだけでなく、アクセスに対する制限が厳しい機密情報を格納したファイル（以下、「機密ファイル」という。）も管理される場合が有る。例えば、機密情報の種類ごとにディレクトリ（以下、「機密ディレクトリ」という。）が作成され、各機密ファイルが格納する機密情報の種類に応じたディレクトリへ各機密ファイルが保存される。 In an in-house shared file server, not only a file that can be freely operated by a plurality of users but also a file that stores confidential information with severe restrictions on access (hereinafter referred to as a “confidential file”) is managed. There is. For example, a directory (hereinafter, referred to as “secret directory”) is created for each type of confidential information, and each confidential file is stored in a directory corresponding to the type of confidential information stored in each confidential file.

特開２００９−１１６６８０号公報JP 2009-116680 A 特開２００２−３３４０６１号公報JP 2002-334061 A 特開２０１３−１３７７４０号公報JP 2013-137740 A 特開２００７−１４８９４６号公報JP 2007-148946 A 特開２０１３−１９１１８８号公報JP 2013-191188 A 特開２０１０−２５０５０２号公報JP 2010-250502 A 特開２０１２−０６８８３３号公報JP 2012-068833 A 特開２０１３−１９１１８８号公報JP 2013-191188 A

機密ファイルに対するアクセスは、不正なファイルアクセスである可能性が有るため、ファイルアクセスごとに記録されるアクセスログに基づいて、機密ファイルへのアクセスを自動的に検知できれば、不正なファイルアクセスの検知作業の効率化を期待することができる。 Since there is a possibility that access to a confidential file is an unauthorized file access, if the access to the confidential file can be automatically detected based on the access log recorded for each file access, the unauthorized file access can be detected. Can be expected to improve efficiency.

しかしながら、社内で扱う機密情報は部署によって異なる場合も有るため、ファイルサーバ内の多数のディレクトリのうち、いずれが機密ディレクトリであるのかを特定するのは困難である。 However, since the confidential information handled in the company may differ depending on the department, it is difficult to identify which of the many directories in the file server is the confidential directory.

特許文献７には、文書情報に含まれるコンテンツに基づいて機密ラベルを判定することが提案されているが、ファイルに含まれる機密ラベルの形態は多様であるため、機密ラベルを判断できないことが考えられる。 Patent Document 7 proposes to determine a confidential label based on content included in document information. However, since there are various forms of the confidential label included in the file, it is considered that the confidential label cannot be determined. It is done.

そこで、一側面では、本発明は、機密情報の格納先の抽出精度を高めることを目的とする。 Accordingly, in one aspect, an object of the present invention is to improve the extraction accuracy of a storage destination of confidential information.

一つの態様では、機密語特定装置は、記憶装置に記憶されているディレクトリが格納する各ファイルのファイルパス文字列のうち、予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数する計数部と、
前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する特定部と、を有する。 In one aspect, the confidential word specifying device includes a file path character string including one or more pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device. A counting unit for counting the number of appearances of each word constituting
And a specifying unit that specifies each word as a word related to confidential information when the number of appearances of each word satisfies a predetermined condition.

一側面として、機密情報の格納先の抽出精度を高めることができる。 As one aspect, it is possible to improve the extraction accuracy of the storage destination of confidential information.

第１の実施の形態におけるシステム構成例を示す図である。It is a figure which shows the system configuration example in 1st Embodiment. 第１の実施の形態における分析装置１０のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the analyzer 10 in 1st Embodiment. 第１の実施の形態における分析装置１０の機能構成例を示す図である。It is a figure which shows the function structural example of the analyzer 10 in 1st Embodiment. 共起性解析部１１の機能の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the function of the co-occurrence analysis part. 学習リストの生成及び機密ディレクトリの判定のための処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence for the production | generation of a learning list, and determination of a confidential directory. ディレクトリ一覧の一例を示す図である。It is a figure which shows an example of a directory list. 初期リストの一例を示す図である。It is a figure which shows an example of an initial list. 共起性の解析結果の行列の一例を説明するための図である。It is a figure for demonstrating an example of the matrix of the analysis result of co-occurrence. 学習リストの一例を示す図である。It is a figure which shows an example of a learning list. 不正アクセスの抽出処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the extraction process of unauthorized access. アクセスログの一例を示す図である。It is a figure which shows an example of an access log. 第２の実施の形態における分析装置１０の機能構成例を示す図である。It is a figure which shows the function structural example of the analyzer 10 in 2nd Embodiment. クロス集計表の生成を説明するための図である。It is a figure for demonstrating the production | generation of a cross tabulation table. 継続事象の特定を説明するための図である。It is a figure for demonstrating specification of a continuation event. クロス集計表の生成処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the production | generation process of a cross tabulation table. クロス集計表の構成例を示す図である。It is a figure which shows the structural example of a cross tabulation table. 継続事象の特定処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the specific process of a continuation event. 継続事象記憶部１９の構成例を示す図である。3 is a diagram illustrating a configuration example of a continuation event storage unit 19. FIG. アクセスログの分類処理の処理手順の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the process sequence of the classification process of an access log.

以下、図面に基づいて第１の実施の形態を説明する。図１は、第１の実施の形態におけるシステム構成例を示す図である。図１において、ユーザ端末３０ａ、３０ｂ及び３０ｃ等の１以上のユーザ端末３０（以下、それぞれを区別しない場合「ユーザ端末３０」という。）は、或る企業内のＬＡＮ（Local Area Network）又はイントラネット等のネットワークを介してファイルサーバ２０に接続される。また、ファイルサーバ２０には、分析装置１０もネットワークを介して接続される。なお、本実施の形態において、当該企業を「企業Ａ」という。 Hereinafter, a first embodiment will be described based on the drawings. FIG. 1 is a diagram illustrating an example of a system configuration in the first embodiment. In FIG. 1, one or more user terminals 30 such as user terminals 30a, 30b, and 30c (hereinafter referred to as “user terminals 30” when not distinguished from each other) are a LAN (Local Area Network) or an intranet in a certain company. Connected to the file server 20 via the network. Further, the analysis apparatus 10 is also connected to the file server 20 via a network. In the present embodiment, the company is referred to as “company A”.

ファイルサーバ２０は、企業Ａにおいて、例えば、複数の従業員によって共有されるファイルや、企業Ａ内の組織において管理されるファイル等が記憶される１以上のコンピュータ（記憶装置）である。ファイルとは、電子データを格納したファイルをいう。 In the company A, the file server 20 is one or more computers (storage devices) in which, for example, files shared by a plurality of employees, files managed in an organization in the company A, and the like are stored. A file refers to a file storing electronic data.

各ユーザ端末３０は、ファイルサーバ２０に記憶されているファイルに対するアクセス元となる端末である。例えば、ＰＣ（Personal Computer）、スマートフォン又はタブレット端末等がユーザ端末３０として利用されてもよい。ユーザ端末３０からのアクセスは、ユーザ端末３０のユーザの操作によって任意のタイミングで発生するアクセス（以下、「ユーザアクセス」という。）や、ユーザ端末３０に実装されたプログラムによって周期的又は反復的に発生する機械的なアクセス（以下、「機械アクセス」という。）等が有る。機械アクセスの一例として、ファイルの定期的なバックアップのためのアクセスやファイルの検索機能によるアクセス等が有る。例えば、次のような特性を有するアクセスが機械アクセスの一例として想定される。発生タイミングが定期的（１日に数回、半日に１回など）、かつ、短時間（１分、１０分など）である。１ファイルに対するアクセス数は１回である。短時間に発生するアクセスに関してはアクセス元のＩＰアドレスは同じである。但し、毎日異なるＩＰアドレスがユーザ端末３０に割り振られる場合など、発生タイミングごとにアクセス元のＩＰアドレスが異なる可能性がある。 Each user terminal 30 is a terminal that is an access source for a file stored in the file server 20. For example, a PC (Personal Computer), a smartphone, a tablet terminal, or the like may be used as the user terminal 30. The access from the user terminal 30 is periodically or repetitively performed by an access generated at an arbitrary timing by the user operation of the user terminal 30 (hereinafter referred to as “user access”) or a program installed in the user terminal 30. There are mechanical accesses that occur (hereinafter referred to as “machine access”). Examples of machine access include access for periodic backup of files and access by a file search function. For example, an access having the following characteristics is assumed as an example of a machine access. The generation timing is regular (several times a day, once a half day, etc.) and short time (1 minute, 10 minutes, etc.). The number of accesses to one file is one time. For access that occurs in a short time, the access source IP address is the same. However, there is a possibility that the access source IP address is different at each occurrence timing, such as when a different IP address is assigned to the user terminal 30 every day.

一方、ユーザアクセスの一例として、ファイルの閲覧や編集のためのアクセス等が有る。なお、不正なユーザアクセスの一例として、なりすましによるファイルの閲覧や、ファイルの改ざんのためのアクセス等が有る。 On the other hand, as an example of user access, there is access for browsing and editing files. Note that examples of unauthorized user access include file browsing by impersonation and access for falsification of a file.

分析装置１０は、ファイルサーバ２０においてファイルに対するアクセスごとに記録されるログ（以下、「アクセスログ」という。）に基づく、ファイルへの不正なアクセスの検知を支援する１以上のコンピュータである。本実施の形態において、分析装置１０は、機密情報を格納しているファイル（以下、「機密ファイル」という。）のパス文字列から、機密ファイルを格納するディレクトリ（以下、「機密ディレクトリ」）を特定するためのキーワードを抽出する処理を行う。分析装置１０は、更に、当該キーワードに基づいて機密ディレクトリを特定し、過去のファイルアクセスに関するアクセスログ群から、機密ディレクトリ下のファイルアクセスに係るアクセスログを抽出する。 The analysis device 10 is one or more computers that support detection of unauthorized access to a file based on a log (hereinafter referred to as “access log”) recorded for each access to the file in the file server 20. In the present embodiment, the analysis apparatus 10 determines a directory (hereinafter referred to as “confidential directory”) for storing a confidential file from a path character string of a file storing confidential information (hereinafter referred to as “confidential file”). A process for extracting a keyword for identification is performed. The analysis apparatus 10 further specifies a confidential directory based on the keyword, and extracts an access log related to file access under the confidential directory from an access log group related to past file access.

図２は、第１の実施の形態における分析装置１０のハードウェア構成例を示す図である。図２の分析装置１０は、それぞれバスＢで相互に接続されているドライブ装置１００、補助記憶装置１０２、メモリ装置１０３、ＣＰＵ１０４、及びインタフェース装置１０５等を有する。 FIG. 2 is a diagram illustrating a hardware configuration example of the analysis apparatus 10 according to the first embodiment. The analysis apparatus 10 in FIG. 2 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, and the like that are mutually connected by a bus B.

分析装置１０での処理を実現するプログラムは、記録媒体１０１によって提供される。プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing in the analysis apparatus 10 is provided by the recording medium 101. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って分析装置１０に係る機能を実行する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 executes a function related to the analysis device 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network.

なお、記録媒体１０１の一例としては、ＣＤ−ＲＯＭ、ＤＶＤディスク、又はＵＳＢメモリ等の可搬型の記録媒体が挙げられる。また、補助記憶装置１０２の一例としては、ＨＤＤ（Hard Disk Drive）又はフラッシュメモリ等が挙げられる。記録媒体１０１及び補助記憶装置１０２のいずれについても、コンピュータ読み取り可能な記録媒体に相当する。 An example of the recording medium 101 is a portable recording medium such as a CD-ROM, a DVD disk, or a USB memory. An example of the auxiliary storage device 102 is an HDD (Hard Disk Drive) or a flash memory. Both the recording medium 101 and the auxiliary storage device 102 correspond to computer-readable recording media.

図３は、第１の実施の形態における分析装置１０の機能構成例を示す図である。図３において、分析装置１０は、共起性解析部１１及び不正アクセス抽出部１２等を有する。これら各部は、分析装置１０にインストールされた１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。分析装置１０は、また、機密ワード記憶部１３、機密ディレクトリリスト記憶部１４及び不正アクセス候補記憶部１５等を利用する。これら各記憶部は、例えば、補助記憶装置１０２、又は分析装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 FIG. 3 is a diagram illustrating a functional configuration example of the analysis apparatus 10 according to the first embodiment. In FIG. 3, the analysis apparatus 10 includes a co-occurrence analysis unit 11 and an unauthorized access extraction unit 12. Each of these units is realized by processing executed by the CPU 104 by one or more programs installed in the analysis apparatus 10. The analysis apparatus 10 also uses a confidential word storage unit 13, a confidential directory list storage unit 14, an unauthorized access candidate storage unit 15, and the like. Each of these storage units can be realized using, for example, a storage device that can be connected to the auxiliary storage device 102 or the analysis device 10 via a network.

機密ワード記憶部１３には、初期状態において、初期リストが記憶されている。初期リストとは、機密情報に関連が有る可能性が高い単語等の文字列（以下、「機密ワード」という。）のうち、予め設定される機密ワード（機密語）のリストである。機密ワード記憶部１３には、また、学習リストが記憶される。学習リストは、ファイルサーバ２０に記憶されている各ファイルのファイルパス文字列から機密ワードであることが学習（推定）された機密ワードのリストである。 The secret word storage unit 13 stores an initial list in an initial state. The initial list is a list of secret words (confidential words) set in advance among character strings such as words (hereinafter referred to as “confidential words”) that are highly likely to be related to the confidential information. The secret word storage unit 13 also stores a learning list. The learning list is a list of secret words that have been learned (estimated) as secret words from the file path character string of each file stored in the file server 20.

共起性解析部１１は、ファイルサーバ２０のディレクトリごとに、当該ディレクトリに格納されている各ファイルのファイルパス文字列から、初期リストに含まれる機密ワードと共起して出現する単語を抽出し、当該単語の出現回数をカウント（計数）する。なお、当該単語の出現回数が所定の条件を満たす場合（例えば、当該単語数が多く、かつ、当該各単語の出現回数が多い場合）、当該単語は、学習された機密ワードとして学習リストに追加される。 The co-occurrence analysis unit 11 extracts, for each directory of the file server 20, a word that appears along with the confidential word included in the initial list from the file path character string of each file stored in the directory. The number of occurrences of the word is counted (counted). When the number of appearances of the word satisfies a predetermined condition (for example, when the number of words is large and the number of appearances of each word is large), the word is added to the learning list as a learned confidential word. Is done.

共起性解析部１１は、上記所定の条件を満たしたディレクトリを「機密ディレクトリ」として判定（抽出）する。 The co-occurrence analysis unit 11 determines (extracts) a directory that satisfies the predetermined condition as a “confidential directory”.

図４は、共起性解析部１１の機能の概要を説明するための図である。図４において、ファイルサーバ２０を示す矩形内には、ファイルサーバ２０内のディレクトリ構成が例示されている。当該ディレクトリ構成では、「Ｙプロジェクト」ディレクトリの下に、「他社情報（秘密）ディレクトリ、「社外資料」ディレクトリ、及び「他社製品紹介」ディレクトリが有る。これら各ディレクトリの下には、図４に示されるファイルが格納されている。 FIG. 4 is a diagram for explaining the outline of the function of the co-occurrence analysis unit 11. In FIG. 4, a directory structure in the file server 20 is illustrated in a rectangle indicating the file server 20. In the directory structure, there are a “other company information (secret) directory”, an “external material” directory, and a “other company product introduction” directory under the “Y project” directory. The files shown in FIG. 4 are stored under each of these directories.

また、図４には、初期リストを構成する機密ワードが、「秘密」、「他社」であることが示されている。 Further, FIG. 4 shows that the secret words constituting the initial list are “secret” and “other company”.

このような状況において、共起性解析部１１は、「Ｙプロジェクト／他社情報（秘密）」ディレクトリ格納される３つのファイルの各ファイルパス文字列を構成する単語について、行列Ｍ１に示されるように、各機密ワードとの共起回数をカウントする。例えば、行列Ｍ１では、「秘密」と「情報」との共起回数が４となっている。これは、「Ｙプロジェクト／他社情報（秘密）／Ａ社取引情報．ｘｌｓ」において２回、「Ｙプロジェクト／他社情報（秘密）／Ｂ社発注予算．ｘｌｓ」において１回、「Ｙプロジェクト／他社情報（秘密）／Ｃ社知財資料．ｘｌｓ」において１回の共起が発生しており、これらの合計が４回だからである。 In such a situation, the co-occurrence analysis unit 11 causes the words constituting the file path character strings of the three files stored in the “Y project / other company information (secret)” directory as shown in the matrix M1. Count the number of co-occurrence with each confidential word. For example, in the matrix M1, the number of co-occurrence of “secret” and “information” is 4. This is twice for "Y project / other company information (secret) / company A transaction information.xls", once for "Y project / other company information (secret) / B company ordering budget.xls", "Y project / other company information. This is because one occurrence of co-occurrence occurs in “Information (secret) / Company C intellectual property material.xls”, and the total of these is four times.

「Ｙプロジェクト／社外資料」ディレクトリについては、行列Ｍ２に示されるように共起回数がカウントされ、「Ｙプロジェクト／他社製品紹介」ディレクトリについては行列Ｍ３に示されるように、共起回数がカウントされる。 For the “Y project / external document” directory, the number of co-occurrence is counted as shown in the matrix M2, and for the “Y project / other company product introduction” directory, the number of co-occurrence is counted as shown in the matrix M3. The

共起性解析部１１は、行列Ｍ１、Ｔ２、Ｔ３のカウント結果に基づいて、機密ディレクトリを判定する。図４の例では、「Ｙプロジェクト／他社情報（秘密）」ディレクトリが機密ディレクトリとして判定される例が示されている。この場合、行列Ｍ１において列を構成した各単語の一覧は、「Ｙプロジェクト／他社情報（秘密）」ディレクトリに対する学習リストとして、当該ディレクトリに関連付けられて機密ワード記憶部１３に記憶される。すなわち、学習リストは、機密ディレクトリと判定されたディレクトリごとに生成される。 The co-occurrence analysis unit 11 determines the confidential directory based on the count results of the matrices M1, T2, and T3. The example of FIG. 4 shows an example in which the “Y project / other company information (secret)” directory is determined as a confidential directory. In this case, the list of each word constituting the column in the matrix M1 is stored in the confidential word storage unit 13 in association with the directory as a learning list for the “Y project / other company information (secret)” directory. That is, the learning list is generated for each directory that is determined to be a confidential directory.

不正アクセス抽出部１２は、ファイルサーバ２０に記憶されている各ファイルへのアクセスごとに記録されているアクセスログの中から、機密ディレクトリに属するファイルへのアクセスに対応するアクセスログを不正アクセスの候補として抽出し、抽出されたアクセスログを不正アクセス候補記憶部１５に記憶する。 The unauthorized access extraction unit 12 selects an access log corresponding to access to a file belonging to the confidential directory from access logs recorded for each access to each file stored in the file server 20 as candidates for unauthorized access. And the extracted access log is stored in the unauthorized access candidate storage unit 15.

以下、分析装置１０が実行する処理手順について説明する。図５は、学習リストの生成及び機密ディレクトリの判定のための処理手順の一例を説明するためのフローチャートである。 Hereinafter, a processing procedure executed by the analysis apparatus 10 will be described. FIG. 5 is a flowchart for explaining an example of a processing procedure for generating a learning list and determining a confidential directory.

ステップＳ１０１において、共起性解析部１１は、ファイルサーバ２０に記憶されている（に形成されている）ディレクトリのパス文字列（パス名）の一覧（以下、「ディレクトリ一覧」という。）をファイルサーバ２０から取得する。 In step S101, the co-occurrence analysis unit 11 stores a list of directory path character strings (path names) stored in (formed in) the file server 20 (hereinafter referred to as “directory list”) as a file. Obtain from the server 20.

図６は、ディレクトリ一覧の一例を示す図である。図６の（１）に示されるように、ディレクトリ一覧は、ディレクトリのパス文字列の一覧である。図６の（１）は、図６の（２）に示されるようなディレクトリ構成に対応するディレクトリ一覧に相当する。なお、ディレクトリ一覧にはファイルのパス文字列は含まれない。（２）において、ファイルが破線で示されているのはこのことを表現している。また、（１）において、＃の列は、便宜上、各パス文字列に付与されている識別子である。また、ディレクトリ一覧には、ファイルサーバ２０における全てのディレクトリのパス文字列が含まれてもよいし、例えば、分析者等によって分析対象として選定された一部のディレクトリのパス文字列が含まれてもよい。 FIG. 6 is a diagram illustrating an example of a directory list. As shown in (1) of FIG. 6, the directory list is a list of directory path character strings. (1) in FIG. 6 corresponds to a directory list corresponding to the directory structure as shown in (2) in FIG. The directory list does not include the file path string. In (2), the fact that the file is indicated by a broken line represents this. In (1), the # column is an identifier given to each path character string for convenience. Further, the directory list may include path character strings of all directories in the file server 20, for example, include path character strings of some directories selected as analysis targets by an analyst or the like. Also good.

続いて、共起性解析部１１は、ディレクトリ一覧のうちの一つのパス文字列を処理対象として取得する（Ｓ１０２）。当該パス文字列に係るディレクトリを以下「対象ディレクトリ」という。なお、既に、ステップＳ１０４以降において処理対象とされたパス文字列は、取得対象から除外される。 Subsequently, the co-occurrence analysis unit 11 acquires one path character string in the directory list as a processing target (S102). The directory related to the path character string is hereinafter referred to as “target directory”. Note that path character strings that have already been processed in step S104 and subsequent steps are excluded from acquisition targets.

パス文字列を取得できない場合（すなわち、ディレクトリ一覧に含まれる全てのパス文字列について処理が実行された場合）（Ｓ１０３でＮｏ）、図５の処理手順は終了する。パス文字列を取得できた場合（Ｓ１０３でＹｅｓ）、共起性解析部１１は、対象ディレクトリ下に保存されている各ファイルのファイルパス文字列（ファイルパス名）の一覧をファイルサーバ２０から取得する（Ｓ１０４）。当該一覧が取得されない場合（すなわち、対象ディレクトリ下にファイルが保存されていない場合）（Ｓ１０５でＮｏ）、ステップＳ１０２に戻る。 When the path character string cannot be acquired (that is, when processing is performed for all the path character strings included in the directory list) (No in S103), the processing procedure of FIG. 5 ends. If the path character string can be acquired (Yes in S103), the co-occurrence analysis unit 11 acquires a list of file path character strings (file path names) of the files stored under the target directory from the file server 20. (S104). When the list is not acquired (that is, when a file is not stored under the target directory) (No in S105), the process returns to step S102.

当該一覧が取得された場合（Ｓ１０５でＹｅｓ）、共起性解析部１１は、対象ディレクトリに対応する学習リスト（以下、「対象学習リスト」という。）が既に生成されているか（対象ディレクトリに対応する学習リストが機密ワード記憶部１３に記憶されているか）否かを判定する（Ｓ１０６）。上述したように、学習リストは、ディレクトリごとに生成され、例えば、各ディレクトリの識別情報（例えば、パス文字列）に関連付けられて機密ワード記憶部１３に記憶される。 When the list is acquired (Yes in S105), the co-occurrence analysis unit 11 has already generated a learning list corresponding to the target directory (hereinafter referred to as “target learning list”) (corresponding to the target directory). Whether or not the learning list to be stored is stored in the confidential word storage unit 13) (S106). As described above, the learning list is generated for each directory, and is stored in the confidential word storage unit 13 in association with identification information (for example, a path character string) of each directory, for example.

対象学習リストが機密ワード記憶部１３に記憶されていない場合（Ｓ１０６でＮｏ）、共起性解析部１１は、初期リストから機密ワードの一覧を読み込む（Ｓ１０７）。したがって、ステップＳ１０８が対象ディレクトリに関して初めて実行される場合、初期リストから機密ワードが読み込まれる。 When the target learning list is not stored in the confidential word storage unit 13 (No in S106), the co-occurrence analysis unit 11 reads a list of confidential words from the initial list (S107). Therefore, when step S108 is first executed for the target directory, the secret word is read from the initial list.

図７は、初期リストの一例を示す図である。図７に示されるように、初期リストは、機密情報に関連が有る可能性が高い文字列（単語）として予め設定された文字列のリストである。 FIG. 7 is a diagram illustrating an example of the initial list. As shown in FIG. 7, the initial list is a list of character strings set in advance as character strings (words) that are highly likely to be related to confidential information.

一方、対象学習リストが機密ワード記憶部１３に記憶されている場合（Ｓ１０６でＹｅｓ）、共起性解析部１１は、対象学習リストから機密ワードの一覧を読み込む。 On the other hand, when the target learning list is stored in the confidential word storage unit 13 (Yes in S106), the co-occurrence analysis unit 11 reads a list of confidential words from the target learning list.

以下、ステップＳ１０７又はＳ１０８において読み込まれた機密ワードの一覧を、「対象機密ワードリスト」という。 Hereinafter, the list of confidential words read in step S107 or S108 is referred to as “target confidential word list”.

続いて、共起性解析部１１は、ステップＳ１０４において取得されたファイルパス文字列の一覧のうち、未処理の１つのファイルパス文字列を処理対象として取得する（Ｓ１０９）。未処理とは、ステップＳ１１０以降について処理対象とされていないことをいう。取得されたファイルパス文字列を以下「対象ファイルパス文字列」という。 Subsequently, the co-occurrence analysis unit 11 acquires one unprocessed file path character string as a processing target from the list of file path character strings acquired in step S104 (S109). “Unprocessed” means that the processing is not performed for step S110 and subsequent steps. The obtained file path character string is hereinafter referred to as “target file path character string”.

続いて、共起性解析部１１は、対象ファイルパス文字列を、形態素解析等を利用して単語に分解する（Ｓ１１０）。続いて、共起性解析部１１は、対象機密ワードリストに含まれる機密ワードのうち、対象ファイルパス文字列から分解されたいずれかの単語と一致する機密ワードが有れば（すなわち、対象ファイルパス文字列が当該機密ワードを含めば）、該当する機密ワードごとに、対象ファイルパス文字列を構成する単語群のうち当該機密ワードに一致する単語を除く各単語（つまり、当該機密ワードとの共起語）の出現回数（共起回数）をカウント（特定）し、カウント結果を、例えばメモリ装置１０３に記憶する（Ｓ１１１）。 Subsequently, the co-occurrence analysis unit 11 decomposes the target file path character string into words using morphological analysis or the like (S110). Subsequently, the co-occurrence analysis unit 11 determines that there is a confidential word that matches one of the words decomposed from the target file path character string among the confidential words included in the target confidential word list (that is, the target file). If the path character string includes the confidential word), each word other than the word that matches the confidential word in the word group constituting the target file path character string (that is, the confidential word The number of occurrences (co-occurrence number) of (co-occurrence words) is counted (specified), and the count result is stored in, for example, the memory device 103 (S111).

続いて、共起性解析部１１は、ファイルパス文字列の一覧に含まれる全てのファイルパス文字列を処理したか否かを判定する（Ｓ１１２）。未処理のファイルパス文字列が有る場合（Ｓ１１２でＮｏ）、ステップＳ１０９以降が繰り返される。 Subsequently, the co-occurrence analysis unit 11 determines whether all file path character strings included in the list of file path character strings have been processed (S112). When there is an unprocessed file path character string (No in S112), Step S109 and subsequent steps are repeated.

全てのファイルパス文字列が処理された場合（Ｓ１１２でＹｅｓ）、共起性解析部１１は、ステップＳ１１１においていずれかの単語と一致した機密ワード（いずれかのファイルパス文字列に含まれる機密ワード）を行とし、いずれかの機密ワードとの共起語を列とし、各行の機密ワードと各列の共起語との共起回数の合計（すなわち、ファイルパス文字列別にカウントされた共起回数の合計）を要素とする行列（以下、「対象行列」という。）を生成する（Ｓ１１３）。 When all the file path character strings have been processed (Yes in S112), the co-occurrence analysis unit 11 determines that the confidential word that matches any word in Step S111 (the confidential word included in any file path character string) ) As a row, a co-occurrence word with one of the confidential words as a column, and the total number of co-occurrence times of the confidential word in each row with the co-occurrence word in each column (ie, co-occurrence counted by file path string) A matrix (hereinafter referred to as “target matrix”) whose elements are the sum of the number of times is generated (S113).

図８は、共起性の解析結果の行列の一例を説明するための図である。図８において最上位に示される行列は、図４の行列Ｍ１に対応する。なお、図８では、共起性の解析結果を示す行列が、ディレクトリごとに生成されることを示しており、ステップＳ１１３の１回の実行によって、図８に示される複数の行列が生成されることを意味するものではない。 FIG. 8 is a diagram for explaining an example of a co-occurrence analysis result matrix. The matrix shown at the top in FIG. 8 corresponds to the matrix M1 in FIG. FIG. 8 shows that a matrix indicating the co-occurrence analysis result is generated for each directory, and a plurality of matrices shown in FIG. 8 are generated by executing step S113 once. It doesn't mean that.

続いて、共起性解析部１１は、対象行列の密度を計算する（Ｓ１１４）。当該密度は、例えば、以下の式によって計算されてもよいし、他の式に基づいて計算されてもよい。 Subsequently, the co-occurrence analysis unit 11 calculates the density of the target matrix (S114). The density may be calculated by, for example, the following formula or may be calculated based on another formula.

続いて、共起性解析部１１は、当該密度が予め設定される閾値以上であるか否かを判定する（Ｓ１１５）。すなわち、本実施の形態において、当該密度が閾値以上であるか否かが、各単語の出現回数が所定の条件を満たすか否かの一例である。 Subsequently, the co-occurrence analysis unit 11 determines whether or not the density is equal to or higher than a preset threshold value (S115). That is, in the present embodiment, whether or not the density is equal to or higher than a threshold is an example of whether or not the number of appearances of each word satisfies a predetermined condition.

当該密度が閾値未満であれば（Ｓ１１５でＮｏ）、ステップＳ１０２へ戻る。当該密度が閾値以上であれば（Ｓ１１５でＹｅｓ）、共起性解析部１１は、対象ディレクトリのパス文字列を機密ディレクトリリスト記憶部１４に記憶する（Ｓ１１６）。すなわち、対象ディレクトリが機密ディレクトリとして判定（推定）される。 If the density is less than the threshold (No in S115), the process returns to step S102. If the density is equal to or higher than the threshold (Yes in S115), the co-occurrence analysis unit 11 stores the path character string of the target directory in the confidential directory list storage unit 14 (S116). That is, the target directory is determined (estimated) as a confidential directory.

続いて、共起性解析部１１は、いずれかの機密ワードの共起語とされた単語（すなわち、ステップＳ１１３において生成された対象行列の列を構成した単語）を、機密ワード記憶部１３に記憶されている対象ディレクトリに対する学習リストに追加し（Ｓ１１７）、ステップＳ１０２以降を繰り返す。但し、対象ディレクトリに対する学習リストが未生成である場合には、共起性解析部１１は、いずれかの機密ワードの共起語とされた単語と、初期リストに含まれる各機密ワードとを含む学習リストを生成し、当該学習リストを対象ディレクトリの識別情報に関連付けて機密ワード記憶部１３に記憶する。 Subsequently, the co-occurrence analysis unit 11 stores the word that is a co-occurrence word of any confidential word (that is, the word constituting the column of the target matrix generated in step S113) in the confidential word storage unit 13. It adds to the learning list for the stored target directory (S117), and repeats step S102 and subsequent steps. However, when the learning list for the target directory has not been generated, the co-occurrence analysis unit 11 includes a word that is a co-occurrence word of any confidential word and each confidential word included in the initial list. A learning list is generated, and the learning list is stored in the confidential word storage unit 13 in association with the identification information of the target directory.

図９は、学習リストの一例を示す図である。図９に示す学習リストには、図７に示した初期リストに対して、図８行列の列を構成する各単語が機密ワードとして追加されている。 FIG. 9 is a diagram illustrating an example of a learning list. In the learning list shown in FIG. 9, each word constituting the column of the matrix in FIG. 8 is added as a confidential word to the initial list shown in FIG.

なお、図５の処理手順は、定期的等、複数のタイミングにおいて実行されてもよい。そうすることで、新たなファイル又はディレクトリの追加に応じて各学習リストを更新することができる。 Note that the processing procedure of FIG. 5 may be executed at a plurality of timings, such as periodically. By doing so, each learning list can be updated as new files or directories are added.

図１０は、不正アクセスの抽出処理の処理手順の一例を説明するためのフローチャートである。 FIG. 10 is a flowchart for explaining an example of the processing procedure of unauthorized access extraction processing.

ステップＳ２０１において、不正アクセス抽出部１２は、機密ディレクトリリスト記憶部１４に記憶されている、機密ディレクトリのパス文字列のうちの一つを取得する。取得されたパス文字列を以下「対象パス文字列」という。 In step S <b> 201, the unauthorized access extraction unit 12 acquires one of the path character strings of the confidential directory stored in the confidential directory list storage unit 14. The acquired path string is hereinafter referred to as “target path string”.

続いて、不正アクセス抽出部１２は、例えば、一定期間内のアクセスログのうちの１行の（１つの）アクセスログをファイルサーバ２０から取得する（Ｓ２０２）。 Subsequently, the unauthorized access extraction unit 12 acquires, from the file server 20, for example, one row of access logs in an access log within a certain period (S 202).

図１１は、アクセスログの一例を示す図である。図１１に示されるように、アクセスログは、いずれかのファイルへのアクセスごとに、アクセス元ＩＰ、アクセス時刻、アクセス先パス、及び操作等の項目の値を含む。アクセス元ＩＰは、アクセス元のユーザ端末３０等のＩＰアドレスである。アクセス時刻は、例えば、アクセスが発生した日時等、アクセス時期を示す情報である。アクセス先パスは、アクセス対象とされたファイルのパス文字列である。操作は、アクセス対象のファイルに対して行われた操作種別を示す情報である。例えば、Ｒｅａｄ（参照）、Ｗｒｉｔｅ（書き込み）、Ｄｅｌｅｔｅ（削除）、Ｒｅａｄ−ｆａｉｌｕｒｅ（参照失敗）、Ｗｒｉｔｅ−ｆａｉｌｕｒｅ（書き込み失敗）、Ｄｅｌｅｔｅ−ｆａｉｌｕｒｅ（削除失敗）等が操作として記録されてもよい。 FIG. 11 is a diagram illustrating an example of an access log. As shown in FIG. 11, the access log includes values of items such as an access source IP, an access time, an access destination path, and an operation for each access to any file. The access source IP is an IP address of the access source user terminal 30 or the like. The access time is information indicating the access time such as the date and time when the access occurred. The access destination path is a path character string of a file to be accessed. The operation is information indicating the type of operation performed on the file to be accessed. For example, Read (reference), Write (write), Delete (deletion), Read-failure (reference failure), Write-failure (write failure), Delete-failure (deletion failure), and the like may be recorded as operations.

ステップＳ２０２では、図１１に示されるようなアクセスログ群のうちの１行分が取得される。取得されたアクセスログを以下「対象ログ」という。 In step S202, one line of the access log group as shown in FIG. 11 is acquired. The acquired access log is hereinafter referred to as “target log”.

続いて、不正アクセス抽出部１２は、対象ログのアクセス先パスに、対象パス文字列が含まれているか否かを判定する（Ｓ２０３）。対象ログのアクセス先パスに対象パス文字列が含まれている場合（Ｓ２０３でＹｅｓ）、不正アクセス抽出部１２は、対象ログを不正アクセス候補記憶部１５に記憶する（Ｓ２０４）。すなわち、対象ログに係るアクセスが、不正アクセスの候補として判定される。 Subsequently, the unauthorized access extraction unit 12 determines whether or not the target path character string is included in the access destination path of the target log (S203). When the target path character string is included in the access destination path of the target log (Yes in S203), the unauthorized access extraction unit 12 stores the target log in the unauthorized access candidate storage unit 15 (S204). That is, the access related to the target log is determined as a candidate for unauthorized access.

一方、対象ログのアクセス先パスに、対象パス文字列が含まれていない場合（Ｓ２０３でＮｏ）、不正アクセス抽出部１２は、一定期間内の全てのアクセスログを取得したか否かを判定する（Ｓ２０５）、取得されていないアクセスログが有る場合（Ｓ２０５でＮｏ）、ステップＳ２０２以降が繰り返される。全てのアクセスログが取得された場合（Ｓ２０５でＹｅｓ）、不正アクセス抽出部１２は、機密ディレクトリリスト記憶部１４に記憶されている全てのパス文字列が取得されたか否かを判定する（Ｓ２０６）。取得されていないパス文字列が有る場合（Ｓ２０６でＮｏ）、ステップＳ２０１以降が繰り返される。全てのパス文字列が取得された場合（Ｓ２０６でＹｅｓ）、図１０の処理手順が終了する。 On the other hand, when the target path character string is not included in the access destination path of the target log (No in S203), the unauthorized access extraction unit 12 determines whether all access logs within a certain period have been acquired. (S205) When there is an access log that has not been acquired (No in S205), Step S202 and subsequent steps are repeated. When all access logs have been acquired (Yes in S205), the unauthorized access extraction unit 12 determines whether all path character strings stored in the confidential directory list storage unit 14 have been acquired (S206). . If there is a path character string that has not been acquired (No in S206), Step S201 and subsequent steps are repeated. When all the path character strings have been acquired (Yes in S206), the processing procedure in FIG. 10 ends.

なお、ステップＳ２０３では、対象パス文字列が対象ログのアクセス先パスに前方一致するか否かが判定されてもいい。 In step S203, it may be determined whether the target path character string matches the access destination path of the target log.

上述したように、第１の実施の形態によれば、初期値として設定された機密ワードに基づいて、ファイルサーバ２０の各ファイルのファイルパス文字列から新たな機密ワードを学習（抽出）し、これらの機密ワードに基づいて、機密ディレクトリを特定又は推定することができる。したがって、第２の実施の形態によれば、機密情報の格納先の抽出精度を高めることができる。 As described above, according to the first embodiment, based on the confidential word set as the initial value, a new confidential word is learned (extracted) from the file path character string of each file of the file server 20, Based on these secret words, secret directories can be identified or inferred. Therefore, according to the second embodiment, it is possible to improve the extraction accuracy of the storage destination of confidential information.

また、機密ワードは、ディレクトリごとに学習される。したがって、企業Ａで扱う機密情報が部署によって異なる場合等、機密ワードがディレクトリに応じて異なる場合であっても、適切に機密ワードを特定することができる。 Also, the secret word is learned for each directory. Therefore, even when the confidential word varies depending on the directory, such as when the confidential information handled by the company A is different depending on the department, the confidential word can be appropriately specified.

また、機密ディレクトリに格納されたファイルへのアクセスを示すアクセスログが抽出されることで、分析者等は、分析対象とするアクセスログを絞り込むことができ、分析作業を効率化することができる。 Further, by extracting an access log indicating access to a file stored in the confidential directory, an analyst or the like can narrow down an access log to be analyzed, and can improve analysis work efficiency.

次に、第２の実施の形態について説明する。第２の実施の形態では第１の実施の形態と異なる点について説明する。第２の実施の形態において、特に言及されない点については、第１の実施の形態と同様でもよい。 Next, a second embodiment will be described. In the second embodiment, differences from the first embodiment will be described. In the second embodiment, points that are not particularly mentioned may be the same as those in the first embodiment.

ファイルサーバ２０に記憶されているファイルへのアクセスとしては、ユーザアクセスの他に、機械アクセスが有る。 Access to the file stored in the file server 20 includes machine access in addition to user access.

機密ディレクトリが、機械アクセスの対象外であれば、機密ディレクトリに対する機械アクセスは、マルウェアによるファイルスキャンなどの不正アクセスである可能性が高い。この場合、不正アクセス抽出部１２による不正アクセスの抽出処理において処理対象とするアクセスログを、機械アクセスに係るアクセスログに限定すれば、当該抽出処理の処理効率の向上を期待することができる。そのためには、機密ディレクトリ下のファイルに対する機械アクセスを自動的に検知できる必要が有る。 If the confidential directory is not subject to machine access, the machine access to the confidential directory is likely to be unauthorized access such as file scanning by malware. In this case, if the access log to be processed in the unauthorized access extraction process by the unauthorized access extraction unit 12 is limited to the access log related to machine access, it can be expected to improve the processing efficiency of the extraction process. For this purpose, it is necessary to be able to automatically detect machine access to files under the confidential directory.

そこで、第２の実施の形態では、分析装置１０が、機械アクセスを自動的に検知可能とする例について説明する。 Therefore, in the second embodiment, an example in which the analyzer 10 can automatically detect machine access will be described.

図１２は、第２の実施の形態における分析装置１０の機能構成例を示す図である。図１２中、図３と同一部分には同一符号を付し、その説明は省略する。図１２において、分析装置１０は、アクセスログ群を機械アクセスに関する可能性の高いアクセスログ（以下、「機械アクセス候補ログ」という。）と、ユーザアクセスに関する可能性の高いアクセスログ（以下、「ユーザアクセス候補ログ」という。）とに分類（分別）するために、更に、クロス集計表生成部１６、継続事象特定部１７及びアクセスログ分類部１８等を有する。これら各部は、分析装置１０にインストールされた１以上のプログラムが、ＣＰＵ１０４に実行させる処理により実現される。分析装置１０は、また、継続事象記憶部１９を更に利用する。継続事象記憶部１９は、例えば、補助記憶装置１０２、又は分析装置１０にネットワークを介して接続可能な記憶装置等を用いて実現可能である。 FIG. 12 is a diagram illustrating a functional configuration example of the analysis apparatus 10 according to the second embodiment. In FIG. 12, the same parts as those of FIG. In FIG. 12, the analysis apparatus 10 divides the access log group into an access log with a high possibility of machine access (hereinafter referred to as “machine access candidate log”) and an access log with a high possibility of user access (hereinafter referred to as “user”). In addition, a cross tabulation table generation unit 16, a continuation event identification unit 17, an access log classification unit 18, and the like are included. Each of these units is realized by processing executed by the CPU 104 by one or more programs installed in the analysis apparatus 10. The analysis apparatus 10 further uses the continuation event storage unit 19. The continuation event storage unit 19 can be realized by using, for example, a storage device that can be connected to the auxiliary storage device 102 or the analysis device 10 via a network.

クロス集計表生成部１６は、例えば、過去の一週間等の一定期間において記録されたアクセスログについて、一定周期の期間（例えば、１日）ごとに、クロス集計表を生成する。クロス集計表とは、単位時間（例えば、３０分）ごとのアクセス先のファイルパス文字列の順序集合（時系列集合）を、アクセス元別に分類した表をいう。 For example, the cross tabulation table generation unit 16 generates a cross tabulation table for each period of a predetermined cycle (for example, one day) for access logs recorded in a predetermined period such as the past week. The cross tabulation table is a table in which an ordered set (time series set) of access destination file path character strings per unit time (for example, 30 minutes) is classified by access source.

図１３は、クロス集計表の生成を説明するための図である。図１３では、一週間分のアクセスログ群が１日ごとに分割され、各日のアクセスログ群ごとに、クロス集計表が生成される例が示されている。図１３における下段の表が、クロス集計表を示す。クロス集計表の行はアクセス元ごとに区別され、列は単位時間ごとに区別されている。クロス集計表のいずれかのセル内の楕円が、当該セルに係るアクセス元及び単位時間に関して抽出された順序集合を示す。したがって、図１３では、１日目〜７日目まで各日のアクセスログ群から順序集合ｓ１及びｓ２が抽出され、３日目以降のアクセスログ群から、更に、順序集合ｓ３が抽出された例が示されている。なお、内容が同じである順序集合には同一符号が付されている。 FIG. 13 is a diagram for explaining generation of a cross tabulation table. FIG. 13 shows an example in which an access log group for one week is divided every day, and a cross tabulation table is generated for each day's access log group. The lower table in FIG. 13 shows a cross tabulation table. The rows of the cross tabulation table are distinguished for each access source, and the columns are distinguished for each unit time. An ellipse in any cell of the cross tabulation table indicates an ordered set extracted with respect to the access source and unit time related to the cell. Therefore, in FIG. 13, an example in which ordered sets s1 and s2 are extracted from the access log groups for each day from the first day to the seventh day, and an ordered set s3 is further extracted from the access log groups on and after the third day. It is shown. In addition, the same code | symbol is attached | subjected to the ordered set with the same content.

継続事象特定部１７は、各クロス集計表（各日）の順序集合を比較することにより、各日において継続して（反復して）出現している順序集合（すなわち、１日周期で出現している順序集合）を特定する。各日において継続している順序集合の組を、以下「継続事象」という。継続事象の特定結果は、継続事象記憶部１９に記憶される。 The continuation event specifying unit 17 compares the ordered sets of the respective cross tabulation tables (each day), so that the ordered set that appears continuously (repeatedly) on each day (that is, appears in a daily cycle). Specified ordered set). A set of ordered sets continuing on each day is hereinafter referred to as a “continuation event”. The continuation event specifying result is stored in the continuation event storage unit 19.

図１４は、継続事象の特定を説明するための図である。図１４では、同一の内容を有する順序集合が線によって接続されている。ここで、順序集合ｓ１及びｓ２が、１週間の各日において出現している。したがって、順序集合ｓ１及びｓ２が継続事象として特定される。 FIG. 14 is a diagram for explaining identification of a continuation event. In FIG. 14, ordered sets having the same contents are connected by lines. Here, the ordered sets s1 and s2 appear on each day of one week. Therefore, the ordered sets s1 and s2 are specified as continuation events.

アクセスログ分類部１８は、継続事象記憶部１９に記憶された継続事象に基づいて、各アクセスログを機械アクセス候補ログ又はユーザアクセス候補ログに分類する。 The access log classification unit 18 classifies each access log into a machine access candidate log or a user access candidate log based on the continuation event stored in the continuation event storage unit 19.

以下、分析装置１０が実行する処理手順について説明する。図１５は、クロス集計表の生成処理の処理手順の一例を説明するためのフローチャートである。 Hereinafter, a processing procedure executed by the analysis apparatus 10 will be described. FIG. 15 is a flowchart for explaining an example of the processing procedure of the cross tabulation table generation processing.

ステップＳ３０１において、クロス集計表生成部１６は、分析対象期間の設定を取得する。分析対象期間とは、分析対象とするアクセスログを特定するための期間をいい、例えば、当該期間の開始日及び終了日によって指定される。図１３における１週間が分析対象期間に相当する。分析対象期間は、予め設定されていてもよいし、ステップＳ３０１のタイミングでユーザによって入力されてもよい。 In step S301, the cross tabulation table generation unit 16 acquires the setting of the analysis target period. The analysis target period is a period for specifying an access log to be analyzed, and is specified by, for example, a start date and an end date of the period. One week in FIG. 13 corresponds to the analysis target period. The analysis target period may be set in advance or may be input by the user at the timing of step S301.

続いて、クロス集計表生成部１６は、クロス集計表の生成単位とする期間（以下、「生成期間」という。）の設定を取得する（Ｓ３０２）。図１３における１日が、生成期間に相当する。生成期間は、予め設定されていてもよいし、ステップＳ３０２のタイミングでユーザによって入力されてもよい。なお、機械アクセスが発生する周期に基づいて生成期間が設定されてもよい。例えば、機械アクセスが２日ごとに発生することが予定されていれば、生成期間は２日に設定されてもよい。 Subsequently, the cross tabulation table generation unit 16 acquires the setting of a period (hereinafter referred to as “generation period”) as a generation unit of the cross tabulation table (S302). One day in FIG. 13 corresponds to the generation period. The generation period may be set in advance or may be input by the user at the timing of step S302. Note that the generation period may be set based on a cycle in which machine access occurs. For example, if the machine access is scheduled to occur every two days, the generation period may be set to two days.

続いて、クロス集計表生成部１６は、変数ｉに１を代入する（Ｓ３０３）。図１５において、変数ｉは、処理対象とされている生成期間の順番を記憶するための変数である。 Subsequently, the cross tabulation table generation unit 16 substitutes 1 for the variable i (S303). In FIG. 15, a variable i is a variable for storing the order of generation periods to be processed.

続いて、クロス集計表生成部１６は、ｉ番目の生成期間のアクセスログ群（図１１）をファイルサーバ２０から読み込む（Ｓ３０４）。 Subsequently, the cross tabulation table generation unit 16 reads the access log group (FIG. 11) for the i-th generation period from the file server 20 (S304).

ステップＳ３０４では、アクセス時刻がｉ番目の生成期間に含まれるアクセスログ群（以下、「対象アクセスログ群」という。）が読み込まれる。 In step S304, an access log group whose access time is included in the i-th generation period (hereinafter referred to as “target access log group”) is read.

続いて、クロス集計表生成部１６は、対象アクセスログ群に含まれている各アクセスログのアクセス元ＩＰの種類と、単位時間の種類とを特定し、空のクロス集計表を生成する（Ｓ３０５）。以下、生成されたクロス集計表を「対象クロス集計表」という。単位時間の種類の特定とは、ｉ番目の生成期間を区切る複数の単位時間のうち、１以上のアクセスログのアクセス時刻を含む単位時間を特定することをいう。生成されるクロス集計表は、特定されたアクセス元ＩＰアドレスの種類を行とし、特定された単位時間の種類を列とする。 Subsequently, the cross tabulation table generation unit 16 specifies the type of access source IP and the type of unit time of each access log included in the target access log group, and generates an empty cross tabulation table (S305). ). Hereinafter, the generated cross tabulation table is referred to as “target cross tabulation table”. Specifying the type of unit time means specifying a unit time including an access time of one or more access logs among a plurality of unit times dividing the i-th generation period. The generated cross tabulation table uses the specified type of access source IP address as a row and the specified type of unit time as a column.

続いて、クロス集計表生成部１６は、対象アクセスログ群について、アクセス元ＩＰ別に、単位時間ごとのアクセス先パスの順序集合を生成する（Ｓ３０６）。１つの順序集合に複数のアクセス先パスが含まれる場合、当該複数のアクセス先パスは、アクセス時刻の昇順に（すなわち、時系列に）整列される。 Subsequently, the cross tabulation table generation unit 16 generates an ordered set of access destination paths for each unit time for each access source IP for the target access log group (S306). When a plurality of access destination paths are included in one ordered set, the plurality of access destination paths are arranged in ascending order of access times (that is, in time series).

続いて、クロス集計表生成部１６は、生成した各順序集合を、対象クロス集計表において、それぞれの順序集合のアクセス元ＩＰ及び単位時間に対応するセルに登録する（Ｓ３０７）。続いて、クロス集計表生成部１６は、対象クロス集計表を、メモリ装置１０３又は補助記憶装置１０２等に記憶する（Ｓ３０８）。 Subsequently, the cross tabulation table generation unit 16 registers each generated ordered set in a cell corresponding to the access source IP and unit time of each ordered set in the target cross tabulation table (S307). Subsequently, the cross tabulation table generation unit 16 stores the target cross tabulation table in the memory device 103 or the auxiliary storage device 102 (S308).

図１６は、クロス集計表の構成例を示す図である。図１６に示されるように、クロス集計表には、アクセス元ＩＰ別に、単位時間ごとのアクセス先の順序集合が登録される。なお、図１６に示されるクロス集計表は、４月１日の１日分の生成期間のアクセスログ群に基づくクロス集計表である。 FIG. 16 is a diagram illustrating a configuration example of a cross tabulation table. As shown in FIG. 16, in the cross tabulation table, an ordered set of access destinations for each unit time is registered for each access source IP. The cross tabulation table shown in FIG. 16 is a cross tabulation table based on the access log group for the generation period for one day on April 1.

続いて、クロス集計表生成部１６は、対象分析期間の全てのアクセスログを読み込んだか否かを判定する（Ｓ３０９）。まだ読み込んでいないアクセスログが有る場合（Ｓ３０９でＮｏ）、クロス集計表生成部１６は、ｉに１を加算して（Ｓ３１０）、ステップＳ３０４以降を繰り返す。全てのアクセスログを読み込んだ場合（Ｓ３０９でＹｅｓ）、クロス集計表生成部１６は、図１５の処理手順を終了する。 Subsequently, the cross tabulation table generation unit 16 determines whether all access logs in the target analysis period have been read (S309). When there is an access log that has not been read yet (No in S309), the cross tabulation table generation unit 16 adds 1 to i (S310), and repeats Step S304 and subsequent steps. When all the access logs have been read (Yes in S309), the cross tabulation table generation unit 16 ends the processing procedure of FIG.

図１５の処理手順が実行されることにより、対象分析期間に含まれる生成期間ごとに、クロス集計表が生成される。 By executing the processing procedure of FIG. 15, a cross tabulation table is generated for each generation period included in the target analysis period.

図１７は、継続事象の特定処理の処理手順の一例を説明するためのフローチャートである。 FIG. 17 is a flowchart for explaining an example of the processing procedure of the continuation event specifying process.

ステップＳ４０１において、継続事象特定部１７は、変数ｉに１を代入する。図１７において、変数ｉは、処理対象とする（注目する）クロス集計表の順番を記憶するための変数である。なお、クロス集計表の順番は、クロス集計表の生成期間の前後関係に従う。すなわち、生成期間が相対的に前であるクロス集計表が当該順番において前である。 In step S401, the continuation event specifying unit 17 substitutes 1 for a variable i. In FIG. 17, a variable i is a variable for storing the order of the cross tabulation table to be processed (attention). The order of the cross tabulation table follows the context of the generation period of the cross tabulation table. That is, the cross tabulation table whose generation period is relatively earlier is the previous in the order.

続いて、継続事象特定部１７は、ｉの値が対象分析期間に関して生成されたクロス集計表の数Ｎ以下であるか否かを判定する（Ｓ４０２）。ｉの値がＮ以下である場合（Ｓ４０２でＹｅｓ）、継続事象特定部１７は、ｉ番目のクロス集計表から未処理の順序集合を１つ取り出す（Ｓ４０３）。ここで、未処理とは、ステップＳ４０４〜Ｓ４０７について未処理であることをいう。以下、ステップＳ４０３において取り出された順序集合を「対象順序集合」という。 Subsequently, the continuation event specifying unit 17 determines whether or not the value of i is equal to or less than the number N of the cross tabulation tables generated for the target analysis period (S402). When the value of i is N or less (Yes in S402), the continuation event specifying unit 17 extracts one unprocessed ordered set from the i-th cross tabulation table (S403). Here, “unprocessed” means that steps S404 to S407 are not processed. Hereinafter, the ordered set extracted in step S403 is referred to as “target ordered set”.

続いて、継続事象特定部１７は、対象順序集合を、各グループの順序集合と比較する（Ｓ４０４）。ここでいうグループとは、後述されるステップＳ４０６又はＳ４０７において、順序集合（アクセス先パスの並び順）の同一性に基づいて各順序集合が分類されることで形成されるグループをいい、図１５の処理の開始時には存在しない。したがって、少なくとも、ｉ＝１であり、かつ、対象順序集合が最初に取り出された順序集合である場合、対象順序集合と一致する順序集合が属するグループは無い。 Subsequently, the continuation event specifying unit 17 compares the target ordered set with the ordered set of each group (S404). The group here means a group formed by classifying each ordered set based on the identity of the ordered set (arrangement order of access destination paths) in step S406 or S407 described later. Does not exist at the start of processing. Therefore, if at least i = 1 and the target ordered set is the first extracted ordered set, there is no group to which the ordered set that matches the target ordered set belongs.

対象順序集合と一致する順序集合が属するグループが無い場合（Ｓ４０５でＮｏ）、継続事象特定部１７は、新たなグループを生成し、当該グループに対象順序集合を含める（Ｓ４０６）。一方、対象順序集合と一致する順序集合が属するグループが有る場合（Ｓ４０５でＹｅｓ）、継続事象特定部１７は、対象順序集合を当該グループに含める（Ｓ４０７）。 When there is no group to which the ordered set that matches the target ordered set belongs (No in S405), the continuation event specifying unit 17 generates a new group and includes the target ordered set in the group (S406). On the other hand, when there is a group to which the ordered set that matches the target ordered set belongs (Yes in S405), the continuation event specifying unit 17 includes the target ordered set in the group (S407).

ステップＳ４０３〜Ｓ４０８が、ｉ番目のクロス集計表に含まれる全ての順序集合について実行されると（Ｓ４０８でＹｅｓ）、継続事象特定部１７は、変数ｉに１を加算して（Ｓ４０９）、ステップＳ４０２以降を繰り返す。変数ｉの値がクロス集計表の数Ｎを超えると（Ｓ４０２でＮｏ）、ステップＳ４１０に進む。この時点において、各クロス集計表の各順序集合は、複数のグループに分類されている。 When steps S403 to S408 are executed for all ordered sets included in the i-th cross tabulation table (Yes in S408), the continuation event specifying unit 17 adds 1 to the variable i (S409), S402 and subsequent steps are repeated. When the value of the variable i exceeds the number N of the cross tabulation table (No in S402), the process proceeds to step S410. At this time, each ordered set of each cross tabulation table is classified into a plurality of groups.

ステップＳ４１０において、継続事象特定部１７は、全てのクロス集計表から取り出された順序集合が属するグループの有無を判定する。すなわち、各日（各生成周期）において出現する順序集合が属するグループの有無が判定される。 In step S410, the continuation event specifying unit 17 determines whether there is a group to which the ordered set extracted from all the cross tabulation tables belongs. That is, it is determined whether or not there is a group to which the ordered set that appears on each day (each generation cycle) belongs.

該当するグループが１以上有る場合（Ｓ４１０でＹｅｓ）、継続事象特定部１７は、当該各グループに属する順序集合を継続事象記憶部１９に記憶する（Ｓ４１１）。すなわち、当該各グループの順序集合が、継続事象として特定される。当該各グループの順序集合は、周期的に継続して出現しているからである。 When there are one or more corresponding groups (Yes in S410), the continuation event specifying unit 17 stores the ordered set belonging to each group in the continuation event storage unit 19 (S411). That is, the ordered set of each group is specified as a continuation event. This is because the ordered set of each group appears periodically continuously.

図１８は、継続事象記憶部１９の構成例を示す図である。図１８に示されるように、継続事象記憶部１９には、継続事象ごとに、継続事象ＩＤ、順序集合、クロス集計表での該当箇所等が記憶される。 FIG. 18 is a diagram illustrating a configuration example of the continuation event storage unit 19. As illustrated in FIG. 18, the continuation event storage unit 19 stores a continuation event ID, an ordered set, a corresponding portion in the cross tabulation table, and the like for each continuation event.

継続事象ＩＤは、継続事象ごとに割り当てられる識別情報である。順序集合は、継続事象として特定された順序集合である。クロス集計表での該当箇所は、当該順序集合が、クロス集計表において出現した箇所を示す情報である。図１８において、クロス集計表での該当箇所は、生成期間、アクセス元ＩＰ、単位時間等の項目を含む。生成期間は、当該順序集合が出現したクロス集計表の生成期間である。アクセス元ＩＰは、当該クロス集計表において当該順序集合が含まれる行を特定するための項目である。単位時間は、当該クロス集計表において当該順序集合が含まれる列を特定するための項目である。 The continuation event ID is identification information assigned for each continuation event. The ordered set is an ordered set identified as a continuation event. The corresponding part in the cross tabulation table is information indicating a part where the ordered set appears in the cross tabulation table. In FIG. 18, the corresponding part in the cross tabulation table includes items such as a generation period, an access source IP, and a unit time. The generation period is a generation period of the cross tabulation table in which the ordered set appears. The access source IP is an item for specifying a row including the ordered set in the cross tabulation table. The unit time is an item for specifying a column including the ordered set in the cross tabulation table.

なお、ステップＳ４０４における順序集合の比較において、完全一致のみでなく、類似している場合にも、「一致」としての比較結果が出力されるようにしてもよい。すなわち、完全一致の順序集合群のみでなく、類似する順序集合も同一のグループに分類されるようにしてもよい。例えば、順序集合に含まれるアクセス先パスのうちの所定の割合以上のアクセス先パスの並びが一致している場合に、比較された順序集合は類似すると判定されてもよい。この場合、グループ内の全ての順序集合に類似することが要求されてもよいし、いずれかＭ個以上の順序集合に類似することが要求されてもよい。 In the comparison of the ordered sets in step S404, the comparison result as “match” may be output not only in the case of complete match but also in the case of similarity. That is, not only completely matched ordered sets but also similar ordered sets may be classified into the same group. For example, the compared ordered sets may be determined to be similar when the access destination paths of a predetermined ratio or more of the access destination paths included in the ordered set match. In this case, it may be required to be similar to all ordered sets in the group, or may be required to be similar to any M or more ordered sets.

また、ステップＳ４１０では、例えば、全部のクロス集計表に対する所定の割合以上のクロス集計表から取り出された順序集合が属するグループの有無が判定されてもよい。又は、略一定間隔のクロス集計表に出現する順序集合が属するグループの有無が判定されてもよい。例えば、１番目、３番目、５番目といったように、出現間隔が２である順序集合は、略一定間隔のクロス集計表に出現する順序集合に該当する。 In step S410, for example, it may be determined whether or not there is a group to which an ordered set extracted from a cross tabulation table having a predetermined ratio or more with respect to all the cross tabulation tables belongs. Alternatively, the presence / absence of a group to which an ordered set appearing in the cross tabulation table at substantially constant intervals may be determined. For example, an ordered set whose appearance interval is 2, such as first, third, and fifth, corresponds to an ordered set that appears in the cross tabulation table at a substantially constant interval.

図１９は、アクセスログの分類処理の処理手順の一例を説明するためのフローチャートである。 FIG. 19 is a flowchart for explaining an example of the processing procedure of the access log classification processing.

ステップＳ５０１において、アクセスログ分類部１８は、継続事象記憶部１９に記憶されている全ての継続事象を読み込む。続いて、アクセスログ分類部１８は、アクセス時刻が分析対象期間に含まれるアクセスログを、ファイルサーバ２０から１つ読み込む（Ｓ５０２）。以下、読み込まれたアクセスログを「対象ログ」という。 In step S501, the access log classification unit 18 reads all continuation events stored in the continuation event storage unit 19. Subsequently, the access log classification unit 18 reads one access log whose access time is included in the analysis target period from the file server 20 (S502). Hereinafter, the read access log is referred to as “target log”.

続いて、アクセスログ分類部１８は、対象期間及び単位時間が、対象ログのアクセス時刻を含み、アクセス元ＩＰが、対象ログのアクセス元ＩＰに一致する継続事象の有無を判定する（Ｓ５０３）。すなわち、対象ログが、いずれかの継続事象を構成するアクセスログであるか否かが判定される。 Subsequently, the access log classification unit 18 determines whether or not there is a continuation event in which the target period and unit time include the access time of the target log and the access source IP matches the access source IP of the target log (S503). That is, it is determined whether or not the target log is an access log that constitutes one of the continuation events.

該当する継続事象が有れば（Ｓ５０３でＹｅｓ）、アクセスログ分類部１８は、対象ログを機械アクセス候補ログに分類する（Ｓ５０４）。該当する継続事象が無ければ（Ｓ５０３でＮｏ）、アクセスログ分類部１８は、対象ログをユーザアクセス候補ログに分類する（Ｓ５０５）。なお、機械アクセス候補ログに分類されたアクセスログと、ユーザアクセス候補ログに分類されたアクセスログとは、例えば、分析装置１０の補助記憶装置１０２において異なるフォルダに保存されてもよい。 If there is a corresponding continuation event (Yes in S503), the access log classification unit 18 classifies the target log as a machine access candidate log (S504). If there is no corresponding continuation event (No in S503), the access log classification unit 18 classifies the target log as a user access candidate log (S505). Note that the access log classified as the machine access candidate log and the access log classified as the user access candidate log may be stored in different folders in the auxiliary storage device 102 of the analyzer 10, for example.

ステップＳ５０１〜Ｓ５０５は、分析対象期間内の全てのアクセスログについて実行される（Ｓ５０６）。 Steps S501 to S505 are executed for all access logs within the analysis target period (S506).

上述したように、第２の実施の形態によれば、アクセスログ群を、ユーザアクセス候補ログと機械アクセス候補ログとに分類することができる。したがって、例えば、機械アクセスについて不正アクセスを検知したい場合、図１０のステップＳ２０２で取得するアクセスログを機械アクセス候補ログに限定することで、処理効率を向上させることができる。 As described above, according to the second embodiment, the access log group can be classified into a user access candidate log and a machine access candidate log. Therefore, for example, when it is desired to detect unauthorized access for machine access, processing efficiency can be improved by limiting the access log acquired in step S202 of FIG. 10 to the machine access candidate log.

また、第２の実施の形態では、機械アクセスであるか否かについて、継続して類似する順序集合が出現するか否かに着目するため，定期的に発生しないアクセス（すなわち、周期性又は反復性を有さないアクセス）に係るアクセスログを、機械アクセス候補ログから除外することができる。 Further, in the second embodiment, in order to focus on whether or not a similar ordered set continues to appear as to whether or not it is machine access, access that does not occur regularly (that is, periodicity or repetition) The access log related to (access without sex) can be excluded from the machine access candidate log.

また、第２の実施の形態では、アクセスが発生した順序に着目するため、アクセス時刻が厳密に一定間隔でない機械アクセスについても、機械アクセスであると判定することができる。 Further, in the second embodiment, attention is paid to the order in which accesses occur, and therefore machine accesses whose access times are not strictly constant can be determined to be machine accesses.

なお、上記各実施の形態において、分析装置１０は、機密語特定装置の一例である。共起性解析部１１は、計数部及び特定部の一例である。不正アクセス抽出部１２は、抽出部の一例である。 In each of the above embodiments, the analysis device 10 is an example of a confidential word specifying device. The co-occurrence analysis unit 11 is an example of a counting unit and a specifying unit. The unauthorized access extraction unit 12 is an example of an extraction unit.

以上、本発明の実施の形態について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims. Deformation / change is possible.

以上の説明に関し、更に以下の項を開示する。
（付記１）
記憶装置に記憶されているディレクトリが格納する各ファイルのファイルパス文字列のうち、予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数する計数部と、
前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する特定部と、
を有することを特徴とする機密語特定装置。
（付記２）
前記特定部は、前記各単語の前記出現回数が所定の条件を満たす場合に、前記ディレクトリを、機密情報を含むファイルを格納するディレクトリとして特定する、
ことを特徴とする付記１記載の機密語特定装置。
（付記３）
前記記憶装置に記憶されているファイルに対するアクセスごとに記録されるログの中から、前記特定部によって特定されたディレクトリが格納するファイルへのアクセスを示すログを抽出する抽出部、
を有することを特徴とする付記２記載の機密語特定装置。
（付記４）
前記計数部は、記憶装置に記憶されているディレクトリごとに、当該ディレクトリが格納する各ファイルのファイルパス文字列のうち、機密情報に関するキーワードとして予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数し、
前記特定部は、ディレクトリごとに、当該ディレクトリに係る前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する、
ことを特徴とする付記１乃至３いずれか一項記載の機密語特定装置。
（付記５）
記憶装置に記憶されているディレクトリが格納する各ファイルのファイルパス文字列のうち、予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数する処理と、
前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する処理と、
をコンピュータが実行することを特徴とする機密語特定方法。
（付記６）
前記特定する処理は、前記各単語の前記出現回数が所定の条件を満たす場合に、前記ディレクトリを、機密情報を含むファイルを格納するディレクトリとして特定する、
ことを特徴とする付記５記載の機密語特定方法。
（付記７）
前記記憶装置に記憶されているファイルに対するアクセスごとに記録されるログの中から、前記特定する処理において特定されたディレクトリが格納するファイルへのアクセスを示すログを抽出する処理、
を前記コンピュータが実行することを特徴とする付記６記載の機密語特定方法。
（付記８）
前記計数する処理は、記憶装置に記憶されているディレクトリごとに、当該ディレクトリが格納する各ファイルのファイルパス文字列のうち、機密情報に関するキーワードとして予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数し、
前記特定する処理は、ディレクトリごとに、当該ディレクトリに係る前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する、
ことを特徴とする付記５乃至７いずれか一項記載の機密語特定方法。
（付記９）
記憶装置に記憶されているディレクトリが格納する各ファイルのファイルパス文字列のうち、予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数する処理と、
前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する処理と、
をコンピュータに実行させることを特徴とする機密語特定プログラム。
（付記１０）
前記特定する処理は、前記各単語の前記出現回数が所定の条件を満たす場合に、前記ディレクトリを、機密情報を含むファイルを格納するディレクトリとして特定する、
ことを特徴とする付記９記載の機密語特定プログラム。
（付記１１）
前記記憶装置に記憶されているファイルに対するアクセスごとに記録されるログの中から、前記特定する処理において特定されたディレクトリが格納するファイルへのアクセスを示すログを抽出する処理、
を前記コンピュータに実行させることを特徴とする付記１０記載の機密語特定プログラム。
（付記１２）
前記計数する処理は、記憶装置に記憶されているディレクトリごとに、当該ディレクトリが格納する各ファイルのファイルパス文字列のうち、機密情報に関するキーワードとして予め登録された１以上の文字列のいずれかを含むファイルパス文字列を構成する各単語の出現回数を計数し、
前記特定する処理は、ディレクトリごとに、当該ディレクトリに係る前記各単語の前記出現回数が所定の条件を満たす場合に、前記各単語を機密情報に関する単語として特定する、
ことを特徴とする付記９乃至１１いずれか一項記載の機密語特定プログラム。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device A counting unit;
A specifying unit that specifies each word as a word related to confidential information when the number of appearances of each word satisfies a predetermined condition;
A secret word identification device characterized by comprising:
(Appendix 2)
The specifying unit specifies the directory as a directory for storing files including confidential information when the number of appearances of each word satisfies a predetermined condition.
The confidential word identification device according to supplementary note 1, wherein:
(Appendix 3)
An extraction unit for extracting a log indicating access to a file stored in the directory specified by the specifying unit from logs recorded for each access to the file stored in the storage device;
The classified word identification device according to supplementary note 2, characterized by comprising:
(Appendix 4)
The counting unit includes, for each directory stored in the storage device, one of at least one character string registered in advance as a keyword related to confidential information among the file path character strings of each file stored in the directory. Count the number of occurrences of each word that makes up the file path string,
The specifying unit specifies each word as a word related to confidential information when the number of appearances of each word related to the directory satisfies a predetermined condition for each directory.
4. The classified word identification device according to any one of supplementary notes 1 to 3, wherein:
(Appendix 5)
Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device Processing,
A process of specifying each word as a word related to confidential information when the number of occurrences of each word satisfies a predetermined condition;
A secret word specifying method, wherein the computer executes
(Appendix 6)
The specifying process specifies the directory as a directory for storing files including confidential information when the number of appearances of each word satisfies a predetermined condition.
The method for identifying confidential words according to supplementary note 5, characterized in that:
(Appendix 7)
A process of extracting a log indicating access to a file stored in the directory specified in the specified process from logs recorded for each access to the file stored in the storage device;
The secret word identification method according to appendix 6, wherein the computer executes
(Appendix 8)
For each directory stored in the storage device, the counting process is performed by calculating one of one or more character strings registered in advance as keywords related to confidential information, from among the file path character strings of each file stored in the directory. Count the number of occurrences of each word that comprises the file path string
The specifying process specifies each word as a word related to confidential information when the number of appearances of each word related to the directory satisfies a predetermined condition for each directory.
8. The classified word specifying method according to any one of appendices 5 to 7, wherein:
(Appendix 9)
Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device Processing,
A process of specifying each word as a word related to confidential information when the number of occurrences of each word satisfies a predetermined condition;
A secret word identification program characterized by causing a computer to execute
(Appendix 10)
The specifying process specifies the directory as a directory for storing files including confidential information when the number of appearances of each word satisfies a predetermined condition.
The confidential word identification program according to supplementary note 9, characterized in that:
(Appendix 11)
A process of extracting a log indicating access to a file stored in the directory specified in the specified process from logs recorded for each access to the file stored in the storage device;
The secret word specifying program according to appendix 10, wherein the computer is executed.
(Appendix 12)
For each directory stored in the storage device, the counting process is performed by calculating one of one or more character strings registered in advance as keywords related to confidential information, from among the file path character strings of each file stored in the directory. Count the number of occurrences of each word that comprises the file path string
The specifying process specifies each word as a word related to confidential information when the number of appearances of each word related to the directory satisfies a predetermined condition for each directory.
The confidential word specifying program according to any one of appendices 9 to 11, characterized in that:

１０分析装置
１１共起性解析部
１２不正アクセス抽出部
１３機密ワード記憶部
１４機密ディレクトリリスト記憶部
１５不正アクセス候補記憶部
１６クロス集計表生成部
１７継続事象特定部
１８アクセスログ分類部
１９継続事象記憶部
２０ファイルサーバ
３０ユーザ端末
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
Ｂバス DESCRIPTION OF SYMBOLS 10 Analysis apparatus 11 Co-occurrence analysis part 12 Unauthorized access extraction part 13 Confidential word memory | storage part 14 Confidential directory list memory | storage part 15 Unauthorized access candidate memory | storage part 16 Cross tabulation table generation part 17 Continuation event specification part 18 Access log classification | category part 19 Continuation event Storage unit 20 File server 30 User terminal 100 Drive device 101 Recording medium 102 Auxiliary storage device 103 Memory device 104 CPU
105 Interface device B bus

Claims

Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device A counting unit;
A specifying unit that specifies each word as a word related to confidential information when the number of appearances of each word satisfies a predetermined condition;
A secret word identification device characterized by comprising:

The specifying unit specifies the directory as a directory for storing files including confidential information when the number of appearances of each word satisfies a predetermined condition.
The confidential word specifying device according to claim 1.

An extraction unit for extracting a log indicating access to a file stored in the directory specified by the specifying unit from logs recorded for each access to the file stored in the storage device;
The classified word identifying device according to claim 2, further comprising:

For each directory stored in the storage device, the counting unit selects one of one or more character strings registered in advance as keywords related to confidential information, among file path character strings of each file stored in the directory. Count the number of occurrences of each word that comprises the file path string
The specifying unit specifies each word as a word related to confidential information when the number of appearances of each word related to the directory satisfies a predetermined condition for each directory.
The confidential word specifying device according to any one of claims 1 to 3.

Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device Processing,
A process of specifying each word as a word related to confidential information when the number of occurrences of each word satisfies a predetermined condition;
A secret word specifying method, wherein the computer executes

Counts the number of occurrences of each word constituting a file path character string including one or more of pre-registered character strings among the file path character strings of each file stored in the directory stored in the storage device Processing,
A process of specifying each word as a word related to confidential information when the number of occurrences of each word satisfies a predetermined condition;
A secret word identification program characterized by causing a computer to execute