WO2016031034A1

WO2016031034A1 - Apparatus and method for detecting unauthorized access

Info

Publication number: WO2016031034A1
Application number: PCT/JP2014/072670
Authority: WO
Inventors: 進芹田; 哲郎鬼頭
Original assignee: 株式会社日立製作所
Priority date: 2014-08-29
Filing date: 2014-08-29
Publication date: 2016-03-03

Abstract

In the prior art, it is not possible to detect unauthorized access by extracting an access feature from a network log storing records of access to a plurality of servers on the Internet. In this invention, a log analysis apparatus acquires a network log containing attributes of access destinations, classifies the access destinations on the basis of the attributes of the access destinations and pre-stored rules, extracts normal-time access feature information on the basis of the network log and the classified access destinations, compares accesses contained in the network log with the access feature information, and determines that an access is a normal access if the comparison result indicates a match and that the access is an abnormal access if the comparison result indicates a mismatch.

Description

[Name of invention determined by ISA based on Rule 37.2] Unauthorized access detection method and device

The present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.

There is a method using a black list as a method of detecting unauthorized access from the network log. The black list is a list that lists existing unauthorized access destinations. By comparing the access destination included in the network log with the unauthorized access destination included in the blacklist, the machine that accessed the unauthorized access destination can be identified.

However, the method using the black list cannot detect unauthorized access that does not use the unauthorized access destination included in the black list. On the other hand, Document 1 discloses a method for extracting the characteristics of a normal user's access from the access to the Web server and presenting the access greatly different from the characteristic as an abnormal access to the administrator.

US Patent Publication No. 2011/0185421

However, in the method of Patent Document 1, network access to be analyzed is limited to access to a single Web server. In general, access to a plurality of servers by a plurality of users is recorded in a network log output by a proxy server or the like. The access destination is generally identified by a URL (Uniform Resource Locator). When an access destination included in a network log such as a proxy server is identified by a URL and the method of Patent Document 1 is applied, there are the following problems.

Since the user accesses many pages on the Internet every day, the number of access destinations included in the network log is enormous compared to the number of pages in the Web server. Therefore, there are few users who access with the same access pattern. In the method of Literature 1, in order to extract a characteristic amount of a typical access pattern, a certain amount of samples of the same access pattern are required. When the access destination is identified by the URL and the access pattern is extracted, almost all accesses are detected as unusual, that is, abnormal access, which is not practical.

Also, multiple processes are running on the client, and each process accesses the network independently. For example, Internet access using a browser and access for OS updater are performed in parallel. In a network log such as a proxy, access of each machine is recorded in chronological order, and there is no record indicating which process is used for each access. To extract the characteristics of access patterns, it is necessary to identify a series of accesses of the same process, but it is difficult to identify a series of accesses from the network log.

In consideration of the above problems, the present invention extracts the characteristics of normal access from the network log even when the access destination is not limited to the Web server designated in advance, and abnormally accesses that deviate from the characteristics are detected. The purpose is to present it as an access and detect unauthorized access.

The disclosed log analysis apparatus acquires a network log to which an access destination attribute is assigned, classifies the access destination based on the access destination attribute and a rule stored in advance, and based on the access destination classified as a network log The feature information of the normal access is extracted, and the access included in the network log is compared with the access feature information, and when it matches, it is detected as normal access, and when it does not match, it is detected as abnormal access.

The present invention can detect unauthorized access by extracting the characteristics of normal access from the network log and presenting access that deviates from the characteristics as abnormal access.

It is the figure which showed the example of the system configuration. It is the figure which showed the example of the access destination classification | category function. It is the figure which showed the example of access destination URL information. It is the figure which showed the example of the automatic access list. It is the figure which showed the example of the access destination cluster. It is the figure which showed the example of the access feature-value extraction function. It is the figure which showed the example of the user base access feature-value. It is the figure which showed the example of the group base access feature consideration. It is the figure which showed the example of the abnormal access extraction function. It is the figure which showed the example of the abnormal access report. It is the figure which showed the example of the processing flow of access feature-value comparison. It is the figure which showed the example of the processing flow of a session determination function. It is the figure which showed the example of the data used for access destination clustering. It is the figure which showed the example of the access tree which a session determination module produces | generates.

Hereinafter, modes for carrying out the present invention (hereinafter referred to as “embodiments”) will be described with reference to the drawings as appropriate.
<Embodiment>
Hereinafter, embodiments will be described.

FIG. 1 is a diagram showing an example of a system configuration. As shown in FIG. 1, this system includes an external server 120, a firewall 140, a proxy server, an administrator terminal 160, a client 170, a log analysis server 180, and the like, and these devices are connected to each other via a network 101. Configured.

The external server 120 is a server arranged on the Internet 110 and is accessed from the client 170 through the network. Generally, the external server 120 is used for providing various services such as information retrieval. However, malicious attackers may use it for illegal activities such as distributing malware. In addition, a malicious attacker may intrude into a legitimate server and modify the site to use it for an attack. Here, regular servers and servers used for illegal purposes are collectively referred to as the external server 120.

The firewall 140 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 130 and the Internet 110. By setting a condition that the access that does not pass through the proxy server 150 is blocked, all access between the external server 120 and the client 170 can be performed through the proxy server 150.

The proxy server 150 has a function of relaying packet exchange between the client 170 and the external server 120 and recording the access as a network log 185. By setting the conditions of the firewall 140 as described above, all accesses between the external server 120 and the client 170 can be recorded in the network log 185.

The log analysis server 180 analyzes the network log 185 and presents an abnormal access. The log analysis server 180 includes a network log 185 selection function 184, an access destination classification function 181, an access feature quantity extraction function 182, an abnormal access extraction function 183, a network log 185, an access destination classification 186, an access feature quantity, and an abnormal access report. 188188 is provided inside. These functions may be arranged in one device or may be distributed in a plurality of devices.

The network log 185 is a record of network access output by a network device such as the proxy server 150. The network log 185 is composed of a plurality of records. Each record corresponds to one access, the access date and time, the IP address of the access source client 170, the URL that identifies the access-destination external server 120 and the resource in the external server 120, and the User- set in the client 170 Includes the agent, referrer that indicates the URL of the link destination of the access destination, HTTP status code that indicates the access result, and authentication information that identifies the accessed user. In addition to the proxy server 150, if the log of the firewall 140 or the client 170 can be used, it may be included in the network log 185. The network log 185 output by each machine is transmitted to the log analysis server 180 and stored in the log analysis server 180.

The access destination classification function 181 classifies the access destinations included in the network log 185 according to a plurality of rules. In general, when access destinations are classified by URL, the types of access destinations become enormous, and it is not possible to obtain sufficient past samples to extract access characteristics. Therefore, the access destination classification function 181 classifies the access destination using a domain, a category based on the contents of the content provided by the external server 120, a group of accessing users, or the like. As a result, the access destination can be classified with a coarser granularity classification than the URL, and a sufficient number of samples can be obtained to extract the features of the access destination. Details of the access destination classification procedure will be described with reference to FIG. The result of classifying the access destination is stored as an access destination classification 186.

The access feature quantity extraction function 182 uses the network log 185 and the access destination classification 186 to extract the features of normal access. Specifically, the number of accesses for each access destination, the number of transitions from one access destination to another access destination, and the like are used as features for each user and for all users. A detailed procedure for extracting the feature amount will be described with reference to FIG. The extracted feature quantity is stored as an access feature quantity. Details of the access feature amount will be described with reference to FIGS.

The abnormal access extraction function 183 extracts an abnormal access based on the network log 185, the access destination classification 186, and the access feature amount. The access included in the network log 185 is compared with the access feature amount, and the number of past appearances of each access and the frequency of access transition are calculated. The detailed procedure of abnormal access extraction will be described with reference to FIG. The extracted abnormal access is stored as an abnormal access report 188188. The administrator can confirm the abnormal access by browsing the abnormal access report 188188 using the administrator terminal 160. Details of the abnormal access report 188188 will be described with reference to FIG.

The network log selection function 184 has a function of selecting a part used by each function according to a rule set in advance for the network log 185. The network log 185 includes a record of accesses for a certain period (for example, one year). The administrator designates which part of the network log 185 is used for inputting each function according to the access date and time of the network log 185. For example, an access for a certain month is used for the input of the access destination classification function 181 and the access feature quantity extraction function 182, and the access for one week after that is used for the input of the abnormal access extraction function 183. Since the access destination feature value 187 is intended to characterize the normal access, the access used for inputting the access destination feature value 187 is selected from a period during which it is guaranteed that there is no unauthorized access.

The client 170 has a function of accessing the external server 120 via a network. The client 170 may be infected with malware by executing an executable file attached to a forged mail. Or there is a possibility that a malicious person who has stolen the login password will gain unauthorized access.

The administrator terminal 160 has a function of logging in to the proxy server 150 and the log analysis server 180 and performing various operations. The administrator uses the administrator terminal 160 to set parameters for processing performed by the log analysis server 180, view an abnormal access report 188188 output from the log analysis server 180, and the like.

These devices connected to the network include at least an auxiliary storage device such as a CPU (Central Processing Unit) and a hard disk drive, a main storage device such as a ROM (Read Only Memory), and an input device such as a keyboard and a mouse I ( (Input) / O (Output) interface, a local area network 120, a network interface for connecting to the Internet 110, and the like.

Details of the access destination classification function 181 will be described with reference to FIG.

The access destination extraction module 201 receives the network log 185 selected by the network log selection function 184. The access destination extraction module 201 extracts an access destination external server 120 and a URL for identifying the location of the resource in the external server 120 (hereinafter referred to as an access destination URL) from the records included in the network log 185. A set of access URLs can be acquired by eliminating duplicate access URLs. The access destination extraction module 201 transmits the extracted access destination URL set to the domain extraction module 202.

The domain extraction module 202 extracts a domain name set from the access destination URL set. Here, the domain name is a part of a name for identifying a computer on the IP network. For example, in the case of a URL represented as http://www.example.com/page.html, www.examplel.com is the domain name. However, domain names have a hierarchical structure, and domains can be defined in each hierarchy (level). Therefore, the domain name is extracted for each hierarchy. In the above example, domain names are extracted at three levels: top level domain: com, second level domain: example.com: third level domain: www.example.com. Each extracted domain is stored in association with the original URL. If part of the URL is described with an IP address, the domain name is obtained by querying the DNS. If it cannot be obtained, it is saved as no domain name. The domain extraction module 202 transmits the access destination URL set and the extracted domain set to the category extraction module.

The category determination module 203 determines the category of each access destination URL included in the access destination URL set. Here, the category is a classification based on the content of the access destination, and examples of the category include news, SNS (social network service), and video distribution. These category types are specified in advance. Security vendors investigate which category a URL belongs to and publish the results. By using such a service, it is possible to determine to which category the access destination URL included in the access destination set belongs. It is also possible to actually access the access destination specified by the access destination URL and determine the category based on the acquired content. This method can be further divided into manual determination and rule-based determination. In the manual determination, a human confirms the content, and determines to which category a preset category candidate belongs. On the other hand, the classification based on the rule determines the acquired content based on the mechanical rule. As a mechanical rule, for example, there is a method of preparing a correspondence table between content file formats and categories. There is also a method of determining a category by a statistical method such as clustering using the frequency of words included in the content as a feature amount. The access destination may belong to a plurality of categories. The category extraction module determines the category of each access destination URL included in the access destination URL set, and then outputs it as access destination URL information 208. Details of the access destination URL information 208 will be described with reference to FIG.

The network log shaping module 204 shapes the network log 185 into a form suitable for subsequent processing. When the network log 185 includes logs of a plurality of devices, there is a possibility that the clock of each device is shifted. In this case, the access order based on the access time recorded in the network log 185 is different from the actual access order. The network log shaping module 204 corrects the access time included in the network log 185 to the correct access time in consideration of the clock deviation of the device. Also, the log format may be different for each device. The network log shaping module 204 formats the network log 185 into a unified format. The network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.

The user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user. The user specifying module 205 specifies a user by using authentication information included in the record. If the authentication information cannot be acquired for some reason, the user is specified using the IP address of the access source client 170. The user identification module 205 transmits the network log 185 divided for each user to the automatic access extraction module 208.

The automatic access extraction module 208 extracts automatic access from the access destinations included in the network log 185. Here, the automatic access destination refers to an access that is automatically triggered after accessing a certain access destination. For example, when a page of a news site with a browser is accessed, images and style sheets constituting the page are automatically downloaded without any user click or other operation. Even if the user only accesses the first page, the network log 185 also records access related to subsequent downloads. The access destination that caused the automatic access is called the main access destination, and the access destination accessed by the automatic access is called the sub-access destination. The automatic access extraction module 208 associates the main access destination with the sub access destinations (generally a plurality of access destinations) and stores them as an automatic access destination list. The feature quantity extraction function generates an access tree representing a user access transition when extracting an access feature quantity.

The following describes how the automatic access extraction module 208 extracts main access and sub access. The automatic access extraction module 208 extracts main access and sub access based on information such as referrer, access interval, and repeat pattern. First, an automatic access extraction method based on access intervals and repetition patterns will be described.

First, the automatic access extraction module 208 arranges the network logs 185 divided for each user in order of access time. Then, the following processing is performed for each divided network log 185. The automatic access extraction module 208 reads accesses included in the network log 185 in order from the top, and acquires a referrer each time. Searches whether the access destination represented by the acquired referrer exists before the current access. As a result of the search, if the corresponding access destination exists and the time interval between the corresponding access and the current access is smaller than a preset threshold, the corresponding access is determined as the main access and the current access is determined as the sub-access. . The time interval condition is used to reflect the fact that sub-access occurs immediately after main access. A value such as 5 seconds is used as the threshold value. The above is the automatic access extraction method using the referrer and the access interval.

Next, an automatic access extraction method based on access intervals and repetition patterns will be described. First, the automatic access extraction module 208 arranges the network logs 185 divided for each user in the order of access times. Next, the divided network log 185 is divided into clusters having similar access times. Specifically, the network log 185 is read from the head, and when the access interval between two consecutive accesses becomes larger than a preset threshold value (for example, 5 seconds), the interval between the two accesses is set as a separation position. This process is performed on the network log 185 of all users. At this point, a plurality of groups composed of consecutive accesses are generated. Next, the automatic access extraction module 208 finds a set of accesses that repeatedly appear in this group set. For example, considering three groups (A, B, C), (A, D, B), (E, A, B), (A, B) is an access set that appears repeatedly. Several techniques are known for finding such a repetitive set of accesses. See for example literature (added later). Next, when the time interval of the discovered access pair is examined and is smaller than a preset threshold value (for example, 5 seconds), the access destination with the earlier access time is the main access destination, and the access destination with the later access time is the sub-access destination. Register as

The automatic access destination list is used when the feature extraction function generates an access tree. Details of the automatic access list 209 will be described with reference to FIG.

The access destination clustering module 207 divides the access destination based on the characteristics of the user who is accessing the access destination. The access destination clustering module 207 generates a correspondence table of access destinations and access users from the network log 185 divided for each user. Access destinations are classified according to multiple criteria such as URL and domain. An example of the correspondence table is shown in the figure. Clustering is performed using each record of these correspondence tables as an element. To perform clustering, it is necessary to define the similarity or distance function of each record. A Jaccard distance obtained by dividing the number of common elements of two access user ID sets by the number of union sets can be used as a distance function. For clustering, general methods such as hierarchical clustering can be used. As a result of clustering, a cluster to which each access destination belongs is obtained for each classification criterion. The access destination clustering module 207 stores the obtained cluster as an ac-cluster. Details of the access destination cluster will be described with reference to FIG.

An example of the access destination URL information 208 will be described with reference to FIG. The access destination URL information 208 includes an access destination URL 301, a domain 302, a category 303, and the like.

The access destination URL 301 is an access destination URL recorded in the network log 185. The domain 302 is a domain name to which the access destination URL extracted by the domain extraction module 202 belongs. A domain name is recorded for each domain hierarchy. The category 303 is a category determined by the category extraction module.

An example of the automatic access list 209 will be described with reference to FIG. The automatic access destination list 209 includes a main access destination 401, a sub access destination set 402, and the like.

The automatic access extraction module 208 extracts the main access destination and the sub access destination from the network log 185. In general, since a plurality of sub access destinations correspond to one main access destination, the sub access destinations are stored as a set as shown in FIG.

An example of the access destination cluster 210 will be described with reference to FIG. A cluster is generated for each access destination classification criterion. FIG. 5A shows an example of a cluster generated using a URL. Fig. 5 (b) shows an example of a cluster generated using a domain (3 levels).

The access destination cluster includes a cluster ID 501, an access destination URL 502, and the like. The cluster ID is an identifier that uniquely represents the cluster. The access destination is a list of access destinations belonging to the cluster.

Details of the access feature amount extraction function 182 will be described with reference to FIG.

The network log shaping module 204 receives the network log 185 selected by the network log selection function 184. Then, the network log 185 is shaped into a format suitable for subsequent data processing. The processing of the network log shaping module 204 is the same as that described in FIG. The network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.

The user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user. The processing of the user specifying module 205 is the same as that described in FIG. The user identification module 205 transmits the network log 185 divided for each user to the session identification module 603.

The session identification module 603 divides the network log 185 of the same user into units called sessions. A session refers to a series of accesses performed by a user or program for the same purpose. A general method for determining a session using a web server log is to use an access time. If the access interval exceeds a certain threshold (for example, 30 minutes), it is determined as another session. However, the network log 185 such as the proxy server 150 includes access for different purposes by the same machine. Therefore, these processes cannot be separated only by the time interval. Therefore, the session identification module 603 identifies a session using an automatic access destination list, referrer information included in the network log 185, and the like in addition to the access time interval. Detailed processing of the session identification module 603 will be described with reference to the drawings. The session specifying module 603 stores the access included in each session as an access tree as shown in FIG. The session identification module 603 transmits the network log 185 divided by the session to the access destination attribute assignment module 604.

The access destination attribute assignment module 604 assigns an access destination attribute to the network log 185 divided by the session. The access destination attribute is attribute information stored in the access destination URL information 208 and the access destination cluster 210. The access destination information addition module reads each record in the network log 185 and acquires the access destination URL. Then, the access destination URL information 208 and the access destination cluster 210 are searched using the access destination URL as a key, and the corresponding attribute information is acquired. Finally, the acquired attribute information is associated with the access destination and stored in a storage area inside the module. This series of processing is executed for all access destinations included in the network log 185. The access destination information addition module transmits the network log 185 to which the access destination attribute is added to the frequency calculation module 605.

The frequency calculation module 605 calculates the appearance frequency and transition frequency of the access destination class based on the network log 185 to which the access destination attribute is assigned. The frequency calculation module 605 specifies an access destination by various classification criteria such as an access destination URL, a domain, a category, and a cluster ID. An access destination specified by a certain classification standard is called an access destination class. The frequency calculation module 605 calculates the number of times each access destination class is accessed from the network log 185 to which the access destination attribute is assigned. First, the frequency calculation module 605 calculates the number of times for each user. Next, the number of times is calculated for all users. These pieces of information are used by the abnormal access extraction function 183 to detect abnormal access focusing on a specific user and abnormal access as a whole user. The frequency calculation module 605 calculates the number of times of transition from a certain access class to a certain access class. The access class transition is determined as follows. In the access tree generated by the session extraction module, when the A node and the B node are connected by an edge, it is determined that a transition has occurred from the access class to which A belongs to the access class to which B belongs. The frequency calculation module 605 follows all access trees and calculates the number of transitions between access classes. As with the appearance frequency, the frequency calculation module 605 calculates the number of transitions for each user and for the entire user. The frequency calculation module 605 stores the calculated frequency as an access feature amount. Details of the access feature amount will be described with reference to FIGS.

An example of the access destination feature quantity 187 will be described with reference to FIGS. The access destination feature quantity 187 includes a feature quantity defined for each user (user base access feature quantity) and a feature quantity defined for all users (group base access feature quantity).

Fig. 7 shows an example of user base access features. The user base access feature amount is based on the frequency of access destinations, and is based on the transition frequency between access destinations. FIG. 7A shows an example of a user base access feature quantity based on the frequency of a single access destination.

The user base access feature amount includes a classification standard 701, an access destination class identifier 702, an access frequency 703, and the like.

The classification standard indicates from which viewpoint the access destination is classified. Access destination URL, domain, category, cluster ID, etc. can be used as classification criteria.

The access destination class identifier is information that uniquely identifies the access destination based on the classification criteria. For example, when the classification standard is URL, the access destination URL can be used as an identifier.

The access frequency represents the number of times the user has accessed the access destination class represented by the access destination identifier.

Fig. 7 (b) shows an example of the user base access feature quantity based on the transition frequency of the access destination. The user base access feature amount includes a classification standard, an access destination identifier, an access frequency, and the like.

The source class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
The destination class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.

The access frequency represents the number of times the user has accessed the access destination represented by the destination class identifier after accessing the access destination represented by the source class identifier.

Fig. 8 shows an example of group-based access features. Similar to the user base access feature quantity, the group base access feature quantity is based on the frequency of a single access destination, and is based on the transition frequency between access destinations.

Fig. 8 (a) shows an example of group-based access features based on the frequency of a single access destination. The group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard and the access destination class identifier are the same as those in FIG. The access frequency is the total number of accesses of all users who have accessed the access class specified by the access destination class identifier. The number of access users is the number of users who have accessed the access class specified by the access destination class identifier.

Fig. 8 (b) shows an example of group-based access feature quantity based on the transition frequency between access destinations. The group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard, the source class identifier, and the destination class identifier are the same as those in FIG. 7B. The transition frequency is the total access count of all users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier. The number of access users is the number of users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier.

Details of the abnormal access extraction function 183 will be described with reference to FIG. The abnormal access extraction function 183 includes a network log shaping module 204, a user identification module 205, a session identification module 603, an access attribute assignment module, an access feature amount comparison module, a report generation module, and the like. Among these, the processing performed by the network log shaping module 204, the user identification module 205, the session identification module 603, and the access attribute assignment module is the same as the module described in FIG.

When the processing of the access feature amount giving module is completed, the abnormal access extraction function 183 holds the access tree divided for each session and the access destination information given to each access. The access feature amount comparison module calculates the feature amount of each access based on this information. Specifically, the appearance frequency and transition probability of each access destination are calculated. Details of the access feature amount comparison module will be described with reference to the drawings.

The report creation module generates a report for the administrator to browse based on the feature quantity calculated by the access feature quantity comparison module. The report creation module generates a screen that displays the access tree generated by the session identification module 603, the access destination attribute provided by the access destination attribute assignment module 604, the access feature amount calculated by the access feature amount comparison module, and the like. At this time, a plurality of display methods are provided. As one method, all the above information is displayed for each user. As another method, only the access for which the result calculated by the access feature amount comparison module satisfies a preset condition is displayed. For example, it is possible to set conditions such as displaying only accesses where the number of occurrences of access is 10 or less and displaying only accesses where the access transition probability is 5% or less. Alternatively, information such as “warning” is shown in the report when these conditions are met. The generation module stores the generated report as an abnormal access report 188. Details of the abnormal access report 188 will be described with reference to FIG.

An example of the abnormal access report 188 will be described with reference to FIG. The abnormal access report 188 includes an access tree screen, user information, access information, and the like.

The access tree is an access tree generated by the session identification module 603, and when the administrator designates a period or a user, the corresponding access tree is displayed. In addition, when the conditions set by the report creation module are met, a warning screen is displayed.

User information is information on the user who performed the access. In addition to information obtained from the network log 185, information obtained from an external database may be displayed.

Access information is information on access performed by the corresponding user. The result calculated by the access feature amount comparison module is displayed.

An example of the processing flow of the access feature amount comparison module will be described with reference to FIG.

The flow for access feature amount comparison starts processing at the timing when the access destination attribute is given from the access destination attribute assignment module 604 and the network log 185 is received. The network log 185 received by the access feature amount comparison module is assigned an access destination attribute divided for each session. These have an access tree structure as shown in FIG. The access feature amount comparison module compares the access feature amounts shown in FIG. 11 for each session.

In step 1101, the access feature amount comparison flow reads one session from the network log 185 transmitted by the access destination attribute assignment module 604. Each access of the network log 185 is given an access destination attribute. After reading the session, the process proceeds to step 1102. In step 1102, the access feature amount comparison module selects one access destination from the access destinations included in the session. When performing this step first, select the access corresponding to the root node of the access tree. When this step is performed after the second time, the access corresponding to the child node of the access selected in the previous step is selected. If there are multiple child nodes, select one of them. Once selected, the access is stored internally. As described above, a depth-first search algorithm, a breadth-first search algorithm, and the like are known as methods for scanning tree-structured data.

In step 1103, the access feature amount comparison flow searches the appearance frequency of the access destination selected in step. The access destination has a plurality of access destination classes according to the classification criteria. The access feature amount comparison module searches the user base access feature amount and the group base feature amount for each access destination class, and acquires the access frequency. After the acquisition, go to step 1104.

In step 1104, the access feature value comparison module searches for an access destination corresponding to the child node of the currently selected access destination. After the search, go to step 1105.

In step 1105, the access feature amount comparison flow determines whether a child node exists in the currently selected access. If there is a child node, the process proceeds to step 1;

In step 1106, the access feature amount comparison module calculates a feature amount based on the transition frequency. A conditional probability or the like can be used as a feature quantity based on the transition frequency. The conditional probability is a probability that when an event A occurs, another event B occurs. The conditional probability Pr (B | A) is defined as Pr (B | A) = Pr (A, B) / Pr (A). Here, Pr (A, B) is the probability that A and B will occur simultaneously, and Pr (A) is the probability that A will occur. Calculate Pr (B | A) and Pr (A | B) with A as the parent node and B as the child node. Pr (B | A) represents the probability of selecting B as the next access destination when the user accesses the access destination A. Pr (A | B) represents the probability that the access before the access is A when the user accesses the access destination B. Both can be used as an index indicating the unusualness of access transition. The access feature amount comparison module calculates these probabilities for each user and for all users, and stores the results internally. Thereafter, the process proceeds to step 1107.

In step 1107, the access feature value comparison module determines whether all nodes among the nodes included in the access tree have been selected. When all nodes have been selected, the process ends. If a node that has not yet been selected remains, the process proceeds to step 1102.

Referring to FIG. 12, an example of a session determination flow is shown.

In step 1201, the session determination function reads the network log 185. The network log 185 is divided for each user by the user specifying module 205. The flow shown in FIG. 12 is processing for the network log 185 of one user. After reading the network log 185, the process proceeds to step 1202.

In step 1202, the session determination function rearranges the records of the network log 185 read in the step in order from the earliest access date. After the rearrangement, the process proceeds to step 1203.

In step 1203, the session determination function selects one access of the network log 185 rearranged in the step. If you are executing the step for the first time, select the first access. When executing the step from the second time onward, the next access is selected in chronological order. Here, the access set selected in the past in the step is defined as the past access set, the currently selected access is defined as the current access, and the access set not yet selected is defined as the future access set. After selecting one access, go to step 1204.

In step 1204, the session determination function confirms whether the access destination (current access) selected in step is a sub-access of the access included in the past access set. For this process, the session determination function searches the records in the automatic access list 209 for records in which the access destination of the current access is included in the sub-access destination set. If the corresponding record is found, a search is performed to determine whether the main access destination is included in the past access. If the main access destination is included in the past access set, the past access destination record is stored as a parent access candidate. Thereafter, the process proceeds to step 1205.

In step, the session determination function determines whether there is a main access corresponding to the current access. If the main access is found as a result of the determination, the process proceeds to step 1208.

In step 1206, the session determination function searches for an access having the same access destination URL as the referrer URL of the current access in the past access set. The corresponding past access is called a parent access candidate. Thereafter, the process proceeds to step 1207.

In step 1207, the session determination function determines whether the referrer of the current access exists in the past access set. As a result of the determination, if it exists, the process proceeds to step 1; otherwise, the process proceeds to step 1208.

In step 1208, the session determination function identifies the parent access for the current access. The access interval between the parent access candidate and the current access is calculated, and if the value is smaller than a preset threshold value, the parent access candidate is registered as the parent access for the current access. For example, 30 minutes is selected as the threshold value. If the access interval exceeds the threshold, there is no parent access. After identifying the parent access, go to step 1209.

In step, the session determination function confirms whether access exists in the future access set. If it exists, the process returns to step 1203. If it does not exist, the process proceeds to step 1210.

In step 1210, the session determination function tries to identify the parent access by using information other than the automatic access list 209 and the referrer information for the access in which the parent access could not be identified from step to step. In the steps so far, the session determination function generates several access trees from the network log 185. FIG. 14 shows an example of an access tree. Each access is represented as a node in the graph, and there is a directional edge between the parent access and the identified access. The session determination function performs processing for accesses not included in the access tree as shown in FIG. As information other than the automatic access list 209 and the referrer information, the periodicity of the access time can be used. When a plurality of accesses having the same URL or domain are accessed at a constant cycle, it is determined that these accesses are related to each other and a tree is formed. As a technique for extracting periodicity from access time, Fourier transform or the like can be used.

101: Network, 110: Internet, 120: External server, 130: Local area network, 140: Firewall, 150: Proxy server, 160: Administrator terminal, 170: Client, 180: Log analysis server, 181: Access destination classification Function, 182: Access feature amount extraction function, 183: Abnormal access extraction function, 184: Network log selection function, 185: Network log, 186: Access destination classification, 187: Access feature amount, 188: Abnormal access report

Claims

A method for detecting unauthorized access in a log analysis device for detecting unauthorized access from a network log on a computer or a network,
The log analysis device
Obtaining the network log to which the access destination attribute is assigned;
Classifying access destinations based on the access destination attributes and pre-stored rules;
Extracting characteristic information of normal access based on the network log and the classified access destination;
Comparing the access included in the network log with the access characteristic information, detecting the access as a normal access if they match, and detecting the access as an abnormal access if they do not match,
An unauthorized access detection method comprising:
The unauthorized access detection method according to claim 1,
The characteristic information of the normal access is a tendency of access to the classified access destination,
An unauthorized access detection method, comprising: comparing an access included in the network log with a tendency of the access at a normal time.
The unauthorized access detection method according to claim 2,
The access destination is a network log domain, or a category based on the content provided by an external device, or group information of the accessing user is further acquired to classify the access destination. Detection method.
A log analysis device that detects unauthorized access from network logs on a computer or network,
The log analysis device
A receiving unit for acquiring the network log to which the access destination attribute is assigned;
An access destination classification unit for classifying an access destination based on the access destination attribute and a rule stored in advance;
Based on the network log and the classified access destination, an access feature amount extraction unit that extracts feature information of a normal access;
An access included in the network log is compared with the access feature information, and if it matches, the access is detected as a normal access, and if it does not match, an abnormal access extraction unit that detects the access as an abnormal access,
An unauthorized access detection device comprising:
The unauthorized access detection device according to claim 4,
The characteristic information of the normal access is a tendency of access to the classified access destination,
The abnormal access extraction unit compares the access included in the network log with the access tendency in a normal state.
The unauthorized access detection device according to claim 5,
The access destination classification unit further classifies the access destination by further acquiring a category based on the content of the network log domain or the content provided by the external device, or group information of the accessing user. Unauthorized access detection device.