WO2016031034A1 - Apparatus and method for detecting unauthorized access - Google Patents

Apparatus and method for detecting unauthorized access Download PDF

Info

Publication number
WO2016031034A1
WO2016031034A1 PCT/JP2014/072670 JP2014072670W WO2016031034A1 WO 2016031034 A1 WO2016031034 A1 WO 2016031034A1 JP 2014072670 W JP2014072670 W JP 2014072670W WO 2016031034 A1 WO2016031034 A1 WO 2016031034A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
destination
network log
network
access destination
Prior art date
Application number
PCT/JP2014/072670
Other languages
French (fr)
Japanese (ja)
Inventor
進 芹田
哲郎 鬼頭
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2014/072670 priority Critical patent/WO2016031034A1/en
Publication of WO2016031034A1 publication Critical patent/WO2016031034A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Definitions

  • the present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.
  • the black list is a list that lists existing unauthorized access destinations. By comparing the access destination included in the network log with the unauthorized access destination included in the blacklist, the machine that accessed the unauthorized access destination can be identified.
  • Document 1 discloses a method for extracting the characteristics of a normal user's access from the access to the Web server and presenting the access greatly different from the characteristic as an abnormal access to the administrator.
  • network access to be analyzed is limited to access to a single Web server.
  • access to a plurality of servers by a plurality of users is recorded in a network log output by a proxy server or the like.
  • the access destination is generally identified by a URL (Uniform Resource Locator).
  • URL Uniform Resource Locator
  • multiple processes are running on the client, and each process accesses the network independently. For example, Internet access using a browser and access for OS updater are performed in parallel.
  • a network log such as a proxy, access of each machine is recorded in chronological order, and there is no record indicating which process is used for each access.
  • To extract the characteristics of access patterns it is necessary to identify a series of accesses of the same process, but it is difficult to identify a series of accesses from the network log.
  • the present invention extracts the characteristics of normal access from the network log even when the access destination is not limited to the Web server designated in advance, and abnormally accesses that deviate from the characteristics are detected.
  • the purpose is to present it as an access and detect unauthorized access.
  • the disclosed log analysis apparatus acquires a network log to which an access destination attribute is assigned, classifies the access destination based on the access destination attribute and a rule stored in advance, and based on the access destination classified as a network log
  • the feature information of the normal access is extracted, and the access included in the network log is compared with the access feature information, and when it matches, it is detected as normal access, and when it does not match, it is detected as abnormal access.
  • the present invention can detect unauthorized access by extracting the characteristics of normal access from the network log and presenting access that deviates from the characteristics as abnormal access.
  • FIG. 1 is a diagram showing an example of a system configuration. As shown in FIG. 1, this system includes an external server 120, a firewall 140, a proxy server, an administrator terminal 160, a client 170, a log analysis server 180, and the like, and these devices are connected to each other via a network 101. Configured.
  • the external server 120 is a server arranged on the Internet 110 and is accessed from the client 170 through the network. Generally, the external server 120 is used for providing various services such as information retrieval. However, malicious attackers may use it for illegal activities such as distributing malware. In addition, a malicious attacker may intrude into a legitimate server and modify the site to use it for an attack.
  • regular servers and servers used for illegal purposes are collectively referred to as the external server 120.
  • the firewall 140 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 130 and the Internet 110. By setting a condition that the access that does not pass through the proxy server 150 is blocked, all access between the external server 120 and the client 170 can be performed through the proxy server 150.
  • the proxy server 150 has a function of relaying packet exchange between the client 170 and the external server 120 and recording the access as a network log 185. By setting the conditions of the firewall 140 as described above, all accesses between the external server 120 and the client 170 can be recorded in the network log 185.
  • the log analysis server 180 analyzes the network log 185 and presents an abnormal access.
  • the log analysis server 180 includes a network log 185 selection function 184, an access destination classification function 181, an access feature quantity extraction function 182, an abnormal access extraction function 183, a network log 185, an access destination classification 186, an access feature quantity, and an abnormal access report. 188188 is provided inside. These functions may be arranged in one device or may be distributed in a plurality of devices.
  • the network log 185 is a record of network access output by a network device such as the proxy server 150.
  • the network log 185 is composed of a plurality of records. Each record corresponds to one access, the access date and time, the IP address of the access source client 170, the URL that identifies the access-destination external server 120 and the resource in the external server 120, and the User- set in the client 170 Includes the agent, referrer that indicates the URL of the link destination of the access destination, HTTP status code that indicates the access result, and authentication information that identifies the accessed user.
  • the proxy server 150 if the log of the firewall 140 or the client 170 can be used, it may be included in the network log 185.
  • the network log 185 output by each machine is transmitted to the log analysis server 180 and stored in the log analysis server 180.
  • the access destination classification function 181 classifies the access destinations included in the network log 185 according to a plurality of rules.
  • the access destination classification function 181 classifies the access destination using a domain, a category based on the contents of the content provided by the external server 120, a group of accessing users, or the like.
  • the access destination can be classified with a coarser granularity classification than the URL, and a sufficient number of samples can be obtained to extract the features of the access destination. Details of the access destination classification procedure will be described with reference to FIG.
  • the result of classifying the access destination is stored as an access destination classification 186.
  • the access feature quantity extraction function 182 uses the network log 185 and the access destination classification 186 to extract the features of normal access. Specifically, the number of accesses for each access destination, the number of transitions from one access destination to another access destination, and the like are used as features for each user and for all users. A detailed procedure for extracting the feature amount will be described with reference to FIG. The extracted feature quantity is stored as an access feature quantity. Details of the access feature amount will be described with reference to FIGS.
  • the abnormal access extraction function 183 extracts an abnormal access based on the network log 185, the access destination classification 186, and the access feature amount.
  • the access included in the network log 185 is compared with the access feature amount, and the number of past appearances of each access and the frequency of access transition are calculated.
  • the detailed procedure of abnormal access extraction will be described with reference to FIG.
  • the extracted abnormal access is stored as an abnormal access report 188188.
  • the administrator can confirm the abnormal access by browsing the abnormal access report 188188 using the administrator terminal 160. Details of the abnormal access report 188188 will be described with reference to FIG.
  • the network log selection function 184 has a function of selecting a part used by each function according to a rule set in advance for the network log 185.
  • the network log 185 includes a record of accesses for a certain period (for example, one year).
  • the administrator designates which part of the network log 185 is used for inputting each function according to the access date and time of the network log 185. For example, an access for a certain month is used for the input of the access destination classification function 181 and the access feature quantity extraction function 182, and the access for one week after that is used for the input of the abnormal access extraction function 183. Since the access destination feature value 187 is intended to characterize the normal access, the access used for inputting the access destination feature value 187 is selected from a period during which it is guaranteed that there is no unauthorized access.
  • the client 170 has a function of accessing the external server 120 via a network.
  • the client 170 may be infected with malware by executing an executable file attached to a forged mail. Or there is a possibility that a malicious person who has stolen the login password will gain unauthorized access.
  • the administrator terminal 160 has a function of logging in to the proxy server 150 and the log analysis server 180 and performing various operations.
  • the administrator uses the administrator terminal 160 to set parameters for processing performed by the log analysis server 180, view an abnormal access report 188188 output from the log analysis server 180, and the like.
  • auxiliary storage device such as a CPU (Central Processing Unit) and a hard disk drive
  • main storage device such as a ROM (Read Only Memory)
  • input device such as a keyboard and a mouse I ( (Input) / O (Output) interface
  • local area network 120 a network interface for connecting to the Internet 110, and the like.
  • the access destination extraction module 201 receives the network log 185 selected by the network log selection function 184.
  • the access destination extraction module 201 extracts an access destination external server 120 and a URL for identifying the location of the resource in the external server 120 (hereinafter referred to as an access destination URL) from the records included in the network log 185.
  • a set of access URLs can be acquired by eliminating duplicate access URLs.
  • the access destination extraction module 201 transmits the extracted access destination URL set to the domain extraction module 202.
  • the domain extraction module 202 extracts a domain name set from the access destination URL set.
  • the domain name is a part of a name for identifying a computer on the IP network.
  • a URL represented as http://www.example.com/page.html
  • www.examplel.com is the domain name.
  • domain names have a hierarchical structure, and domains can be defined in each hierarchy (level). Therefore, the domain name is extracted for each hierarchy.
  • domain names are extracted at three levels: top level domain: com, second level domain: example.com: third level domain: www.example.com.
  • Each extracted domain is stored in association with the original URL. If part of the URL is described with an IP address, the domain name is obtained by querying the DNS. If it cannot be obtained, it is saved as no domain name.
  • the domain extraction module 202 transmits the access destination URL set and the extracted domain set to the category extraction module.
  • the category determination module 203 determines the category of each access destination URL included in the access destination URL set.
  • the category is a classification based on the content of the access destination, and examples of the category include news, SNS (social network service), and video distribution. These category types are specified in advance.
  • Security vendors investigate which category a URL belongs to and publish the results. By using such a service, it is possible to determine to which category the access destination URL included in the access destination set belongs. It is also possible to actually access the access destination specified by the access destination URL and determine the category based on the acquired content.
  • This method can be further divided into manual determination and rule-based determination. In the manual determination, a human confirms the content, and determines to which category a preset category candidate belongs.
  • the classification based on the rule determines the acquired content based on the mechanical rule.
  • a mechanical rule for example, there is a method of preparing a correspondence table between content file formats and categories. There is also a method of determining a category by a statistical method such as clustering using the frequency of words included in the content as a feature amount.
  • the access destination may belong to a plurality of categories.
  • the category extraction module determines the category of each access destination URL included in the access destination URL set, and then outputs it as access destination URL information 208. Details of the access destination URL information 208 will be described with reference to FIG.
  • the network log shaping module 204 shapes the network log 185 into a form suitable for subsequent processing.
  • the network log 185 includes logs of a plurality of devices, there is a possibility that the clock of each device is shifted. In this case, the access order based on the access time recorded in the network log 185 is different from the actual access order.
  • the network log shaping module 204 corrects the access time included in the network log 185 to the correct access time in consideration of the clock deviation of the device. Also, the log format may be different for each device.
  • the network log shaping module 204 formats the network log 185 into a unified format.
  • the network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.
  • the user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user.
  • the user specifying module 205 specifies a user by using authentication information included in the record. If the authentication information cannot be acquired for some reason, the user is specified using the IP address of the access source client 170.
  • the user identification module 205 transmits the network log 185 divided for each user to the automatic access extraction module 208.
  • the automatic access extraction module 208 extracts automatic access from the access destinations included in the network log 185.
  • the automatic access destination refers to an access that is automatically triggered after accessing a certain access destination. For example, when a page of a news site with a browser is accessed, images and style sheets constituting the page are automatically downloaded without any user click or other operation. Even if the user only accesses the first page, the network log 185 also records access related to subsequent downloads.
  • the access destination that caused the automatic access is called the main access destination, and the access destination accessed by the automatic access is called the sub-access destination.
  • the automatic access extraction module 208 associates the main access destination with the sub access destinations (generally a plurality of access destinations) and stores them as an automatic access destination list.
  • the feature quantity extraction function generates an access tree representing a user access transition when extracting an access feature quantity.
  • the automatic access extraction module 208 extracts main access and sub access.
  • the automatic access extraction module 208 extracts main access and sub access based on information such as referrer, access interval, and repeat pattern.
  • information such as referrer, access interval, and repeat pattern.
  • the automatic access extraction module 208 arranges the network logs 185 divided for each user in order of access time. Then, the following processing is performed for each divided network log 185.
  • the automatic access extraction module 208 reads accesses included in the network log 185 in order from the top, and acquires a referrer each time. Searches whether the access destination represented by the acquired referrer exists before the current access. As a result of the search, if the corresponding access destination exists and the time interval between the corresponding access and the current access is smaller than a preset threshold, the corresponding access is determined as the main access and the current access is determined as the sub-access. .
  • the time interval condition is used to reflect the fact that sub-access occurs immediately after main access. A value such as 5 seconds is used as the threshold value.
  • the above is the automatic access extraction method using the referrer and the access interval.
  • the automatic access extraction module 208 arranges the network logs 185 divided for each user in the order of access times.
  • the divided network log 185 is divided into clusters having similar access times.
  • the network log 185 is read from the head, and when the access interval between two consecutive accesses becomes larger than a preset threshold value (for example, 5 seconds), the interval between the two accesses is set as a separation position. This process is performed on the network log 185 of all users. At this point, a plurality of groups composed of consecutive accesses are generated.
  • the automatic access extraction module 208 finds a set of accesses that repeatedly appear in this group set.
  • the automatic access destination list is used when the feature extraction function generates an access tree. Details of the automatic access list 209 will be described with reference to FIG.
  • the access destination clustering module 207 divides the access destination based on the characteristics of the user who is accessing the access destination.
  • the access destination clustering module 207 generates a correspondence table of access destinations and access users from the network log 185 divided for each user. Access destinations are classified according to multiple criteria such as URL and domain. An example of the correspondence table is shown in the figure. Clustering is performed using each record of these correspondence tables as an element. To perform clustering, it is necessary to define the similarity or distance function of each record. A Jaccard distance obtained by dividing the number of common elements of two access user ID sets by the number of union sets can be used as a distance function. For clustering, general methods such as hierarchical clustering can be used. As a result of clustering, a cluster to which each access destination belongs is obtained for each classification criterion. The access destination clustering module 207 stores the obtained cluster as an ac-cluster. Details of the access destination cluster will be described with reference to FIG.
  • the access destination URL information 208 includes an access destination URL 301, a domain 302, a category 303, and the like.
  • the access destination URL 301 is an access destination URL recorded in the network log 185.
  • the domain 302 is a domain name to which the access destination URL extracted by the domain extraction module 202 belongs.
  • a domain name is recorded for each domain hierarchy.
  • the category 303 is a category determined by the category extraction module.
  • the automatic access destination list 209 includes a main access destination 401, a sub access destination set 402, and the like.
  • the automatic access extraction module 208 extracts the main access destination and the sub access destination from the network log 185.
  • the sub access destinations are stored as a set as shown in FIG.
  • FIG. 5A shows an example of a cluster generated using a URL.
  • Fig. 5 (b) shows an example of a cluster generated using a domain (3 levels).
  • the access destination cluster includes a cluster ID 501, an access destination URL 502, and the like.
  • the cluster ID is an identifier that uniquely represents the cluster.
  • the access destination is a list of access destinations belonging to the cluster.
  • the network log shaping module 204 receives the network log 185 selected by the network log selection function 184. Then, the network log 185 is shaped into a format suitable for subsequent data processing. The processing of the network log shaping module 204 is the same as that described in FIG. The network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.
  • the user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user.
  • the processing of the user specifying module 205 is the same as that described in FIG.
  • the user identification module 205 transmits the network log 185 divided for each user to the session identification module 603.
  • the session identification module 603 divides the network log 185 of the same user into units called sessions.
  • a session refers to a series of accesses performed by a user or program for the same purpose.
  • a general method for determining a session using a web server log is to use an access time. If the access interval exceeds a certain threshold (for example, 30 minutes), it is determined as another session.
  • the network log 185 such as the proxy server 150 includes access for different purposes by the same machine. Therefore, these processes cannot be separated only by the time interval. Therefore, the session identification module 603 identifies a session using an automatic access destination list, referrer information included in the network log 185, and the like in addition to the access time interval. Detailed processing of the session identification module 603 will be described with reference to the drawings.
  • the session specifying module 603 stores the access included in each session as an access tree as shown in FIG.
  • the session identification module 603 transmits the network log 185 divided by the session to the access destination attribute assignment module 604.
  • the access destination attribute assignment module 604 assigns an access destination attribute to the network log 185 divided by the session.
  • the access destination attribute is attribute information stored in the access destination URL information 208 and the access destination cluster 210.
  • the access destination information addition module reads each record in the network log 185 and acquires the access destination URL. Then, the access destination URL information 208 and the access destination cluster 210 are searched using the access destination URL as a key, and the corresponding attribute information is acquired. Finally, the acquired attribute information is associated with the access destination and stored in a storage area inside the module. This series of processing is executed for all access destinations included in the network log 185.
  • the access destination information addition module transmits the network log 185 to which the access destination attribute is added to the frequency calculation module 605.
  • the frequency calculation module 605 calculates the appearance frequency and transition frequency of the access destination class based on the network log 185 to which the access destination attribute is assigned.
  • the frequency calculation module 605 specifies an access destination by various classification criteria such as an access destination URL, a domain, a category, and a cluster ID. An access destination specified by a certain classification standard is called an access destination class.
  • the frequency calculation module 605 calculates the number of times each access destination class is accessed from the network log 185 to which the access destination attribute is assigned. First, the frequency calculation module 605 calculates the number of times for each user. Next, the number of times is calculated for all users. These pieces of information are used by the abnormal access extraction function 183 to detect abnormal access focusing on a specific user and abnormal access as a whole user.
  • the frequency calculation module 605 calculates the number of times of transition from a certain access class to a certain access class.
  • the access class transition is determined as follows. In the access tree generated by the session extraction module, when the A node and the B node are connected by an edge, it is determined that a transition has occurred from the access class to which A belongs to the access class to which B belongs.
  • the frequency calculation module 605 follows all access trees and calculates the number of transitions between access classes. As with the appearance frequency, the frequency calculation module 605 calculates the number of transitions for each user and for the entire user.
  • the frequency calculation module 605 stores the calculated frequency as an access feature amount. Details of the access feature amount will be described with reference to FIGS.
  • the access destination feature quantity 187 includes a feature quantity defined for each user (user base access feature quantity) and a feature quantity defined for all users (group base access feature quantity).
  • Fig. 7 shows an example of user base access features.
  • the user base access feature amount is based on the frequency of access destinations, and is based on the transition frequency between access destinations.
  • FIG. 7A shows an example of a user base access feature quantity based on the frequency of a single access destination.
  • the user base access feature amount includes a classification standard 701, an access destination class identifier 702, an access frequency 703, and the like.
  • the classification standard indicates from which viewpoint the access destination is classified.
  • Access destination URL, domain, category, cluster ID, etc. can be used as classification criteria.
  • the access destination class identifier is information that uniquely identifies the access destination based on the classification criteria. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
  • the access frequency represents the number of times the user has accessed the access destination class represented by the access destination identifier.
  • Fig. 7 (b) shows an example of the user base access feature quantity based on the transition frequency of the access destination.
  • the user base access feature amount includes a classification standard, an access destination identifier, an access frequency, and the like.
  • the classification standard indicates from which viewpoint the access destination is classified.
  • Access destination URL, domain, category, cluster ID, etc. can be used as classification criteria.
  • the source class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
  • the destination class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
  • the access frequency represents the number of times the user has accessed the access destination represented by the destination class identifier after accessing the access destination represented by the source class identifier.
  • Fig. 8 shows an example of group-based access features. Similar to the user base access feature quantity, the group base access feature quantity is based on the frequency of a single access destination, and is based on the transition frequency between access destinations.
  • Fig. 8 (a) shows an example of group-based access features based on the frequency of a single access destination.
  • the group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard and the access destination class identifier are the same as those in FIG.
  • the access frequency is the total number of accesses of all users who have accessed the access class specified by the access destination class identifier.
  • the number of access users is the number of users who have accessed the access class specified by the access destination class identifier.
  • Fig. 8 (b) shows an example of group-based access feature quantity based on the transition frequency between access destinations.
  • the group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard, the source class identifier, and the destination class identifier are the same as those in FIG. 7B.
  • the transition frequency is the total access count of all users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier.
  • the number of access users is the number of users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier.
  • the abnormal access extraction function 183 includes a network log shaping module 204, a user identification module 205, a session identification module 603, an access attribute assignment module, an access feature amount comparison module, a report generation module, and the like.
  • the processing performed by the network log shaping module 204, the user identification module 205, the session identification module 603, and the access attribute assignment module is the same as the module described in FIG.
  • the abnormal access extraction function 183 holds the access tree divided for each session and the access destination information given to each access.
  • the access feature amount comparison module calculates the feature amount of each access based on this information. Specifically, the appearance frequency and transition probability of each access destination are calculated. Details of the access feature amount comparison module will be described with reference to the drawings.
  • the report creation module generates a report for the administrator to browse based on the feature quantity calculated by the access feature quantity comparison module.
  • the report creation module generates a screen that displays the access tree generated by the session identification module 603, the access destination attribute provided by the access destination attribute assignment module 604, the access feature amount calculated by the access feature amount comparison module, and the like.
  • a plurality of display methods are provided. As one method, all the above information is displayed for each user.
  • only the access for which the result calculated by the access feature amount comparison module satisfies a preset condition is displayed. For example, it is possible to set conditions such as displaying only accesses where the number of occurrences of access is 10 or less and displaying only accesses where the access transition probability is 5% or less. Alternatively, information such as “warning” is shown in the report when these conditions are met.
  • the generation module stores the generated report as an abnormal access report 188. Details of the abnormal access report 188 will be described with reference to FIG.
  • the abnormal access report 188 includes an access tree screen, user information, access information, and the like.
  • the access tree is an access tree generated by the session identification module 603, and when the administrator designates a period or a user, the corresponding access tree is displayed. In addition, when the conditions set by the report creation module are met, a warning screen is displayed.
  • User information is information on the user who performed the access.
  • information obtained from an external database may be displayed.
  • Access information is information on access performed by the corresponding user.
  • the result calculated by the access feature amount comparison module is displayed.
  • the flow for access feature amount comparison starts processing at the timing when the access destination attribute is given from the access destination attribute assignment module 604 and the network log 185 is received.
  • the network log 185 received by the access feature amount comparison module is assigned an access destination attribute divided for each session. These have an access tree structure as shown in FIG.
  • the access feature amount comparison module compares the access feature amounts shown in FIG. 11 for each session.
  • step 1101 the access feature amount comparison flow reads one session from the network log 185 transmitted by the access destination attribute assignment module 604. Each access of the network log 185 is given an access destination attribute. After reading the session, the process proceeds to step 1102.
  • step 1102 the access feature amount comparison module selects one access destination from the access destinations included in the session. When performing this step first, select the access corresponding to the root node of the access tree. When this step is performed after the second time, the access corresponding to the child node of the access selected in the previous step is selected. If there are multiple child nodes, select one of them. Once selected, the access is stored internally. As described above, a depth-first search algorithm, a breadth-first search algorithm, and the like are known as methods for scanning tree-structured data.
  • the access feature amount comparison flow searches the appearance frequency of the access destination selected in step.
  • the access destination has a plurality of access destination classes according to the classification criteria.
  • the access feature amount comparison module searches the user base access feature amount and the group base feature amount for each access destination class, and acquires the access frequency. After the acquisition, go to step 1104.
  • step 1104 the access feature value comparison module searches for an access destination corresponding to the child node of the currently selected access destination. After the search, go to step 1105.
  • step 1105 the access feature amount comparison flow determines whether a child node exists in the currently selected access. If there is a child node, the process proceeds to step 1;
  • the access feature amount comparison module calculates a feature amount based on the transition frequency.
  • a conditional probability or the like can be used as a feature quantity based on the transition frequency.
  • the conditional probability is a probability that when an event A occurs, another event B occurs.
  • A) is defined as Pr (B
  • A) Pr (A, B) / Pr (A).
  • Pr (A, B) is the probability that A and B will occur simultaneously, and Pr (A) is the probability that A will occur.
  • A) represents the probability of selecting B as the next access destination when the user accesses the access destination A.
  • B) represents the probability that the access before the access is A when the user accesses the access destination B. Both can be used as an index indicating the unusualness of access transition.
  • the access feature amount comparison module calculates these probabilities for each user and for all users, and stores the results internally. Thereafter, the process proceeds to step 1107.
  • step 1107 the access feature value comparison module determines whether all nodes among the nodes included in the access tree have been selected. When all nodes have been selected, the process ends. If a node that has not yet been selected remains, the process proceeds to step 1102.
  • step 1201 the session determination function reads the network log 185.
  • the network log 185 is divided for each user by the user specifying module 205.
  • the flow shown in FIG. 12 is processing for the network log 185 of one user. After reading the network log 185, the process proceeds to step 1202.
  • step 1202 the session determination function rearranges the records of the network log 185 read in the step in order from the earliest access date. After the rearrangement, the process proceeds to step 1203.
  • step 1203 the session determination function selects one access of the network log 185 rearranged in the step. If you are executing the step for the first time, select the first access. When executing the step from the second time onward, the next access is selected in chronological order.
  • the access set selected in the past in the step is defined as the past access set
  • the currently selected access is defined as the current access
  • the access set not yet selected is defined as the future access set. After selecting one access, go to step 1204.
  • step 1204 the session determination function confirms whether the access destination (current access) selected in step is a sub-access of the access included in the past access set. For this process, the session determination function searches the records in the automatic access list 209 for records in which the access destination of the current access is included in the sub-access destination set. If the corresponding record is found, a search is performed to determine whether the main access destination is included in the past access. If the main access destination is included in the past access set, the past access destination record is stored as a parent access candidate. Thereafter, the process proceeds to step 1205.
  • step 1208 the session determination function determines whether there is a main access corresponding to the current access. If the main access is found as a result of the determination, the process proceeds to step 1208.
  • step 1206 the session determination function searches for an access having the same access destination URL as the referrer URL of the current access in the past access set.
  • the corresponding past access is called a parent access candidate. Thereafter, the process proceeds to step 1207.
  • step 1207 the session determination function determines whether the referrer of the current access exists in the past access set. As a result of the determination, if it exists, the process proceeds to step 1; otherwise, the process proceeds to step 1208.
  • the session determination function identifies the parent access for the current access.
  • the access interval between the parent access candidate and the current access is calculated, and if the value is smaller than a preset threshold value, the parent access candidate is registered as the parent access for the current access. For example, 30 minutes is selected as the threshold value. If the access interval exceeds the threshold, there is no parent access. After identifying the parent access, go to step 1209.
  • step 12 the session determination function confirms whether access exists in the future access set. If it exists, the process returns to step 1203. If it does not exist, the process proceeds to step 1210.
  • the session determination function tries to identify the parent access by using information other than the automatic access list 209 and the referrer information for the access in which the parent access could not be identified from step to step.
  • the session determination function generates several access trees from the network log 185.
  • FIG. 14 shows an example of an access tree. Each access is represented as a node in the graph, and there is a directional edge between the parent access and the identified access.
  • the session determination function performs processing for accesses not included in the access tree as shown in FIG.
  • the periodicity of the access time can be used.
  • 101 Network
  • 110 Internet
  • 120 External server
  • 130 Local area network
  • 140 Firewall
  • 150 Proxy server
  • 160 Administrator terminal
  • 170 Client
  • 180 Log analysis server
  • 181 Access destination classification Function
  • 182 Access feature amount extraction function
  • 183 Abnormal access extraction function
  • 184 Network log selection function
  • 185 Network log
  • 186 Access destination classification
  • 187 Access feature amount
  • 188 Abnormal access report

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

In the prior art, it is not possible to detect unauthorized access by extracting an access feature from a network log storing records of access to a plurality of servers on the Internet. In this invention, a log analysis apparatus acquires a network log containing attributes of access destinations, classifies the access destinations on the basis of the attributes of the access destinations and pre-stored rules, extracts normal-time access feature information on the basis of the network log and the classified access destinations, compares accesses contained in the network log with the access feature information, and determines that an access is a normal access if the comparison result indicates a match and that the access is an abnormal access if the comparison result indicates a mismatch.

Description

[規則37.2に基づきISAが決定した発明の名称] 不正アクセスの検知方法および検知装置[Name of invention determined by ISA based on Rule 37.2] Unauthorized access detection method and device
 本発明は,マルウェアに感染したコンピュータなどが行う不正なネットワークアクセスを検知する技術に関する。 The present invention relates to technology for detecting unauthorized network access performed by malware-infected computers.
 ネットワークログから不正アクセスを検知する方法として,ブラックリストを用いる方法がある。ブラックリストは,既存の不正アクセス先を列挙したリストである。ネットワークログに含まれるアクセス先とブラックリストに含まれる不正アクセス先を照合することで,不正アクセス先にアクセスしたマシンを特定できる。 There is a method using a black list as a method of detecting unauthorized access from the network log. The black list is a list that lists existing unauthorized access destinations. By comparing the access destination included in the network log with the unauthorized access destination included in the blacklist, the machine that accessed the unauthorized access destination can be identified.
 しかし,ブラックリストを用いた方法では,ブラックリストに含まれる不正アクセス先を利用しない不正アクセスを検知することができない。それに対し,文献1は,Webサーバへのアクセスからユーザの正常時のアクセスの特徴を抽出し,その特徴と大きく異なるアクセスを異常アクセスとして,管理者に提示する手法を開示している。 However, the method using the black list cannot detect unauthorized access that does not use the unauthorized access destination included in the black list. On the other hand, Document 1 discloses a method for extracting the characteristics of a normal user's access from the access to the Web server and presenting the access greatly different from the characteristic as an abnormal access to the administrator.
米国特許公開第2011/0185421号US Patent Publication No. 2011/0185421
 しかし,特許文献1の手法は,解析対象となるネットワークアクセスが単一のWebサーバへのアクセスに限定される。一般に,プロキシサーバなどが出力するネットワークログには,複数のユーザによる複数のサーバへのアクセスが記録される。アクセス先は一般にURL(Uniform Resource Locator)で識別される。プロキシサーバなどのネットワークログに含まれるアクセス先をURLで識別し,特許文献1の手法を適用する場合,次の課題がある。 However, in the method of Patent Document 1, network access to be analyzed is limited to access to a single Web server. In general, access to a plurality of servers by a plurality of users is recorded in a network log output by a proxy server or the like. The access destination is generally identified by a URL (Uniform Resource Locator). When an access destination included in a network log such as a proxy server is identified by a URL and the method of Patent Document 1 is applied, there are the following problems.
 ユーザは,日々インターネット上の多くのページにアクセスするため,ネットワークログに含まれるアクセス先の数は,Webサーバ内のページの数に比べ,膨大である。そのため,完全に同じアクセスパタンでアクセスするユーザはほとんど存在しない。文献1の方法では,典型的なアクセスパタンの特徴量を抽出するために,一定量の同一アクセスパタンのサンプルが必要となる。アクセス先をURLで識別してアクセスパタンを抽出した場合,ほとんどすべてのアクセスが通常と違う,すなわち異常アクセスとして検知され,実用的ではない。 Since the user accesses many pages on the Internet every day, the number of access destinations included in the network log is enormous compared to the number of pages in the Web server. Therefore, there are few users who access with the same access pattern. In the method of Literature 1, in order to extract a characteristic amount of a typical access pattern, a certain amount of samples of the same access pattern are required. When the access destination is identified by the URL and the access pattern is extracted, almost all accesses are detected as unusual, that is, abnormal access, which is not practical.
 また,クライアント上では,複数のプロセスが起動しており,各プロセスが独立にネットワークアクセスを行う。例えば,ブラウザを利用したインターネットアクセスとOSのアップデータのためのアクセスが並行して行われる。プロキシなどのネットワークログには,各マシンのアクセスが時系列で記録され,各アクセスがどのプロセスに基づいて行われたかを示す記録はない。アクセスパタンの特徴を抽出するには,同一プロセスの一連のアクセスを特定する必要があるが,ネットワークログから一連のアクセスを特定することは困難である。 Also, multiple processes are running on the client, and each process accesses the network independently. For example, Internet access using a browser and access for OS updater are performed in parallel. In a network log such as a proxy, access of each machine is recorded in chronological order, and there is no record indicating which process is used for each access. To extract the characteristics of access patterns, it is necessary to identify a series of accesses of the same process, but it is difficult to identify a series of accesses from the network log.
 本発明は,上記の問題点を考慮し,アクセス先があらかじめ指定したWebサーバに限定されない場合であっても,ネットワークログから通常時のアクセスの特徴を抽出し,その特徴から外れたアクセスを異常アクセスとして提示し,不正アクセスを検知することを目的とする。 In consideration of the above problems, the present invention extracts the characteristics of normal access from the network log even when the access destination is not limited to the Web server designated in advance, and abnormally accesses that deviate from the characteristics are detected. The purpose is to present it as an access and detect unauthorized access.
 開示するログ解析装置は、アクセス先属性が付与されたネットワークログを取得し、アクセス先属性と予め記憶されたルールに基づいて、アクセス先を分類し、ネットワークログと分類されたアクセス先に基づいて、通常時のアクセスの特徴情報を抽出し、ネットワークログに含まれるアクセスを、アクセス特徴情報と比較して、一致する場合は通常アクセスとして検知し、不一致の場合は異常アクセスとして検知する。 The disclosed log analysis apparatus acquires a network log to which an access destination attribute is assigned, classifies the access destination based on the access destination attribute and a rule stored in advance, and based on the access destination classified as a network log The feature information of the normal access is extracted, and the access included in the network log is compared with the access feature information, and when it matches, it is detected as normal access, and when it does not match, it is detected as abnormal access.
 本発明は,ネットワークログから通常時のアクセスの特徴を抽出し,その特徴から外れたアクセスを異常アクセスとして提示することで,不正アクセスを検知することができる。 The present invention can detect unauthorized access by extracting the characteristics of normal access from the network log and presenting access that deviates from the characteristics as abnormal access.
システム構成の例を示した図である。It is the figure which showed the example of the system configuration. アクセス先分類機能の例を示した図である。It is the figure which showed the example of the access destination classification | category function. アクセス先URL情報の例を示した図である。It is the figure which showed the example of access destination URL information. 自動アクセス一覧の例を示した図である。It is the figure which showed the example of the automatic access list. アクセス先クラスタの例を示した図である。It is the figure which showed the example of the access destination cluster. アクセス特徴量抽出機能の例を示した図である。It is the figure which showed the example of the access feature-value extraction function. ユーザベースアクセス特徴量の例を示した図である。It is the figure which showed the example of the user base access feature-value. グループベースアクセス特徴慮の例を示した図である。It is the figure which showed the example of the group base access feature consideration. 異常アクセス抽出機能の例を示した図である。It is the figure which showed the example of the abnormal access extraction function. 異常アクセスレポートの例を示した図である。It is the figure which showed the example of the abnormal access report. アクセス特徴量比較の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of access feature-value comparison. セッション判定機能の処理フローの例を示した図である。It is the figure which showed the example of the processing flow of a session determination function. アクセス先クラスタリングに使うデータの例を示した図である。It is the figure which showed the example of the data used for access destination clustering. セッション判定モジュールが生成するアクセスツリーの例を示した図である。It is the figure which showed the example of the access tree which a session determination module produces | generates.
 以下,本発明を実施するための形態(以下,「実施形態」という。)について,適宜図面を参照しつつ,説明する。
  ≪実施形態≫
  以下,実施形態について,説明する。
Hereinafter, modes for carrying out the present invention (hereinafter referred to as “embodiments”) will be described with reference to the drawings as appropriate.
<Embodiment>
Hereinafter, embodiments will be described.
 図1は,システム構成の例を示した図である。図1に示すように,本システムは,外部サーバ120,ファイアウォール140,プロキシサーバ,管理者端末160,クライアント170,ログ解析サーバ180などを含み,これらの装置はネットワーク101を介して相互に接続されて構成される。 FIG. 1 is a diagram showing an example of a system configuration. As shown in FIG. 1, this system includes an external server 120, a firewall 140, a proxy server, an administrator terminal 160, a client 170, a log analysis server 180, and the like, and these devices are connected to each other via a network 101. Configured.
 外部サーバ120は,インターネット110上に配置されたサーバであり,クライアント170からネットワークを通じてアクセスされる。一般に,外部サーバ120は情報検索など各種サービスの提供に利用される。しかし,悪意を持った攻撃者が,マルウェアの配布など不正な活動に利用することもある。また,悪意を持った攻撃者が正規のサーバに不正に侵入して,サイトを改ざんして攻撃に利用する場合もある。ここでは,正規のサーバ,不正な目的に利用されるサーバを総括して外部サーバ120と呼ぶ。 The external server 120 is a server arranged on the Internet 110 and is accessed from the client 170 through the network. Generally, the external server 120 is used for providing various services such as information retrieval. However, malicious attackers may use it for illegal activities such as distributing malware. In addition, a malicious attacker may intrude into a legitimate server and modify the site to use it for an attack. Here, regular servers and servers used for illegal purposes are collectively referred to as the external server 120.
 ファイアウォール140はローカルエリアネットワーク130とインターネット110との間で,互いのネットワークを行き来するパケットの中から,特定の条件に合ったパケットを破棄(遮断)あるいは許可(通過)する機能を備える。プロキシサーバ150を経由しないアクセスは遮断するという条件を設定することで,外部サーバ120とクライアント170の間のアクセスをすべてプロキシサーバ150経由で行うことができる。 The firewall 140 has a function of discarding (blocking) or permitting (passing) a packet that meets a specific condition from among packets traveling between the local area network 130 and the Internet 110. By setting a condition that the access that does not pass through the proxy server 150 is blocked, all access between the external server 120 and the client 170 can be performed through the proxy server 150.
 プロキシサーバ150は,クライアント170と外部サーバ120のパケットのやり取りを中継し,そのアクセスをネットワークログ185として記録する機能を備える。上記のようにファイアウォール140の条件を設定することで,外部サーバ120とクライアント170の間の全てのアクセスをネットワークログ185に記録できる。 The proxy server 150 has a function of relaying packet exchange between the client 170 and the external server 120 and recording the access as a network log 185. By setting the conditions of the firewall 140 as described above, all accesses between the external server 120 and the client 170 can be recorded in the network log 185.
 ログ解析サーバ180は,ネットワークログ185を解析して,異常アクセスを提示する。ログ解析サーバ180は,内部にネットワークログ185選択機能184,アクセス先分類機能181,アクセス特徴量抽出機能182,異常アクセス抽出機能183,ネットワークログ185,アクセス先分類186,アクセス特徴量,異常アクセスレポート188188を内部に備える。これらの機能は,一つの機器に配置してもよいし,複数の機器に分散して配置してもよい。 The log analysis server 180 analyzes the network log 185 and presents an abnormal access. The log analysis server 180 includes a network log 185 selection function 184, an access destination classification function 181, an access feature quantity extraction function 182, an abnormal access extraction function 183, a network log 185, an access destination classification 186, an access feature quantity, and an abnormal access report. 188188 is provided inside. These functions may be arranged in one device or may be distributed in a plurality of devices.
 ネットワークログ185は,プロキシサーバ150などのネットワーク機器が出力したネットワークアクセスの記録である。ネットワークログ185は,複数のレコードから構成される。各レコードは,1つのアクセスに対応し,アクセス日時,アクセス元クライアント170のIPアドレス,アクセス先の外部サーバ120および外部サーバ120内のリソースの位置を識別するURL,クライアント170に設定されたUser-Agent,アクセス先のリンク元のURLを示すリファラ,アクセスの結果を示すHTTPステータスコード,アクセスしたユーザを識別する認証情報などを含む。プロキシサーバ150以外にも,ファイアウォール140やクライアント170のログなどが利用できる場合は,ネットワークログ185に含めてもよい。各マシンが出力したネットワークログ185はログ解析サーバ180へ送信され,ログ解析サーバ180に保存される。 The network log 185 is a record of network access output by a network device such as the proxy server 150. The network log 185 is composed of a plurality of records. Each record corresponds to one access, the access date and time, the IP address of the access source client 170, the URL that identifies the access-destination external server 120 and the resource in the external server 120, and the User- set in the client 170 Includes the agent, referrer that indicates the URL of the link destination of the access destination, HTTP status code that indicates the access result, and authentication information that identifies the accessed user. In addition to the proxy server 150, if the log of the firewall 140 or the client 170 can be used, it may be included in the network log 185. The network log 185 output by each machine is transmitted to the log analysis server 180 and stored in the log analysis server 180.
 アクセス先分類機能181は,ネットワークログ185に含まれるアクセス先を,複数のルールに従って分類する。一般に,アクセス先をURLで分類した場合,アクセス先の種類が膨大になり,アクセスの特徴を抽出するのに十分な過去のサンプルが得られない。そこで,アクセス先分類機能181は,ドメインや,外部サーバ120が提供するコンテンツの内容に基づくカテゴリや,アクセスしているユーザのグループなどを用いてアクセス先を分類する。それにより,URLより粗い粒度の分類でアクセス先を分類でき,アクセス先の特徴を抽出するのに十分な数のサンプルが得られる。アクセス先の分類の手順の詳細は,図2で説明する。アクセス先を分類した結果は,アクセス先分類186として保存される。 The access destination classification function 181 classifies the access destinations included in the network log 185 according to a plurality of rules. In general, when access destinations are classified by URL, the types of access destinations become enormous, and it is not possible to obtain sufficient past samples to extract access characteristics. Therefore, the access destination classification function 181 classifies the access destination using a domain, a category based on the contents of the content provided by the external server 120, a group of accessing users, or the like. As a result, the access destination can be classified with a coarser granularity classification than the URL, and a sufficient number of samples can be obtained to extract the features of the access destination. Details of the access destination classification procedure will be described with reference to FIG. The result of classifying the access destination is stored as an access destination classification 186.
 アクセス特徴量抽出機能182は,ネットワークログ185とアクセス先分類186を用いて,通常時のアクセスの特徴を抽出する。具体的には,各アクセス先のアクセス回数,あるアクセス先からあるアクセス先への遷移回数などを,ユーザごと,全ユーザで算出した値を特徴として利用する。特徴量の抽出の詳細な手順は図6で説明する。抽出された特徴量は,アクセス特徴量として保存される。アクセス特徴量の詳細は図7,8で説明する。 The access feature quantity extraction function 182 uses the network log 185 and the access destination classification 186 to extract the features of normal access. Specifically, the number of accesses for each access destination, the number of transitions from one access destination to another access destination, and the like are used as features for each user and for all users. A detailed procedure for extracting the feature amount will be described with reference to FIG. The extracted feature quantity is stored as an access feature quantity. Details of the access feature amount will be described with reference to FIGS.
 異常アクセス抽出機能183は,ネットワークログ185,アクセス先分類186,アクセス特徴量に基づいて,異常アクセスを抽出する。ネットワークログ185に含まれるアクセスを,アクセス特徴量と比較し,各アクセスの過去の出現回数やアクセス遷移の頻度などを計算する。異常アクセス抽出の詳細の手順は図9で説明する。抽出した異常アクセスは異常アクセスレポート188188として保存される。管理者が管理者端末160を利用して,異常アクセスレポート188188を閲覧することで,異常アクセスを確認できる。異常アクセスレポート188188の詳細は図10で説明する。 The abnormal access extraction function 183 extracts an abnormal access based on the network log 185, the access destination classification 186, and the access feature amount. The access included in the network log 185 is compared with the access feature amount, and the number of past appearances of each access and the frequency of access transition are calculated. The detailed procedure of abnormal access extraction will be described with reference to FIG. The extracted abnormal access is stored as an abnormal access report 188188. The administrator can confirm the abnormal access by browsing the abnormal access report 188188 using the administrator terminal 160. Details of the abnormal access report 188188 will be described with reference to FIG.
 ネットワークログ選択機能184は,ネットワークログ185を予め設定したルールに従って,各機能が使う部分を選択する機能を備える。ネットワークログ185は,一定期間(たとえば1年)のアクセスの記録を含む。管理者は,ネットワークログ185のうち,どの部分を各機能の入力に使うかを,ネットワークログ185のアクセス日時なで指定する。例えば,ある1ヵ月分のアクセスをアクセス先分類機能181とアクセス特徴量抽出機能182の入力に使用し,それより後の一週間のアクセスを異常アクセス抽出機能183の入力に使用する。アクセス先特徴量187は,通常時のアクセスを特徴づけたいので,アクセス先特徴量187の入力に使うアクセスは,不正アクセスないことが保障できている期間から選ぶ。 The network log selection function 184 has a function of selecting a part used by each function according to a rule set in advance for the network log 185. The network log 185 includes a record of accesses for a certain period (for example, one year). The administrator designates which part of the network log 185 is used for inputting each function according to the access date and time of the network log 185. For example, an access for a certain month is used for the input of the access destination classification function 181 and the access feature quantity extraction function 182, and the access for one week after that is used for the input of the abnormal access extraction function 183. Since the access destination feature value 187 is intended to characterize the normal access, the access used for inputting the access destination feature value 187 is selected from a period during which it is guaranteed that there is no unauthorized access.
 クライアント170は,ネットワークを介して外部サーバ120へアクセスする機能を備える。クライアント170は,偽造メールに添付された実行ファイルを実行するなどして,マルウェアに感染する可能性がある。あるいはログインパスワードを盗み出した悪意のある人物が,不正なアクセスを行う可能性がある。 The client 170 has a function of accessing the external server 120 via a network. The client 170 may be infected with malware by executing an executable file attached to a forged mail. Or there is a possibility that a malicious person who has stolen the login password will gain unauthorized access.
 管理者端末160は,プロキシサーバ150,ログ解析サーバ180へログインして各種の操作を行う機能を備える。管理者は,管理者端末160を用いて,ログ解析サーバ180が行う処理のパラメータの設定や,ログ解析サーバ180が出力する異常アクセスレポート188188の閲覧などを行う。 The administrator terminal 160 has a function of logging in to the proxy server 150 and the log analysis server 180 and performing various operations. The administrator uses the administrator terminal 160 to set parameters for processing performed by the log analysis server 180, view an abnormal access report 188188 output from the log analysis server 180, and the like.
 ネットワークに接続されたこれらの装置は,少なくともCPU(Central Processing Unit),ハードディスクドライブなどの補助記憶装置,ROM(Read Only Memory)などの主記憶装置,キーボードやマウスといった入力装置と接続されるI(Input)/O(Output)インターフェース,ローカルエリアネット120およびインターネット110に接続するためのネットワークインターフェースなどを備える。 These devices connected to the network include at least an auxiliary storage device such as a CPU (Central Processing Unit) and a hard disk drive, a main storage device such as a ROM (Read Only Memory), and an input device such as a keyboard and a mouse I ( (Input) / O (Output) interface, a local area network 120, a network interface for connecting to the Internet 110, and the like.
 図2を参照して,アクセス先分類機能181の詳細を説明する。 Details of the access destination classification function 181 will be described with reference to FIG.
 アクセス先抽出モジュール201は,ネットワークログ選択機能184により選択されたネットワークログ185を受信する。アクセス先抽出モジュール201は,ネットワークログ185に含まれるレコードから,アクセス先の外部サーバ120および外部サーバ120内のリソースの位置を識別するURL(以下アクセス先URLと呼ぶ)を抽出する。重複するアクセス先URLを排除することで,アクセス先URLの集合を取得できる。アクセス先抽出モジュール201は,抽出したアクセス先URL集合をドメイン抽出モジュール202へ送信する。 The access destination extraction module 201 receives the network log 185 selected by the network log selection function 184. The access destination extraction module 201 extracts an access destination external server 120 and a URL for identifying the location of the resource in the external server 120 (hereinafter referred to as an access destination URL) from the records included in the network log 185. A set of access URLs can be acquired by eliminating duplicate access URLs. The access destination extraction module 201 transmits the extracted access destination URL set to the domain extraction module 202.
 ドメイン抽出モジュール202は,アクセス先URL集合からドメイン名集合を抽出する。ここで,ドメイン名とは,IPネットワーク上でコンピュータを識別する名称の一部である。たとえば,http://www.example.com/page.htmlと表されるURLの場合,www.examplel.comがドメイン名になる。しかし,ドメイン名は階層構造を持ち,それぞれの階層(レベル)でドメインが定義できる。そのため,階層ごとにドメイン名を抽出する。上記の例では,トップレベルドメイン:com,セカンドレベルドメイン:example.com:サードレベルドメイン:www.example.comと3つのレベルでドメイン名を抽出する。抽出した各ドメインはもとのURLと紐付けて保存する。また,URLの一部がIPアドレスで記述されている場合は,DNSに問い合わせてドメイン名を取得する。取得できなかった場合は,ドメイン名なしとして保存する。ドメイン抽出モジュール202は,アクセス先URL集合と抽出したドメイン集合をカテゴリ抽出モジュールに送信する。 The domain extraction module 202 extracts a domain name set from the access destination URL set. Here, the domain name is a part of a name for identifying a computer on the IP network. For example, in the case of a URL represented as http://www.example.com/page.html, www.examplel.com is the domain name. However, domain names have a hierarchical structure, and domains can be defined in each hierarchy (level). Therefore, the domain name is extracted for each hierarchy. In the above example, domain names are extracted at three levels: top level domain: com, second level domain: example.com: third level domain: www.example.com. Each extracted domain is stored in association with the original URL. If part of the URL is described with an IP address, the domain name is obtained by querying the DNS. If it cannot be obtained, it is saved as no domain name. The domain extraction module 202 transmits the access destination URL set and the extracted domain set to the category extraction module.
 カテゴリ判定モジュール203は,アクセス先URL集合に含まれる各アクセス先URLのカテゴリを判定する。ここで,カテゴリとは,アクセス先のコンテンツに基づいた分類であり,カテゴリの例として,ニュース,SNS(ソーシャルネットワークサービス),動画配信などがある。これらのカテゴリの種類は,あらかじめ指定しておく。セキュリティベンダなどは,あるURLがどのカテゴリに属するかを調査し,結果を公開している。そのようなサービスを利用することで,アクセス先集合に含まれるアクセス先URLがどのカテゴリに属するかを判定できる。また,アクセス先URLで指定されるアクセス先へ,実際にアクセスを行い,取得したコンテンツに基づいてカテゴリを判定することもできる。この方法はさらに,人手による判定とルールに基づいた判定に分けられる。人手による判定は,人間がコンテンツを確認し,あらかじめ設定したカテゴリの候補のうち,どのカテゴリに属するかを判定する。一方,ルールに基づいた分類は,取得したコンテンツを機械的なルールに基づいて判定する。機械的なルールとして,たとえば,コンテンツのファイル形式と属するカテゴリの対応表を用意する方法がある。また,コンテンツに含まれる単語頻度などを特徴量に使い,クラスタリングなどの統計的手法でカテゴリを判定する方法もある。アクセス先は,複数のカテゴリに属する場合もある。カテゴリ抽出モジュールは,アクセス先URL集合に含まれる各アクセス先URLのカテゴリを判定した後,アクセス先URL情報208として出力する。アクセス先URL情報208の詳細は図3で説明する。 The category determination module 203 determines the category of each access destination URL included in the access destination URL set. Here, the category is a classification based on the content of the access destination, and examples of the category include news, SNS (social network service), and video distribution. These category types are specified in advance. Security vendors investigate which category a URL belongs to and publish the results. By using such a service, it is possible to determine to which category the access destination URL included in the access destination set belongs. It is also possible to actually access the access destination specified by the access destination URL and determine the category based on the acquired content. This method can be further divided into manual determination and rule-based determination. In the manual determination, a human confirms the content, and determines to which category a preset category candidate belongs. On the other hand, the classification based on the rule determines the acquired content based on the mechanical rule. As a mechanical rule, for example, there is a method of preparing a correspondence table between content file formats and categories. There is also a method of determining a category by a statistical method such as clustering using the frequency of words included in the content as a feature amount. The access destination may belong to a plurality of categories. The category extraction module determines the category of each access destination URL included in the access destination URL set, and then outputs it as access destination URL information 208. Details of the access destination URL information 208 will be described with reference to FIG.
 ネットワークログ整形モジュール204は,ネットワークログ185を,その後の処理に適した形に整形する。ネットワークログ185が複数の機器のログを含む場合,各機器の時計がずれている可能性がある。その場合,ネットワークログ185に記録されたアクセス時刻に基づくアクセスの順番が実際のアクセスの順番と異なる。ネットワークログ整形モジュール204は,機器の時計のずれを考慮して,ネットワークログ185に含まれるアクセス時刻を正しいアクセス時刻に修正する。また,各機器でログのフォーマットが異なる可能性もある。ネットワークログ整形モジュール204は,ネットワークログ185を統一的なフォーマットに整形する。ネットワークログ整形モジュール204は,整形したネットワークログ185をユーザ特定モジュール205に送信する。 The network log shaping module 204 shapes the network log 185 into a form suitable for subsequent processing. When the network log 185 includes logs of a plurality of devices, there is a possibility that the clock of each device is shifted. In this case, the access order based on the access time recorded in the network log 185 is different from the actual access order. The network log shaping module 204 corrects the access time included in the network log 185 to the correct access time in consideration of the clock deviation of the device. Also, the log format may be different for each device. The network log shaping module 204 formats the network log 185 into a unified format. The network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.
 ユーザ特定モジュール205は,ネットワークログ185に含まれる各アクセスを行ったユーザを特定し,ネットワークログ185をユーザごとに分割する。ユーザ特定モジュール205は,レコードに含まれる認証情報を使うことでユーザを特定する。何らかの理由で認証情報が取得できない場合は,アクセス元クライアント170のIPアドレスを用いてユーザを特定する。ユーザ特定モジュール205は,ユーザごとに分割したネットワークログ185を自動アクセス抽出モジュール208に送信する。 The user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user. The user specifying module 205 specifies a user by using authentication information included in the record. If the authentication information cannot be acquired for some reason, the user is specified using the IP address of the access source client 170. The user identification module 205 transmits the network log 185 divided for each user to the automatic access extraction module 208.
 自動アクセス抽出モジュール208は,ネットワークログ185に含まれるアクセス先のうち自動アクセスを抽出する。ここで,自動アクセス先とは,あるアクセス先にアクセスした後に,自動的に引き起こされるアクセスを指す。たとえば,ブラウザがあるニュースサイトのページにアクセスすると,そのページを構成する画像やスタイルシートなどが,ユーザによるクリックなどの操作がなくても,自動的にダウンロードされる。ユーザは,はじめのページにアクセスしただけでも,ネットワークログ185には,その後のダウンロードに関するアクセスも記録される。自動アクセスを引き起こしたアクセス先をメインアクセス先,自動アクセスによりアクセスされたアクセス先をサブアクセス先と呼ぶ。自動アクセス抽出モジュール208は,メインアクセス先とサブアクセス先(一般に複数存在)を関連付けて,自動アクセス先一覧として保存する。特徴量抽出機能は,アクセス特徴量を抽出する際に,ユーザのアクセスの遷移を表すアクセスツリーを生成する。 The automatic access extraction module 208 extracts automatic access from the access destinations included in the network log 185. Here, the automatic access destination refers to an access that is automatically triggered after accessing a certain access destination. For example, when a page of a news site with a browser is accessed, images and style sheets constituting the page are automatically downloaded without any user click or other operation. Even if the user only accesses the first page, the network log 185 also records access related to subsequent downloads. The access destination that caused the automatic access is called the main access destination, and the access destination accessed by the automatic access is called the sub-access destination. The automatic access extraction module 208 associates the main access destination with the sub access destinations (generally a plurality of access destinations) and stores them as an automatic access destination list. The feature quantity extraction function generates an access tree representing a user access transition when extracting an access feature quantity.
 以下に,自動アクセス抽出モジュール208が,メインアクセスとサブアクセスを抽出する方法を示す。自動アクセス抽出モジュール208は,リファラ,アクセス間隔,繰り返しパタンなどの情報を元に,メインアクセスとサブアクセスを抽出する。はじめに,アクセス間隔と繰り返しパタンに基づいた自動アクセス抽出の方法を説明する。 The following describes how the automatic access extraction module 208 extracts main access and sub access. The automatic access extraction module 208 extracts main access and sub access based on information such as referrer, access interval, and repeat pattern. First, an automatic access extraction method based on access intervals and repetition patterns will be described.
 まず,自動アクセス抽出モジュール208は,ユーザごとに分割されたネットワークログ185をアクセス時刻の順番で並べる。そして,分割されたネットワークログ185ごとに以下の処理を行う。自動アクセス抽出モジュール208は,ネットワークログ185に含まれるアクセスを先頭から順に読み込み,そのつど,リファラを取得する。取得したリファラで表されるアクセス先が,現在のアクセスより前に存在するかを検索する。検索した結果,該当するアクセス先が存在し,かつ該当するアクセスと現在のアクセスの時間間隔が,あらかじめ設定した閾値よりも小さい場合,該当するアクセスをメインアクセス,現在のアクセスをサブアクセスと判定する。時間間隔のよる条件は,サブアクセスはメインアクセスの直後すぐに起きるという事実を反映するために用いる。閾値としては,5秒などの値を用いる。以上がリファラとアクセス間隔を用いた自動アクセス抽出の方法である。 First, the automatic access extraction module 208 arranges the network logs 185 divided for each user in order of access time. Then, the following processing is performed for each divided network log 185. The automatic access extraction module 208 reads accesses included in the network log 185 in order from the top, and acquires a referrer each time. Searches whether the access destination represented by the acquired referrer exists before the current access. As a result of the search, if the corresponding access destination exists and the time interval between the corresponding access and the current access is smaller than a preset threshold, the corresponding access is determined as the main access and the current access is determined as the sub-access. . The time interval condition is used to reflect the fact that sub-access occurs immediately after main access. A value such as 5 seconds is used as the threshold value. The above is the automatic access extraction method using the referrer and the access interval.
 次に,アクセス間隔と繰り返しパタンに基づいた自動アクセス抽出の方法を説明する。まず,自動アクセス抽出モジュール208は,ユーザごとに分割されたネットワークログ185をアクセス時刻の順番で並べる。次に,分割されたネットワークログ185をアクセス時刻が近い固まりに分割する。具体的には,ネットワークログ185を先頭から読み込み,連続する2つのアクセスのアクセス間隔があらかじめ設定した閾値(たとえば5秒)より大きくなった場合に,2つのアクセスの間を区切り位置とする。この処理をすべてのユーザのネットワークログ185に行う。この時点で,連続するアクセスから構成されるグループが複数生成される。次に,自動アクセス抽出モジュール208は,このグループ集合に,繰り返し現れるアクセスの組を発見する。たとえば,三つのグループ(A,B,C),(A,D,B),(E,A,B)を考えると,(A,B)というのが繰り返し現れるアクセス組となる。このような繰り返し現れるアクセスの組を発見する手法はいくつか知られている。たとえば文献(後で追加)を参照。次に,発見したアクセスの組の時間間隔を調べ,があらかじめ設定した閾値(たとえば5秒)より小さい場合に,アクセス時刻が早いアクセス先をメインアクセス先,アクセス時刻が遅いアクセス先をサブアクセス先として登録する。 Next, an automatic access extraction method based on access intervals and repetition patterns will be described. First, the automatic access extraction module 208 arranges the network logs 185 divided for each user in the order of access times. Next, the divided network log 185 is divided into clusters having similar access times. Specifically, the network log 185 is read from the head, and when the access interval between two consecutive accesses becomes larger than a preset threshold value (for example, 5 seconds), the interval between the two accesses is set as a separation position. This process is performed on the network log 185 of all users. At this point, a plurality of groups composed of consecutive accesses are generated. Next, the automatic access extraction module 208 finds a set of accesses that repeatedly appear in this group set. For example, considering three groups (A, B, C), (A, D, B), (E, A, B), (A, B) is an access set that appears repeatedly. Several techniques are known for finding such a repetitive set of accesses. See for example literature (added later). Next, when the time interval of the discovered access pair is examined and is smaller than a preset threshold value (for example, 5 seconds), the access destination with the earlier access time is the main access destination, and the access destination with the later access time is the sub-access destination. Register as
 自動アクセス先一覧は,特徴量抽出機能が,アクセスツリーを生成する際に,利用される。自動アクセス一覧209の詳細は,図4で説明する。 The automatic access destination list is used when the feature extraction function generates an access tree. Details of the automatic access list 209 will be described with reference to FIG.
 アクセス先クラスタリングモジュール207は,アクセス先をそのアクセス先にアクセスしているユーザの特性に基づいて分割する。アクセス先クラスタリングモジュール207は,ユーザごとに分割されたネットワークログ185から,アクセス先とアクセスユーザの対応表を生成する。アクセス先は,URLやドメインなど複数の基準で分類する。対応表の例を図に示す。これらの対応表の各レコードを要素としてクラスタリングを行う。クラスタリングを実行するには,各レコードの類似度あるいは距離関数を定義する必要がある。2つのアクセスユーザID集合の共通要素の数を和集合の数で割るJaccard距離などが距離関数として利用できる。クラスタリングは,階層的クラスタリングなど一般的な手法が利用できる。クラスタリングの結果,それぞれの分類基準において,各アクセス先が属するクラスタが求まる。アクセス先クラスタリングモジュール207は,求めたクラスタをアククラスタとして保存する。アクセス先クラスタの詳細は図5で説明する。 The access destination clustering module 207 divides the access destination based on the characteristics of the user who is accessing the access destination. The access destination clustering module 207 generates a correspondence table of access destinations and access users from the network log 185 divided for each user. Access destinations are classified according to multiple criteria such as URL and domain. An example of the correspondence table is shown in the figure. Clustering is performed using each record of these correspondence tables as an element. To perform clustering, it is necessary to define the similarity or distance function of each record. A Jaccard distance obtained by dividing the number of common elements of two access user ID sets by the number of union sets can be used as a distance function. For clustering, general methods such as hierarchical clustering can be used. As a result of clustering, a cluster to which each access destination belongs is obtained for each classification criterion. The access destination clustering module 207 stores the obtained cluster as an ac-cluster. Details of the access destination cluster will be described with reference to FIG.
 図3を参照してアクセス先URL情報208の例を説明する。アクセス先URL情報208は,アクセス先URL301,ドメイン302,カテゴリ303などを含む。 An example of the access destination URL information 208 will be described with reference to FIG. The access destination URL information 208 includes an access destination URL 301, a domain 302, a category 303, and the like.
 アクセス先URL301はネットワークログ185に記録されたアクセス先URLである。ドメイン302は,ドメイン抽出モジュール202が抽出したアクセス先URLが属するドメイン名である。ドメインの階層ごとにドメイン名が記録される。カテゴリ303は,カテゴリ抽出モジュールが判定したカテゴリである。 The access destination URL 301 is an access destination URL recorded in the network log 185. The domain 302 is a domain name to which the access destination URL extracted by the domain extraction module 202 belongs. A domain name is recorded for each domain hierarchy. The category 303 is a category determined by the category extraction module.
 図4を参照して自動アクセス一覧209の例を説明する。自動アクセス先一覧209は,メインアクセス先401,サブアクセス先集合402などを含む。 An example of the automatic access list 209 will be described with reference to FIG. The automatic access destination list 209 includes a main access destination 401, a sub access destination set 402, and the like.
 自動アクセス抽出モジュール208は,ネットワークログ185から,メインアクセス先とサブアクセス先を抽出する。一般にひとつのメインアクセス先に,複数のサブアクセス先が対応するので,図4のように,サブアクセス先は集合として保存される。 The automatic access extraction module 208 extracts the main access destination and the sub access destination from the network log 185. In general, since a plurality of sub access destinations correspond to one main access destination, the sub access destinations are stored as a set as shown in FIG.
 図5を参照してアクセス先クラスタ210の例を説明する。クラスタはアクセス先の分類基準ごとに生成される。図5(a)に,URLを用いて生成したクラスタの例を示す。図5(b)に,ドメイン(3レベル)を用いて生成したクラスタの例を示す。 An example of the access destination cluster 210 will be described with reference to FIG. A cluster is generated for each access destination classification criterion. FIG. 5A shows an example of a cluster generated using a URL. Fig. 5 (b) shows an example of a cluster generated using a domain (3 levels).
 アクセス先クラスタは,クラスタID501,アクセス先URL502などを含む。クラスタIDはクラスタを一意に表す識別子である。アクセス先は,そのクラスタに属するアクセス先の一覧である。 The access destination cluster includes a cluster ID 501, an access destination URL 502, and the like. The cluster ID is an identifier that uniquely represents the cluster. The access destination is a list of access destinations belonging to the cluster.
 図6を参照して,アクセス特徴量抽出機能182の詳細を説明する。 Details of the access feature amount extraction function 182 will be described with reference to FIG.
 ネットワークログ整形モジュール204は,ネットワークログ選択機能184により選択されたネットワークログ185を受信する。そして,その後のデータ処理に適したフォーマットにネットワークログ185を整形する。ネットワークログ整形モジュール204の処理は,図2で説明した内容と同じである。ネットワークログ整形モジュール204は,整形したネットワークログ185をユーザ特定モジュール205に送信する。 The network log shaping module 204 receives the network log 185 selected by the network log selection function 184. Then, the network log 185 is shaped into a format suitable for subsequent data processing. The processing of the network log shaping module 204 is the same as that described in FIG. The network log shaping module 204 transmits the shaped network log 185 to the user specifying module 205.
 ユーザ特定モジュール205は,ネットワークログ185に含まれる各アクセスを行ったユーザを特定し,ネットワークログ185をユーザごとに分割する。ユーザ特定モジュール205の処理は,図2で説明した内容と同じである。ユーザ特定モジュール205は,ユーザごとに分割したネットワークログ185をセッション特定モジュール603に送信する。 The user identification module 205 identifies the user who has made each access included in the network log 185, and divides the network log 185 for each user. The processing of the user specifying module 205 is the same as that described in FIG. The user identification module 205 transmits the network log 185 divided for each user to the session identification module 603.
 セッション特定モジュール603は,同一ユーザのネットワークログ185をセッションと呼ばれる単位で分割する。セッションはユーザあるいはプログラムが行う同一の目的で行う一連のアクセスを指す。Webサーバのログを使ってセッションを判定する一般的な方法はアクセス時刻を使う方法である。アクセス間隔がある閾値(たとえば30分)を超えた場合は,別のセッションと判定する。しかし,プロキシサーバ150などのネットワークログ185は,同一マシンによる異なる目的のアクセスを含む。そのため,時間間隔だけでは,それらのプロセスを分離できない。そのため,セッション特定モジュール603は,アクセス時間間隔に加え,自動アクセス先一覧や,ネットワークログ185に含まれるリファラの情報などを使って,セッションを特定する。セッション特定モジュール603の詳細な処理は,図で説明する。セッション特定モジュール603は,各セッションに含まれるアクセスを図14のようなアクセスツリーとして記憶する。セッション特定モジュール603は,セッションで分割したネットワークログ185をアクセス先属性付与モジュール604に送信する。 The session identification module 603 divides the network log 185 of the same user into units called sessions. A session refers to a series of accesses performed by a user or program for the same purpose. A general method for determining a session using a web server log is to use an access time. If the access interval exceeds a certain threshold (for example, 30 minutes), it is determined as another session. However, the network log 185 such as the proxy server 150 includes access for different purposes by the same machine. Therefore, these processes cannot be separated only by the time interval. Therefore, the session identification module 603 identifies a session using an automatic access destination list, referrer information included in the network log 185, and the like in addition to the access time interval. Detailed processing of the session identification module 603 will be described with reference to the drawings. The session specifying module 603 stores the access included in each session as an access tree as shown in FIG. The session identification module 603 transmits the network log 185 divided by the session to the access destination attribute assignment module 604.
 アクセス先属性付与モジュール604は,セッションで分割したネットワークログ185にアクセス先属性を付与する。アクセス先属性は,アクセス先URL情報208,アクセス先クラスタ210に保存された属性情報である。アクセス先情報付与モジュールは,ネットワークログ185の各レコードを読み込み,アクセス先URLを取得する。そして,アクセス先URLをキーにして,アクセス先URL情報208,アクセス先クラスタ210を検索し,該当する属性情報を取得する。最後に,取得した属性情報をアクセス先と紐付けてモジュール内部の記憶領域に保存する。この一連の処理を,ネットワークログ185に含まれるすべてのアクセス先に対して実行する。アクセス先情報付与モジュールは,アクセス先属性を付与したネットワークログ185を頻度計算モジュール605に送信する。 The access destination attribute assignment module 604 assigns an access destination attribute to the network log 185 divided by the session. The access destination attribute is attribute information stored in the access destination URL information 208 and the access destination cluster 210. The access destination information addition module reads each record in the network log 185 and acquires the access destination URL. Then, the access destination URL information 208 and the access destination cluster 210 are searched using the access destination URL as a key, and the corresponding attribute information is acquired. Finally, the acquired attribute information is associated with the access destination and stored in a storage area inside the module. This series of processing is executed for all access destinations included in the network log 185. The access destination information addition module transmits the network log 185 to which the access destination attribute is added to the frequency calculation module 605.
 頻度計算モジュール605は,アクセス先属性が付与されたネットワークログ185を元に,アクセス先クラスの出現頻度,遷移頻度などを計算する。頻度計算モジュール605は,アクセス先を,アクセス先URL,ドメイン,カテゴリ,クラスタIDなど様々な分類基準で指定する。ある分類基準で指定するアクセス先をアクセス先クラスと呼ぶ。頻度計算モジュール605は,アクセス先属性が付与されたネットワークログ185から,各アクセス先クラスがアクセスされた回数を計算する。まず,頻度計算モジュール605は,ユーザごとに回数を計算する。次に,全ユーザを対象にして回数を計算する。これらの情報は,異常アクセス抽出機能183は,特定のユーザに着目した異常アクセス,およびユーザ全体としての異常アクセスを検出するのに利用される。また,頻度計算モジュール605は,あるアクセスクラスからあるアクセスクラスへ遷移した回数を計算する。アクセスクラスの遷移は次のように判定する。セッション抽出モジュールが生成したアクセスツリーにおいて,AノードとBノードがエッジで結ばれる場合,Aが属するアクセスクラスからBが属するアクセスクラスに遷移が起きたと判定する。頻度計算モジュール605は,全てのアクセスツリーをたどり,各アクセスクラス間の遷移回数を計算する。出現頻度の場合と同様,頻度計算モジュール605は,遷移回数をユーザごと,および,ユーザ全体に対して計算する。頻度計算モジュール605は,計算した頻度をアクセス特徴量として保存する。アクセス特徴量の詳細は図7,8で説明する。 The frequency calculation module 605 calculates the appearance frequency and transition frequency of the access destination class based on the network log 185 to which the access destination attribute is assigned. The frequency calculation module 605 specifies an access destination by various classification criteria such as an access destination URL, a domain, a category, and a cluster ID. An access destination specified by a certain classification standard is called an access destination class. The frequency calculation module 605 calculates the number of times each access destination class is accessed from the network log 185 to which the access destination attribute is assigned. First, the frequency calculation module 605 calculates the number of times for each user. Next, the number of times is calculated for all users. These pieces of information are used by the abnormal access extraction function 183 to detect abnormal access focusing on a specific user and abnormal access as a whole user. The frequency calculation module 605 calculates the number of times of transition from a certain access class to a certain access class. The access class transition is determined as follows. In the access tree generated by the session extraction module, when the A node and the B node are connected by an edge, it is determined that a transition has occurred from the access class to which A belongs to the access class to which B belongs. The frequency calculation module 605 follows all access trees and calculates the number of transitions between access classes. As with the appearance frequency, the frequency calculation module 605 calculates the number of transitions for each user and for the entire user. The frequency calculation module 605 stores the calculated frequency as an access feature amount. Details of the access feature amount will be described with reference to FIGS.
 図7,図8を参照してアクセス先特徴量187の例を説明する。アクセス先特徴量187は,ユーザごとに定義される特徴量(ユーザベースアクセス特徴量)と全ユーザ共通で定義される特徴量(グループベースアクセス特徴量)を含む。 An example of the access destination feature quantity 187 will be described with reference to FIGS. The access destination feature quantity 187 includes a feature quantity defined for each user (user base access feature quantity) and a feature quantity defined for all users (group base access feature quantity).
 図7にユーザベースアクセス特徴量の例を示す。ユーザベースアクセス特徴量は,アクセス先の頻度に基づくものと,アクセス先間の遷移頻度に基づくものなどがある。図7(a)にアクセス先単体の頻度に基づくユーザベースアクセス特徴量の例を示す。 Fig. 7 shows an example of user base access features. The user base access feature amount is based on the frequency of access destinations, and is based on the transition frequency between access destinations. FIG. 7A shows an example of a user base access feature quantity based on the frequency of a single access destination.
 ユーザベースアクセス特徴量は,分類基準701,アクセス先クラス識別子702,アクセス頻度703などを含む。 The user base access feature amount includes a classification standard 701, an access destination class identifier 702, an access frequency 703, and the like.
 分類基準は,アクセス先をどの観点で分類するかを表す。分類基準として,アクセス先URL,ドメイン,カテゴリ,クラスタIDなどが利用できる。 The classification standard indicates from which viewpoint the access destination is classified. Access destination URL, domain, category, cluster ID, etc. can be used as classification criteria.
 アクセス先クラス識別子は,分類基準に基づいてアクセス先を一意に識別する情報である。たとえば,分類基準がURLの場合は,アクセス先URLが識別子として利用できる。 The access destination class identifier is information that uniquely identifies the access destination based on the classification criteria. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
 アクセス頻度は,アクセス先識別子で表されるアクセス先クラスにユーザがアクセスした回数を表す。 The access frequency represents the number of times the user has accessed the access destination class represented by the access destination identifier.
 図7(b)にアクセス先の遷移頻度に基づくユーザベースアクセス特徴量の例を示す。ユーザベースアクセス特徴量は,分類基準,アクセス先識別子,アクセス頻度などを含む。 Fig. 7 (b) shows an example of the user base access feature quantity based on the transition frequency of the access destination. The user base access feature amount includes a classification standard, an access destination identifier, an access frequency, and the like.
 分類基準は,アクセス先をどの観点で分類するかを表す。分類基準として,アクセス先URL,ドメイン,カテゴリ,クラスタIDなどが利用できる。 The classification standard indicates from which viewpoint the access destination is classified. Access destination URL, domain, category, cluster ID, etc. can be used as classification criteria.
 ソースクラス識別子は,分類基準に基づいてアクセス先を一意に識別する情報である。たとえば,分類基準がURLの場合は,アクセス先URLが識別子として利用できる。
デスティネーションクラス識別子は,分類基準に基づいてアクセス先を一意に識別する情報である。たとえば,分類基準がURLの場合は,アクセス先URLが識別子として利用できる。
The source class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
The destination class identifier is information that uniquely identifies an access destination based on a classification criterion. For example, when the classification standard is URL, the access destination URL can be used as an identifier.
 アクセス頻度は,ソースクラス識別子で表されるアクセス先にアクセスした後,デスティネーションクラス識別子で表されるアクセス先にユーザがアクセスした回数を表す。 The access frequency represents the number of times the user has accessed the access destination represented by the destination class identifier after accessing the access destination represented by the source class identifier.
 図8にグループベースアクセス特徴量の例を示す。ユーザベースアクセス特徴量と同じく,グループベースアクセス特徴量も,アクセス先単体の頻度に基づくものと,アクセス先間の遷移頻度に基づくものなどがある。 Fig. 8 shows an example of group-based access features. Similar to the user base access feature quantity, the group base access feature quantity is based on the frequency of a single access destination, and is based on the transition frequency between access destinations.
 図8(a)にアクセス先単体の頻度に基づくグループベースアクセス特徴量の例を示す。グループベースアクセス特徴量は,分類基準,アクセス先クラス識別子,アクセス頻度,アクセスユーザ数などを含む。これらのうち,分類基準,アクセス先クラス識別子の役割は図7(a)と同じである。アクセス頻度は,アクセス先クラス識別子で指定されるアクセスクラスにアクセスした全ユーザのアクセス回数の合計である。アクセスユーザ数は,アクセス先クラス識別子で指定されるアクセスクラスにアクセスしたユーザの数である。 Fig. 8 (a) shows an example of group-based access features based on the frequency of a single access destination. The group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard and the access destination class identifier are the same as those in FIG. The access frequency is the total number of accesses of all users who have accessed the access class specified by the access destination class identifier. The number of access users is the number of users who have accessed the access class specified by the access destination class identifier.
 図8(b)にアクセス先間の遷移頻度に基づくグループベースアクセス特徴量の例を示す。グループベースアクセス特徴量は,分類基準,アクセス先クラス識別子,アクセス頻度,アクセスユーザ数などを含む。これらのうち,分類基準,ソースクラス識別子,デスティネーションクラス識別子の役割は図7(b)と同じである。遷移頻度は,ソースクラス識別子で指定されるアクセスクラスへアクセスした後に,デスティネーションクラス識別子で指定されるアクセスクラスにアクセスした全ユーザのアクセス回数の合計である。アクセスユーザ数は,ソースクラス識別子で指定されるアクセスクラスへアクセスした後に,デスティネーションクラス識別子で指定されるアクセスクラスにアクセスしたユーザの数である。 Fig. 8 (b) shows an example of group-based access feature quantity based on the transition frequency between access destinations. The group-based access feature amount includes classification criteria, access destination class identifier, access frequency, number of access users, and the like. Among these, the roles of the classification standard, the source class identifier, and the destination class identifier are the same as those in FIG. 7B. The transition frequency is the total access count of all users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier. The number of access users is the number of users who have accessed the access class specified by the destination class identifier after accessing the access class specified by the source class identifier.
 図9を参照して異常アクセス抽出機能183の詳細を説明する。異常アクセス抽出機能183は,ネットワークログ整形モジュール204,ユーザ特定モジュール205,セッション特定モジュール603,アクセス属性付与モジュール,アクセス特徴量比較モジュール,レポート生成モジュールなどから構成される。これらのうち,ネットワークログ整形モジュール204,ユーザ特定モジュール205,セッション特定モジュール603,アクセス属性付与モジュールが行う処理は,図6で説明したモジュールと同一である。 Details of the abnormal access extraction function 183 will be described with reference to FIG. The abnormal access extraction function 183 includes a network log shaping module 204, a user identification module 205, a session identification module 603, an access attribute assignment module, an access feature amount comparison module, a report generation module, and the like. Among these, the processing performed by the network log shaping module 204, the user identification module 205, the session identification module 603, and the access attribute assignment module is the same as the module described in FIG.
 アクセス特徴量付与モジュールの処理が終わると,異常アクセス抽出機能183は,セッションごとに分割されたアクセスツリーと各アクセスに付与されたアクセス先情報を保持している。アクセス特徴量比較モジュールは,これらの情報をもとに,各アクセスの特徴量を計算する。具体的には,各アクセス先の出現頻度,遷移確率などを計算する。アクセス特徴量比較モジュールの詳細は図で説明する。 When the processing of the access feature amount giving module is completed, the abnormal access extraction function 183 holds the access tree divided for each session and the access destination information given to each access. The access feature amount comparison module calculates the feature amount of each access based on this information. Specifically, the appearance frequency and transition probability of each access destination are calculated. Details of the access feature amount comparison module will be described with reference to the drawings.
 レポート作成モジュールは,アクセス特徴量比較モジュールが計算した特徴量に基づいて,管理者が閲覧するためのレポートを生成する。レポート作成モジュールは,セッション特定モジュール603が生成したアクセスツリー,アクセス先属性付与モジュール604が付与したアクセス先属性,アクセス特徴量比較モジュールが計算したアクセスの特徴量などを表示する画面を生成する。この時,複数の表示の方法を提供する。ひとつの方法として,上記の情報をユーザごとにすべて表示する。別の方法として,アクセス特徴量比較モジュールが計算した結果が,あらかじめ設定した条件に当てはまるアクセスのみ表示する。たとえば,アクセスの出現回数が10以下のアクセスのみ表示,アクセス遷移確率が5%以下のアクセスのみ表示,という条件を設定できる。あるいは,これらの条件に当てはまる場合に,レポート中に「警告」などの情報を示す。生成モジュールは生成したレポートを異常アクセスレポート188として保存する。異常アクセスレポー188の詳細は図10で説明する。 The report creation module generates a report for the administrator to browse based on the feature quantity calculated by the access feature quantity comparison module. The report creation module generates a screen that displays the access tree generated by the session identification module 603, the access destination attribute provided by the access destination attribute assignment module 604, the access feature amount calculated by the access feature amount comparison module, and the like. At this time, a plurality of display methods are provided. As one method, all the above information is displayed for each user. As another method, only the access for which the result calculated by the access feature amount comparison module satisfies a preset condition is displayed. For example, it is possible to set conditions such as displaying only accesses where the number of occurrences of access is 10 or less and displaying only accesses where the access transition probability is 5% or less. Alternatively, information such as “warning” is shown in the report when these conditions are met. The generation module stores the generated report as an abnormal access report 188. Details of the abnormal access report 188 will be described with reference to FIG.
 図10を参照して,異常アクセスレポート188の例を説明する。異常アクセスレポート188は,アクセスツリー画面,ユーザ情報,アクセス情報,などを含む。 An example of the abnormal access report 188 will be described with reference to FIG. The abnormal access report 188 includes an access tree screen, user information, access information, and the like.
 アクセスツリーは,セッション特定モジュール603が生成するアクセスツリーであり,管理者が期間やユーザなどを指定すると,該当するアクセスツリーが表示される。また,レポート作成モジュールが設定した条件に合う場合に,警告などの画面表示を行う。 The access tree is an access tree generated by the session identification module 603, and when the administrator designates a period or a user, the corresponding access tree is displayed. In addition, when the conditions set by the report creation module are met, a warning screen is displayed.
 ユーザ情報は,アクセスを行ったユーザの情報である。ネットワークログ185から得られる情報の他,外部のデータベースから取得した情報を表示してもよい。 User information is information on the user who performed the access. In addition to information obtained from the network log 185, information obtained from an external database may be displayed.
 アクセス情報は,該当ユーザが行ったアクセスの情報である。アクセス特徴量比較モジュールが計算した結果などを表示する。 Access information is information on access performed by the corresponding user. The result calculated by the access feature amount comparison module is displayed.
 図11を参照して,アクセス特徴量比較モジュールの処理フローの例を説明する。 An example of the processing flow of the access feature amount comparison module will be described with reference to FIG.
 アクセス特徴量比較にフローは,アクセス先属性付与モジュール604から,アクセス先属性が付与されネットワークログ185を受信したタイミングで処理を開始する。アクセス特徴量比較モジュールが受信したネットワークログ185は,セッションごとに分割された,アクセス先属性が付与されている。これらは図14のようなアクセスツリーの構造を持つ。アクセス特徴量比較モジュールは,セッションごとに,図11に示すアクセス特徴量の比較を行う。 The flow for access feature amount comparison starts processing at the timing when the access destination attribute is given from the access destination attribute assignment module 604 and the network log 185 is received. The network log 185 received by the access feature amount comparison module is assigned an access destination attribute divided for each session. These have an access tree structure as shown in FIG. The access feature amount comparison module compares the access feature amounts shown in FIG. 11 for each session.
 ステップ1101において,アクセス特徴量比較フローは,アクセス先属性付与モジュール604が送信したネットワークログ185からひとつのセッションを読み込む。ネットワークログ185の各アクセスには,アクセス先属性が付与されている。セッションを読み込んだ後,ステップ1102に進む
 ステップ1102において,アクセス特徴量比較モジュールはセッションに含まれるアクセス先からひとつのアクセス先を選ぶ。はじめにこのステップを行う場合は,アクセスツリーのルートノードに対応するアクセスを選ぶ。二回目以降にこのステップを行う場合は,前のステップで選択したアクセスの子ノードに対応するアクセスを選ぶ。子ノードが複数存在する場合は,そのうちのひとつを選ぶ。1度選んだアクセスは内部で記憶しておく。このように,ツリー構造のデータを走査していく方法として,深さ優先探索,幅優先探索アルゴリズムなどが知られている。
In step 1101, the access feature amount comparison flow reads one session from the network log 185 transmitted by the access destination attribute assignment module 604. Each access of the network log 185 is given an access destination attribute. After reading the session, the process proceeds to step 1102. In step 1102, the access feature amount comparison module selects one access destination from the access destinations included in the session. When performing this step first, select the access corresponding to the root node of the access tree. When this step is performed after the second time, the access corresponding to the child node of the access selected in the previous step is selected. If there are multiple child nodes, select one of them. Once selected, the access is stored internally. As described above, a depth-first search algorithm, a breadth-first search algorithm, and the like are known as methods for scanning tree-structured data.
 ステップ1103において,アクセス特徴量比較フローは,ステップにおいて選択したアクセス先の出現頻度を検索する。アクセス先は,分類基準に応じて,複数のアクセス先クラスを持つ。アクセス特徴量比較モジュールは,アクセス先クラスごとに,ユーザベースアクセス特徴量,グループベース特徴量を検索して,アクセス頻度を取得する。取得した後ステップ1104に進む。 In step 1103, the access feature amount comparison flow searches the appearance frequency of the access destination selected in step. The access destination has a plurality of access destination classes according to the classification criteria. The access feature amount comparison module searches the user base access feature amount and the group base feature amount for each access destination class, and acquires the access frequency. After the acquisition, go to step 1104.
 ステップ1104において,アクセス特徴量比較モジュールは,現在選択しているアクセス先の子ノードにあたるアクセス先検索する。検索した後ステップ1105に進む。 In step 1104, the access feature value comparison module searches for an access destination corresponding to the child node of the currently selected access destination. After the search, go to step 1105.
 ステップ1105において,アクセス特徴量比較フローは,現在選択しているアクセスに子ノードが存在するかを判定する。子ノードが存在する場合,ステップへ,存在しない場合ステップ1106へ進む。 In step 1105, the access feature amount comparison flow determines whether a child node exists in the currently selected access. If there is a child node, the process proceeds to step 1;
 ステップ1106において,アクセス特徴量比較モジュールは,遷移頻度に基づく特徴量を計算する。遷移頻度に基づく特徴量として,条件付き確率などが利用できる。条件確率は,ある事象Aが発生した際に,別の事象Bが発生するか確率である。条件確率Pr(B|A)はPr(B|A)=Pr(A,B)/Pr(A)と定義される。ここでPr(A,B)はAとBが同時に発生する確率であり,Pr(A)はAが発生する確率である。Aを親ノード,Bを子ノードとしてPr(B|A),Pr(A|B)を計算する。Pr(B|A)はユーザがアクセス先Aにアクセスした時に,次のアクセス先としてBを選ぶ確率を表す。Pr(A|B)はユーザがアクセス先Bにアクセスした時に,そのアクセス前のアクセスがAである確率を表す。どちらもアクセス遷移の珍しさを表す指標として利用できる。アクセス特徴量比較モジュールは,これらの確率を,ユーザごと,ユーザ全体で計算し,結果を内部に記憶する。その後ステップ1107に進む。 In step 1106, the access feature amount comparison module calculates a feature amount based on the transition frequency. A conditional probability or the like can be used as a feature quantity based on the transition frequency. The conditional probability is a probability that when an event A occurs, another event B occurs. The conditional probability Pr (B | A) is defined as Pr (B | A) = Pr (A, B) / Pr (A). Here, Pr (A, B) is the probability that A and B will occur simultaneously, and Pr (A) is the probability that A will occur. Calculate Pr (B | A) and Pr (A | B) with A as the parent node and B as the child node. Pr (B | A) represents the probability of selecting B as the next access destination when the user accesses the access destination A. Pr (A | B) represents the probability that the access before the access is A when the user accesses the access destination B. Both can be used as an index indicating the unusualness of access transition. The access feature amount comparison module calculates these probabilities for each user and for all users, and stores the results internally. Thereafter, the process proceeds to step 1107.
 ステップ1107において,アクセス特徴量比較モジュールは,アクセスツリーに含まれるノードのうち全てのノードを選択し終わったかを判定する。全てのノードを選択し終わった場合は,処理を終了する。まだ選択していないノードが残っている場合は,ステップ1102に進む。 In step 1107, the access feature value comparison module determines whether all nodes among the nodes included in the access tree have been selected. When all nodes have been selected, the process ends. If a node that has not yet been selected remains, the process proceeds to step 1102.
 図12を参照して,セッション判定のフローの例を示す。 Referring to FIG. 12, an example of a session determination flow is shown.
 ステップ1201において,セッション判定機能は,ネットワークログ185を読み込む。ネットワークログ185は,ユーザ特定モジュール205により,ユーザごとに分割される。図12に示すフローは,1ユーザのネットワークログ185に対する処理である。ネットワークログ185を読み込んだ後,ステップ1202に進む。 In step 1201, the session determination function reads the network log 185. The network log 185 is divided for each user by the user specifying module 205. The flow shown in FIG. 12 is processing for the network log 185 of one user. After reading the network log 185, the process proceeds to step 1202.
 ステップ1202において,セッション判定機能は,ステップで読み込んだネットワークログ185の各レコードを,アクセス日時が早い順に並び替える。並び替えた後,ステップ1203に進む。 In step 1202, the session determination function rearranges the records of the network log 185 read in the step in order from the earliest access date. After the rearrangement, the process proceeds to step 1203.
 ステップ1203において,セッション判定機能は,ステップで並び替えたネットワークログ185ひとつのアクセスを選択する。はじめてステップを実行する場合は,先頭のアクセスを選択する。二回目以降にステップを実行する場合は,時系列順で次のアクセスを選択する。ここで,ステップで過去に選んだアクセスの集合を過去アクセス集合,現在選択しているアクセスを現在アクセス,まだ選択していないアクセス集合を未来アクセス集合と定義する。ひとつのアクセスを選択した後ステップ1204に進む。 In step 1203, the session determination function selects one access of the network log 185 rearranged in the step. If you are executing the step for the first time, select the first access. When executing the step from the second time onward, the next access is selected in chronological order. Here, the access set selected in the past in the step is defined as the past access set, the currently selected access is defined as the current access, and the access set not yet selected is defined as the future access set. After selecting one access, go to step 1204.
 ステップ1204において,セッション判定機能は,ステップにおいて選択したアクセス先(現在アクセス)が,過去アクセス集合に含まれるアクセスのサブアクセスであるかを確認する。この処理のため,セッション判定機能は,自動アクセス一覧209のレコードで,現在アクセスのアクセス先がサブアクセス先集合に含まれるレコードを検索する。該当するレコードが見つかった場合は,そのメインアクセス先が過去アクセスに含まれるかを検索する。メインアクセス先が過去アクセス集合に含まれる場合は,その過去アクセス先のレコードを親アクセス候補として記憶する。その後,ステップ1205に進む。 In step 1204, the session determination function confirms whether the access destination (current access) selected in step is a sub-access of the access included in the past access set. For this process, the session determination function searches the records in the automatic access list 209 for records in which the access destination of the current access is included in the sub-access destination set. If the corresponding record is found, a search is performed to determine whether the main access destination is included in the past access. If the main access destination is included in the past access set, the past access destination record is stored as a parent access candidate. Thereafter, the process proceeds to step 1205.
 ステップにおいて,セッション判定機能は,現在アクセスに対応するメインアクセスがあるかを判断する。判断の結果,メインアクセスがある場合は,ステップ1208に,メインアクセスない場合は,ステップ1206に進む。 In step, the session determination function determines whether there is a main access corresponding to the current access. If the main access is found as a result of the determination, the process proceeds to step 1208.
 ステップ1206において,セッション判定機能は,現在アクセスのリファラURLと同一のアクセス先URLを持つアクセスが過去アクセス集合に存在するかを検索する。該当する過去アクセスを親アクセス候補と呼ぶ。その後,ステップ1207に進む。 In step 1206, the session determination function searches for an access having the same access destination URL as the referrer URL of the current access in the past access set. The corresponding past access is called a parent access candidate. Thereafter, the process proceeds to step 1207.
 ステップ1207において,セッション判定機能は,現在アクセスのリファラが過去アクセス集合に存在するかを判断する。判断の結果,存在する場合は,ステップに,存在しない場合は,ステップ1208に進む。 In step 1207, the session determination function determines whether the referrer of the current access exists in the past access set. As a result of the determination, if it exists, the process proceeds to step 1; otherwise, the process proceeds to step 1208.
 ステップ1208において,セッション判定機能は,現在アクセスに対する親アクセスを同定する。親アクセス候補と現在アクセスのアクセス間隔を算出し,その値があらかじめ設定した閾値より小さければ,親アクセス候補を現在アクセスの親アクセスとして登録する。閾値として,たとえば30分を選択する。アクセス間隔が閾値を越える場合は,親アクセスなしとする。親アクセスを同定した後,ステップ1209に進む。 In step 1208, the session determination function identifies the parent access for the current access. The access interval between the parent access candidate and the current access is calculated, and if the value is smaller than a preset threshold value, the parent access candidate is registered as the parent access for the current access. For example, 30 minutes is selected as the threshold value. If the access interval exceeds the threshold, there is no parent access. After identifying the parent access, go to step 1209.
 ステップにおいて,セッション判定機能は,未来アクセス集合にアクセスが存在するかを確認する。存在する場合は,ステップ1203に戻り,存在しない場合は,ステップ1210に進む。 In step, the session determination function confirms whether access exists in the future access set. If it exists, the process returns to step 1203. If it does not exist, the process proceeds to step 1210.
 ステップ1210において,セッション判定機能は,ステップからステップまでで親アクセスを同定できなかったアクセスに対して,自動アクセス一覧209,リファラの情報以外の情報を使って親アクセスの同定を試みる。これまでのステップでセッション判定機能は,ネットワークログ185からいくつかのアクセスツリーを生成する。図14にアクセスツリーの例を示す。各アクセスはグラフのノードとして表され,親アクセスと同定されたアクセスとの間には,向きを持ったエッジが存在する。セッション判定機能は,図14のようなアクセスツリーに含まれないアクセスに対して,処理を行う。自動アクセス一覧209,リファラの情報以外の情報として,アクセス時間の周期性などが利用できる。同一のURLあるいはドメインを持つ複数のアクセスが,一定周期でアクセスが行われた場合,これらのアクセスは互いに関係すると判断し,ツリーを構成する。アクセス時間から周期性を抽出する技術としてフーリエ変換などが利用できる。 In step 1210, the session determination function tries to identify the parent access by using information other than the automatic access list 209 and the referrer information for the access in which the parent access could not be identified from step to step. In the steps so far, the session determination function generates several access trees from the network log 185. FIG. 14 shows an example of an access tree. Each access is represented as a node in the graph, and there is a directional edge between the parent access and the identified access. The session determination function performs processing for accesses not included in the access tree as shown in FIG. As information other than the automatic access list 209 and the referrer information, the periodicity of the access time can be used. When a plurality of accesses having the same URL or domain are accessed at a constant cycle, it is determined that these accesses are related to each other and a tree is formed. As a technique for extracting periodicity from access time, Fourier transform or the like can be used.
101:ネットワーク,110:インターネット,120:外部サーバ,130:ローカルエリアネットワーク,140:ファイアウォール,150:プロキシサーバ,160:管理者用端末,170:クライアント,180:ログ解析サーバ,181:アクセス先分類機能,182:アクセス特徴量抽出機能,183:異常アクセス抽出機能,184:ネットワークログ選択機能,185:ネットワークログ,186:アクセス先分類,187:アクセス特徴量,188:異常アクセスレポート 101: Network, 110: Internet, 120: External server, 130: Local area network, 140: Firewall, 150: Proxy server, 160: Administrator terminal, 170: Client, 180: Log analysis server, 181: Access destination classification Function, 182: Access feature amount extraction function, 183: Abnormal access extraction function, 184: Network log selection function, 185: Network log, 186: Access destination classification, 187: Access feature amount, 188: Abnormal access report

Claims (6)

  1.  コンピュータやネットワーク上のネットワークログから不正なアクセスを検知するログ解析装置における不正アクセスの検知方法であって、
     前記ログ解析装置は、
     アクセス先属性が付与された前記ネットワークログを取得するステップと、
     前記アクセス先属性と予め記憶されたルールに基づいて、アクセス先を分類するステップと、
     前記ネットワークログと前記分類されたアクセス先に基づいて、通常時のアクセスの特徴情報を抽出するステップと、
     前記ネットワークログに含まれるアクセスを、前記アクセス特徴情報と比較して、一致する場合は前記アクセスを通常アクセスとして検知し、不一致の場合は前記アクセスを異常アクセスとして検知するステップとを、
    備えることを特徴とする不正アクセス検知方法。
    A method for detecting unauthorized access in a log analysis device for detecting unauthorized access from a network log on a computer or a network,
    The log analysis device
    Obtaining the network log to which the access destination attribute is assigned;
    Classifying access destinations based on the access destination attributes and pre-stored rules;
    Extracting characteristic information of normal access based on the network log and the classified access destination;
    Comparing the access included in the network log with the access characteristic information, detecting the access as a normal access if they match, and detecting the access as an abnormal access if they do not match,
    An unauthorized access detection method comprising:
  2.  請求項1に記載の不正アクセス検知方法であって、
     前記通常時のアクセスの特徴情報とは、前記分類されたアクセス先へのアクセスの傾向であり、
     前記ネットワークログに含まれるアクセスを通常時の前記アクセスの傾向と比較することを特徴とする不正アクセス検知方法。
    The unauthorized access detection method according to claim 1,
    The characteristic information of the normal access is a tendency of access to the classified access destination,
    An unauthorized access detection method, comprising: comparing an access included in the network log with a tendency of the access at a normal time.
  3.  請求項2に記載の不正アクセス検知方法であって、
     前記アクセス先は、前記ネットワークログのドメイン、又は外部装置が提供するコンテンツの内容に基づくカテゴリ、又はアクセスしているユーザのグループ情報を更に取得してアクセス先を分類する
    ことを特徴とする不正アクセス検知方法。
    The unauthorized access detection method according to claim 2,
    The access destination is a network log domain, or a category based on the content provided by an external device, or group information of the accessing user is further acquired to classify the access destination. Detection method.
  4.  コンピュータやネットワーク上のネットワークログから不正なアクセスを検知するログ解析装置であって、
     前記ログ解析装置は、
     アクセス先属性が付与された前記ネットワークログを取得する受信部と、
     前記アクセス先属性と予め記憶されたルールに基づいて、アクセス先を分類するアクセス先分類部と、
     前記ネットワークログと前記分類されたアクセス先に基づいて、通常時のアクセスの特徴情報を抽出するアクセス特徴量抽出部と、
     前記ネットワークログに含まれるアクセスを、前記アクセス特徴情報と比較して、一致する場合は前記アクセスを通常アクセスとして検知し、不一致の場合は前記アクセスを異常アクセスとして検知する異常アクセス抽出部とを、
    備えることを特徴とする不正アクセス検知装置。
    A log analysis device that detects unauthorized access from network logs on a computer or network,
    The log analysis device
    A receiving unit for acquiring the network log to which the access destination attribute is assigned;
    An access destination classification unit for classifying an access destination based on the access destination attribute and a rule stored in advance;
    Based on the network log and the classified access destination, an access feature amount extraction unit that extracts feature information of a normal access;
    An access included in the network log is compared with the access feature information, and if it matches, the access is detected as a normal access, and if it does not match, an abnormal access extraction unit that detects the access as an abnormal access,
    An unauthorized access detection device comprising:
  5.  請求項4に記載の不正アクセス検知装置であって、
     前記通常時のアクセスの特徴情報とは、前記分類されたアクセス先へのアクセスの傾向であり、
     前記異常アクセス抽出部は、前記ネットワークログに含まれるアクセスを通常時の前記アクセスの傾向と比較することを特徴とする不正アクセス検知装置。
    The unauthorized access detection device according to claim 4,
    The characteristic information of the normal access is a tendency of access to the classified access destination,
    The abnormal access extraction unit compares the access included in the network log with the access tendency in a normal state.
  6.  請求項5に記載の不正アクセス検知装置であって、
     前記アクセス先分類部は、前記ネットワークログのドメイン、又は外部装置が提供するコンテンツの内容に基づくカテゴリ、又はアクセスしているユーザのグループ情報を更に取得してアクセス先を分類する
    ことを特徴とする不正アクセス検知装置。
    The unauthorized access detection device according to claim 5,
    The access destination classification unit further classifies the access destination by further acquiring a category based on the content of the network log domain or the content provided by the external device, or group information of the accessing user. Unauthorized access detection device.
PCT/JP2014/072670 2014-08-29 2014-08-29 Apparatus and method for detecting unauthorized access WO2016031034A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/072670 WO2016031034A1 (en) 2014-08-29 2014-08-29 Apparatus and method for detecting unauthorized access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/072670 WO2016031034A1 (en) 2014-08-29 2014-08-29 Apparatus and method for detecting unauthorized access

Publications (1)

Publication Number Publication Date
WO2016031034A1 true WO2016031034A1 (en) 2016-03-03

Family

ID=55398970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/072670 WO2016031034A1 (en) 2014-08-29 2014-08-29 Apparatus and method for detecting unauthorized access

Country Status (1)

Country Link
WO (1) WO2016031034A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465648A (en) * 2016-06-06 2017-12-12 腾讯科技(深圳)有限公司 The recognition methods of warping apparatus and device
CN114050922A (en) * 2021-11-05 2022-02-15 国网江苏省电力有限公司常州供电分公司 Network flow abnormity detection method based on space-time IP address image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005236863A (en) * 2004-02-23 2005-09-02 Kddi Corp Log analyzing device and program, and recording medium
JP2007013343A (en) * 2005-06-28 2007-01-18 Fujitsu Ltd Worm detection parameter setting program and worm detection parameter setting device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005236863A (en) * 2004-02-23 2005-09-02 Kddi Corp Log analyzing device and program, and recording medium
JP2007013343A (en) * 2005-06-28 2007-01-18 Fujitsu Ltd Worm detection parameter setting program and worm detection parameter setting device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465648A (en) * 2016-06-06 2017-12-12 腾讯科技(深圳)有限公司 The recognition methods of warping apparatus and device
CN107465648B (en) * 2016-06-06 2020-09-04 腾讯科技(深圳)有限公司 Abnormal equipment identification method and device
CN114050922A (en) * 2021-11-05 2022-02-15 国网江苏省电力有限公司常州供电分公司 Network flow abnormity detection method based on space-time IP address image
CN114050922B (en) * 2021-11-05 2023-07-21 国网江苏省电力有限公司常州供电分公司 Network flow anomaly detection method based on space-time IP address image

Similar Documents

Publication Publication Date Title
US10817603B2 (en) Computer security system with malicious script document identification
KR101010302B1 (en) Security management system and method of irc and http botnet
Perdisci et al. Early detection of malicious flux networks via large-scale passive DNS traffic analysis
US20180351983A1 (en) Security Threat Detection based on Patterns in Machine Data Events
Pouget et al. Honeypot-based forensics
US8732472B2 (en) System and method for verification of digital certificates
EP3731166A1 (en) Data clustering
US20160344758A1 (en) External malware data item clustering and analysis
US20050060643A1 (en) Document similarity detection and classification system
Caruccio et al. Fake account identification in social networks
CN107547490B (en) Scanner identification method, device and system
JP2005339545A (en) Detection of search engine spam using external data
US10574658B2 (en) Information security apparatus and methods for credential dump authenticity verification
JP2014502753A (en) Web page information detection method and system
CN108023868B (en) Malicious resource address detection method and device
CN110830496B (en) Using method and operation method of system for preventing scanning authority file
Skopik et al. Smart Log Data Analytics
RU2659482C1 (en) Protection of web applications with intelligent network screen with automatic application modeling
CN111147490A (en) Directional fishing attack event discovery method and device
Platzer et al. A synopsis of critical aspects for darknet research
Morichetta et al. Clue: Clustering for mining web urls
WO2016031034A1 (en) Apparatus and method for detecting unauthorized access
Mowar et al. Fishing out the Phishing Websites
Xie et al. Scanner hunter: Understanding http scanning traffic
Lampesberger et al. An on-line learning statistical model to detect malicious web requests

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14900988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14900988

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP