WO2015186662A1

WO2015186662A1 - Log analysis device, attack detection device, attack detection method and program

Info

Publication number: WO2015186662A1
Application number: PCT/JP2015/065772
Authority: WO
Inventors: 揚鐘; 浩志朝倉; 慎吾折原; 一史青木
Original assignee: 日本電信電話株式会社
Priority date: 2014-06-06
Filing date: 2015-06-01
Publication date: 2015-12-10
Also published as: US10243982B2; CN106415507A; US20170126724A1; EP3136249B1; CN106415507B; EP3136249A4; JP6106340B2; JPWO2015186662A1; EP3136249A1

Abstract

This log analysis device is provided with: a storage unit (12) for storing a profile that will be the standard for determining whether or not there is an attack on an information processing device; a parameter extraction unit (31) which extracts parameters from an access request; a character string class conversion unit (32) which, for each parameter, compares each part of the parameter value with predefined character string classes, replaces said part with the character string class for which the matching length is greatest, and thus converts the parameters into a class list arranged in the substitution order; a profile storage unit (43) which, from the set of class lists for data access requests that are normal as learning data, stores in a storage unit (12) those class lists that occur at a frequency greater than or equal to a prescribed value as a profile; and an abnormality detection unit (53) which determines whether or not the access request being analyzed is an attack in accordance with the similarity between the aforementioned class list and the profile.

Description

Log analysis device, attack detection device, attack detection method and program

The present invention relates to a technology related to network security, and more particularly, to a technology for analyzing and detecting an access to an attack that attacks a Web server and a Web application.

Systems using the Web are used in various places in society including EC (Electronic Commerce). However, since such a system is a base used by general users, Web servers are always exposed to the risk of attacks. Various methods for detecting an access for attacking a Web server have been studied.
As a method of detecting an attack, a method of analyzing access contents by WAF (Web Application Firewall) and a method of analyzing a log remaining in a Web server or application server are generally used. There are two known attack detection methods, signature type and anomaly type detection methods.
FIG. 17 is a diagram for explaining a conventional attack detection method. FIG. 17A is a diagram showing a signature type attack detection method, and FIG. 17B is a diagram showing an anomaly type attack detection method.
As shown in FIG. 17A, the signature type extracts a portion that can determine an attack from an attack code, and detects a request that matches the pattern as an attack. Since vulnerabilities that exist in WebAP (Web Application) have increased, it has become difficult to prevent attacks by signature-type detection that takes countermeasures for each vulnerability. For this reason, research has been conducted on anomaly detection in which a profile is created from a normal request for WebAP and an abnormality is detected.
As shown in FIG. 17B, the anomaly type creates a profile from a normal request, calculates the similarity to the profile, and detects different requests as abnormal (see Non-Patent Documents 1 and 2). ). Hereinafter, a process for creating a profile is referred to as a learning process, and a process for determining whether an analysis target request is an attack using a profile is referred to as a detection process.

In the methods disclosed in Non-Patent Documents 1 and 2, a profile having several feature amounts is created for the parameters of the path part based on the path part of WebAP. Explain how to create a profile.
Here, only the character string structure and the character string class feature amount, which are considered to have a large influence on the detection result, are considered. FIG. 18 is a diagram for explaining profile feature amounts.
These techniques will be briefly described as the prior art 1 where the character string structure is a feature amount and the character string class as the feature amount.

First, a profile creation method using the character string structure as a feature amount according to the related art 1 will be described. FIG. 19 is a diagram for explaining a method of creating a state transition model according to prior art 1.
The procedure of the learning process is as follows.
(Procedure 1) Create a state transition model in which all the parameter values are listed with the appearing characters as states.
(Procedure 2) The same state is combined from the initial state (s) and repeated until it cannot be combined, and the completed state transition model is used as a profile (see Non-Patent Document 3 for how to create a state transition model).
It should be noted that the state transition probability must be taken into consideration when creating the model. However, since the prior art 1 does not consider the probability at the time of detection, it is considered equivalent to creating a model that does not consider the transition probability.
In the detection process, if the character string cannot be output from the profile (state transition model), it is determined as abnormal.

Next, a profile creation method using the character string type as a feature amount according to the related art 2 will be described. FIG. 20 is a diagram for explaining the abnormality determination method of the prior art 2.
The procedure of the learning process is as follows.
(Procedure 1) A character string class is defined in advance (see Non-Patent Document 4 for an example of a definition method).
(Procedure 2) It is determined whether the class applies to the entire parameter value, and the class name is held as a profile for the parameter.
In the detection process, the entire parameter value is converted to a class, and if it does not match the profile class, it is determined as abnormal.

With reference to FIG. 21, the problem of the prior art will be described.
In prior art 1, as shown in “Problem 1” in FIG. 21, a state transition model is created with each character appearing in the learning data as a state. There is a problem that occurs.
In prior art 2, as shown in “Problem 2” in FIG. 21, only one character string class is created for one parameter, so a parameter having a complicated structure (for example, a predefined character string class) In the case of multiple connected and combined), there is a problem that a profile is not created.
Also, in the prior art 2, as shown in “Problem 3” in FIG. 21, it can be seen that they are similar when viewed by humans, but strictly, it has a different format, and the regular expression of the prepared character string class If it does not match, there is a problem that the profile is not created.

The present invention has been made to solve the above-described problems of the technology, and determines that normal data is abnormal for a request transmitted to an information processing apparatus such as a Web server via a network. An object of the present invention is to provide a log analysis device, an attack detection device, an attack detection method, and a program that can suppress this.

The log analysis device of the present invention for achieving the above object is a log analysis device for collecting and analyzing access logs from an information processing device connected to a network,
A storage unit for storing a profile serving as a reference for determining whether the analysis target data indicates an attack on the information processing apparatus;
A parameter extractor for extracting each parameter from the access log request;
For each parameter extracted by the parameter extraction unit, the parameter value is compared with a predefined character string class for each part from the first character, and the part is added to the character string class that has the longest match with the character string class. A class conversion unit that replaces and converts the replaced character string class into a class string arranged in order;
Of the set of class sequences obtained by the parameter extraction unit and the class conversion unit for the access log of normal data as learning data, a class sequence having an appearance frequency equal to or higher than a predetermined value is stored in the storage unit as the profile. A profile storage unit to
The similarity between the class string obtained by the parameter extraction unit and the class conversion unit for the access log of the analysis target data and the profile is calculated, and an attack on the information processing apparatus occurs according to the similarity An anomaly detector that determines whether or not
Have

The attack detection device of the present invention is an attack detection device that detects an attack on an information processing device connected to a network,
A storage unit for storing a profile serving as a reference for determining whether an access request to the information processing apparatus attacks the information processing apparatus;
A parameter extractor for extracting each parameter from the access request;
For each parameter extracted by the parameter extraction unit, the parameter value is compared with a predefined character string class for each part from the first character, and the part is added to the character string class that has the longest match with the character string class. A class conversion unit that replaces and converts the replaced character string class into a class string arranged in order;
Of the set of class sequences obtained by the parameter extraction unit and the class conversion unit for the access request of normal data as learning data, a class sequence having an appearance frequency equal to or higher than a predetermined value is stored in the storage unit as the profile. A profile storage unit to
Calculate the similarity between the class string obtained by the parameter extraction unit and the class conversion unit for the access request to be analyzed and the profile, and whether an attack has occurred on the information processing apparatus according to the similarity An abnormality detection unit for determining whether or not,
Have

The attack detection method of the present invention is an attack detection method by an attack detection device that detects an attack on an information processing device connected to a network,
Each parameter is extracted from the access request to the information processing apparatus for normal data as learning data, and for each parameter, the parameter value is compared with a character string class defined in advance for each part from the first character. Is replaced with a character string class that has the longest match, and converted into a class string in which the replaced character string classes are arranged in order. The analysis target data is stored in the storage unit as a reference profile for determining whether or not the analysis target data indicates an attack on the information processing apparatus,
Extract parameters from the access request for the data to be analyzed,
The extracted parameter value is converted to the class string based on the string class,
Calculating the similarity between the class column and the profile;
It is determined whether an attack on the information processing apparatus has occurred according to the similarity.

Furthermore, the program of the present invention is a computer that detects an attack on an information processing apparatus connected to a network.
Each parameter is extracted from the access request to the information processing apparatus for normal data as learning data, and for each parameter, the parameter value is compared with a character string class defined in advance for each part from the first character. Is replaced with a character string class that has the longest match, and converted into a class string in which the replaced character string classes are arranged in order. , A procedure for storing in the storage unit as a profile serving as a reference for determining whether the analysis target data indicates an attack on the information processing apparatus;
A procedure for extracting parameters from the access request of the analysis target data;
A procedure for converting the value of the extracted parameter into the class string based on the string class;
Calculating the similarity between the class sequence and the profile;
A procedure for determining whether or not an attack on the information processing apparatus has occurred according to the similarity is executed.

According to the present invention, for a request input to an information processing apparatus via a network, parameter values extracted from the request are abstracted into class columns corresponding to parameter values in various forms, and the data to be analyzed is normal or invalid. Therefore, it is possible to reduce the possibility of erroneous detection in which normal data to be analyzed is determined to be abnormal.

It is a block diagram which shows the example of 1 structure of the communication system containing WAF of 1st Embodiment. It is a block diagram which shows one structural example of WAF of 1st Embodiment. It is a figure which shows the flow of a process of the attack detection method by WAF of 1st Embodiment. It is a flowchart which shows the procedure of the learning process by the profiling part in 1st Embodiment. It is a figure for demonstrating the detail of the process of

steps

103 and 105 shown in FIG. It is a figure for demonstrating the calculation method of the similarity with a profile in 1st Embodiment. It is a figure which shows the Example of 1st Embodiment. It is a figure for demonstrating the modification 3 in 2nd Embodiment. It is a flowchart which shows the procedure of the learning process by the profiling part in 2nd Embodiment. It is a figure for demonstrating the calculation method of the similarity with a profile in 2nd Embodiment. It is a figure which shows the Example of 2nd Embodiment. It is a figure for demonstrating the profile creation method of 3rd Embodiment. It is a figure for demonstrating the similarity calculation in 3rd Embodiment. It is a flowchart which shows the procedure of the learning process by the profiling part in 3rd Embodiment. It is a figure which shows the Example of 3rd Embodiment. It is a block diagram which shows one structural example of the log analysis system containing the attack detection apparatus of this invention as a log analysis server. It is a figure for demonstrating the conventional attack detection method. It is a figure for demonstrating the feature-value of a profile. It is a figure for demonstrating the creation method of the state transition model of the prior art 1. FIG. It is a figure for demonstrating the abnormality determination method of the prior art 2. FIG. It is a figure for demonstrating the subject of a prior art. It is a figure for demonstrating another subject of the prior art 1. FIG.

The present invention relates to an information processing apparatus and a computer that detect an access that performs an attack on a Web server. In the following embodiment, the information processing apparatus is a WAF. A log analysis device that analyzes a log).

(First embodiment)
A configuration of a communication system including the WAF according to the present embodiment will be described.
FIG. 1 is a block diagram showing a configuration example of a communication system including a WAF according to the present embodiment.
As illustrated in FIG. 1, the communication system includes a Web server 60 that is a type of information processing apparatus that provides services to the client 70 via a network 80, and a WAF 10 that detects an attack on the Web server 60. The WAF 10 is provided between the network 80 and the Web server 60. The client 70 is connected to the web server 60 via the network 80 and the WAF 10.
FIG. 2 is a block diagram illustrating a configuration example of the WAF according to the present embodiment.
As shown in FIG. 2, the WAF 10 includes an input unit 11, a storage unit 12, a control unit 13, and a detection result output unit 14. The input unit 11 includes a learning data input unit 21 and an analysis target data input unit 22.
Normal data to the Web server 60 is input from the network 80 to the learning data input unit 21 as learning data. The analysis target data input unit 22 receives, from the network 80, analysis target data that is data to be determined whether or not the Web server 60 is attacked.

The storage unit 12 stores a profile serving as a reference for determining whether the analysis target data indicates an attack on the Web server 60.
The control unit 13 includes a profiling unit 40 and an analysis target data processing unit 50. The profiling unit 40 includes a parameter extraction unit 31, a character string class conversion unit 32, and a profile storage unit 43. The analysis target data processing unit 50 includes a parameter extraction unit 31, a character string class conversion unit 32, and an abnormality detection unit 53. The parameter extraction unit 31 and the character string class conversion unit 32 are involved in the processing of the profiling unit 40 and the analysis target data processing unit 50.
The control unit 13 includes a memory (not shown) that stores a program and a CPU (Central Processing Unit) (not shown) that executes processing according to the program. When the CPU executes the process according to the program, the parameter extraction unit 31, the character string class conversion unit 32, the profile storage unit 43, and the abnormality detection unit 53 are configured in the WAF 10. The memory (not shown) stores character string class information that defines how to classify character strings for parameter values extracted from access requests. Details of the string class will be described later.

The parameter extraction unit 31 extracts each parameter of access from an access request serving as learning data input from the Web server 60 via the learning data input unit 21 and outputs the extracted parameter to the character string class conversion unit 32. Further, the parameter extraction unit 31 extracts each parameter of access from the access request serving as analysis target data input from the network 80 via the analysis target data input unit 22 and outputs the parameter to the character string class conversion unit 32.
The character string class conversion unit 32 converts the parameter value received from the parameter extraction unit 31 into a class string based on the character string class and outputs the learning data to the profile storage unit 43. Further, the character string class conversion unit 32 converts the parameter value received from the parameter extraction unit 31 into the class string based on the character string class and outputs the data to the abnormality detection unit 53 with respect to the analysis target data.
When the profile storage unit 43 receives the set of class strings after the conversion by the character string class conversion unit 32 with respect to the learning data, the profile storage unit 43 selects the class string that appears most frequently from the set of class strings of each parameter, and selects the selected class string. The parameter profile is stored in the storage unit 12.
When the abnormality detection unit 53 receives the class string converted by the character string class conversion unit 32 with respect to the analysis target data, the abnormality detection unit 53 calculates the similarity to the parameter profile and sets the calculated similarity to a predetermined threshold value. By comparing, it is detected whether or not the access is abnormal. The abnormality detection unit 53 notifies the detection result output unit 14 of the detection result. Specifically, the abnormality detection unit 53 determines that the calculated similarity is normal if the calculated similarity is larger than the threshold, and determines that the abnormality is abnormal if the similarity is smaller than the threshold. That is, it is determined that an attack has occurred against the Web server 60 or the Web AP of the Web server 60.
The detection result output unit 14 outputs the detection result received from the abnormality detection unit 53.

Next, the operation of the WAF of this embodiment will be described.
FIG. 3 is a diagram showing the flow of processing of the attack detection method by WAF of this embodiment.
In the present embodiment, “features used for profile creation”, “profile creation method and profile to be created (structure and data) at learning” and “comparison and collation between profile and analysis target at detection” Is characteristic.
The attack detection method of the present embodiment is divided into two phases of learning processing and detection processing.
In the learning process, the learning data input unit 21 acquires an access request (learning data) from the network 80. The profiling unit 40 extracts each parameter from the acquired access request (parameter extraction unit 31), and converts the parameter value into a class string (character string class conversion unit 32). Next, the class sequence that appears most frequently is selected from the set of class sequences for each parameter, and set as a parameter profile (profile storage unit 43).
In the detection process, the analysis target data input unit 22 acquires an access request (analysis target data) from the network 80. The analysis target data processing unit 50 extracts the parameters from the acquired access request for the analysis target data in the same manner as the learning process, converts them into class strings (parameter extraction unit 31, character string class conversion unit 32), and class of the parameters The degree of similarity between the column and the profile class column is calculated, and an abnormality is detected based on the threshold (an abnormality detection unit 53). Thereafter, the detection result output unit 14 outputs the detection result of the abnormality detection unit 53.
Note that as the original data for extracting the request parameters, packet capture or the like may be used instead of the access request.

Next, the procedure of the learning process by the profiling unit 40 will be described in detail.
FIG. 4 is a flowchart showing the procedure of the learning process by the profiling unit in this embodiment.
The profiling unit 40 performs the following process for each learning target parameter p, and creates a profile L of the parameter p.
When all learning data relating to the parameter (parameter values: d1 to dn) is input (step 101), the profiling unit 40 extracts unprocessed learning data (dx) (step 102). Then, the profiling unit 40 converts the learning data dx into a class string cx based on a predetermined character string class definition and records it (step 103).
The profiling unit 40 determines whether there is unprocessed learning data (step 104). If there is unprocessed learning data, the profiling unit 40 returns to step 102, and if there is no unprocessed learning data, , Go to Step 105. In step 105, the profiling unit 40 selects only the class string having the maximum number of appearances from all the recorded class strings (step 105). Thereafter, the profiling unit 40 records L as a profile of the parameter p in the storage unit 12 (step 106).

In the flowchart shown in FIG. 4, the processing of step 103 and step 105 will be described in detail using a specific example. FIG. 5 is a diagram for explaining the details of the processing of

steps

103 and 105 shown in FIG.
The upper part of FIG. 5 shows an example of the definition of a character string class in which a plurality of types of character strings indicating the same type of parameter values are classified into one class. Examples of the character string class include classes such as “numeric” and “space”.
The middle part of FIG. 5 compares each part of the parameter value from the first character to the last character with the character string class, replaces that part with the character string class that has the longest match with the character string class, It shows a state of being converted into a class sequence arranged in order. The lower part of FIG. 5 obtains the class string for each parameter as described above, calculates the appearance frequency for each class string for the set of class strings, and saves the class string having the maximum appearance frequency as a profile. Is shown.

The above operation will be described with reference to FIG.
In step 103, when converting the parameter value into the class string, the profiling unit 40 sets the longest matching part of the class value and the partial character string with the matching parameter value to the regular expression of the character string class prepared in advance. Judged as one class and converts all strings to classes in order from the left. As a result, parameters having a complicated structure such as those in which a plurality of definitions defined in one character string class in the conventional definition are concatenated or combined can be classified into any class.
In step 105, when selecting the class string, the profiling unit 40 selects the class string having the maximum appearance frequency and stores it as a profile.
Specifically, the process of step 103 is executed by the character string class conversion unit 32, and the process of step 105 is executed by the profile storage unit 43. Information on the definition of the character string class may be stored in the storage unit 12.

Next, the detection process in the analysis target data processing unit 50 will be described.
FIG. 6 is a diagram for explaining a method of calculating a similarity to a profile in the present embodiment. The abnormality detection unit 53 of the analysis target data processing unit 50 performs detection determination according to the following procedure. Here, test data is used as analysis target data.
(Procedure 1) Similar to the learning process, the parameter value is converted into a class string.
(Procedure 2) The class sequence similarity with the profile is obtained. As a similarity calculation method, for example, LCS (longest common subsequence) shown in FIG. 6 can be used.
(Procedure 3) If the similarity S is smaller than the threshold value St, it is determined as abnormal, otherwise it is determined as normal.

Examples of the present embodiment will be described. FIG. 7 is a diagram showing an example of this embodiment. In this embodiment, the case of the file parameter will be described. Test data is used as analysis target data.
In the learning process, the profiling unit 40 selects one class sequence having the maximum appearance frequency. In the detection process, the analysis target data processing unit 50 performs similarity calculation after class string conversion, and determines whether the result is normal or abnormal based on the result.

According to the present embodiment, in the WAF using the character string structure of the parameter value of the Web application, the parameter value is classified into the class corresponding to the parameter value of various forms by using the characteristic of the parameter and the format of the character string. Since it is abstracted into columns and it is determined whether the data to be analyzed is normal or incorrect, the possibility of erroneous detection that determines normal data that is not in the learning data as abnormal can be reduced.

(Second Embodiment)
In the first embodiment, only one class column having the highest appearance frequency is selected in selecting a class column, and the class column is used as a profile. However, in this embodiment, another method for selecting a class column is proposed. As described above, any one of the following modifications 1 to 3 is applied.
(Modification 1) u class strings are selected in descending order of appearance frequency.
(Modification 2) Select a class string whose appearance frequency is v% or more.
(Modification 3) The appearance frequencies fx are sorted in descending order (f′1, f′2, f′3...), And the sum of the appearance frequencies (contribution rate) exceeds Ft for the first time (f′1 + f′2 + ... + f'u> Ft) Select u class strings (c'1, c'2, ... c'u).
FIG. 8 is a diagram for explaining a third modification of the present embodiment.
The profile storage unit 43 sorts the appearance frequencies from the graph showing the appearance frequencies, and extracts u class strings that satisfy the expression shown in FIG.

A learning process by the profiling unit in the present embodiment will be described.
FIG. 9 is a flowchart showing the procedure of the learning process by the profiling unit in this embodiment.
In the present embodiment, the process of step 105-abc shown in FIG. 9 is executed instead of the process of step 105 in the flowchart shown in FIG. In the present embodiment, the processing of step 105-abc will be described, and the description of the processing of other steps will be omitted.
In step 105-abc, the profiling unit 40 selects a plurality of class strings by any one of the methods of the first to third modifications among all the recorded class strings.

The similarity calculation at the time of detection in this embodiment will be described. FIG. 10 is a diagram for explaining a method of calculating a similarity with a profile in the present embodiment.
When u class strings are selected in Modifications 1 to 3, the similarity in detection is maximized from the similarity (s1, s2, ..., su) between the profile class string and each of the u class strings. The similarity Smax = max (s1, s2,... Su) is defined as the similarity to the profile.
In this example, the similarity S between the test data and the profile is 0.8.

Examples of the present embodiment will be described. FIG. 11 is a diagram showing an example of this embodiment. In this embodiment, the case of the file parameter will be described.
In the learning process, the profiling unit 40 selects a plurality of class strings using any one of the modifications 1 to 3. In the detection process, the analysis target data processing unit 50 performs similarity calculation after class string conversion, and determines whether the result is normal or abnormal based on the result.

(Third embodiment)
In the first embodiment, a single class column is used as a profile, and in the second embodiment, a plurality of class columns is used as a profile. Or a class set that does not consider the order of classes is selected.
Note that a plurality of class strings selected in the second embodiment may be applied to this embodiment, and in this embodiment, any one of Modifications 1 to 3 described in the second embodiment is used. It is possible to apply.
Here, another problem of the related art 1 will be described. In the prior art 1, since a state transition model for each character that actually appears in the learning data is created, there is a problem that many false detections occur in a parameter with a high degree of freedom of the character string. This problem is referred to as “Problem 4”. An example of Problem 4 is shown in FIG.

The profile creation method of this embodiment will be described.
FIG. 12 is a diagram for explaining the profile creation method of the present embodiment, and shows a profile creation method using the compression ratio R.
In the present embodiment, as shown in FIG. 12, the profile storage unit 43 shown in FIG. 2 determines whether the compression rate (R) of the class string set is smaller than the threshold value Rt, and if the compression rate is smaller than the threshold value, A set of class sequences is a profile.
On the other hand, when the compression rate is greater than the threshold, the profile storage unit 43 sets the class set as a profile. A class set is a collection of unique classes that appear, and the appearance order of classes is not maintained. That is, in the class set, the character string classes (alpha, numeric, etc.) included in the set of class strings do not overlap, and the order of appearance is not determined.
In this embodiment, the set of class strings considers the order of character string classes, but the class set does not consider the order of character string classes.

The similarity calculation at the time of detection in this embodiment will be described. FIG. 13 is a diagram for explaining similarity calculation in the present embodiment.
In the present embodiment, it is necessary to change the similarity calculation method in detection between when a profile is created using a class sequence set and when a profile is created using a class set.
FIG. 13A shows a similarity calculation method when the profile is a class column set type, and FIG. 13B shows a similarity calculation method when the profile is a class set type.
(1) When the profile is a class sequence set type, the similarity in detection is the maximum similarity Smax = the similarity (s1, s2,..., Su) between the profile class sequence and each of the u class sequences. Let max (s1, s2,... su) be the similarity to the profile (the same as the similarity calculation method of the first to third modifications).
(2) When the profile is a class set type, the similarity S is set to 1.0 when the class set is included in the class set of the profile, and 0.0 when not matching.

A learning process by the profiling unit in the present embodiment will be described. Here, the case of Modification 2 in the second embodiment will be described.
FIG. 14 is a flowchart showing the procedure of the learning process by the profiling unit in this embodiment.
In this embodiment, in the flowchart shown in FIG. 9, step 105-abc is set as step 105-b corresponding to the modified example 2, and between the processing of step 105-b and step 106, as shown in FIG. Steps 111 to 113 are added. In the present embodiment, the processing of step 105-b and steps 111 to 113 will be described, and the description of the processing of other steps will be omitted.
In step 105-b, the profiling unit 40 calculates the compression rate R from all the recorded class sequences (c1 to cn). In step 111, the profiling unit 40 determines whether or not the compression rate R is smaller than a predetermined compression rate threshold value Rt.
If R <Rt in the determination in step 111, the profiling unit 40 sets a unique one (class string set) among all recorded class strings as the profile L (step 112). On the other hand, if R> Rt in the determination in step 111, the profiling unit 40 sets a unique set (class set) of all classes appearing in the recorded class sequence as the profile L (step 113).

Examples of the present embodiment will be described. FIG. 15 is a diagram showing an example of the present embodiment. In this embodiment, the case of the file parameter will be described.
In the learning process, the profiling unit 40 selects a plurality of class strings by any one of the methods of the first to third modifications. Thereafter, the class string set is stored because the compression ratio R <Rt. In the detection process, the analysis target data processing unit 50 performs the class string conversion and then calculates the similarity with the class string because the profile is a class string, and determines whether the profile is normal or abnormal based on the result.

The operation of the attack detection apparatus of the present invention will be described in comparison with the problems 1 to 4 described with reference to FIGS.
In contrast to the problem 1 described with reference to FIG. 21, in the present invention, an abnormality determination can be performed in consideration of differences in subscripts by abstracting a character string into a class, so that false detection can be reduced. Further, by using the LCS similarity of the class sequence at the time of detection, even if data such as a subscript added to the data at the time of learning appears, the high degree of similarity is shown, so that false detection can be reduced.
In contrast to the problem 2 described with reference to FIG. 21, in the present invention, the character string class conversion unit considers that a plurality of character string classes are connected and combined, and creates a class string. A profile that matches the parameters can be created.
For the problem 3 described with reference to FIG. 21, in the present invention, a simple character string class such as numeric, alpha is defined in addition to defining a complicated character string class such as url or ip in the character string class. Even if the character string "2014.1.1" cannot be determined as the date type, the class column (numeric, symbol, numeric, symbol, numeric) can be created as a profile.
In contrast to the problem 4 described with reference to FIG. 22, in the invention described in the third embodiment, the concept of a class set is introduced to lower the restriction on the parameter having a high degree of freedom rather than the class order. It is possible to reduce false detection because it is determined whether or not the class is abnormal based on whether or not the class appears.

According to the present invention, in an attack detection method for a Web application, normal data not included in learning data is determined to be abnormal by using the character string structure of the parameter value and using the characteristics of the parameter and the character string format. The possibility of false detection and false detection in a parameter with a high degree of freedom can be reduced.

The WAF described in the above embodiment may be applied to a log analysis system that includes a log analysis server. FIG. 16 is a block diagram showing a configuration example of a log analysis system including the attack detection apparatus of the present invention as a log analysis server.
The log analysis system includes a web server 60, a log server 90, and a log analysis server 15. The log server 90 is connected to the web server 60. The log server 90 periodically obtains access log information from the Web server 60 and stores it in the storage unit of its own device.
The log analysis server 15 is connected to the log server 90. The log analysis server 15 has the function of the WAF 10 described in the above embodiment, and detects an attack on the Web server 60 by reading and analyzing the access request from the access log.

10 WAF
DESCRIPTION OF SYMBOLS 15 Log analysis server 13 Control part 12 Storage part 31 Parameter extraction part 32 Character string class conversion part 40 Profiling part 43 Profile preservation | save part 50 Analysis object data processing part 53 Abnormality detection part 60 Web server

Claims

A log analysis device that collects and analyzes access logs from information processing devices connected to a network,
A storage unit for storing a profile serving as a reference for determining whether the analysis target data indicates an attack on the information processing apparatus;
A parameter extractor for extracting each parameter from the access log request;
For each parameter extracted by the parameter extraction unit, the parameter value is compared with a predefined character string class for each part from the first character, and the part is added to the character string class that has the longest match with the character string class. A class conversion unit that replaces and converts the replaced character string class into a class string arranged in order;
Of the set of class sequences obtained by the parameter extraction unit and the class conversion unit for the access log of normal data as learning data, a class sequence having an appearance frequency equal to or higher than a predetermined value is stored in the storage unit as the profile. A profile storage unit to
The similarity between the class string obtained by the parameter extraction unit and the class conversion unit for the access log of the analysis target data and the profile is calculated, and an attack on the information processing apparatus occurs according to the similarity An anomaly detector that determines whether or not
A log analysis device.
An attack detection device that detects an attack on an information processing device connected to a network,
A storage unit for storing a profile serving as a reference for determining whether an access request to the information processing apparatus attacks the information processing apparatus;
A parameter extractor for extracting each parameter from the access request;
For each parameter extracted by the parameter extraction unit, the parameter value is compared with a predefined character string class for each part from the first character, and the part is added to the character string class that has the longest match with the character string class. A class conversion unit that replaces and converts the replaced character string class into a class string arranged in order;
Of the set of class sequences obtained by the parameter extraction unit and the class conversion unit for the access request of normal data as learning data, a class sequence having an appearance frequency equal to or higher than a predetermined value is stored in the storage unit as the profile. A profile storage unit to
Calculate the similarity between the class string obtained by the parameter extraction unit and the class conversion unit for the access request to be analyzed and the profile, and whether an attack has occurred on the information processing apparatus according to the similarity An abnormality detection unit for determining whether or not,
An attack detection device.
The attack detection device according to claim 2,
The profile storage unit
An attack detection apparatus that saves, in the storage unit, one class sequence having the maximum appearance frequency among the set of class sequences as the profile.
The attack detection device according to claim 2,
The profile storage unit
The attack detection apparatus which preserve | saves the some class sequence from which the said appearance frequency becomes more than predetermined value among the set of the said class sequence as said profile in the said memory | storage part.
The attack detection device according to claim 2,
The profile storage unit
When the set of class strings satisfies a predetermined condition, a unique set of all the character string classes included in the set of class strings is stored in the storage unit as the profile,
The abnormality detection unit
When the set of class columns satisfies a predetermined condition, an attack is performed depending on whether or not the profile includes all unique sets of the character string classes in the class column of the analysis target data when the determination is based on the similarity. Determine whether it occurred,
The attack detection device that calculates the similarity between the class column of the analysis target data and the profile when the set of class columns does not satisfy a predetermined condition.
The attack detection device according to claim 4,
The profile storage unit
When the plurality of class strings satisfy a predetermined condition, a unique set of all the character string classes included in the plurality of class strings is stored in the storage unit as the profile,
The abnormality detection unit
When the set of the plurality of class columns satisfies a predetermined condition, whether or not the profile includes all the unique sets of the character string classes in the class column of the analysis target data when the determination based on the similarity is performed Determine if an attack has occurred,
When the set of the plurality of class columns does not satisfy a predetermined condition, the maximum similarity among the similarities between the class column of the analysis target data and each of the plurality of class columns included in the profile is determined. An attack detection device that uses and determines.
An attack detection method by an attack detection device for detecting an attack on an information processing device connected to a network,
Each parameter is extracted from the access request to the information processing apparatus for normal data as learning data, and for each parameter, the parameter value is compared with a character string class defined in advance for each part from the first character. Is replaced with a character string class that has the longest match, and converted into a class string in which the replaced character string classes are arranged in order. The analysis target data is stored in the storage unit as a reference profile for determining whether or not the analysis target data indicates an attack on the information processing apparatus,
Extract parameters from the access request for the data to be analyzed,
The extracted parameter value is converted to the class string based on the string class,
Calculating the similarity between the class column and the profile;
An attack detection method for determining whether or not an attack to the information processing apparatus has occurred according to the similarity.
A computer that detects attacks on information processing devices connected to the network.
Each parameter is extracted from the access request to the information processing apparatus for normal data as learning data, and for each parameter, the parameter value is compared with a character string class defined in advance for each part from the first character. Is replaced with a character string class that has the longest match, and converted into a class string in which the replaced character string classes are arranged in order. , A procedure for storing in the storage unit as a profile serving as a reference for determining whether the analysis target data indicates an attack on the information processing apparatus;
A procedure for extracting parameters from the access request of the analysis target data;
A procedure for converting the value of the extracted parameter into the class string based on the string class;
Calculating the similarity between the class sequence and the profile;
A program for executing a procedure for determining whether or not an attack to the information processing apparatus has occurred according to the similarity.