CN112019508A - Method, system and electronic device for detecting DDos attack based on Web log analysis - Google Patents

Method, system and electronic device for detecting DDos attack based on Web log analysis Download PDF

Info

Publication number
CN112019508A
CN112019508A CN202010737398.2A CN202010737398A CN112019508A CN 112019508 A CN112019508 A CN 112019508A CN 202010737398 A CN202010737398 A CN 202010737398A CN 112019508 A CN112019508 A CN 112019508A
Authority
CN
China
Prior art keywords
access
session
data
user
average
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010737398.2A
Other languages
Chinese (zh)
Inventor
范如
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN202010737398.2A priority Critical patent/CN112019508A/en
Publication of CN112019508A publication Critical patent/CN112019508A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The application relates to a method, a system and an electronic device for detecting DDos attack based on Web log analysis, wherein the method for detecting DDos attack based on Web log analysis comprises the following steps: acquiring log data of a website platform, and extracting a session set of a user from the log data; in a session set, acquiring access information of a user, wherein the access information comprises the access frequency, the average request flow and the average access proportion of a webpage; and through the decision tree, under the condition that the access information meets the judgment condition, judging that the access of the user is DDos attack, wherein the judgment condition comprises at least one of the condition that the access frequency is greater than a frequency threshold, the average request flow is less than a request flow threshold and the average access proportion is less than an access proportion threshold. By the method and the device, the problem that the DDos attack and the normal access behavior are distinguished through the flow information in the related technology and the accuracy is low is solved, and the speed and the accuracy of detecting the DDos attack are improved.

Description

Method, system and electronic device for detecting DDos attack based on Web log analysis
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method, a system, and an electronic device for detecting a DDos attack based on Web log analysis.
Background
With the rapid development of computer technology, network technology is widely applied, the access frequency of a user to a website platform is increased, however, in a variety of access behaviors, some malicious accesses attack the website platform, and thus great threat is caused to the information security of the website platform. Among them, Distributed denial of service attack (DDos) is a common network attack behavior, and DDos is divided into two main categories based on the difference of network communication protocols: the first type is attack launching by utilizing loopholes of a transport layer protocol and a network layer protocol, namely the traditional denial of service attack; the second type is to launch attacks on the application layer against the highest layer of the network.
In the related art, the flow information returned by the openFlow switch is acquired, and when the flow information is greater than a certain flow threshold value, the access is considered as a DDos attack, but in an actual production environment, a large-flow access behavior is easily generated, so that the DDos attack and a normal access behavior are distinguished through the flow information, and the accuracy is low.
At present, no effective solution is provided for the problem of low accuracy in the related art that DDos attack and normal access behavior are distinguished through flow information.
Disclosure of Invention
The embodiment of the application provides a method, a system, an electronic device and a storage medium for detecting DDos attack based on Web log analysis, so as to at least solve the problem that the accuracy is low when DDos attack and normal access behavior are distinguished through flow information in the related technology.
In a first aspect, an embodiment of the present application provides a method for detecting a DDos attack based on Web log analysis, where the method includes:
acquiring log data of a website platform, and extracting a session set of a user from the log data;
in the session set, acquiring access information of the user, wherein the access information comprises the access frequency, the average request flow and the average access proportion of a webpage;
and through a decision tree, under the condition that the access information meets a judgment condition, judging that the access of the user is DDos attack, wherein the judgment condition comprises at least one of the condition that the access frequency is greater than a frequency threshold, the average request flow is less than a request flow threshold and the average access proportion is less than an access proportion threshold.
In some of these embodiments, obtaining the average proportion of accesses comprises:
acquiring the visit volume of each webpage in the session set and the total visit volume of the website platform according to the log data;
obtaining the access proportion of each webpage in the session set according to the access amount of each webpage and the total access amount;
and obtaining the average access proportion according to the sum of the access proportions and the number of requests in the session set.
In some of these embodiments, obtaining the average requested traffic comprises:
and calculating the average request flow according to the request times of the user in the session set and the total number of the request flows.
In some of these embodiments, obtaining the access frequency comprises:
and calculating the access frequency according to the request times of the user in the session set and the session duration of the session set.
In some of these embodiments, where the frequency threshold, the request traffic threshold, and the access proportion threshold are all access thresholds, calculating the access threshold comprises:
obtaining a plurality of data of each kind of the access information from a plurality of historical conversation sets;
sorting the data according to the numerical values of the data, wherein the data comprises normal access data and attack data;
at the boundary of the normal access data and the attack data, taking a preset amount of data and calculating the information gain rate of each data;
and taking the data with the maximum information gain rate as the access threshold.
In some embodiments, said extracting the set of sessions of the user in the log data comprises:
acquiring a first session record and a second session record of the user according to the log data; the first session record comprises first identification information and a first session time; the second session record comprises second identification information and a second session time; the first session record and the second session record are session records adjacent in time;
and merging the first session record and the second session record into the session set under the condition that the first identification information and the second identification information are the same and the time difference between the first session time and the second session time is less than a session time threshold value.
In some embodiments, before extracting the set of sessions of the user from the log data, the method further comprises:
and acquiring a third session record according to the log data, and deleting the third session record under the condition that the character string information of the third session record belongs to a preset character string set, wherein the preset character string set comprises the character string of the request of the embedded object.
In a second aspect, an embodiment of the present application provides a system for detecting a DDos attack based on Web log analysis, where the system includes a session identification module, an information acquisition module, and a decision determination module:
the session identification module is used for acquiring log data of a website platform and extracting a session set of a user from the log data;
the information acquisition module is used for acquiring the access information of the user in the session set, wherein the access information comprises the access frequency, the average request flow and the average access proportion of the webpage;
the decision-making judgment module is configured to judge, through a decision tree, that the access of the user is a DDos attack when the access information meets a judgment condition, where the judgment condition includes at least one of the access frequency being greater than a frequency threshold, the average request traffic being less than a request traffic threshold, and the average access proportion being less than an access proportion threshold.
In a third aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the method for detecting a DDos attack based on Web log analysis as described in the first aspect.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the method for detecting a DDos attack based on Web log analysis as described in the first aspect above.
Compared with the related technology, the method for detecting DDos attack based on Web log analysis provided by the embodiment of the application extracts the session set of the user from the log data by acquiring the log data of the website platform; in the session set, acquiring access information of the user, wherein the access information comprises the access frequency, the average request flow and the average access proportion of a webpage; through the decision tree, under the condition that the access information meets the judgment condition, the access of the user is judged to be the DDos attack, the judgment condition comprises at least one of the condition that the access frequency is greater than the frequency threshold, the average request flow is less than the request flow threshold and the average access proportion is less than the access proportion threshold, the problem that the DDos attack and the normal access behavior are distinguished through the flow information in the related technology, the accuracy is low is solved, and the speed and the accuracy of detecting the DDos attack are improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for detecting DDos attacks based on Web log analysis according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of obtaining an average access proportion according to an embodiment of the present application;
FIG. 3 is a flow chart of a method of calculating an access threshold according to an embodiment of the present application;
FIG. 4 is a flow diagram of a method of extracting a set of sessions according to an embodiment of the present application;
FIG. 5 is a flow chart of a method of extracting a set of sessions in accordance with a preferred embodiment of the present application;
fig. 6 is a block diagram of a hardware structure of a terminal for detecting a DDos attack method based on Web log analysis according to an embodiment of the present application;
FIG. 7 is a block diagram of a system for detecting DDos attacks based on Web log analysis according to an embodiment of the present application;
fig. 8 is a schematic diagram of a system for detecting a DDos attack based on Web log analysis according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
In the field of network security, application layer DDos attacks often occur at the highest level of the network SIO model, such as the HyperText Transfer Protocol (HTTP) layer. The DDos attacks at the application layer are mainly classified into two categories: broadband depletion and host resource depletion. The method comprises the following steps that when the broadband is exhausted, a large number of requests with high frequency are used for accessing a target host, the broadband resources of a target network are occupied, and a legal user cannot access a Web website; the host resource exhaustion refers to that an attacker sends out HTTP request data packets like normal users, and an attacked server returns a large amount of files occupying internal memory, so that the server is prompted to run some complex programs all the time, and the method can completely exhaust the resources of the host.
The embodiment provides a method for detecting DDos attack based on Web log analysis. Fig. 1 is a flowchart of a method for detecting a DDos attack based on Web log analysis according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:
step S110, obtaining log data of the website platform, and extracting a session set of the user from the log data.
The log data is obtained by recording a process event generated by an internet system of the computer, and the specific operation record of the user on the website platform at a certain moment can be obtained by looking up the log data. The log data is mainly sourced from a server, a storage center, network equipment, an operating system, middleware, a database, a service system and the like. The session set is a set of a plurality of session records of the user in a time window, and comprises request records of the user to different webpages in the website platform.
Step S120, in the session set, obtaining access information of the user, where the access information includes an access frequency, an average request traffic, and an average access proportion of the web page.
Specifically, the access frequency can represent the frequency of access, and since increasing the access frequency to consume the server load is a common means for attackers, the access behavior of users can be distinguished by the access frequency; the average request flow is the average size of data packets generated by a user in the process of browsing a webpage, the size of the generated data packets is similar in the process of accessing the webpage by an attacker, and the flow consumed by the data packets is between 20 kb and 400kb, while the size of the data packets is different in the process of accessing the webpage by common users due to different contents browsed by different users, and the flow consumed by the data packets is more than 1M, so that the access behaviors of the users can be distinguished through the average request flow; the average access proportion can represent the average access amount of the webpages in the website platform, and since the access of attackers to the website platform is dispersed to a plurality of webpages, the access of common users to the website platform is usually concentrated on a plurality of webpages, the access behavior of the users can be analyzed through the average access proportion.
Step S130, determining that the access of the user is a DDos attack by using the decision tree under the condition that the access information meets a determination condition, where the determination condition includes at least one of an access frequency greater than a frequency threshold, an average request traffic less than a request traffic threshold, and an average access proportion less than an access proportion threshold.
The decision tree is a machine learning method and is represented as a tree structure, and comprises nodes, branches and leaf nodes, wherein each node represents judgment on one characteristic, each branch represents output of a judgment result, and each leaf node represents a classification result. In this embodiment, the access information is used as the feature of the decision tree, in the process of forming the decision tree, the access threshold including the frequency threshold, the request traffic threshold and the access proportion threshold is determined according to the information gain rate of different features, and in the process of classifying through the decision tree, the obtained access information is compared with the access threshold to obtain the analysis result of the user access.
Through the steps S110 to S130, the present embodiment analyzes the session set based on the access information including the access frequency, the average request traffic and the average access proportion, and implements classification through the decision tree according to the access information, and because a plurality of characteristics of the user access are considered, the DDos attack and the normal access behavior are not only distinguished depending on the traffic information, so the method in the present embodiment solves the problem that the accuracy is low when the DDos attack and the normal access behavior are distinguished through the traffic information in the related art, and improves the speed and the accuracy of detecting the DDos attack.
In some embodiments, fig. 2 is a flowchart of a method for obtaining an average access proportion according to an embodiment of the present application, and as shown in fig. 2, the method includes the following steps:
step S210, according to the log data, the visit volume of each webpage in the session set and the total visit volume of the website platform are obtained.
Specifically, the visit volume of each web page is the click volume of the web page in the session set, and the total visit volume of the website platform is the sum of the visit volumes of all the web pages in the website platform.
Step S220, obtaining the access proportion of each web page in the session set according to the access amount of each web page and the total access amount.
The access proportion of each web page in the session set in this embodiment can be obtained by the following formula 1:
Figure BDA0002605600620000073
in formula 1, i is 1 ≦ n, which indicates different web pages, P indicates the access ratio of a certain web page, and C (WP)i) Indicating the amount of access to the web page at a certain time,
Figure BDA0002605600620000074
the total access volume of the website platform at that moment.
Step S230, obtaining the average access ratio according to the sum of the access ratios and the number of requests in the session set.
The average access ratio in this embodiment can be obtained by the following formula 2:
Figure BDA0002605600620000071
in equation 2, 1 ≦ i ≦ n, representing a different web page, pwpiRepresents the average proportion of visits by web pages of a web site platform in a session set,
Figure BDA0002605600620000072
representing the sum of the access proportions of different web pages, CkRepresenting the number of requests in the session set.
It should be noted that the most visited web pages in the website platform are found by analyzing the log data of the web pages, and the number of the popular web pages only accounts for about 30% of the total number of the web pages in the website platform. The DDos attacker does not know the popularity of the webpage when attacking the webpage, and can only access the webpage in the website platform in a random mode. The DDos attack of the application layer is very similar to the large increase of the access amount in the normal condition, and the main difference between the DDos attack and the access amount in the normal condition is that the large increase of the access amount in the normal condition is caused by the occurrence of the hot topic, the existence of the hot topic can attract a user to access the webpage, so that the access amount of the webpage is increased, and the higher the access amount is, the higher the popularity of the webpage is.
In the process of accessing, a common user is more inclined to access a webpage with high popularity, so that the access proportion of the popular webpage is larger, the finally calculated average access proportion is higher, an attacker performs indiscriminate access on all webpages, the access proportion of the webpage is lower, and the finally calculated average access proportion is lower. The access behavior of the user can be differentiated by the average access proportion.
Through the steps S210 to S230, in this embodiment, based on the analysis of the popularity of the web pages, the average access proportion in the session set is calculated according to the access amount of each web page and the total access amount of the website platform, the access behavior of the user is determined according to the average access proportion, and the accuracy of detecting the DDos attack is improved.
In some embodiments, the average requested traffic is calculated according to the number of requests of the user in the session set and the total requested traffic, and the average requested traffic in this embodiment may be calculated according to the following formula 3:
Fl=Al/Ckequation 3
In the formula 3, FlIndicating the average requested traffic of the user in the session set, AlIndicates the total requested traffic of the user in the session set, CkRepresenting the number of requests in the session set.
In particular, average request traffic is one obvious feature to distinguish normal access from DDos attacks. In general, in order to quickly achieve the purpose of attack, an attacker increases the attack times and consumes resources of a host of a website platform. Attackers usually use some tools, because the tools lack intelligence, the sizes of the sent data packets are similar, normal users can visit websites according to their interests, and the tools have individual subjectivity, and different users have different browsed contents, and the generated data packets are different in size. Therefore, the present embodiment may calculate an average request traffic based on the total number of request traffic and the number of requests in the session set, and determine the access behavior of the user according to the average request traffic, thereby further improving the accuracy of detecting the DDos attack.
In some embodiments, the access frequency is calculated based on the number of requests made by the user in the session set and the session duration of the session set. The access frequency in this embodiment can be obtained by the following formula 4:
Fk=Ckformula 4
In the formula 4, FkIs the access frequency, C, in the set of sessionskThe number of requests in the session set is represented, T is a session duration of the session set, and optionally, the session duration in this embodiment is 30min of a time window.
Specifically, when a DDos attack occurs, an attacker often uses a low cost to achieve an attack purpose, for example, requesting a resource that needs to consume a large load of a server or increasing an access frequency to achieve the attack purpose quickly. The access frequency is a common means for attackers, historical log data are analyzed, and it is found that more than 70% of users can only send one HTTP request per second, and the HTTP requests sent per second in DDos attack reach dozens, so that the access frequency in attack is obviously different from that of normal users. Therefore, the present embodiment may calculate the access frequency based on the number of requests in the session set and the session duration, and determine the access behavior of the user according to the access frequency, thereby further improving the accuracy of detecting the DDos attack.
In some embodiments, the decision tree implements classification of user access behavior by comparing the obtained access information with an access threshold, fig. 3 is a flowchart of a method for calculating an access threshold according to an embodiment of the present application, and as shown in fig. 3, the method includes the following steps:
in step S310, a plurality of data of each access information is acquired from a plurality of historical session sets.
The historical session set is a session set used in calculating an access threshold, and the access information comprises access frequency, average request flow and average access proportion. For example, in the case of acquiring 50 historical session sets, 50 access frequencies, 50 average request traffic and 50 average access ratios may be obtained.
Step S320, sorting the plurality of data according to the numerical values of the plurality of data, wherein the plurality of data includes normal access data and attack data.
For a plurality of data of each kind of access information, sorting is carried out according to the magnitude of the numerical value, and taking the access frequency as an example, sorting is carried out on 50 access frequencies obtained from a plurality of historical session sets from small to large according to the magnitude of the access frequency. Because the normal access behavior is obviously different from the attack behavior of an attacker, a boundary exists in the sorted data, the data on one side of the boundary are the data generated by the normal access behavior and are marked as normal access data, and the data on the other side of the boundary are the data generated by the attack behavior and are marked as attack data.
Step S330, at the boundary between the normal access data and the attack data, a preset amount of data is taken and the information gain rate of each data is calculated, and the taken data includes the normal access data and the attack data.
Optionally, the preset number in this embodiment is 8, that is, the information gain ratio is calculated for 8 data. The information gain rate is used for representing the effect of classifying the decision tree according to certain data, and the higher the information gain rate is, the better the effect of classifying the decision tree according to the data corresponding to the information gain rate is.
In step S340, the data with the largest information gain rate is used as the access threshold.
The access threshold value of each kind of access information can be obtained by calculating the preset amount of data in each kind of access information, and on the basis, the information gain rate between different access threshold values is compared to determine the position of the access frequency, the average request flow and the average access proportion in the decision tree.
Optionally, the embodiment performs the analysis of the user access behavior based on the improved decision tree C4.5 algorithm. In the case of more continuous features of the decision tree, the values of the features are also more, for example, the features in this embodiment include access frequency, average request traffic, and average access proportion, which are all continuous features. Because the continuous features have more values, a larger calculation amount is generated, and the generation efficiency of the decision tree is influenced. Through the steps S310 to S340, in the embodiment, a preset number of data is calculated, a value with the largest information gain rate is used as an access threshold, and compared with the information gain rates of all values of all features that need to be traversed by a C4.5 algorithm in the related art, the algorithm is improved by the embodiment, so that the spatial complexity of the algorithm is reduced, and the efficiency of the algorithm is greatly improved.
In some embodiments, fig. 4 is a flowchart of a method for extracting a session set according to an embodiment of the present application, the method including the steps of:
step S410, a first session record and a second session record of the user are obtained according to the log data, the first session record includes first identification information and first session time, the second session record includes second identification information and second session time, and the first session record and the second session record are session records with time adjacent to each other.
In this embodiment, the first session record and the second session record are session records in log data, the first identification information and the second identification information may be Internet Protocol (IP) addresses or cookies of the user, and the first session time and the second session time are specific time points of session generation. In this embodiment, users are distinguished according to IP addresses, different IP addresses represent different users, log data of each IP address is extracted, session records in the log data are arranged in time sequence, and a session set is extracted based on different session records.
Step S420, merging the first session record and the second session record into a session set when the first identification information is the same as the second identification information and the time difference between the first session time and the second session time is smaller than the session time threshold. The session time threshold in this embodiment may be 30 min.
Typically, the log data of the Web includes 8 fields: client IP, date, request method, request object, protocol version number, status code, transfer file size, and user agent. Optionally, in the present embodiment, a Write Ahead Log (WAL for short) is selected as Log data, and the WAL may be represented by the following formula 5:
WAL { Date, Time, Method, URL, Protocol, Status … … agent } equation 5
In formula 5, Date represents Date, Time represents Time, Method represents a request Method, URL is Uniform Resource Locator (URL), Protocol represents Protocol version number, Status represents Status code, agent represents user agent, and WAL records the access of users to websites in sequence within a period of Time and includes a plurality of session records.
For multiple session records of a user, this can be expressed by the following equation 6:
USER={ID,<data1...agent1>,...,<datai,…agenti>,...,<datan,…agentn>}
equation 6
In formula 6, i is greater than or equal to 1 and less than or equal to n, ID is the identity identifier corresponding to the user,<datai,…agenti>the ith session record representing the user.
In particular, in<datej...,agentj>And<datej+1...,agentj+1>session record representing two consecutive user lines to represent session timeIn the case of threshold, if at datej-datej+1In the case of ≦ the two session records are considered to belong to the same time window, as the content of the user in one session set.
Through the above steps S410 and S420, the present embodiment sets a session time threshold, and performs session set extraction from a plurality of session records of the user based on the session time threshold, thereby improving accuracy of session set extraction.
In some embodiments, before extracting the session set of the user from the log data, the session record in the log data needs to be cleaned, specifically, a third session record is obtained according to the log data, and the third session record is deleted when the character string information of the third session record belongs to a preset character string set, where the preset character string set includes a character string of the request of the embedded object, and the third session record is also a session record in the log data. Each time a user clicks a web page, the browser sends many HTTP GET requests, including a request for a page object and a request for an embedded object, in this embodiment, a character string set is preset according to a character string model corresponding to the request for the embedded object, for example, various jpeg, js, gif, css, and the like. Embedded objects are often not relevant to the mining of user behavior and therefore it is necessary to filter out the objects of these embedded requests, extracting URLs ending with "/" or ". html" etc. and with ". php", ". asp", ". jsp", ". cgi" or longer including "? "is used to determine the dynamic page URL. Specifically, a plurality of session records in one WAL are preprocessed, and in the case that the character string information of the session records belongs to the character string of the request of the embedded object, the session records are deleted until all the requests of the embedded object in the WAL are deleted. Before extracting the session set, the embodiment preprocesses the session records in the log data, and deletes the request of the embedded object which does not have the significance of data mining, so as to improve the accuracy of extracting the session set.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
Fig. 5 is a flowchart of a method for extracting a set of sessions according to a preferred embodiment of the present application, the method comprising the steps of:
step S510, acquiring a second session record in the WAL;
step S520, judging whether the second identification information in the second session record is the same as the first identification information in the first session record, and judging whether the time difference between the first session time and the second session time is less than 30min under the condition that the second identification information is the same as the first identification information;
step S530, under the condition that the second identification information is different from the first identification information, or under the condition that the time difference between the first conversation time and the second conversation time is greater than or equal to 30min, dividing the first conversation record and the second conversation record into different conversation sets;
step S540, under the condition that the time difference between the first conversation time and the second conversation time is less than 30min, combining the first conversation record and the second conversation record into a conversation set;
step S550, after all session records belonging to the request of the page object in the WAL are identified, determining that the WAL is empty, otherwise, obtaining the next session record and continuing to extract the session set.
Through the above steps S510 to S550, the present embodiment performs extraction of a session set from a plurality of session records of the user based on the session time threshold, so as to improve accuracy of session set extraction.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The method embodiments provided in the present application may be executed in a terminal, a computer or a similar computing device. Taking the operation on the terminal as an example, fig. 6 is a block diagram of a hardware structure of the terminal based on the method for detecting the DDos attack by Web log analysis according to the embodiment of the present application. As shown in fig. 6, the terminal 60 may include one or more (only one shown in fig. 6) processors 602 (the processor 602 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 604 for storing data, and optionally may also include a transmission device 606 for communication functions and an input-output device 608. It will be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration and is not intended to limit the structure of the terminal. For example, terminal 60 may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
The memory 604 may be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the detection method of the emerging entity in the embodiment of the present application, and the processor 602 executes various functional applications and data processing by running the computer programs stored in the memory 604, so as to implement the above-mentioned method. The memory 604 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 604 may further include memory located remotely from the processor 602, which may be connected to the terminal 60 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmitting device 606 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 60. In one example, the transmission device 606 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 606 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The present embodiment further provides a system for detecting a DDos attack based on Web log analysis, where the system is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a block diagram of a system for detecting a DDos attack based on Web log analysis according to an embodiment of the present application, and as shown in fig. 7, the apparatus includes a session identification module 71, an information acquisition module 72, and a decision determination module 73:
a session identification module 71, configured to obtain log data of a website platform, and extract a session set of a user from the log data;
an information obtaining module 72, configured to obtain, in the session set, access information of the user, where the access information includes an access frequency, an average request traffic, and an average access proportion of a web page;
a decision determining module 73, configured to determine, through a decision tree, that the access of the user is a DDos attack if the access information meets a determination condition, where the determination condition includes that the access frequency is greater than a frequency threshold, the average request traffic is less than a request traffic threshold, and the average access proportion is less than at least one of access proportion thresholds.
In the present embodiment, the session identification module 71 extracts the session set, the information acquisition module 72 acquires access information including access frequency, average request traffic and average access proportion, and the decision determination module 73 analyzes the session set according to the access information. Because a plurality of characteristics of user access are considered, the DDos attack and the normal access behavior are distinguished not only by depending on the traffic information, so that the method in the embodiment solves the problem of low accuracy caused by distinguishing the DDos attack and the normal access behavior by the traffic information in the related art, and improves the speed and the accuracy of detecting the DDos attack.
Fig. 8 is a schematic diagram of a system for detecting a DDos attack based on Web log analysis according to an embodiment of the present application, and as shown in fig. 8, the system includes a session record collection module 81, a session record cleansing module 82, a session set extraction module 83, an access information extraction module 84, and a decision module 85.
The session record cleaning module 82 cleans the session records from the session record acquisition module 81, the session set extraction module 83 extracts the session sets from the cleaned session records, and then the access information extraction module 84 calculates the access information of the session sets of each user, wherein the access information includes access frequency, average request flow and average access proportion, and the obtained access information realizes detection of application layer DDos attack through the decision module 85.
The log data of the web page records information such as an IP address of a user, a URL of a user access page, the size of content returned to a user request, the version type of a browser of the user, state information correspondingly returned by a web server to the user request, access time of the user, interval time of the request, access interest and the like.
By observing the webpage access behavior of a normal user, the webpage which the user often accesses and is interested in can be obtained, and the popularity of the website is highly asymmetric. However, an attacker generally adopts to circularly visit the same page or traverse all links of a website platform to realize attack, so that the webpage visited by the attacker does not surround the same theme, and the popularity of the page visited by the attack deviates from the normal page popularity. The attack detection method is obtained through analysis, and because 30% of webpages have 60% -80% of requests, the attack detection can be better realized when the access proportion threshold value is set to 60% -80%.
The decision module 85 in this embodiment is implemented based on a decision tree, and performs prediction and classification by using a C4.5 algorithm. And the step of classification is to compare the acquired access information with an access threshold value at the root node of the decision tree, and judge which classification the access behavior corresponding to the access information belongs to according to the comparison result, so as to determine the downward branch of the root node, and finally obtain the classification result at the leaf node.
The C4.5 algorithm is mainly to calculate the information gain rate of each access information to determine the root node, and make corresponding adjustments after forming the tree structure, and finally reach the tree in an ideal state.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring the log data of the website platform, and extracting the session set of the user from the log data;
s2, acquiring the access information of the user in the session set, wherein the access information comprises the access frequency, the average request flow and the average access proportion of the webpage;
s3, determining, by the decision tree, that the access of the user is DDos attack if the access information satisfies a determination condition, where the determination condition includes at least one of the access frequency being greater than a frequency threshold, the average request traffic being less than a request traffic threshold, and the average access proportion being less than an access proportion threshold.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, by combining the method for detecting a DDos attack based on Web log analysis in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of a method for detecting a DDos attack based on Web log analysis.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for detecting DDos attacks based on Web log analysis, the method comprising:
acquiring log data of a website platform, and extracting a session set of a user from the log data;
in the session set, acquiring access information of the user, wherein the access information comprises the access frequency, the average request flow and the average access proportion of a webpage;
and through a decision tree, under the condition that the access information meets a judgment condition, judging that the access of the user is DDos attack, wherein the judgment condition comprises at least one of the condition that the access frequency is greater than a frequency threshold, the average request flow is less than a request flow threshold and the average access proportion is less than an access proportion threshold.
2. The method of claim 1, wherein obtaining the average access proportion comprises:
acquiring the visit volume of each webpage in the session set and the total visit volume of the website platform according to the log data;
obtaining the access proportion of each webpage in the session set according to the access amount of each webpage and the total access amount;
and obtaining the average access proportion according to the sum of the access proportions and the number of requests in the session set.
3. The method of claim 1, wherein obtaining the average requested traffic comprises:
and calculating the average request flow according to the request times of the user in the session set and the total number of the request flows.
4. The method of claim 1, wherein obtaining the access frequency comprises:
and calculating the access frequency according to the request times of the user in the session set and the session duration of the session set.
5. The method of claim 1, wherein calculating the access threshold value in the case that the frequency threshold value, the request traffic threshold value, and the access proportion threshold value are all access threshold values comprises:
obtaining a plurality of data of each kind of the access information from a plurality of historical conversation sets;
sorting the data according to the numerical values of the data, wherein the data comprises normal access data and attack data;
at the boundary of the normal access data and the attack data, taking a preset amount of data and calculating the information gain rate of each data;
and taking the data with the maximum information gain rate as the access threshold.
6. The method of claim 1, wherein extracting the set of sessions for the user in the log data comprises:
acquiring a first session record and a second session record of the user according to the log data; the first session record comprises first identification information and a first session time; the second session record comprises second identification information and a second session time; the first session record and the second session record are session records adjacent in time;
and merging the first session record and the second session record into the session set under the condition that the first identification information and the second identification information are the same and the time difference between the first session time and the second session time is less than a session time threshold value.
7. The method of claim 1, wherein prior to extracting the set of sessions for the user from the log data, the method further comprises:
and acquiring a third session record according to the log data, and deleting the third session record under the condition that the character string information of the third session record belongs to a preset character string set, wherein the preset character string set comprises the character string of the request of the embedded object.
8. A system for detecting DDos attack based on Web log analysis is characterized by comprising a session identification module, an information acquisition module and a decision judgment module:
the session identification module is used for acquiring log data of a website platform and extracting a session set of a user from the log data;
the information acquisition module is used for acquiring the access information of the user in the session set, wherein the access information comprises the access frequency, the average request flow and the average access proportion of the webpage;
the decision-making judgment module is configured to judge, through a decision tree, that the access of the user is a DDos attack when the access information meets a judgment condition, where the judgment condition includes at least one of the access frequency being greater than a frequency threshold, the average request traffic being less than a request traffic threshold, and the average access proportion being less than an access proportion threshold.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for detecting a DDos attack based on Web log analysis according to any one of claims 1 to 7.
10. A storage medium having stored thereon a computer program, wherein the computer program is arranged to execute the method for detecting a DDos attack based on Web log analysis of any of claims 1 to 7 when running.
CN202010737398.2A 2020-07-28 2020-07-28 Method, system and electronic device for detecting DDos attack based on Web log analysis Withdrawn CN112019508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010737398.2A CN112019508A (en) 2020-07-28 2020-07-28 Method, system and electronic device for detecting DDos attack based on Web log analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010737398.2A CN112019508A (en) 2020-07-28 2020-07-28 Method, system and electronic device for detecting DDos attack based on Web log analysis

Publications (1)

Publication Number Publication Date
CN112019508A true CN112019508A (en) 2020-12-01

Family

ID=73498726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010737398.2A Withdrawn CN112019508A (en) 2020-07-28 2020-07-28 Method, system and electronic device for detecting DDos attack based on Web log analysis

Country Status (1)

Country Link
CN (1) CN112019508A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626814A (en) * 2021-08-10 2021-11-09 国网福建省电力有限公司 Window system emergency response method based on malicious attack behaviors
CN114760125A (en) * 2022-04-08 2022-07-15 中国银行股份有限公司 Method and device for detecting abnormal data access
CN116232767A (en) * 2023-05-06 2023-06-06 杭州美创科技股份有限公司 DDoS defense method, device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724059A (en) * 2012-03-31 2012-10-10 常熟市支塘镇新盛技术咨询服务有限公司 Website operation state monitoring and abnormal detection based on MapReduce
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102724059A (en) * 2012-03-31 2012-10-10 常熟市支塘镇新盛技术咨询服务有限公司 Website operation state monitoring and abnormal detection based on MapReduce
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈伊娜: "《基于行为特征的DDoS攻击检测方法的研究》", 《万方学位论文库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626814A (en) * 2021-08-10 2021-11-09 国网福建省电力有限公司 Window system emergency response method based on malicious attack behaviors
CN114760125A (en) * 2022-04-08 2022-07-15 中国银行股份有限公司 Method and device for detecting abnormal data access
CN116232767A (en) * 2023-05-06 2023-06-06 杭州美创科技股份有限公司 DDoS defense method, device, computer equipment and storage medium
CN116232767B (en) * 2023-05-06 2023-08-15 杭州美创科技股份有限公司 DDoS defense method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
US20220224706A1 (en) Artificial intelligence-based network security protection method and apparatus, and electronic device
US11122067B2 (en) Methods for detecting and mitigating malicious network behavior and devices thereof
US10530671B2 (en) Methods, systems, and computer readable media for generating and using a web page classification model
US11399288B2 (en) Method for HTTP-based access point fingerprint and classification using machine learning
CN112019508A (en) Method, system and electronic device for detecting DDos attack based on Web log analysis
CN103179132B (en) A kind of method and device detecting and defend CC attack
CN107465651B (en) Network attack detection method and device
Gurulakshmi et al. Analysis of IoT bots against DDOS attack using machine learning algorithm
US11451583B2 (en) System and method to detect and block bot traffic
CN107483488A (en) A kind of malice Http detection methods and system
TW201824047A (en) Attack request determination method, apparatus and server
CN109729044B (en) Universal internet data acquisition reverse-crawling system and method
CN102638448A (en) Method for judging phishing websites based on non-content analysis
Taylor et al. Detecting malicious exploit kits using tree-based similarity searches
CN102724317A (en) Network data flow classification method and device
CN102291390A (en) Method for defending against denial of service attack based on cloud computation platform
CN107528812B (en) Attack detection method and device
CN114422211B (en) HTTP malicious traffic detection method and device based on graph attention network
WO2013026362A1 (en) Method and system for monitoring network traffic
Cai et al. Detecting HTTP botnet with clustering network traffic
CN110636068B (en) Method and device for identifying unknown CDN node in CC attack protection
CN113810381B (en) Crawler detection method, web application cloud firewall device and storage medium
CN108647240A (en) A kind of method, apparatus, electronic equipment and the storage medium of statistics visit capacity
Suchacka Analysis of aggregated bot and human traffic on e-commerce site
CN103401861B (en) Proxy surfing recognition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201201