CN110855648B - Early warning control method and device for network attack - Google Patents

Early warning control method and device for network attack Download PDF

Info

Publication number
CN110855648B
CN110855648B CN201911067096.2A CN201911067096A CN110855648B CN 110855648 B CN110855648 B CN 110855648B CN 201911067096 A CN201911067096 A CN 201911067096A CN 110855648 B CN110855648 B CN 110855648B
Authority
CN
China
Prior art keywords
attack
alarm data
data
similarity
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911067096.2A
Other languages
Chinese (zh)
Other versions
CN110855648A (en
Inventor
聂利权
曾凡
阮华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911067096.2A priority Critical patent/CN110855648B/en
Publication of CN110855648A publication Critical patent/CN110855648A/en
Application granted granted Critical
Publication of CN110855648B publication Critical patent/CN110855648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application relates to the technical field of data processing, and discloses a network attack early warning control method and device, which are used for reducing the workload of analyzing network attacks and improving the efficiency. The method comprises the following steps: acquiring a plurality of attack alarm data; determining data similarity between every two attack alarm data in the plurality of attack alarm data; clustering the attack alarm data according to the data similarity between every two attack alarm data to obtain N clustering categories; and aiming at each cluster category in the L cluster categories, selecting at least one attack alarm data from the cluster categories to carry out early warning control.

Description

Early warning control method and device for network attack
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for early warning control of a network attack.
Background
The internet has become a part of people's lives, and carries a large amount of data information. The client and the server perform data communication based on a communication protocol, and transmit information in the form of data packets. In the course of conducting data communications, attacks from malicious attackers often exist.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept. In a cloud computing scene, a large amount of user information is concentrated on a cloud computing provider, and compared with the traditional internet service, the user information is more concentrated, the information asset value is higher, and more attacks are encountered. The security solution of cloud computing needs to provide a security solution according to the differentiated security requirements of different services.
When a user accesses a website, data is sent to the website, the sent data is called a request, the website returns the data requested by the user after receiving the request, and the returned data is called a response. When an attacker attacks a website, an attack code is added into a request to try to trigger a website vulnerability, so that data which cannot be obtained by the attacker, such as sensitive information or information which can cause further attack, is obtained through response information. In order to timely sense and control the attack situation of the network, the attack requests need to be followed and analyzed, and the follow-up and analysis according to all the attack requests has huge workload and low efficiency.
Disclosure of Invention
The embodiment of the application provides a network attack early warning control method and device, which are used for reducing the workload of analyzing network attacks and improving the efficiency.
According to a first aspect of an embodiment of the present application, a method for early warning control of a network attack is provided, including:
acquiring a plurality of attack alarm data;
determining data similarity between every two attack alarm data in the plurality of attack alarm data;
clustering the attack alarm data according to the data similarity between every two attack alarm data to obtain N clustering categories;
aiming at each cluster category in the L cluster categories, selecting at least one attack alarm data from the cluster categories to carry out early warning control; wherein the L cluster categories are determined from the N cluster categories, and L is more than or equal to 1 and less than or equal to N.
According to a second aspect of the embodiments of the present application, there is provided a network attack early warning control apparatus, including:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of attack alarm data;
the determining unit is used for determining the data similarity between every two attack alarm data in the attack alarm data;
the clustering unit is used for clustering the attack alarm data according to the data similarity between every two attack alarm data to obtain N clustering categories;
the device comprises a selecting unit, a pre-warning unit and a warning unit, wherein the selecting unit is used for selecting at least one attack warning data from L cluster categories for early warning control aiming at each cluster category in the L cluster categories; wherein the L cluster categories are determined from the N cluster categories, and L is more than or equal to 1 and less than or equal to N.
According to a third aspect of the embodiments of the present application, there is provided a computing device, including at least one processor and at least one memory, where the memory stores a computer program, and when the program is executed by the processor, the processor is caused to execute the steps of the method for early warning and controlling network attacks provided by the embodiments of the present application.
According to a fourth aspect of the embodiments of the present application, a storage medium is provided, where the storage medium stores computer instructions, and when the computer instructions are run on a computer, the computer is caused to execute the steps of the method for early warning and controlling a network attack provided in the embodiments of the present application.
In the embodiment of the application, the server acquires a plurality of attack alarm data, determines the data similarity between every two attack alarm data, and clusters all the attack alarm data according to the data similarity to obtain N cluster categories. And then, determining L cluster categories from the N cluster categories, and selecting at least one attack alarm data from the cluster categories for early warning control aiming at each cluster category in the L cluster categories. Wherein L is more than or equal to 1 and less than or equal to N. According to the attack warning data similarity aggregation method and device, based on the fact that different attack warning data have certain similarity, the attack warning data with the higher similarity degree are aggregated into one type according to the similarity degree of the attack warning data. For each type of attack alarm data, only one or part of the attack alarm data needs to be extracted for attack analysis and early warning control. Therefore, the number of attack alarm data which need to be followed and further analyzed can be effectively reduced, the workload of analyzing network attacks is reduced, and the working efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.
Fig. 1 is a system architecture diagram of an early warning control system for network attacks in an embodiment of the present application;
fig. 2 is a flowchart of an early warning control method of a network attack in an embodiment of the present application;
FIG. 3 is a schematic view of a hierarchical clustering algorithm in an embodiment of the present application;
fig. 4 is a block diagram illustrating a structure of an early warning control apparatus for network attack according to an embodiment of the present application;
fig. 5 is a block diagram illustrating a server according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.
The terms "first" and "second" in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Some concepts related to the embodiments of the present application are described below.
Network attack: refers to any type of offensive action directed to a computer information system, infrastructure, computer network, or personal computer device. For computers and computer networks, destroying, revealing, modifying, disabling software or services, stealing or accessing data from any computer without authorization, is considered an attack in computers and computer networks. The attack types can be divided into passive attacks and active attacks. Active attacks can result in the tampering of certain data streams and the creation of spurious data streams. Such attacks can be classified as tampering, falsification of message data and termination (denial of service). In passive attack, an attacker does not modify data information, and interception/eavesdropping refers to the fact that the attacker obtains information or related data without user consent and approval. The method generally comprises attack modes such as interception, flow analysis, and the cracking of weakly encrypted data streams.
Attack alarm data: the server can analyze the network attack condition suffered by the server based on the attack alarm data based on the content recorded in the network log of the received network attack.
Editing distance: the Edit Distance, also known as the Levenshtein Distance, is a string metric that measures the difference between two sequences of characters, and the Edit Distance between two words is the minimum number of single character edits (insertions, deletions, or substitutions) required to convert one word to another.
Hierarchical clustering: a clustering method calculates the similarity between nodes through a certain similarity measure, and connects the nodes again step by step according to the sequence of similarity from high to low.
Attack vector: refers to the type of network attack identified in the network attack detection system.
Artificial intelligence: (Artificial Intelligence), abbreviated in english as AI, is a new technical science of studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. From the development process of artificial intelligence, machine learning is another important research field of application of artificial intelligence after an expert system, and is also one of core research subjects of artificial intelligence and neural computation.
Referring to fig. 1, a diagram of an early warning control system architecture for network attacks according to an embodiment of the present application is shown, including a first server 101, a first terminal 102, a second server 103, and a second terminal 104.
The first terminal 102 is installed with a client or a browser, and the first server 101 may provide a background service for the client or the browser on the first terminal 102. For example, the first server 101 may receive a hypertext Transfer Protocol (HTTP) request sent by the first terminal 102, where the HTTP request is used to request the first server 101 to provide a service to the first terminal 102. The first server 101 generates a corresponding HTTP response based on the HTTP request and sends the corresponding HTTP response to the first terminal 102. The weblog on the first server 101 stores the data portions extracted from the received HTTP request and from the corresponding HTTP response.
The second server 103 may obtain the content in the weblog on the first server 101, that is, the attack warning data, analyze the network attack on the first server 101 based on the attack warning data, and send the analysis result to the second terminal 104. The information is presented to a user of the second terminal 104, such as an operation and maintenance engineer, on the second terminal 104, and the user of the second terminal 104 can further analyze the network attack on the first server 101 in detail according to the analysis result.
The first terminal 102 and/or the second terminal 104 may be an electronic device with a wireless communication function, such as a mobile phone, a tablet computer, or a dedicated handheld device, or may be a device connected to the internet in a wired access manner, such as a Personal Computer (PC), a notebook computer, or a server.
The first server 101 and/or the second server 103 may be a computer or other network device. The first server 101 and/or the second server 103 may be an independent device, or may be a server cluster formed by a plurality of servers. Preferably, the first server 101 and/or the second server 103 may perform information processing by using a cloud computing technology.
The network in the System may be an INTERNET network, or may also be a Global System for Mobile Communications (GSM), Long Term Evolution (LTE), or other Mobile communication systems.
It should be noted that the above-mentioned application scenarios are only presented for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, the embodiments of the present application may be applied to any applicable scenario.
The following describes a data processing method provided in the embodiment of the present application with reference to an application scenario shown in fig. 1.
Referring to fig. 2, an embodiment of the present application provides a method for early warning and controlling a network attack, where as shown in fig. 2, the method includes:
step S201: and acquiring a plurality of attack alarm data.
In a specific implementation process, the analysis server may read a web log, which is a web log, of one or more background servers. After receiving a request, a background server writes a record in the weblog, and for an attack request, the background server also writes a corresponding record in the weblog. The analysis server acquires attack alarm data corresponding to the attack requests from the weblogs of the background server, wherein one attack alarm data corresponds to one attack request.
Step S202: and determining the data similarity between every two attack alarm data in the plurality of attack alarm data.
For example, the attack warning data acquired by the analysis server is data a, data B, data C, and data D, which are 4 attack warning data in total. The analysis server needs to determine the similarity between data a and data B, data a and data C, data a and data D, data B and data C, data B and data D, and data C and data D, and then 6 similarities are obtained.
Specifically, the method for calculating the data similarity in the embodiment of the present application may be to calculate an edit distance between data, calculate the similarity between data in a manner of locality sensitive hashing, or calculate the data similarity by using another manner.
Step S203: and clustering the plurality of attack alarm data according to the data similarity between every two attack alarm data to obtain N clustering categories.
Specifically, in the embodiment of the present application, a hierarchical clustering method may be used to cluster a plurality of attack alarm data, or a density-based algorithm such as DBSCAN may be used to cluster the attack alarm data, or other clustering methods may be used to cluster the attack alarm data.
Step S204: and aiming at each cluster category in the L cluster categories, selecting at least one attack alarm data from the cluster categories to carry out early warning control.
Wherein, L cluster categories are determined from N cluster categories, L is more than or equal to 1 and less than or equal to N.
Preferably, after the plurality of attack alarm data are clustered into N cluster categories, in the embodiment of the present application, for each cluster category in the N cluster categories, attack alarm data are selected from the cluster category. Generally, one attack alarm data is selected from each cluster type, and of course, any number of attack alarm data can be selected from one cluster type to perform early warning control.
The number of attack alarm data selected from different cluster categories may be the same or different. For example, for 3 cluster categories, 2 attack alarm data may be selected from the first cluster category, 1 attack alarm data may be selected from the second cluster category, and 3 attack alarm data may be selected from the third cluster category. Or respectively selecting 2 attack alarm data from the first cluster category, the second cluster category and the third cluster category.
In the embodiment of the application, the server acquires a plurality of attack alarm data, determines the data similarity between every two attack alarm data, and clusters all the attack alarm data according to the data similarity to obtain N cluster categories. And then, determining L cluster categories from the N cluster categories, and selecting at least one attack alarm data from the cluster categories for early warning control aiming at each cluster category in the L cluster categories. Wherein L is more than or equal to 1 and less than or equal to N. According to the attack warning data similarity aggregation method and device, based on the fact that different attack warning data have certain similarity, the attack warning data with the higher similarity degree are aggregated into one type according to the similarity degree of the attack warning data. For each type of attack alarm data, only one or part of the attack alarm data needs to be extracted for attack analysis and early warning control. Therefore, the number of attack alarm data which need to be followed and further analyzed can be effectively reduced, the workload of analyzing network attacks is reduced, and the working efficiency is improved.
In order to facilitate comparison of data similarity between two attack alarm data, the embodiment of the application sets a plurality of attack features for the attack alarm data. Step S202, determining a data similarity between every two attack alarm data in the multiple attack alarm data, specifically including:
for any two attack alarm data, determining the data similarity between the two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data;
wherein, the M characteristic fields are respectively corresponding to the set M attack characteristics.
In a specific implementation process, the attack characteristics of the attack alarm data may include a source IP (Internet Protocol address), a request host (host name), a request CGI (Common Gateway Interface), a request parameter, a request content, a request Cookie (small text data), an attack vector, and the like. The source IP and the request host are used for identifying an attack object of the attack request; and requesting the CGI request parameters, request content, request Cookie and attack vectors to describe an attack mode of the attack request.
The data similarity of the two attack alarm data is compared, and the data similarity is determined by comparing the field similarity of the two attack alarm data corresponding to the same attack characteristic. For example, the source IP of the attack alarm data a is xxxxx, the request host is yyyyy, and the request CGI is zzzz; the source IP of the attack alarm data B is wwwwwww, the request host is rrrrrr, and the request CGI is uuu. When comparing the similarity between the attack alarm data a and the attack alarm data B, it is necessary to compare the field similarity between the source IP of the attack alarm data a and the source IP of the attack alarm data B, the field similarity between the request host of the attack alarm data a and the request host of the attack alarm data B, the field similarity between the request CGI of the attack alarm data a and the request CGI of the attack alarm data B, and the like, that is, the field similarity between xxxxx and wwwwwww, the field similarity between yyyyyy and rrrrrr, the field similarity between zzz and uu, and the like.
In order to improve the accuracy of similarity calculation and further improve the accuracy of clustering, weights are distributed to different attack characteristics. Determining the data similarity between two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data, including:
respectively determining field similarity between characteristic fields corresponding to each attack characteristic in M attack characteristics of the two attack alarm data;
and determining the data similarity between the first attack alarm data and the second attack alarm data according to the field similarity of each attack characteristic and the corresponding weight.
That is, the weights of different attack features are different, for example, the weight of the source IP is 20%, the weight of the request host is 15%, and the weight of the request CGI is 5% … …, when calculating the data similarity between two attack alarm data, the weights of different attack features need to be considered.
Preferably, the weighted average of the similarity of the plurality of fields is used as the data similarity between the two attack alarm data. For example, the field similarity between the source IP of the attack alarm data a and the source IP of the attack alarm data B is x, the field similarity between the request host of the attack alarm data a and the request host of the attack alarm data B is y, the field similarity between the request CGI of the attack alarm data a and the request CGI of the attack alarm data B is z, and … … indicates that the data similarity S between the attack alarm data a and the attack alarm data B is 20% x + 15% y + 5% z + … …
In summary, the field similarity between the attack alarm data a and the attack alarm data B is q1、q2、q3、……qnCorresponding weights are respectively p1、p2、p3、……pnThen, the data similarity S between the attack warning data a and the attack warning data B is calculated according to the following formula:
S=p1 q1+p2 q2+p3 q3……+pn qn… … equation 1
Wherein p is1+p2+p3……+pn=1。
Further, any way of calculating text similarity may be selected to calculate field similarity. In the embodiments of the present application, the edit distance is described as an example. The above determining field similarity between the feature fields corresponding to each attack feature in the M attack features of the two attack alarm data respectively includes:
determining an editing distance between two feature fields corresponding to the attack features aiming at each attack feature in the M attack features;
dividing the editing distance by the maximum character string length in the two characteristic fields corresponding to the attack characteristics to serve as the field similarity between the characteristic fields corresponding to the attack characteristics.
The edit distance is an index for measuring the similarity between two data. That is, the edit distance refers to the distance between two character strings<w1,w2>By one of the character strings w1Is converted intoAnother character string w1The minimum number of single character editing operations required. There are and only three single character editing operations defined herein: insertion (Insertion), Deletion (Deletion), replacement (subscription). For example, two strings, "kitten" and "sitting", the minimum single character editing operations required to convert "kitten" to "sitting" are: kitten → sitten (submission of "s" for "k"), i.e., k to s;
sitten → sittin (recommendation of "i" for "e"), i.e., e is converted to i;
3.sittin → sitting (insertion of "g" at the end), i.e. deleting g.
Therefore, the edit distance between the two strings "kitten" and "sitting" is 3.
Further, in the embodiment of the present application, the edit distance is divided by the maximum string length among the two feature fields. The above-mentioned character strings "kitten" and "sitting" are still taken as examples. The string length of the string "kitten" is 6, and the string length of "sitting" is 7, wherein the maximum string length is 7, the editing distance 3 is divided by the maximum string length 7 to obtain the field similarity between the strings "kitten" and "sitting", that is, the field similarity between "kitten" and "sitting" is 3/7.
And after the data similarity between every two attack alarm data is calculated according to the mode, clustering all the attack alarm data. Any clustering algorithm can be selected to cluster the attack alarm data. In the embodiment of the present application, a hierarchical clustering algorithm is taken as an example for explanation. In step 203, clustering the attack alarm data according to the data similarity between every two attack alarm data to obtain N cluster categories, including:
performing multi-layer clustering on the plurality of attack alarm data by using a hierarchical clustering algorithm according to the data similarity between every two attack alarm data until the number of the clustering categories is less than or equal to the threshold value of the number of the categories, wherein the number of the clustering categories corresponding to different clustering layers is different; the category number threshold is less than or equal to N;
determining a contour coefficient corresponding to any clustering layer number according to all clustering categories corresponding to the clustering layer number;
and taking all the cluster categories corresponding to the cluster layer number with the maximum outline coefficient as the N cluster categories.
The basic idea of the hierarchical clustering method is as follows: the similarity between the nodes is calculated through a certain similarity measure, and the nodes are reconnected step by step according to the sequence of similarity from high to low. The method has the advantages that the division can be stopped at any time, and the method mainly comprises the following steps:
1. calculating the similarity between the data samples;
2. assuming each data sample to be a cluster category;
3. and (3) circulation: merging two clustering categories with highest similarity, and then updating a similarity matrix;
4. when the number of cluster categories reaches a threshold, the loop terminates.
For a better understanding, the algorithm is explained in detail below. Assume that there are 6 attack alert data A, B, C, D, E, F.
In the first step, each attack alarm data is assumed to be a cluster type, and at this time, there are 6 cluster types. And calculating the similarity between each attack alarm data.
And secondly, comparing the similarity between every two attack alarm data, if the similarity between B and C is the highest, combining B and C into a cluster type, and further adding 5 cluster types, namely A, BC, D, E and F.
And thirdly, comparing the similarity between every two clustering categories, and if the similarity between BC and D is the highest, combining BC and D into one clustering category to obtain 4 clustering categories, namely A, BCD, E and F.
And fourthly, comparing the similarity between every two clustering categories, and if the similarity between E and F is the highest, combining E and F into one clustering category to obtain 3 clustering categories, namely A, BCD and EF.
And fifthly, comparing the similarity between every two clustering categories, and if the similarity between the BCD and the EF is the highest, combining the BCD and the EF into one clustering category to obtain 2 clustering categories, namely A and BCDEF.
And sixthly, merging the clustering class A and the BCDEF.
It should be noted that, if the threshold of the number of categories is 2, the process is executed to the fifth step, and then clustering is finished; if the threshold of the number of categories is 3, the process is executed to the fourth step, and then the clustering is finished.
In order to record the aggregation process of the clustering categories, the hierarchical clustering algorithm can be visualized by using a tree diagram as shown in fig. 3.
In the above process, the similarity between the cluster categories may be represented based on the similarity between the attack alarm data in the cluster categories. For example, the minimum similarity representation, the maximum similarity representation, the average similarity representation, the center similarity representation, or the minimum variance may be used. For example, the minimum similarity represents, as the similarity between C1 and C2, the data similarity between attack warning data having the highest similarity between two cluster categories, as the similarity between the cluster categories C1 and C2. The rest of the methods can be represented by the minimum similarity, and the description thereof is omitted here.
Meanwhile, in order to measure the clustering effect, the contour coefficient silhoutte _ score is used as an evaluation index of the clustering effect in the embodiment of the application. For example, setting the threshold of the number of classes to be 2, executing the process to the fifth step, namely finishing clustering, completing clustering in each step, respectively calculating a contour coefficient, and selecting the highest number of the clustering classes of the silouette _ score as the final number of the clustering classes. For example, in the first step to the fifth step, if the contour coefficient is the largest for 3 cluster types, the final number of cluster types is 3, and the cluster types are a, BCD, and EF, respectively.
Wherein the contour coefficient is calculated according to the following formula:
Figure GDA0003306168830000121
wherein j is the intra-class average similarity of each cluster class, and k is the similarity between one attack alarm data and the attack alarm data of the nearest non-cluster class.
In addition, in the embodiment of the application, the basis for clustering the attack alarm data is to acquire a large amount of attack alarm data, so data filtering is required before calculation. After acquiring a plurality of attack alarm data, the method further comprises the following steps:
determining an identification field in each attack alarm data;
determining the alarm number of attack alarm data with the same identification field;
and deleting the attack alarm data corresponding to the identification fields with the alarm number smaller than the filtering threshold value.
In a specific implementation process, the identification field may be a source IP, or an attack vector, or both the source IP and the attack vector. Namely, the alarm number of the attack alarm data of the same source IP and the attack vector is determined. And if the alarm number is smaller than the filtering threshold value, deleting the attack alarm data of the same source IP and the attack vector. I.e., a smaller amount of attack alert data, need not be aggregated and is therefore not considered within the scope of the embodiments of the present application.
The above flow is described in detail in the following with specific embodiments, which include the following steps:
the analysis server obtains a plurality of attack alarm data from the weblog of the background server.
And determining a source IP, a request host, a request CGI, request parameters, request content, a request Cookie and an attack vector field of each attack alarm data.
Determining the alarm number of the attack alarm data with the same source IP and the attack vector field, and deleting the attack alarm data if the alarm number is smaller than a filtering threshold; and if the filtering threshold is larger than or equal to the filtering threshold, executing the subsequent steps.
Based on the attack feature source IP, the request host, the request CGI, the request parameters, the request content, the request Cookie and the attack vector field, the field similarity between the corresponding feature fields of every two attack alarm data is calculated by using an edit distance algorithm.
And calculating the data similarity between every two attack alarm data according to the field similarity between the characteristic fields and the corresponding weight.
And performing multi-layer clustering on all attack alarm data by using a hierarchical clustering algorithm according to the data similarity between every two attack alarm data until the number of the clustering categories is less than or equal to the threshold of the number of the categories. Wherein, each clustering is performed once, and a contour coefficient is calculated.
And taking all cluster categories corresponding to the cluster layer number with the maximum outline coefficient as final cluster categories.
And selecting one attack alarm data from each cluster category to perform early warning control.
The following are embodiments of the apparatus of the present application, and for details not described in detail in the embodiments of the apparatus, reference may be made to the above-mentioned one-to-one corresponding method embodiments.
Referring to fig. 4, a block diagram of a data processing system according to an embodiment of the present application is shown. The cross-link data processing apparatus is implemented by hardware or a combination of hardware and software as all or a part of the server 103 in fig. 1. The device includes: acquisition section 401, determination section 402, clustering section 403, selection section 404, and filtering section 405.
An obtaining unit 401, configured to obtain multiple attack alarm data;
a determining unit 402, configured to determine a data similarity between every two attack alarm data in the plurality of attack alarm data;
a clustering unit 403, configured to cluster the attack alarm data according to data similarity between every two attack alarm data, so as to obtain N clustering categories;
a selecting unit 404, configured to select, for each cluster category in the L cluster categories, at least one attack warning data from the cluster categories to perform early warning control; wherein the L cluster categories are determined from the N cluster categories, and L is more than or equal to 1 and less than or equal to N.
In an alternative embodiment, the determining unit 402 is specifically configured to:
for any two attack alarm data, determining the data similarity between the two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data;
the M characteristic fields are respectively corresponding to the set M attack characteristics.
In an alternative embodiment, the determining unit 402 is specifically configured to:
respectively determining field similarity between characteristic fields corresponding to each attack characteristic in M attack characteristics of the two attack alarm data;
and determining the data similarity between the first attack alarm data and the second attack alarm data according to the field similarity of each attack characteristic and the corresponding weight.
In an alternative embodiment, the determining unit 402 is specifically configured to:
determining an editing distance between two feature fields corresponding to the attack features aiming at each attack feature in the M attack features;
dividing the editing distance by the maximum character string length in the two characteristic fields corresponding to the attack characteristics to serve as the field similarity between the characteristic fields corresponding to the attack characteristics.
In an optional embodiment, the clustering unit 403 is specifically configured to:
performing multi-layer clustering on the plurality of attack alarm data by using a hierarchical clustering algorithm according to the data similarity between every two attack alarm data until the number of the clustering categories is less than or equal to the threshold value of the number of the categories, wherein the number of the clustering categories corresponding to different clustering layers is different; the category number threshold is less than or equal to N;
determining a contour coefficient corresponding to any clustering layer number according to all clustering categories corresponding to the clustering layer number;
and taking all the cluster categories corresponding to the cluster layer number with the maximum outline coefficient as the N cluster categories.
In an alternative embodiment, the apparatus further comprises a screening unit 405 for:
determining an identification field in each attack alarm data;
determining the alarm number of attack alarm data with the same identification field;
and deleting the attack alarm data corresponding to the identification fields with the alarm number smaller than the filtering threshold value.
Referring to fig. 5, a block diagram of a server according to an embodiment of the present application is shown. The server 800 is implemented as the server 103 in fig. 1. Specifically, the method comprises the following steps:
the server 800 includes a Central Processing Unit (CPU)801, a system memory 804 including a Random Access Memory (RAM)802 and a Read Only Memory (ROM)803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The server 800 also includes a basic input/output system (I/O system) 806, which facilitates transfer of information between devices within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.
The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse, keyboard, etc. for user input of information. Wherein the display 808 and the input device 809 are connected to the central processing unit 801 through an input output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 810 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage 807 described above may be collectively referred to as memory.
The server 800 may also operate as a remote computer connected to a network via a network, such as the internet, according to various embodiments of the present application. That is, the server 800 may be connected to the network 812 through the network interface unit 811 coupled to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.
The memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include instructions for implementing the early warning control method for network attacks provided by the embodiments of the present application.
Those skilled in the art will understand that all or part of the steps in the method for early warning and controlling network attacks according to the above embodiments may be implemented by a program instructing associated hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Those skilled in the art will understand that all or part of the steps in the network attack early warning control method according to the above embodiments may be implemented by a program instructing associated hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A network attack early warning control method is characterized by comprising the following steps:
acquiring a plurality of attack alarm data;
determining an identification field in each attack alarm data; determining the number of alarms of attack alarm data with the same identification field, and deleting the attack alarm data corresponding to the identification field with the alarm number smaller than a filtering threshold value;
determining the data similarity between every two pieces of attack alarm data in the screened attack alarm data;
performing multi-layer clustering on the plurality of attack alarm data by using a hierarchical clustering algorithm according to the data similarity between every two attack alarm data until the number of the clustering categories is less than or equal to the threshold value of the number of the categories, wherein the number of the clustering categories corresponding to different clustering layers is different; the category number threshold is less than or equal to N; the similarity between the clustering classes is represented based on the minimum similarity or the central similarity or the minimum variance between attack alarm data in the clustering classes;
determining a contour coefficient corresponding to any clustering layer number according to all clustering categories corresponding to the clustering layer number;
acquiring N clustering categories corresponding to the clustering layer number with the maximum outline coefficient;
aiming at each cluster category in the L cluster categories, selecting at least one attack alarm data from the cluster categories to carry out early warning control; wherein the L cluster categories are determined from the N cluster categories, and L is more than or equal to 1 and less than or equal to N.
2. The method according to claim 1, wherein the determining of the data similarity between every two attack alarm data in the plurality of attack alarm data specifically comprises:
for any two attack alarm data, determining the data similarity between the two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data;
the M characteristic fields are respectively corresponding to the set M attack characteristics.
3. The method of claim 2, wherein the two attack alarm data comprise a first attack alarm data and a second attack alarm data;
the determining the data similarity between two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data comprises:
respectively determining field similarity between characteristic fields corresponding to each attack characteristic in M attack characteristics of the two attack alarm data;
and determining the data similarity between the first attack alarm data and the second attack alarm data according to the field similarity of each attack characteristic and the corresponding weight.
4. The method according to claim 3, wherein the determining field similarity between the feature fields corresponding to each attack feature in the M attack features of the two attack alarm data respectively comprises:
determining an editing distance between two feature fields corresponding to the attack features aiming at each attack feature in the M attack features;
dividing the editing distance by the maximum character string length in the two characteristic fields corresponding to the attack characteristics to serve as the field similarity between the characteristic fields corresponding to the attack characteristics.
5. An early warning control device for network attack, the device comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of attack alarm data; determining an identification field in each attack alarm data; determining the number of alarms of attack alarm data with the same identification field, and deleting the attack alarm data corresponding to the identification field with the alarm number smaller than a filtering threshold value;
the determining unit is used for determining the data similarity between every two pieces of attack alarm data in the screened attack alarm data;
the clustering unit is used for carrying out multilayer clustering on the plurality of attack alarm data according to the data similarity between every two attack alarm data by utilizing a hierarchical clustering algorithm until the number of clustering categories is less than or equal to a category number threshold value, wherein the number of clustering categories corresponding to different clustering layer numbers is different; the category number threshold is less than or equal to N; the similarity between the clustering classes is represented based on the minimum similarity or the central similarity or the minimum variance between attack alarm data in the clustering classes; determining a contour coefficient corresponding to any clustering layer number according to all clustering categories corresponding to the clustering layer number; acquiring N clustering categories corresponding to the clustering layer number with the maximum outline coefficient;
the device comprises a selecting unit, a pre-warning unit and a warning unit, wherein the selecting unit is used for selecting at least one attack warning data from L cluster categories for early warning control aiming at each cluster category in the L cluster categories; wherein the L cluster categories are determined from the N cluster categories, and L is more than or equal to 1 and less than or equal to N.
6. The apparatus of claim 5, wherein the determining unit is specifically configured to:
for any two attack alarm data, determining the data similarity between the two attack alarm data according to the field similarity between the M characteristic fields in one attack alarm data and the M characteristic fields in the other attack alarm data;
the M characteristic fields are respectively corresponding to the set M attack characteristics.
7. The apparatus according to claim 6, wherein the two attack warning data include a first attack warning data and a second attack warning data, and the determining unit is specifically configured to:
respectively determining field similarity between characteristic fields corresponding to each attack characteristic in M attack characteristics of the two attack alarm data;
and determining the data similarity between the first attack alarm data and the second attack alarm data according to the field similarity of each attack characteristic and the corresponding weight.
8. The apparatus of claim 7, wherein the determining unit is specifically configured to:
determining an editing distance between two feature fields corresponding to the attack features aiming at each attack feature in the M attack features;
dividing the editing distance by the maximum character string length in the two characteristic fields corresponding to the attack characteristics to serve as the field similarity between the characteristic fields corresponding to the attack characteristics.
9. A computer storage medium, wherein computer-executable instructions are stored in the computer storage medium, and the computer-executable instructions are used for executing the network attack early warning control method according to any one of claims 1 to 4.
10. A storage medium storing computer instructions, wherein the computer instructions, when executed on a computer, cause the computer to execute the method for early warning control of network attacks according to any one of claims 1 to 4.
CN201911067096.2A 2019-11-04 2019-11-04 Early warning control method and device for network attack Active CN110855648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911067096.2A CN110855648B (en) 2019-11-04 2019-11-04 Early warning control method and device for network attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911067096.2A CN110855648B (en) 2019-11-04 2019-11-04 Early warning control method and device for network attack

Publications (2)

Publication Number Publication Date
CN110855648A CN110855648A (en) 2020-02-28
CN110855648B true CN110855648B (en) 2021-11-19

Family

ID=69598827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911067096.2A Active CN110855648B (en) 2019-11-04 2019-11-04 Early warning control method and device for network attack

Country Status (1)

Country Link
CN (1) CN110855648B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496179B (en) * 2020-04-08 2023-12-26 中国电信股份有限公司 Attacker analysis method and device
CN114205094B (en) * 2020-08-27 2023-04-14 腾讯科技(深圳)有限公司 Network attack alarm processing method, device, equipment and storage medium
CN112070161B (en) * 2020-09-08 2024-04-16 南方电网科学研究院有限责任公司 Network attack event classification method, device, terminal and storage medium
CN112564988B (en) * 2021-02-19 2021-06-18 腾讯科技(深圳)有限公司 Alarm processing method and device and electronic equipment
CN113315785B (en) * 2021-06-23 2023-05-12 深信服科技股份有限公司 Alarm reduction method, device, equipment and computer readable storage medium
CN113765923B (en) * 2021-09-08 2023-04-07 上海观安信息技术股份有限公司 Web end parameter detection method, device and system and computer storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853284A (en) * 2010-05-24 2010-10-06 哈尔滨工程大学 Extraction method and device for Internet-oriented meaningful strings
CN104182517A (en) * 2014-08-22 2014-12-03 北京羽乐创新科技有限公司 Data processing method and data processing device
CN105208040A (en) * 2015-10-12 2015-12-30 北京神州绿盟信息安全科技股份有限公司 Network attack detection method and device
CN106021063A (en) * 2016-05-09 2016-10-12 北京蓝海讯通科技股份有限公司 An event message aggregation method, application and system
CN106991033A (en) * 2017-04-01 2017-07-28 北京蓝海讯通科技股份有限公司 Notify method, device, server and the readable storage medium storing program for executing of alarm information
CN107169356A (en) * 2017-05-03 2017-09-15 上海上讯信息技术股份有限公司 System side's analysis method and equipment
CN107358075A (en) * 2017-07-07 2017-11-17 四川大学 A kind of fictitious users detection method based on hierarchical clustering
CN108415910A (en) * 2017-02-09 2018-08-17 中国传媒大学 Topic development cluster analysis system based on time series and method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227528B (en) * 2014-06-26 2018-09-28 华为技术有限公司 To the detection method and device of the attack of Web server group
CN106375339B (en) * 2016-10-08 2019-07-09 电子科技大学 Attack mode detection method based on event sliding window
US10721254B2 (en) * 2017-03-02 2020-07-21 Crypteia Networks S.A. Systems and methods for behavioral cluster-based network threat detection
CN109445936B (en) * 2018-10-12 2020-05-19 深圳先进技术研究院 Cloud computing load clustering method and system and electronic equipment
CN109933984B (en) * 2019-02-15 2020-10-27 中时瑞安(北京)网络科技有限责任公司 Optimal clustering result screening method and device and electronic equipment
CN110311925B (en) * 2019-07-30 2022-06-28 百度在线网络技术(北京)有限公司 DDoS reflection type attack detection method and device, computer equipment and readable medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853284A (en) * 2010-05-24 2010-10-06 哈尔滨工程大学 Extraction method and device for Internet-oriented meaningful strings
CN104182517A (en) * 2014-08-22 2014-12-03 北京羽乐创新科技有限公司 Data processing method and data processing device
CN105208040A (en) * 2015-10-12 2015-12-30 北京神州绿盟信息安全科技股份有限公司 Network attack detection method and device
CN106021063A (en) * 2016-05-09 2016-10-12 北京蓝海讯通科技股份有限公司 An event message aggregation method, application and system
CN108415910A (en) * 2017-02-09 2018-08-17 中国传媒大学 Topic development cluster analysis system based on time series and method
CN106991033A (en) * 2017-04-01 2017-07-28 北京蓝海讯通科技股份有限公司 Notify method, device, server and the readable storage medium storing program for executing of alarm information
CN107169356A (en) * 2017-05-03 2017-09-15 上海上讯信息技术股份有限公司 System side's analysis method and equipment
CN107358075A (en) * 2017-07-07 2017-11-17 四川大学 A kind of fictitious users detection method based on hierarchical clustering

Also Published As

Publication number Publication date
CN110855648A (en) 2020-02-28

Similar Documents

Publication Publication Date Title
CN110855648B (en) Early warning control method and device for network attack
US11190562B2 (en) Generic event stream processing for machine learning
CN114730339A (en) Detecting unknown malicious content in a computer system
CN112839014A (en) Method, system, device and medium for establishing model for identifying abnormal visitor
Narasimha Mallikarjunan et al. DDAM: detecting DDoS attacks using machine learning approach
US11797617B2 (en) Method and apparatus for collecting information regarding dark web
Wang et al. An unknown protocol syntax analysis method based on convolutional neural network
Tang et al. HSLF: HTTP header sequence based LSH fingerprints for application traffic classification
Han et al. Machine learning for Internet of things anomaly detection under low-quality data
US20200004905A1 (en) System and methods for complex it process annotation, tracing, analysis, and simulation
CN110572402A (en) internet hosting website detection method and system based on network access behavior analysis and readable storage medium
Valliyammai et al. Distributed and scalable Sybil identification based on nearest neighbour approximation using big data analysis techniques
Zhao et al. TrCMP: A dependable app usage inference design for user behavior analysis through cyber-physical parameters
Wei et al. Age: authentication graph embedding for detecting anomalous login activities
Vinutha et al. Analysis of NSL-KDD dataset using K-means and canopy clustering algorithms based on distance metrics
Zou et al. Browser fingerprinting identification using incremental clustering algorithm based on autoencoder
Hajdu et al. Use of artificial neural networks to identify fake profiles
CN115099875A (en) Data classification method based on decision tree model and related equipment
WO2022142032A1 (en) Handwritten signature verification method and apparatus, computer device, and storage medium
RU2745362C1 (en) System and method of generating individual content for service user
CN114528908A (en) Network request data classification model training method, classification method and storage medium
KR20210023690A (en) Apparatus for measuring a contribution of content and method thereof
Basurto et al. Dimensionality-reduction methods for the analysis of web traffic
Panja et al. Anomaly Detection in IoT Using Extended Isolation Forest
She et al. An improved malicious code intrusion detection method based on target tree for space information network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40021989

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant