CN109286511B

CN109286511B - Data processing method and device

Info

Publication number: CN109286511B
Application number: CN201710592825.0A
Authority: CN
Inventors: 刘芳宁; 李拓
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2021-10-08
Anticipated expiration: 2037-07-19
Also published as: CN109286511A

Abstract

The invention discloses a data processing method and device, relates to the technical field of firewalls, and solves the problems that analysis of network attack events and network security evaluation are influenced by alarm log loss when a large number of attacks are outbreaked. The method of the invention comprises the following steps: acquiring an original alarm log in a preset time period, wherein the original alarm log is behavior data for recording an attack on a certain network; performing first classification and aggregation on the original alarm logs according to the core attributes of the original alarm logs to obtain a plurality of initial alarm clusters; performing secondary classification and aggregation on the plurality of initial alarm clusters according to the incidence relation in an attack graph to obtain a plurality of final alarm clusters, wherein the attack graph is a connected graph which is formed by the plurality of initial alarm clusters and shows the network attack process in a preset time period; and respectively merging the alarm logs contained in the final alarm clusters to obtain new alarm logs so as to replace the original alarm logs with the new alarm logs. The method and the device are applied to the process of processing the alarm log.

Description

Data processing method and device

Technical Field

The present invention relates to the field of firewall technologies, and in particular, to a method and an apparatus for data processing.

Background

In the internet era, in order to protect the security of an internal network, a firewall is usually used between the internal network and an external network to construct a protection barrier for the internal network so as to protect the internal network from being attacked by an illegal user. When the firewall monitors illegal attacks, the firewall can alarm in time to inform users except for defending the attacks, and generates alarm logs to be stored in a log database, so that network attack events can be analyzed and network safety can be evaluated after the attacks.

Since an alarm is given every time of attack, even if the alarms are continuously the same, the alarm logs are repeatedly generated and recorded in the log database, so that a large number of alarm logs are stored in the log database. However, the space for storing alarm logs is limited, when a large number of attacks are outbreaks, the number of alarms increases dramatically, and the firewall does not have enough space to store all alarm logs, resulting in the discarding of some key alarm logs. When the network attack event needs to be analyzed and the network security needs to be evaluated through the alarm log after the attack, because the alarm log is not complete, the network attack event is difficult to be comprehensively and accurately analyzed and the network security is difficult to be correctly evaluated.

Disclosure of Invention

In view of the above problems, the present invention provides a data processing method and apparatus, so as to solve the problems that the analysis of network attack events and the network security evaluation are affected by the alarm log loss during a large number of attack outbreaks.

In order to solve the above technical problem, in a first aspect, the present invention provides a data processing method, including:

acquiring an original alarm log in a preset time period, wherein the original alarm log is behavior data for recording an attack on a certain network;

performing first classification and aggregation on the original alarm logs according to the core attributes of the original alarm logs to obtain a plurality of initial alarm clusters;

performing secondary classification and aggregation on the plurality of initial alarm clusters according to the incidence relation in an attack graph to obtain a plurality of final alarm clusters, wherein the attack graph is a connected graph which is formed by the plurality of initial alarm clusters and shows the network attack process in the preset time period;

and respectively merging the alarm logs contained in the final alarm clusters to obtain new alarm logs so as to replace the original alarm logs with the new alarm logs.

Optionally, before performing the second classification and aggregation on the multiple initial alarm clusters according to the association relationship in the attack graph, the method further includes:

performing association analysis on the plurality of initial alarm clusters according to a preset association rule, wherein the preset association rule is set according to alarm categories and causal relationships among alarms;

and generating the attack graph by taking the initial alarm cluster with the incidence relation as a vertex and the causal relation as an edge, wherein the attack graph is a directed acyclic connected graph.

Optionally, the performing first classification and aggregation on the original alarm logs according to core attributes of the original alarm logs to obtain a plurality of initial alarm clusters includes:

determining a set of all original alarm logs as a root node of a tree structure;

splitting the original alarm log in sequence by adopting a top-down strategy according to three restriction conditions of an alarm name, a source Internet Protocol (IP) address and a destination IP address of the original alarm log;

and determining each leaf node in the split tree structure as an initial alarm cluster.

Optionally, performing second classification and aggregation on the multiple initial alarm clusters according to the association relationship in the attack graph to obtain multiple final alarm clusters, including:

selecting an original alarm cluster only containing one original alarm log, and determining the original alarm cluster as a single alarm cluster;

judging whether single alarm clusters in the same attack graph exist in all single alarm clusters obtained by splitting the same alarm name;

if yes, aggregating the single alarm clusters in the same attack graph;

and determining the aggregated single alarm cluster and the original alarm cluster containing a plurality of original alarm logs as a final alarm cluster.

Optionally, the method further includes:

when splitting is carried out according to a source IP address, if only one original alarm log corresponding to the same split source IP address exists, all single original alarm logs under the same split alarm name are fused into a new node; and the number of the first and second electrodes,

splitting by using a new node according to the destination IP address;

when splitting is carried out according to a source IP address, if the split original alarm log corresponding to the same source IP address contains a plurality of original alarm logs, directly splitting a set of the plurality of original alarm logs as a node according to a destination IP address.

Optionally, performing association analysis on the plurality of initial alarm clusters according to a preset association rule, including:

inputting the initial alarm clusters into a business rule engine, so that the initial alarm clusters are matched with a preset association rule, wherein the business rule engine defines the preset association rule;

if the matching is successful, establishing attack segments, wherein each attack segment comprises at least two initial alarm clusters with causal relation;

the generating directed acyclic connectivity graph comprises:

forming an attack scene set by all attack fragments;

acquiring an initial alarm cluster in the attack scene set as a vertex of the attack graph;

and traversing the attack scene set, and sequentially adding directed edges to the vertexes to generate a directed acyclic connected graph.

In a second aspect, the present invention provides an apparatus for data processing, the apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original alarm log in a preset time period, and the original alarm log is behavior data for recording an attack on a certain network;

the first aggregation unit is used for carrying out first classification and aggregation on the original alarm logs according to the core attributes of the original alarm logs to obtain a plurality of initial alarm clusters;

the second aggregation unit is used for performing second classification aggregation on the plurality of initial alarm clusters according to the incidence relation in the attack graph to obtain a plurality of final alarm clusters, and the attack graph is a connected graph which is formed by the plurality of initial alarm clusters and shows the network attack process in the preset time period;

and the merging unit is used for respectively merging the alarm logs contained in the final alarm clusters to obtain new alarm logs so as to replace the original alarm logs with the new alarm logs.

Optionally, the apparatus further comprises:

the association analysis unit is used for performing association analysis on the plurality of initial alarm clusters according to a preset association rule before performing secondary classification and aggregation on the plurality of initial alarm clusters according to the association relation in the attack graph, wherein the preset association rule is set according to the alarm category and the causal relation between alarms;

and the generating unit is used for generating the attack graph by taking the initial alarm clusters with the incidence relations as vertexes and the causal relations as edges, wherein the attack graph is a directed acyclic connected graph.

Optionally, the first aggregation unit includes:

a first determining module for determining a set of all original alarm logs as a root node of a tree structure;

the splitting module is used for sequentially splitting the original alarm log according to three restriction conditions of the alarm name, the source Internet Protocol (IP) address and the destination IP address of the original alarm log by adopting a top-down strategy;

and the second determining module is used for determining each leaf node in the split tree structure as an initial alarm cluster.

Optionally, the second polymerization unit comprises:

the system comprises a selecting module, a judging module and a judging module, wherein the selecting module is used for selecting an original alarm cluster only containing one original alarm log and determining the original alarm cluster as a single alarm cluster;

the judging module is used for judging whether single alarm clusters in the same attack graph exist in all the single alarm clusters obtained by splitting the same alarm name;

the aggregation module is used for aggregating the single alarm clusters in the same attack graph if the single alarm clusters exist;

and a third determining module, configured to determine the aggregated single alarm cluster and an original alarm cluster including a plurality of original alarm logs as a final alarm cluster.

Optionally, the apparatus further comprises:

the fusion unit is used for fusing all single original alarm logs under the same alarm name after splitting into a new node if only one original alarm log corresponding to the same source IP address after splitting is available when splitting is performed according to the source IP address;

the splitting module is also used for splitting the new node according to the destination IP address;

the splitting module is further configured to, when splitting is performed according to the source IP address, if an original alarm log corresponding to the same source IP address after splitting includes multiple original alarm logs, directly split a set of the multiple original alarm logs as a node according to the destination IP address.

Optionally, the association analysis unit includes:

the input module is used for inputting the initial alarm clusters into a business rule engine so as to match the initial alarm clusters with a preset association rule, and the business rule engine defines the preset association rule;

the establishment module is used for establishing attack fragments if the matching is successful, wherein each attack fragment comprises at least two initial alarm clusters with causal relationship;

the generation unit includes:

the composition module is used for composing all attack fragments into an attack scene set;

the acquisition module is used for acquiring the initial alarm cluster in the attack scene set as the vertex of the attack graph;

and the generating module is used for traversing the attack scene set and sequentially adding directed edges to the vertexes to generate a directed acyclic connected graph.

By means of the technical scheme, the data processing method and the data processing device provided by the invention have the advantages that the original alarm logs are classified and aggregated twice, then the original alarm logs are divided into a plurality of final alarm clusters, and finally the original alarm logs in the final alarm clusters are combined, so that after the original alarm logs are classified and combined, repeated alarm logs in the original alarm logs or similar alarm logs which can be aggregated can be combined according to different classifications, and the occupied space is greatly reduced.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart illustrating a method of data processing provided by an embodiment of the present invention;

FIG. 2 is a flow chart illustrating another method of data processing provided by an embodiment of the present invention;

FIG. 3 is a diagram illustrating a first sort aggregation of raw alarm logs provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating several attack graphs provided by an embodiment of the present invention;

fig. 5 is a block diagram illustrating a corresponding logical structure of a data processing method according to an embodiment of the present invention;

FIG. 6 is a block diagram illustrating an apparatus for data processing according to an embodiment of the present invention;

fig. 7 is a block diagram illustrating another data processing apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

An embodiment of the present invention provides a data processing method, as shown in fig. 1, the method includes:

101. and acquiring an original alarm log in a preset time period.

In this embodiment, the original alarm log is behavior data for a record generated in a network defense process by a firewall to attack a certain network, but the method in this embodiment is not limited to be applied to the field of firewalls, and can also be applied to any other network security protection system capable of generating alarm logs and needing to perform security analysis on the network according to the alarm logs.

The original alarm log is usually recorded in a log database, and due to the limited space of the log database, only the original alarm log is usually kept for a certain period of time, so the preset period in this step is usually less than or equal to the time period in which the firewall can keep the original alarm log. For example, the preset period may be selected on a certain day or on certain days, assuming that the firewall may keep the original alarm log for the last week.

The content recorded in each original alarm log mainly comprises: alarm name, source Internet Protocol (IP) address, destination IP address, source port, destination port, alarm time. The source IP address and the source port are source information of an attacker, and the destination IP address and the destination port are both destination information of the attacker, namely information of an attacked party.

102. And carrying out first classification and aggregation on the original alarm logs according to the core attributes of the original alarm logs to obtain a plurality of initial alarm clusters.

The contents recorded in the original alarm logs are all attributes of the original alarm logs, and the core attribute is an attribute capable of highlighting a distinction between the original alarm logs. The core attributes set in this embodiment are an alarm name, a source IP address, and a destination IP address. The method comprises the steps of carrying out first classification and aggregation on original alarm logs, roughly classifying the original alarm logs according to core attributes, and recording the original alarm logs of each class as an initial alarm cluster in a finally obtained classification result, wherein the number of the original alarm logs contained in the initial alarm cluster can be multiple or one. It should be noted that, when performing classification and aggregation according to the core attributes, the classification and aggregation are performed layer by layer according to each core attribute, and not simultaneously performed according to three core attributes, and if performing classification and aggregation according to three core attributes simultaneously, it may happen that the same original alarm log is simultaneously classified into multiple original alarm clusters. For example, assume that the same original alarm log a may have the same alarm name as some other original log, but different core attributes, and the same source IP address as some other original alarm log, but different core attributes, such that the original alarm log a is classified into two original alarm clusters at the same time. To avoid such a situation, the present embodiment adopts a layer-by-layer classification aggregation manner, so that it can be ensured that each original alarm log only exists in one original alarm cluster.

103. And carrying out secondary classification and aggregation on the plurality of initial alarm clusters according to the incidence relation in the attack graph to obtain a plurality of final alarm clusters.

If the original alarm logs in each original alarm cluster after the first classification and aggregation are merged, the space of a log database occupied by the original alarm logs can be greatly reduced. In order to further reduce occupied space of the log database, the present embodiment performs a second classification and aggregation on the original alarm clusters, where the second classification and aggregation mainly performs a second classification and merging on alarm clusters that only include one original alarm log in the original alarm clusters. Specifically, classification and aggregation are performed according to incidence relations among initial alarm clusters in an attack graph, and the incidence relations are aggregated, wherein the attack graph is a connected graph which is composed of a plurality of initial alarm clusters and shows a network attack process in a preset time period. The association relationship between the original alarm clusters in the attack graph is correspondingly determined according to the time sequence of the attack steps or the causal relationship of the attack steps when the attack occurs, because each attack step generally corresponds to the occurrence of one alarm, and the time sequence and the causal relationship of the attack steps are the corresponding time sequence of the occurrence of the alarm or the causal relationship between the alarms. The chronological or causal relationship of the attack steps is usually determined largely by experience of the person skilled in the art. For example, vulnerability scanning may occur after scanning an IP address, authority lifting may occur after buffer overflow, dos attack may occur after uploading a trojan, and the like.

104. And respectively merging the alarm logs contained in the final alarm clusters to obtain new alarm logs.

After the first classification aggregation and the second classification aggregation, some final alarm clusters obtained finally may only include one original alarm log, but most of the final alarm clusters include multiple original alarm logs, the final alarm clusters including only one original alarm log are not processed, and the final alarm clusters including multiple original alarm logs are combined as needed. The concrete combination measures are as follows: before merging, the attributes contained in the final alarm cluster are defined first, and the specific main contained attributes comprise an alarm name, a source IP address set, a destination IP address set, a source port set, a destination port set, and the earliest and latest timestamps of the contained alarms. When merging is carried out, except for the earliest and latest timestamps of the contained alarms, the same attribute value in the original alarm log is the same, the true value of the attribute is taken as the value of the attribute, different values in the same attribute are recorded in a set form for different values of the same attribute, and a preset fixed value can be set for the attribute at the same time, for example, when the destination IP addresses of the original alarm log in the final alarm cluster are different, the corresponding destination IP address attribute value can be set to 0.0.0.0, and the true destination IP address is taken as the value of the destination IP address attribute in a set form; when the source ports of the original alarm logs in the final alarm cluster are different, the corresponding source port attribute value may be set to-1, while the true source port value is in the form of a set as the value of the source port set attribute. It should be noted that the preset fixed value may be set to any value except for the possible value of the attribute itself, so as to ensure that the value does not conflict with the actual attribute value. In addition, the earliest and latest timestamps of the included alarms refer to the corresponding earliest and latest times of occurrence of the alarms in all the original alarm logs in the final alarm cluster, and the earliest and latest timestamps of the included alarms refer to the corresponding time periods of the final alarm cluster. And after the merging is finished, replacing the original alarm log with the merged new alarm log.

According to the data processing method provided by the embodiment of the invention, the original alarm logs are classified and aggregated twice according to the core attributes and the attack graph respectively, then the original alarm logs are divided into a plurality of final alarm clusters, and finally the original alarm logs in the final alarm clusters are merged, so that after the original alarm logs are classified and merged, repeated alarm logs in the original alarm logs or similar alarm logs which can be aggregated can be merged according to different classifications, and the occupied space is greatly reduced.

For refinement and extension of the method shown in fig. 1, an embodiment of the present invention further provides a data processing method, as shown in fig. 2:

201. and acquiring an original alarm log in a preset time period.

Implementation of this step is the same as that in step 101 of fig. 1, and is not described herein again.

202. And splitting the original alarm log in sequence according to three restriction conditions of the alarm name, the source Internet Protocol (IP) address and the destination IP address of the original alarm log by adopting a top-down strategy.

In this embodiment, a tree structure is adopted, and the specific splitting steps are as follows:

(1) determining a set of all original alarm logs as a root node of a tree structure;

(2) splitting a root node according to an alarm name, namely classifying all original alarm logs corresponding to the root node according to the alarm name, forming a first-layer alarm cluster by each branch node of the split root node, wherein the alarm names of the original alarm logs in each first-layer alarm cluster are the same;

(3) and (3) taking each branch node generated in the step (2) as a new root node to continue splitting, namely, classifying the original alarm logs in each branch node in the step (2) according to the source IP addresses corresponding to the alarm logs respectively, forming a second-layer alarm cluster by each new branch node after splitting, wherein the alarm names and the source IP addresses in the original alarm logs in each second-layer alarm cluster are the same.

(4) When the number of the original alarm logs in the second-layer alarm cluster is larger than 1, splitting the new branch node in the step (3) corresponding to the second-layer alarm cluster directly, namely classifying the original alarm logs in the corresponding second-layer alarm cluster according to the destination IP address; and when the number of the original alarm logs in the second-layer alarm cluster is 1, fusing all the second-layer alarm clusters with the number of the original alarm logs of 1 under the same alarm name into a new branch node cluster and continuously splitting according to the destination IP address.

In order to more clearly illustrate the splitting process, a specific example is given, specifically as shown in fig. 3, a set R is a root node in step (1), and a, b, c, d, …, z included in the set R is original alarm logs acquired in all preset time periods. Splitting R according to the alarm name to obtain a plurality of branch nodes, where each different alarm name corresponds to one branch node, and the branch nodes corresponding to fig. 3 are alarm name a, alarm names B, …, and alarm name X. The original alarm logs contained in each alarm name constitute a first-level alarm cluster. And then continuing to split the branch nodes corresponding to different alarm names according to the source IP addresses. Since the process of each branch node continuing to split down is the same, the following splitting explanation is performed by taking the branch node corresponding to one alarm name a as an example. As shown in fig. 3, the branch node corresponding to the alarm name a is split to obtain 6 different source IP addresses, and the original alarm logs contained in each source IP address form a second-layer alarm cluster. The number of original alarm logs contained in the source IP3, the source IP4, the source IP5 and the source IP6 in the 6 second-layer alarm clusters is 1, so that the 4 second-layer alarm clusters are fused to obtain the source IP6, and then the splitting is continued according to the destination IP address, and the number of the original alarm logs contained in the source IP1 and the source IP1 in the second-layer alarm clusters is more than 1, so that the splitting can be continued directly according to the destination IP address. And splitting according to the destination IP address to finally obtain 6 leaf nodes which are respectively a destination IP1, a destination IP2, a destination IP3, a destination IP4, a destination IP5 and a destination IP6, and finishing the splitting.

203. And determining each leaf node in the split tree structure as an initial alarm cluster.

In the tree structure, leaf nodes are terminal nodes, corresponding to the example in fig. 3, the leaf nodes are destination IP1, destination IP2, destination IP3, destination IP4, destination IP5 and destination IP6, and the original alarm logs contained in each leaf node constitute an initial alarm cluster.

What is done from the original alarm logs to the initial alarm cluster is the first classification aggregation of the original alarm logs.

204. And performing association analysis on the plurality of initial alarm clusters according to a preset association rule.

In order to further reduce occupied space of the log database, the embodiment performs second classification and aggregation on the original alarm clusters, and performs classification and aggregation according to the incidence relation between the original alarm clusters during the second classification and aggregation. And the incidence relation between the initial alarm clusters is determined by an attack graph generated by the initial alarm clusters according to a preset incidence rule. This step and step 205 are for explanation of the specific generation of the attack graph.

Firstly, a plurality of initial alarm clusters are subjected to association analysis according to a preset association rule, namely the initial alarm clusters are input into a business rule engine, matching a plurality of initial alarm clusters with preset association rules, wherein the preset association rules are defined in the business rule engine and are set according to alarm types and causal relationships among alarms, the causal relationships among alarms are based on the occurrence of attacks, the causality of the attack steps is correspondingly determined, the alarm category is mainly determined by the alarm name, different alarm names may contain the same attack steps, therefore, in order to prevent the association of attack steps of different alarm names, the setting of preset association rules also needs to take into account the alarm category, in addition, the setting of the preset association rule also needs to consider the time of the occurrence of the alarm, and the time of the occurrence of the alarm is also a reference for the causal relationship between the alarms. As in step 103 of fig. 1, the chronological or causal relationship of the general attack step in this step is usually determined mainly by the experience of the related art. It should be noted that the business rule engine in this embodiment is JBoss Rules, and specifically, in an actual application, the business rule engine may be other open source rule engines, or business engines developed by security vendors.

And if the initial alarm clusters which are successfully matched exist after the initial alarm clusters are matched with the preset association rule, establishing an attack fragment. Because the causal relationship between alarms is set in the preset association rule specified in the business rule engine, initial alarm clusters that are successfully matched also have a causal relationship, the initial alarm clusters in the established attack segments also contain the causal relationship, and it needs to be explained that each attack segment contains at least two initial alarm clusters having the causal relationship. Each attack segment represents an attack scenario, and an attack scenario is assumed to be (x)₁,x₂....x_n) Where each element represents an attack step. x is the number of_iIs x_i+1Is a step x of attack_i+1Preparing; x is the number of_i+1Is x_iIs x_iThe result of the attack. x is the number of_iAnd x_i+1A sub-attack scenario may be constructed that may be used to describe the causal relationship between two attack steps.

205. And generating an attack graph by taking the initial alarm cluster with the incidence relation as a vertex and the causal relation as an edge.

Step 204, knowing that all initial alarm clusters in an attack segment are initial alarm clusters with a correlation, forming an attack scene set by all the attack segments matched in step 204, wherein the attack scene set is a set formed by the attack segments; then acquiring an initial alarm cluster in an attack scene set as a vertex of an attack graph; and traversing the attack scene set, and sequentially adding directed edges to the vertexes to generate a directed acyclic connected graph. Fig. 4 shows several attack graphs, in which the (1) th attack graph is composed of 3 initial alarm clusters, and in the (2) th attack graph is composed of 2 initial alarm clusters. In the attack graph, a node with out degree of 0 represents a final attack step of an attacker, and a node with in degree of 0 represents an initial attack step. The attack graph generated in the embodiment is not only used as a basis for subsequent second classification and aggregation, but also can visually display the attack flow of the attack event, so that the attack strategy of an attacker can be conveniently analyzed.

206. And selecting an original alarm cluster only containing one original alarm log, and determining the original alarm cluster as a single alarm cluster.

207. Single alarm clusters located in the same attack graph are aggregated.

In this embodiment, the single alarm clusters aggregated in the same attack graph must have the same alarm name, and therefore, the aggregating of the single alarm clusters located in the same attack graph is to determine whether the single alarm clusters located in the same attack graph exist in all the single alarm clusters obtained by splitting the same alarm name, and if the single alarm clusters located in the attack graph exist, the aggregating is performed. For the example in fig. 3, the finally obtained single alarm cluster obtained by splitting the same alarm name a is destination IP2, destination IP3, destination IP5 and destination IP6, where the single original alarm logs contained therein are c, d, g, h, respectively. It is assumed that an attack graph related to a single alarm log in an attack graph finally obtained by the example in fig. 3 is the (2) th attack graph in fig. 4, and two vertexes of the (2) th attack graph represent initial alarm clusters including the original alarm log g and the original alarm h in fig. 3 respectively. Because g and h are located in the same attack graph, the original alarm log g can be aggregated with the original alarm h, i.e. the initial alarm cluster corresponding to the destination IP5 and the destination IP6 is aggregated.

208. And determining the aggregated single alarm cluster and the original alarm cluster containing a plurality of original alarm logs as a final alarm cluster.

209. And respectively merging the alarm logs contained in the final alarm clusters to obtain new alarm logs.

Implementation of this step is the same as that in step 104 in fig. 1, and is not described here again.

For the method flow described in fig. 2, this embodiment presents a corresponding logical structure diagram, as shown in fig. 5, first obtaining the original alarm log in a preset time period from the log database, then carrying out first classification and aggregation on the original alarm logs, wherein the two classification and aggregation of the original alarm logs in the graph comprise the first classification and aggregation and the second classification and aggregation in the graph 2, obtaining a plurality of initial alarm clusters after the first classification and aggregation, then carrying out correlation analysis on the plurality of initial alarm clusters, namely, the attack fragments are obtained by inputting the attack fragments into a business rule engine and matching the attack fragments with preset association rules, wherein the attack fragments are composed of original alarm clusters, and finally forming an attack graph by using the attack fragments, taking the generated attack graph as a basis for secondary classification and aggregation to finally obtain a final alarm cluster, combining the final alarm cluster, and returning the final alarm cluster to the log database to replace the original alarm log.

Further, as an implementation of the foregoing embodiments, another embodiment of the embodiments of the present invention further provides a data processing apparatus, configured to implement the methods described in fig. 1 and fig. 2. As shown in fig. 6, the apparatus includes: an acquisition unit 31, a first aggregation unit 32, a second aggregation unit 33, and a merging unit 34.

An obtaining unit 31, configured to obtain an original alarm log in a preset time period, where the original alarm log is behavior data for recording an attack on a certain network;

The content recorded in each original alarm log mainly comprises: alarm name, source IP address, destination IP address, source port, destination port, alarm time. The source IP address and the source port are source information of an attacker, and the destination IP address and the destination port are both destination information of the attacker, namely information of an attacked party.

A first aggregation unit 32, configured to perform first classification and aggregation on the original alarm logs according to core attributes of the original alarm logs to obtain a plurality of initial alarm clusters;

The second aggregation unit 33 is configured to perform second classification and aggregation on the multiple initial alarm clusters according to an association relationship in an attack graph to obtain multiple final alarm clusters, where the attack graph is a connected graph formed by the multiple initial alarm clusters and showing a network attack process in the preset time period;

A merging unit 34, configured to merge the alarm logs included in the multiple final alarm clusters to obtain new alarm logs, so as to replace the original alarm logs with the new alarm logs.

As shown in fig. 7, the apparatus further includes:

the association analysis unit 35 is configured to perform association analysis on the plurality of initial alarm clusters according to a preset association rule before performing second classification and aggregation on the plurality of initial alarm clusters according to association relations in the attack graph, where the preset association rule is set according to alarm categories and causal relations between alarms;

performing association analysis on the plurality of initial alarm clusters according to a preset association rule, inputting the plurality of initial alarm clusters into a business rule engine, matching a plurality of initial alarm clusters with preset association rules, wherein the preset association rules are defined in the business rule engine and are set according to alarm types and causal relationships among alarms, the causal relationships among alarms are based on the occurrence of attacks, the causality of the attack steps is correspondingly determined, the alarm category is mainly determined by the alarm name, different alarm names may contain the same attack steps, therefore, in order to prevent the association of attack steps of different alarm names, the setting of preset association rules also needs to take into account the alarm category, in addition, the setting of the preset association rule also needs to consider the time of the occurrence of the alarm, and the time of the occurrence of the alarm is also a reference for the causal relationship between the alarms.

And if the initial alarm clusters which are successfully matched exist after the initial alarm clusters are matched with the preset association rule, establishing an attack fragment. Because the causal relationship between alarms is set in the preset association rule specified in the business rule engine, initial alarm clusters that are successfully matched also have a causal relationship, the initial alarm clusters in the established attack segments also contain the causal relationship, and it needs to be explained that each attack segment contains at least two initial alarm clusters having the causal relationship. Each attack segment represents an attack scenario, and an attack scenario is assumed to be (x)₁,x₂....x_n) Where each element represents an attack step. x is the number of_iIs x_i+1Is a step x of attack_i+1Preparing; x is the number of_i+1Is x_iAfterSuccessive attacks, which are x_iThe result of the attack. x is the number of_iAnd x_i+1A sub-attack scenario may be constructed that may be used to describe the causal relationship between two attack steps.

The generating unit 36 is configured to generate the attack graph, which is a directed acyclic connected graph, by using the initial alarm clusters having the association relationship as vertices and the causal relationship as edges.

Step correlation analysis unit 35 finds that all initial alarm clusters in an attack segment are initial alarm clusters with correlation, and all attack segments matched in correlation analysis unit 35 form an attack scene set, which is a set formed by attack segments; then acquiring an initial alarm cluster in an attack scene set as a vertex of an attack graph; and traversing the attack scene set, and sequentially adding directed edges to the vertexes to generate a directed acyclic connected graph.

As shown in fig. 7, the first aggregation unit 32 includes:

a first determining module 321, configured to determine a set of all original alarm logs as a root node of a tree structure;

the splitting module 322 is configured to sequentially split the original alarm log according to three restriction conditions, namely, an alarm name, a source internet protocol IP address, and a destination IP address of the original alarm log by using a top-down policy;

A second determining module 323, configured to determine each leaf node in the split tree structure as an initial alarm cluster.

As shown in fig. 7, the second polymerization unit 33 includes:

a selecting module 331, configured to select an original alarm cluster that only includes one original alarm log, and determine the original alarm cluster as a single alarm cluster;

a judging module 332, configured to judge whether a single alarm cluster in the same attack graph exists in all single alarm clusters obtained by splitting the same alarm name;

an aggregation module 333, configured to aggregate, if any, the single alarm clusters located in the same attack graph;

a third determining module 334, configured to determine the aggregated single alarm cluster and the original alarm cluster containing a plurality of original alarm logs as a final alarm cluster.

As shown in fig. 7, the apparatus further includes:

the merging unit 37 is configured to merge all single original alarm logs under the same alarm name into a new node if only one original alarm log corresponding to the same split source IP address exists during splitting according to the source IP address;

the splitting module 322 is further configured to split the new node according to the destination IP address;

the splitting module 322 is further configured to, when splitting is performed according to the source IP address, if the split original alarm log corresponding to the same source IP address includes multiple original alarm logs, directly split a set of the multiple original alarm logs as a node according to the destination IP address.

As shown in fig. 7, the association analysis unit 35 includes:

the input module 351 is configured to input the plurality of initial alarm clusters into a business rule engine, so that the plurality of initial alarm clusters are matched with a preset association rule, where the business rule engine defines the preset association rule;

the business rule engine in this embodiment is JBoss Rules, and specifically, in actual application, the business rule engine may be other open source rule engines, or business engines developed by security vendors.

An establishing module 352, configured to establish attack segments if matching is successful, where each attack segment includes at least two initial alarm clusters having a causal relationship;

the generating unit 36 includes:

a composing module 361, configured to compose all attack segments into an attack scene set;

an obtaining module 362, configured to obtain an initial alarm cluster in the attack scene set as a vertex of the attack graph;

and the generating module 363 is configured to traverse the attack scene set, and sequentially add directed edges to the vertices to generate a directed acyclic connected graph.

According to the data processing device provided by the embodiment of the invention, the original alarm logs are classified and aggregated twice according to the core attributes and the attack graph respectively, then the original alarm logs are divided into a plurality of final alarm clusters, and finally the original alarm logs in the final alarm clusters are merged, so that after the original alarm logs are classified and merged, repeated alarm logs in the original alarm logs or similar alarm logs which can be aggregated can be merged according to different classifications, and the occupied space is greatly reduced.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the title of the invention (e.g., a data processing device) according to an embodiment of the invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of data processing, comprising:

performing secondary classification and aggregation on the plurality of initial alarm clusters according to the incidence relation in an attack graph to obtain a plurality of final alarm clusters, wherein the attack graph is a connected graph which is formed by the plurality of initial alarm clusters and shows the network attack process in the preset time period, and the specific steps are as follows: selecting an original alarm cluster only containing one original alarm log, and determining the original alarm cluster as a single alarm cluster; judging whether single alarm clusters in the same attack graph exist in all single alarm clusters obtained by splitting the same alarm name; if yes, aggregating the single alarm clusters in the same attack graph; determining the aggregated single alarm cluster and an original alarm cluster containing a plurality of original alarm logs as a final alarm cluster;

2. The method of claim 1, wherein prior to second categorizing and aggregating the plurality of initial alarm clusters according to associations in an attack graph, the method further comprises:

3. The method of claim 1, wherein the first classifying and aggregating the original alarm logs according to core attributes of the original alarm logs to obtain a plurality of initial alarm clusters comprises:

4. The method of claim 3, further comprising:

splitting by using a new node according to the destination IP address;

5. The method of claim 2, wherein performing association analysis on the plurality of initial alarm clusters according to a preset association rule comprises:

the generating directed acyclic connectivity graph comprises:

forming an attack scene set by all attack fragments;

6. An apparatus for data processing, comprising:

a merging unit, configured to merge the alarm logs included in the multiple final alarm clusters to obtain new alarm logs, so as to replace the original alarm logs with the new alarm logs;

the second polymerization unit comprises:

7. The apparatus of claim 6, further comprising:

8. The apparatus of claim 6, wherein the first aggregation unit comprises:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 7, wherein the association analysis unit comprises:

the generation unit includes: