CN109286511A

CN109286511A - The method and device of data processing

Info

Publication number: CN109286511A
Application number: CN201710592825.0A
Authority: CN
Inventors: 刘芳宁; 李拓
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2019-01-29
Anticipated expiration: 2037-07-19
Also published as: CN109286511B

Abstract

The invention discloses a kind of method and devices of data processing, are related to firewall technology field, and when solving a large amount of attack outbursts, alert log loses the problem of influencing to the analysis of assault and to network security assessment.The method comprise the steps that obtaining the original alarms log in preset period of time, original alarms log is the behavioral data that record attacks a certain network；Original alarms log is subjected to the first subseries polymerization according to the core attribute of original alarms log, obtains multiple initial alarm clusters；Multiple initial alarm clusters are subjected to the second subseries polymerization according to the incidence relation in attack graph, obtain multiple final alarm clusters, attack graph is the connected graph of network attack process in the displaying preset period of time being made of multiple initial alarm clusters；The alert log for including in multiple final alarm clusters is merged respectively, obtains new alert log, new alert log is replaced into original alarms log.During the present invention is applied to alert log processing.

Description

The method and device of data processing

Technical field

The present invention relates to firewall technology field more particularly to a kind of method and devices of data processing.

Background technique

In Internet era, user is in order to protect the safety of internal network, it will usually internal network and external network it Between the use of firewall is that internal network establishing protective barrier protects the internal network from the attack of illegal user.When firewall is supervised When controlling rogue attacks, in addition to being on the defensive and attacking, alarm can be also carried out in time and informs user, and generate alert log storage In log database, in order to analyze and assess the safety of network to assault after attack.

Since attack each time can all alarm, even consecutive identical alarm, which can be also repeated, generates alarm day Will is recorded in log database, therefore a large amount of alert log is preserved in daily record data database.However storage alarm day The space of will is limited, and when a large amount of attack outburst, increasing number of alarming, sufficient space is not complete to save for firewall The alert log in portion causes the discarding of some critical alert logs.When attack is needed through alert log to network attack later When event analyze and assess the safety of network, due to not complete alert log, so being difficult to generation Assault carry out it is comprehensive and accurate analysis and network security is correctly assessed.

Summary of the invention

In view of the above problems, the present invention provides a kind of method and device of data processing, largely attacks outburst to solve When, alert log loses the problem of influencing to the analysis of assault and to network security assessment.

In order to solve the above technical problems, in a first aspect, the present invention provides a kind of method of data processing, the method packet It includes:

The original alarms log in preset period of time is obtained, the original alarms log is that record attacks a certain network Behavioral data；

The original alarms log is subjected to the first subseries polymerization according to the core attribute of original alarms log, is obtained more A initial alarm cluster；

Multiple initial alarm clusters are subjected to the second subseries polymerization according to the incidence relation in attack graph, are obtained multiple final Alarm cluster, the attack graph are the connections of network attack process in the displaying preset period of time being made of multiple initial alarm clusters Figure；

The alert log for including in the multiple final alarm cluster is merged respectively, obtains new alert log, with New alert log is replaced into the original alarms log.

Optionally, multiple initial alarm clusters are being polymerize it according to incidence relation the second subseries of progress in attack graph Before, the method also includes:

The multiple initial alarm cluster is associated analysis according to default correlation rule, the default correlation rule be according to According to the causality setting between alert categories and alarm；

By the initial alarm cluster with incidence relation, using initial alarm cluster as vertex, using causality as side, generate The attack graph, the attack graph are directed acyclic connected graph.

Optionally, described that the original alarms log is subjected to the first subseries according to the core attribute of original alarms log Polymerization, obtains multiple initial alarm clusters, comprising:

The set of all original alarms logs is determined as to the root node of tree；

Using top-down strategy, according to the alert name of original alarms log, source IP(Internet Protocol) IP address and mesh Three restriction conditions of IP address successively original alarms log is divided；

Leaf node each in tree after division is determined as an initial alarm cluster.

Optionally, multiple initial alarm clusters are subjected to the second subseries polymerization according to the incidence relation in attack graph, obtained Multiple final alarm clusters, comprising:

Selection only includes the original alarms cluster of an original alarms log, is determined as single alarm cluster；

Judge in all single alarm clusters divided by same alert name with the presence or absence of in same attack graph Single alarm cluster；

If it exists, then will be located at the single alarm cluster in same attack graph to polymerize；

Single alarm cluster after polymerization and the original alarms cluster comprising multiple original alarms logs are determined as final police Report cluster.

Optionally, the method also includes:

When being divided according to source IP address, if the corresponding original alarms log of same source IP address after division only has One, then be a new node by all single original alarms Log fusings under the same alert name after division；Also,

It is divided with new node according to purpose IP address；

When being divided according to source IP address, if the corresponding original alarms log of same source IP address after division includes A plurality of original alarms log, then the set by a plurality of original alarms log is divided directly as node according to purpose IP address It splits.

Optionally, the multiple initial alarm cluster is associated analysis according to default correlation rule, comprising:

The multiple initial alarm cluster is input in Business Rule Engine, the multiple initial alarm cluster and default pass are made Connection rule is matched, and default correlation rule is defined in the Business Rule Engine；

If successful match, attack segment is established, is contained at least two in each attack segment with causal initial Alarm cluster；

The generation directed acyclic connected graph, comprising:

All attack segments are formed into Attack Scenarios collection；

Obtain vertex of the initial alarm cluster of the Attack Scenarios concentration as the attack graph；

The Attack Scenarios collection is traversed, generates directed acyclic connected graph after being followed successively by the vertex addition directed edge.

Second aspect, the present invention provides a kind of device of data processing, described device includes:

Acquiring unit, for obtaining the original alarms log in preset period of time, the original alarms log is record to certain The behavioral data that one network is attacked；

First polymerized unit, for the original alarms log to be carried out first according to the core attribute of original alarms log Subseries polymerization, obtains multiple initial alarm clusters；

Second polymerized unit, for multiple initial alarm clusters to be carried out the second subseries according to the incidence relation in attack graph Polymerization, obtains multiple final alarm clusters, the attack graph is in the displaying preset period of time being made of multiple initial alarm clusters The connected graph of network attack process；

Combining unit obtains new for respectively merging the alert log for including in the multiple final alarm cluster Alert log, new alert log is replaced into the original alarms log.

Optionally, described device further include:

Association analysis unit, for multiple initial alarm clusters to be carried out second point according to the incidence relation in attack graph Before Type of Collective, the multiple initial alarm cluster is associated analysis, the default correlation rule according to default correlation rule It is to be arranged according to the causality between alert categories and alarm；

Generation unit, for that will have the initial alarm cluster of incidence relation to close using initial alarm cluster as vertex with cause and effect System is used as side, generates the attack graph, and the attack graph is directed acyclic connected graph.

Optionally, first polymerized unit, comprising:

First determining module is determined as the root node of tree for the set by all original alarms logs；

Module is divided, for using top-down strategy, according to the alert name of original alarms log, source IP(Internet Protocol) Three restriction conditions of IP address and purpose IP address successively divide original alarms log；

Second determining module, for each leaf node in the tree after division to be determined as an initial alarm Cluster.

Optionally, the second polymerized unit includes:

Choose module, for choose only include an original alarms log original alarms cluster, be determined as single alarm Cluster；

Judgment module is located at for judging to whether there is in all single alarm clusters divided by same alert name Single alarm cluster in same attack graph；

Aggregation module polymerize for if it exists, then will be located at the single alarm cluster in same attack graph；

Third determining module, for the single alarm cluster after polymerizeing and the original police comprising multiple original alarms logs Report cluster is determined as final alarm cluster.

Optionally, described device further include:

Integrated unit, for when being divided according to source IP address, if the corresponding original of same source IP address after division Beginning alert log only has one, then is one by all single original alarms Log fusings under the same alert name after division New node；

The division module is also used to be divided with new node according to purpose IP address；

The division module, is also used to when being divided according to source IP address, if the same source IP address pair after division The original alarms log answered includes a plurality of original alarms log, then the set by a plurality of original alarms log is pressed directly as node It is divided according to purpose IP address.

Optionally, the association analysis unit includes:

Input module makes the multiple initial for the multiple initial alarm cluster to be input in Business Rule Engine Alarm cluster is matched with default correlation rule, and default correlation rule is defined in the Business Rule Engine；

Establish module, if being used for successful match, establish attack segment, contain at least two in each attack segment with because The initial alarm cluster of fruit relationship；

The generation unit, comprising:

Comprising modules, for all attack segments to be formed Attack Scenarios collection；

Module is obtained, for obtaining vertex of the initial alarm cluster of the Attack Scenarios concentration as the attack graph；

Generation module generates oriented nothing after being followed successively by the vertex addition directed edge for traversing the Attack Scenarios collection Ring connected graph.

By above-mentioned technical proposal, the method and device of data processing provided by the invention carries out original alarms log Double classification polymerization, is then divided into multiple final alarm clusters for original alarms log, finally will be original in final alarm cluster Alert log merges, can be by the repetition in original alarms log after original alarms log is carried out classification merging in this way Alert log or belong to it is similar can polymerize alert log progress merged according to different classification, greatly reduce Therefore the space occupied can preferably receive alert log when a large amount of attack outbursts, avoid alert log as far as possible Abandon, so as to guarantee after attack to the assault of generation carry out comprehensive and accurate analysis and to network security into Row correctly assessment.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 shows a kind of flow chart of the method for data processing provided in an embodiment of the present invention；

Fig. 2 shows the flow charts of the method for another data processing provided in an embodiment of the present invention；

Fig. 3 shows the one provided in an embodiment of the present invention signal that the first subseries polymerization is carried out to original alarms log Figure；

Fig. 4 shows the schematic diagram of several attack graphs provided in an embodiment of the present invention；

Fig. 5 shows a kind of corresponding logical construction block diagram of method of data processing provided in an embodiment of the present invention；

Fig. 6 shows a kind of composition block diagram of the device of data processing provided in an embodiment of the present invention；

Fig. 7 shows the composition block diagram of the device of another data processing provided in an embodiment of the present invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

The embodiment of the invention provides a kind of methods of data processing, as shown in Figure 1, this method comprises:

101, the original alarms log in preset period of time is obtained.

Original alert log is to the record generated during the defence of network for firewall to a certain in the present embodiment The behavioral data that network is attacked, but the method in the present embodiment is not limited only to apply in firewall field, can also answer Alarm log can be generated used in others, and any network for carrying out safety analysis to network according to alert log is needed to pacify In full protection system.

Original alarms log is typically recorded in log database, due to the finiteness in log database space, usually only Retain the original alarms log in a period of time, therefore the preset period of time in this step will usually be less than and can protect equal to firewall Stay the duration of raw alarm log.For example, it is assumed that firewall can save original alarms log in nearest one week, then when default Section can choose wherein some day or a few days.

The content recorded in every original alarms log specifically includes that alert name, source IP(Internet Protocol) (Internet Protocol, IP) address, purpose IP address, source port, destination port, time of fire alarming.Wherein, source IP address is with source port The source information of attacker, purpose IP address and destination port are all the purpose information of attacker i.e. by the information of attacker.

102, original alarms log is subjected to the first subseries polymerization according to the core attribute of original alarms log, obtained more A initial alarm cluster.

The content recorded in above-mentioned original alarms log is all the attribute of original alarms log, and core attribute is can to dash forward The attribute distinguished between original alarms log is indicated out.The core attribute being arranged in the present embodiment is alert name, source IP address And purpose IP address.First subseries polymerization is carried out to original alarms log, is to original alarms log according to core attribute Broad classification is carried out, in finally obtained classification results, the original alarms log of every one kind is denoted as an initial alarm cluster, just It may also be one that the original alarms log for including in beginning alarm cluster, which may be a plurality of,.It should be noted that according to core category Property when carrying out classification polymerization, be to carry out layer-by-layer classification polymerization according to each core attribute, be not simultaneously according to three cores Attribute carries out classification polymerization, if carrying out classification polymerization according to three core attributes simultaneously, just will appear the same original alarms day Will is categorized into simultaneously in multiple original alarms clusters.Such as, it is assumed that same original alarms log A may with other certain it is original The alert name of log is identical, but other core attributes are different, while the source with other other certain original alarms logs IP address is identical, but other core attributes are different, and such original alarms log A will be categorized into two original polices simultaneously It reports in cluster.In order to avoid such situation, the present embodiment is ensured that each in this way by the way of layer-by-layer classification polymerization Original alarms log only exists in an original alarms cluster.

103, multiple initial alarm clusters are subjected to the second subseries polymerization according to the incidence relation in attack graph, obtained multiple Final alarm cluster.

If the original alarms log in each original alarms cluster after the polymerization of the first subseries is merged, may be used To greatly reduce the space for the log database that original alarms log occupies.In order to further reducing the daily record data occupied The space in library, the present embodiment have carried out the polymerization of the second subseries to original alarms cluster, and the second subseries polymerize primarily directed to original The classification that alarm cluster in beginning alarm cluster only comprising an original alarms log carries out again merges.Particularly according to attack graph In initial alarm cluster between incidence relation carry out classification polymerization, will have incidence relation to polymerize, attack therein Figure is the connected graph of network attack process in the displaying preset period of time being made of multiple initial alarm clusters.Original alarms in attack graph When incidence relation between cluster is according to attacking, the time order and function of attack step or the causality of attack step are corresponding Determining, because each attack step would generally correspond to the generation of an alarm, the time order and function and cause and effect of attack step are closed System is exactly causality between the corresponding time order and function or alarm that alarm occurs.The time order and function of usual attack step or Causality is usually predominantly to determine according to the experience of existing related technical personnel.For example, to that can be sent out after IP address scanning Dos attack etc. can occur after privilege-escalation, upload wooden horse can occur after raw vulnerability scanning, Buffer Overflow.

104, the alert log for including in multiple final alarm clusters is merged respectively, obtains new alert log.

After the polymerization of the first subseries and the polymerization of the second subseries, in finally obtained multiple final alarm clusters, have Possibility include only also an original alarms log, but can include a plurality of original alarms day in most final alarm cluster Will, for the final alarm cluster only comprising an original alarms log without processing, for including a plurality of original alarms log Need to be merged.It is specific to merge measure are as follows: to first have to define the category for including in final alarm cluster before the combining Property, the attribute for specifically mainly including include alert name, source IP address, source IP address set, purpose IP address, destination IP Location set, source port, source port set, destination port, destination port set, the earliest and latest time of included alarm stamp. When merging, in addition to the earliest and latest time of included alarm stamp, same attribute value is identical in original alarms log , then it is different for same attribute value using the true value of the attribute as the value of the attribute, then by different value in same attribute It is recorded in the form of set, preset definite value can also be set for it simultaneously, for example, when original in final alarm cluster When the purpose IP address difference of alert log, 0.0.0.0 can be set by corresponding purpose IP address attribute value, while will be true Positive purpose IP address is with the value of the form gathered IP address attribute as a purpose；When original alert log in final alarm cluster When source port difference, -1, while the shape by real source port value to gather can be set by corresponding source port attribute value Value of the formula as source port aggregate attribute.It should be noted that above-mentioned preset definite value can be set in addition to attribute itself can Arbitrary value except the value of energy, is to guarantee not conflict with true attribute value in this way.In addition, included alarm earliest and Latest time stamp refer in final alarm cluster in all original alarms logs it is corresponding occur alarm earliest and alarm the latest when Between, the earliest and latest time of included alarm stamp indicates final alarm cluster corresponding period.After the completion of merging, it will merge New alert log afterwards replaces original alarms log.

The method of data processing provided in an embodiment of the present invention according to core attribute and attacks original alarms log respectively It hits figure and has carried out double classification polymerization, original alarms log has then been divided into multiple final alarm clusters, finally by final alarm Original alarms log in cluster merges, can be by original alarms day after original alarms log is carried out classification merging in this way Duplicate alert log in will belongs to the similar alert log progress that can polymerize and merges according to different classification, It greatly reduced the space occupied, therefore, can preferably receive alert log when a large amount of attack outbursts, avoid as far as possible The discarding of alert log, so as to guarantee to carry out comprehensive and accurate analysis and right to the assault of generation after attack Network security is correctly assessed.

Refinement and extension to method shown in Fig. 1, the embodiment of the invention also provides a kind of methods of data processing, such as scheme Shown in 2:

201, the original alarms log in preset period of time is obtained.

Implementation in implementation Fig. 1 step 101 of this step is identical, and details are not described herein again.

202, using top-down strategy, according to the alert name of original alarms log, source IP(Internet Protocol) IP address with And three restriction conditions of purpose IP address successively divide original alarms log.

Using tree in the present embodiment, specific step toward division is as follows:

(1) set of all original alarms logs is determined as to the root node of tree；

(2) root node is divided first, in accordance with alert name, i.e., it is corresponding to root node all according to alert name Original alarms log is classified, and each branch node of root node forms first layer alarm cluster after division, and each first layer is alert Report the alert name of the original alarms log in cluster all identical；

(3) each branch node generated using in step (2) continues to divide as new root node, i.e., according to alarm The corresponding source IP address of log respectively classifies to the original alarms log in each branch node in step (2), after division Each new branch node forms second layer alarm cluster, in the original alarms log in each second layer alarm cluster alert name with And source IP address is all identical.

(4) when alert log quantity original in second layer alarm cluster is greater than 1, directly to the corresponding step of second layer alarm cluster Suddenly the new branch node in (3) is divided, i.e., according to purpose IP address to the original police in corresponding second layer alarm cluster Report log is classified；And when the quantity of original alert log in second layer alarm cluster is 1, by the institute under same alert name Having original alarms log quantity is that 1 second layer alarm cluster permeate a new branch node cluster continuation according to destination IP Address is divided.

In order to carry out clearer explanation to the process of above-mentioned division, provide specific example, it is specific as shown in figure 3, Set R is root node described in step (1) in Fig. 3, and a, b, c, d ... for including in set R, z are to obtain in all preset period of time The original alarms log got.R is divided according to alert name, multiple branch nodes are obtained after division, it is each different Alert name corresponds to a branch node, the corresponding branch node in Fig. 3 is alert name A, alert name B ..., alert name X.The original alarms log for including in each alert name forms a first layer alarm cluster.It then proceedes to different alert names Corresponding branch node is divided according to source IP address.Since the process that each branch node continues to divide downwards is identical , therefore subsequent division explanation is carried out by taking the corresponding branch node of one of alert name A as an example.As shown in figure 3, to police Registration obtains 6 different source IP address, the original alarms for including in each source IP address after claiming the corresponding branch node division of A Log forms a second layer alarm cluster.Include in source IP 3, source IP 4, source IP 5 and source IP 6 in 6 second layer alarm clusters The quantity of original alarms log is 1, therefore will obtain source IP 6 after this 4 second layer alarm cluster fusions, is further continued for according to destination IP Address carries out continuing to divide, and source IP 1 in second layer alarm cluster, the quantity for the original alarms log for including in source IP 1 are greater than 1, Then directly it can carry out continuing to divide according to purpose IP address.After dividing according to purpose IP address, 6 leaf sections are finally obtained Point, respectively destination IP 1, destination IP 2, destination IP 3, destination IP 4, destination IP 5 and destination IP 6 are completed to this division.

203, leaf node each in the tree after division is determined as an initial alarm cluster.

In tree, leaf node is terminal node, corresponding to the example in Fig. 3, for the purpose of leaf node is IP1, destination IP 2, destination IP 3, destination IP 4, destination IP 5 and destination IP 6, the original alarms for including in each leaf node Log forms an initial alarm cluster.

What is completed by original alarms log to initial alarm cluster is to the polymerization of the first subseries of original alarms log.

204, multiple initial alarm clusters are associated analysis according to default correlation rule.

In order to which to the space for further reducing the log database occupied, the present embodiment has carried out to original alarms cluster Secondary classification polymerization carries out classification polymerization according to the incidence relation between initial alarm cluster when the second subseries polymerize.And it is first Incidence relation between beginning alarm cluster is to be determined by itself according to the attack graph that correlation rule generates is preset.This step and Step 205 is to the specific explanation for generating attack graph.

It is by multiple initial alarm clusters firstly, multiple initial alarm clusters are associated analysis according to default correlation rule It is input in Business Rule Engine, matches multiple initial alarm clusters with default correlation rule, it is fixed in Business Rule Engine The default correlation rule of justice, default correlation rule are arranged according to the causality between alert categories and alarm, alarm Between causality when being according to attacking, the causality of attack step is corresponding determining, alert categories mainly by What alert name determined, may include identical attack step in different alert names, therefore in order to prevent by different alarms The attack step of title is associated, therefore the setting of default correlation rule is also required to consider alert categories, in addition presets association It is also required to consider that the time that alarm occurs, the time that alarm occurs are also for the causality between alarm when the setting of rule A kind of reference.As Fig. 1 step 103, the time order and function or causality of the usual attack step in this step are usually main What if the experience according to existing related technical personnel determined.It is further to note that the business rule in the present embodiment Engine is JBoss Rules, and specifically Business Rule Engine can be other open source regulation engines in actual application, can also be with It is the commercial engine etc. of each security firm oneself exploitation.

If multiple initial alarm clusters have the initial alarm cluster of successful match after being matched with default correlation rule, establish Attack segment.The causality being provided between alarm in correlation rule, institute are preset as defined in Business Rule Engine With between the initial alarm cluster of successful match be also have it is causal, the initial alarm cluster in the attack segment of foundation is also Comprising causal, it should be noted that contain at least two in each attack segment with causal initial alarm Cluster.Each attack segment represents an Attack Scenarios, it is assumed that an Attack Scenarios are (x₁,x₂....x_n) wherein each element generation One attack step of table.x_iFor x_i+1Leading attack, it be attack step x_i+1It prepares；x_i+1For x_iFollow-on attack, it is x_iThe result of attack.x_iAnd x_i+1May be constructed a sub- Attack Scenarios, it can be used to describe between two attack steps because Fruit relationship.

205, by the initial alarm cluster with incidence relation, using initial alarm cluster as vertex, using causality as side, Generate attack graph.

By step 204, it is known that all initial alarm clusters attacked in segment at one are all relevant initial polices Cluster is reported, Attack Scenarios collection is formed by the attack segment being matched in step 204 by all, Attack Scenarios collection is to attack segment The set of composition；Then vertex of the initial alarm cluster of Attack Scenarios concentration as attack graph is obtained；Attack Scenarios collection is traversed, according to Directed acyclic connected graph is generated after the secondary directed edge for vertex addition.Fig. 4 is several attack graphs, is by 3 in (1) a attack graph Initial alarm cluster composition, it is made of in (2) a attack graph 2 initial alarm clusters.In attack graph, out-degree be 0 section The step of point represents the final attack of attacker, the initial attack step of node on behalf that in-degree is 0.It is generated in the present embodiment Attack graph not only acts as the foundation of subsequent second subseries polymerization, can also intuitively show the attack process of attack, It is convenient that the attack strategies of attacker are analyzed.

206, selection only includes the original alarms cluster of an original alarms log, is determined as single alarm cluster.

207, the single alarm cluster in same attack graph will be located to polymerize.

The single alarm cluster being aggregated in the present embodiment must have same alert name, therefore will be located at same attack The single alarm cluster polymerization hit in figure is to judge to whether there is in all single alarm clusters divided by same alert name Single alarm cluster in same attack graph, the single alarm cluster being located in attack graph if it exists, then polymerize.It is right Example in Fig. 3, the single alarm cluster that finally obtained same alert name A is divided are purpose IP2, destination IP 3, mesh IP5 and destination IP 6, single original alarms log wherein included be respectively c, d, g, h.It is assumed that the example in Fig. 3 is final Attack graph relevant to single alert log is (2) a attack graph in Fig. 4, (2) a attack graph in obtained attack graph Two vertex respectively represent in Fig. 3 comprising original alarms log g initial alarm cluster corresponding with original alarms h.Because g and h In the same attack graph, therefore original alarms log g can be polymerize with original alarms h, i.e., by destination IP 5 and purpose The corresponding initial alarm cluster polymerization of IP6.

208, the single alarm cluster after polymerization and the original alarms cluster comprising multiple original alarms logs are determined as most Whole alarm cluster.

209, the alert log for including in multiple final alarm clusters is merged respectively, obtains new alert log.

Implementation in implementation Fig. 1 step 104 of this step is identical, and details are not described herein again.

For method flow described in Fig. 2, this gives a corresponding building-block of logic, as shown in figure 5, from The original alarms log in preset period of time is obtained in log database first, the first subseries then is carried out to original alarms log It polymerize, includes the first subseries polymerization for including and second point in original alert log double classification polymerization in figure in Fig. 2 Type of Collective obtains multiple initial alarm clusters after carrying out the first subseries polymerization, is then associated point to multiple initial alarm clusters Analysis, i.e., by being input in Business Rule Engine, matched with default correlation rule, obtains attack segment, and attack segment is It is made of original alarms cluster, attack segment is then ultimately formed into attack graph again, using the attack graph of generation as second point The foundation of Type of Collective finally obtains final alarm cluster, and final alarm cluster is returned again to log database after merging will be original Alert log is replaced.

Further, as the realization to the various embodiments described above, another embodiment of the embodiment of the present invention additionally provides one The device of kind data processing, for realizing method described in above-mentioned Fig. 1 and Fig. 2.As shown in fig. 6, the device includes: to obtain list First 31, first polymerized unit 32, the second polymerized unit 33 and combining unit 34.

Acquiring unit 31, for obtaining the original alarms log in preset period of time, the original alarms log is record pair The behavioral data that a certain network is attacked；

The content recorded in every original alarms log specifically includes that alert name, source IP address, purpose IP address, source Port, destination port, time of fire alarming.Wherein, source IP address and source port are the source information of attacker, purpose IP address and purpose Port is all the purpose information of attacker i.e. by the information of attacker.

First polymerized unit 32, for the original alarms log to be carried out the according to the core attribute of original alarms log The polymerization of one subseries, obtains multiple initial alarm clusters；

Second polymerized unit 33, for multiple initial alarm clusters to be carried out second point according to the incidence relation in attack graph Type of Collective, obtains multiple final alarm clusters, and the attack graph is the displaying preset period of time being made of multiple initial alarm clusters The connected graph of interior network attack process；

Combining unit 34 is obtained for respectively merging the alert log for including in the multiple final alarm cluster New alert log is replaced the original alarms log by new alert log.

As shown in fig. 7, described device further include:

Association analysis unit 35, for carrying out multiple initial alarm clusters second according to the incidence relation in attack graph Before classification polymerization, the multiple initial alarm cluster is associated analysis, the default association rule according to default correlation rule It is then to be arranged according to the causality between alert categories and alarm；

Multiple initial alarm clusters are associated analysis according to default correlation rule, are to be input to multiple initial alarm clusters In Business Rule Engine, matches multiple initial alarm clusters with default correlation rule, defined in Business Rule Engine pre- If correlation rule, default correlation rule is arranged according to the causality between alert categories and alarm, between alarm When causality is according to attacking, the corresponding determination of the causality of attack step, alert categories are mainly by alarm name Claim decision, may include identical attack step in different alert names, therefore in order to prevent by different alert names Attack step is associated, therefore the setting of default correlation rule is also required to consider alert categories, in addition presets correlation rule It is also required to consider the time that alarm occurs when setting, the time that alarm occurs is also a seed ginseng for the causality between alarm It examines.

Generation unit 36, for that will have the initial alarm cluster of incidence relation, using initial alarm cluster as vertex, with cause and effect Relationship generates the attack graph as side, and the attack graph is directed acyclic connected graph.

By step association analysis unit 35, it is known that all initial alarm clusters attacked in segment at one are all relevant passes The initial alarm cluster of system, forms Attack Scenarios collection by the attack segment being matched in association analysis unit 35 for all, attacks Scene collection is the set attacking segment and constituting；Then top of the initial alarm cluster of Attack Scenarios concentration as attack graph is obtained Point；Attack Scenarios collection is traversed, generates directed acyclic connected graph after being followed successively by vertex addition directed edge.

As shown in fig. 7, first polymerized unit 32, comprising:

First determining module 321 is determined as the root node of tree for the set by all original alarms logs；

Module 322 is divided, for using top-down strategy, according between the alert name of original alarms log, source net Three restriction conditions of Protocol IP address and purpose IP address successively divide original alarms log；

Second determining module 323, for each leaf node in the tree after division to be determined as an initial police Report cluster.

As shown in fig. 7, the second polymerized unit 33 includes:

Choose module 331, for chooses only include an original alarms log original alarms cluster, be determined as single warn Report cluster；

Judgment module 332 whether there is in all single alarm clusters divided by same alert name for judging Single alarm cluster in same attack graph；

Aggregation module 333 polymerize for if it exists, then will be located at the single alarm cluster in same attack graph；

Third determining module 334, for the single alarm cluster after polymerizeing and the original comprising multiple original alarms logs Beginning alarm cluster is determined as final alarm cluster.

As shown in fig. 7, described device further include:

Integrated unit 37, for when being divided according to source IP address, if the same source IP address after division is corresponding Original alarms log only has one, then is one by all single original alarms Log fusings under the same alert name after division A new node；

The division module 322, is also used to be divided with new node according to purpose IP address；

The division module 322, is also used to when being divided according to source IP address, if the same source IP address after division Corresponding original alarms log includes a plurality of original alarms log, then by the set of a plurality of original alarms log directly as node It is divided according to purpose IP address.

As shown in fig. 7, the association analysis unit 35 includes:

Input module 351 makes the multiple first for the multiple initial alarm cluster to be input in Business Rule Engine Beginning alarm cluster is matched with default correlation rule, and default correlation rule is defined in the Business Rule Engine；

Business Rule Engine in the present embodiment is JBoss Rules, specifically the Business Rule Engine in actual application It can be other open source regulation engines, be also possible to the commercial engine etc. of each security firm oneself exploitation.

Module 352 is established, if being used for successful match, attack segment is established, contains at least two tool in each attack segment There is causal initial alarm cluster；

The generation unit 36, comprising:

Comprising modules 361, for all attack segments to be formed Attack Scenarios collection；

Module 362 is obtained, for obtaining vertex of the initial alarm cluster of the Attack Scenarios concentration as the attack graph；

Generation module 363, for traversing the Attack Scenarios collection, be followed successively by after the vertex addition directed edge generate it is oriented Acyclic connected graph.

The device of data processing provided in an embodiment of the present invention according to core attribute and attacks original alarms log respectively It hits figure and has carried out double classification polymerization, original alarms log has then been divided into multiple final alarm clusters, finally by final alarm Original alarms log in cluster merges, can be by original alarms day after original alarms log is carried out classification merging in this way Duplicate alert log in will belongs to the similar alert log progress that can polymerize and merges according to different classification, It greatly reduced the space occupied, therefore, can preferably receive alert log when a large amount of attack outbursts, avoid as far as possible The discarding of alert log, so as to guarantee to carry out comprehensive and accurate analysis and right to the assault of generation after attack Network security is correctly assessed.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment " first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize denomination of invention according to an embodiment of the present invention (such as data processing Device) in some or all components some or all functions.The present invention is also implemented as executing institute here Some or all device or device programs of the method for description are (for example, computer program and computer program produce Product).It is such to realize that program of the invention can store on a computer-readable medium, or can have one or more The form of signal.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or to appoint What other forms provides.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims

1. a kind of method of data processing characterized by comprising

The original alarms log in preset period of time is obtained, the original alarms log is the row that record attacks a certain network For data；

The original alarms log is subjected to the first subseries polymerization according to the core attribute of original alarms log, is obtained multiple first Beginning alarm cluster；

Multiple initial alarm clusters are subjected to the second subseries polymerization according to the incidence relation in attack graph, obtain multiple final alarms Cluster, the attack graph are the connected graphs of network attack process in the displaying preset period of time being made of multiple initial alarm clusters；

The alert log for including in the multiple final alarm cluster is merged respectively, obtains new alert log, it will be new Alert log replace the original alarms log.

2. the method according to claim 1, wherein by multiple initial alarm clusters according to the association in attack graph Before relationship carries out the second subseries polymerization, the method also includes:

The multiple initial alarm cluster is associated analysis according to default correlation rule, the default correlation rule is according to alert Report the causality setting between classification and alarm；

By the initial alarm cluster with incidence relation, using initial alarm cluster as vertex, using causality as side, described in generation Attack graph, the attack graph are directed acyclic connected graph.

3. the method according to claim 1, wherein it is described by the original alarms log according to original alarms day The core attribute of will carries out the first subseries polymerization, obtains multiple initial alarm clusters, comprising:

Using top-down strategy, according to the alert name of original alarms log, source IP(Internet Protocol) IP address and destination IP The restriction condition of address three successively divides original alarms log；

4. according to the method described in claim 3, it is characterized in that, multiple initial alarm clusters are closed according to the association in attack graph System carries out the second subseries polymerization, obtains multiple final alarm clusters, comprising:

Judge in all single alarm clusters divided by same alert name with the presence or absence of the list being located in same attack graph Alarm cluster；

Single alarm cluster after polymerization and the original alarms cluster comprising multiple original alarms logs are determined as final alarm cluster.

5. according to the method described in claim 3, it is characterized in that, the method also includes:

When being divided according to source IP address, if the corresponding original alarms log of same source IP address after division only has one All single original alarms Log fusings under same alert name after division are then a new node by item；Also,

It is divided with new node according to purpose IP address；

When being divided according to source IP address, if the corresponding original alarms log of same source IP address after division includes a plurality of Original alarms log, then the set by a plurality of original alarms log is divided directly as node according to purpose IP address.

6. according to the method described in claim 2, it is characterized in that, by the multiple initial alarm cluster according to default correlation rule It is associated analysis, comprising:

The multiple initial alarm cluster is input in Business Rule Engine, the multiple initial alarm cluster is made to be associated with rule with default It is then matched, default correlation rule is defined in the Business Rule Engine；

The generation directed acyclic connected graph, comprising:

All attack segments are formed into Attack Scenarios collection；

7. a kind of device of data processing characterized by comprising

Acquiring unit, for obtaining the original alarms log in preset period of time, the original alarms log is record to a certain net The behavioral data that network is attacked；

First polymerized unit, for divide for the first time according to the core attribute of original alarms log the original alarms log Type of Collective obtains multiple initial alarm clusters；

Second polymerized unit gathers for multiple initial alarm clusters to be carried out the second subseries according to the incidence relation in attack graph It closes, obtains multiple final alarm clusters, the attack graph is the displaying preset period of time Intranet being made of multiple initial alarm clusters The connected graph of network attack process；

Combining unit obtains new police for respectively merging the alert log for including in the multiple final alarm cluster Log is reported, new alert log is replaced into the original alarms log.

8. device according to claim 7, which is characterized in that described device further include:

Association analysis unit, for gathering multiple initial alarm clusters according to incidence relation the second subseries of progress in attack graph Before conjunction, the multiple initial alarm cluster is associated analysis according to default correlation rule, the default correlation rule be according to According to the causality setting between alert categories and alarm；

Generation unit, for that will have the initial alarm cluster of incidence relation to make using initial alarm cluster as vertex with causality For side, the attack graph is generated, the attack graph is directed acyclic connected graph.

9. device according to claim 7, which is characterized in that first polymerized unit, comprising:

Module is divided, for using top-down strategy, according to the alert name of original alarms log, source IP(Internet Protocol) IP Three restriction conditions in location and purpose IP address successively divide original alarms log；

10. device according to claim 9, which is characterized in that the second polymerized unit includes:

Judgment module, it is same with the presence or absence of being located in all single alarm clusters for judging to be divided by same alert name Single alarm cluster in attack graph；

Third determining module, for the single alarm cluster after polymerizeing and the original alarms cluster comprising multiple original alarms logs It is determined as final alarm cluster.