CN114978778B

CN114978778B - Multi-step attack detection method, device and equipment based on causal inference

Info

Publication number: CN114978778B
Application number: CN202210914607.5A
Authority: CN
Inventors: 黄亭
Original assignee: Beijing 6Cloud Technology Co Ltd; Beijing 6Cloud Information Technology Co Ltd
Current assignee: Beijing 6Cloud Technology Co Ltd; Beijing 6Cloud Information Technology Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-10-28
Anticipated expiration: 2042-08-01
Also published as: CN114978778A

Abstract

The invention relates to the technical field of attack detection, and provides a multi-step attack detection method, a multi-step attack detection device and multi-step attack detection equipment based on causal inference. The multi-step attack detection method based on causal inference comprises the following steps: aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; constructing a first probability table and a second probability table corresponding to attack type combinations according to distribution characteristics of data related to characteristics of attack types in a time window of the partitions, the next partition of the partitions and partition boundaries of the partitions and the next partition of the partitions; and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result. The implementation method provided by the invention can improve the efficiency and accuracy of multi-step attack detection.

Description

Multi-step attack detection method, device and equipment based on causal inference

Technical Field

The invention relates to the technical field of data processing, in particular to a multi-step attack detection method based on causal inference, a multi-step attack detection device based on causal inference, electronic equipment and a computer readable storage medium.

Background

In network attack, in order to achieve the attack purpose, a series of multi-step attack modes are generally adopted, for example, an ATT & CK model considers that different attack means are adopted in different attack stages, certain time sequence and causal relationship exist, different alarm events are triggered, and a detection mode aiming at the threats is called event correlation. At present, the multi-step attack behavior detection scheme generally has the following ideas:

event correlation analysis based on attributes: the association analysis technology based on the attribute features is to analyze the association dependency relationship (e.g. specific port, similar domain name) between the attributes from the perspective of the event itself, and perform matching detection on the event attributes. Such methods rely on expert knowledge and are not able to correlate unknown problems effectively.

Event correlation analysis based on logical reasoning: and reasonably selecting and effectively utilizing related knowledge from the incidence relation among the events, and deducing by utilizing expert knowledge. However, the design of the reasoning control strategy has high requirements on people and low efficiency.

Event correlation analysis based on statistics: from the angle of event occurrence probability and statistical data, the relation between alarm information is depicted in a probability mode, and the time sequence and the causal relation of the network security event are revealed. But statistics between events requires a large amount of computation and has poor correlation to unknown attack patterns and event sets with a large number of redundant alarms.

Event correlation analysis based on machine learning: and training a data set by applying a machine learning method to generate an event association rule. The method has the disadvantages that the algorithm is a black box, a large amount of sample debugging is needed, and the effect on new types of attacks cannot be evaluated.

However, the above solutions all have different problems, such as: on the basis of not depending on expert knowledge, a large number of offline samples are not required, online learning can be achieved, and new multi-step attack behaviors can be identified. The statistical-based scheme requires a large number of comparison calculations for events, which is usually expensive and even impossible to complete within a certain time limit.

Super alarm: based on the original alarm (such as IDS) of the alarm detection system, for the events with the same five tuples, the events with the interruption time not exceeding the time window are combined into one event, namely the super alarm.

Disclosure of Invention

The embodiment of the invention aims to provide a multi-step attack detection and decompression method, device and equipment based on causal inference so as to improve the identification efficiency and accuracy of multi-step attack detection.

In order to achieve the above object, a first aspect of the present invention provides a multi-step attack detection method based on causal inference, including: acquiring an alarm sample set; aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; constructing a first probability table and a second probability table corresponding to attack type combination according to distribution characteristics of data related to characteristics of attack types in a time window of a partition, a next partition of the partitions and partition boundaries of the next partition and the partition boundaries of the next partition; and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.

Preferably, the partitioning according to the time sequence of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.

Preferably, the constructing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the data related to the characteristics of the attack type in the time window of the partition, the next partition of the partition, and the partition boundary of the two includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table.

Preferably, the method further comprises: calculating the partition average data quantity and the data quantity of each partition, wherein if the data quantity of the current partition is larger than the partition average data quantity M, M is a real number larger than 1;

then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.

Preferably, each partition is traversedThe data obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each piece of data is in the value range of each feature of other alarm types comprises the following steps: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF _domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF _domain(B,F) (ii) a With conditional probability P (A | B) _{F∈domain(B,F)} ) =ABF _domain(B,F) / AF _domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type.

Preferably, writing the attack type combination determined according to the occurrence probability into the first probability table includes: performing multi-round iterative computation on attack type combinations under a plurality of feature combinations; screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold; and combining the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.

Preferably, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.

In a second aspect of the present invention, there is also provided a multi-step attack detection apparatus based on causal inference, the apparatus comprising: the sample acquisition module is used for acquiring an alarm sample set; the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm; the probability establishing module is used for establishing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the partition, the next partition of the partition and the data related to the characteristics of the attack type in the time window of the partition boundary; and the event detection module is used for combining the detection event with the historical event in the preset time period, matching the detection event in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.

In a third aspect of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multi-step attack detection method based on causal inference when executing the computer program.

In a fourth aspect of the present invention, there is also provided a computer readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the steps of the aforementioned multi-step attack detection method based on causal inference.

A fifth aspect of the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the aforementioned multi-step attack detection method based on causal inference.

The technical scheme at least has the following beneficial effects:

(1) Aiming at all N characteristics, the algorithm dynamically traverses 1 to N characteristic number combinations, selects an attack mode of the maximum combination characteristic number, and reduces the false alarm probability; and for the maximum feature combination number, all corresponding feature combinations are reserved, and the report missing is reduced to the maximum extent.

(2) According to the time sequence characteristics of multi-step attacks, the same alarm event set is similar in time, and distributed calculation of an algorithm is realized in a mode of partitioning and secondary boundary calculation; and meanwhile, sampling partitions with too high density (only a certain number of alarms of the same type need to be reserved), and realizing rapid parallel computation on large-scale data sets.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:

FIG. 1 is a schematic diagram illustrating an implementation of a multi-step attack detection method based on causal inference according to an embodiment of the present invention;

FIG. 2 schematically illustrates a first probability table construction according to an embodiment of the invention;

fig. 3 schematically shows a structural diagram of a multi-step attack detection device based on causal inference according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

Fig. 1 schematically shows an implementation diagram of a multi-step attack detection method based on causal inference according to an embodiment of the present invention. As shown in fig. 1, a multi-step attack detection method based on causal inference includes:

s01, acquiring an alarm sample set; the alarms in the alarm sample set are original alarms, and the data sources include but are not limited to big data sources, online acquisition or mass data analysis results.

S02, aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; if the attributes of the alarm samples are the same and the interval is less than the time window T ₁ Then merge into super alarm. The occurrence time of the first alarm sample in the set of alarm samples is denoted as start time, and the last super alarm time is denoted as end time.

S03, constructing a first probability table and a second probability table corresponding to attack type combinations according to the distribution characteristics of the partition, the next partition of the partition and data related to the characteristics of the attack types of the partition boundaries in a time window; the distribution characteristics and the probability calculation in the step are calculated according to the causal relationship, and the causal relationship adopts Bayes inferenceThe idea of (1). The conditional probability of the occurrence of type B under the condition that the event type A occurs within the time window satisfies a certain Threshold ₁ A is a precondition for occurrence of B, and the similarity based on the features is taken as the reliability that type A is the cause of occurrence of type B.

And S04, combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.

After the detection event is generated, acquiring the related event of the nearest time window in a streaming or timing reading mode, updating according to the event window after each new event is acquired, and simultaneously marking the newly entered event; for one detection event, the other events are combined with the event two by two earlier than the time of occurrence and within the time window. And finding out all the corresponding characteristic rule combinations in the first probability table and the second probability table by using the combinations, and obtaining corresponding detection results according to the characteristic combinations.

In the embodiment, a statistical means is adopted, the causal relationship of the alarm types is deduced and mined based on Bayes, meanwhile, the strongest causal relationship combination is selected through the attribute equality or the similar relationship, and in addition, the new causal relationship is identified through the change of the probability surge; in the implementation aspect, based on a partition mode, firstly, a sample is sampled according to partition density, then, the global causal relationship calculation is converted into partition calculation and secondary boundary partition calculation, the two steps are completed, N rounds of calculation are carried out, dependence conditions with different strengths are obtained, then, a dependence condition set with the largest feature number is selected, finally, an online alarm event is matched with an attack mode, and multi-step attacks are detected.

In some embodiments provided by the present invention, partitioning according to the time sequence of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; when the preset relation is not met, the default partition number is adjusted, and the alarm sample is subjected to the adjusted default partition numberThe data in the collection is partitioned. Specifically, the default partition number is partition _ num, the time span of the data is All _ time, and the preset time window is T ₂ The following determination is performed:

if the time window T is preset ₂ Greater than frac x (All _ time/partition _ num), the number of partitions is reduced so that it satisfies T ₂ Frac ≈ frac (All _ time/partition _ num _ new), where partition _ num _ new is the new partition number; otherwise, the default partition number is not changed, and frac in the formula is a decimal number between 0 and 1.

In some embodiments, constructing the first probability table and the second probability table corresponding to the attack type combination according to distribution characteristics of data related to characteristics of the attack type in a time window of a partition, a next partition of the partitions, and partition boundaries of the two includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table. The present embodiment adopts a partition calculation method. Traversing each partition data, judging whether each data is in the value range of each feature of other alarm types or not according to the alarm type of each data, if so, marking as 1, otherwise, marking as 0, finally summarizing and adding global data to obtain AF corresponding to each feature of all attack type pairs _domain(B,F) (ii) a Based on the event time window T ₂ First, calculate the ABF in each partition _domain(B,F) And temporarily recording the results in each partition as a set without counting data, and summarizing the partition results to form a set. Compute partition ABF _domain(B,F) And meanwhile, the data at the tail part of each partition cannot determine whether the calculation is finished, the data at the tail part of each partition is taken out, broadcasted, secondarily compared with the data of the next adjacent partition, represented by a set in the same way, and summarized after the calculation is finished.

In the previous stepThe result of (1) merging the sets, calculating the number of the sets, filtering out the signals larger than minMumOfABF (the signals meet a certain number and have reliability), and obtaining the final frequency ABF _domain(B,F) 。

Combining the results obtained in the previous steps, the frequency of the same attack type combination AB (combination of attack type A and attack type B) is divided to obtain P (A | B) _{F∈domain(B,F)} ) Is greater than Threshold ₁ After screening, the attack mode of the attack type combination AB is obtained under the condition that the number of the feature combinations is 1.

On the basis of the steps, the attack mode under more feature combinations is calculated by iteration layer by layer, and in the Nth iteration, the last probability is used for being more than or equal to the Threshold value Threshold ₂ (generally Threshold) ₂ < Threshold ₁ ) The attack type pair is the same, and all the attack type pairs are larger than or equal to the Threshold value with the first time (single feature) ₂ The single feature combination is used as all feature combinations of ABF calculation in the current round, and the steps are repeated to obtain attack patterns under N feature combinations;

terminate when N reaches the maximum feature combination number maxNumOfFeaturesCombine; and then combining the results of the 1 to N rounds, taking the attack mode with the most feature combinations as the attack mode of the AB type for all the AB attack modes, and storing the attack mode into a first probability table which can be named as an alert TypeCorTable table, wherein the key fields of the table comprise: attack type A, attack type B, set of feature combinations, and conditional probability. FIG. 2 schematically illustrates a first probability table construction according to an embodiment of the invention. As shown in fig. 2, the first probability table can be obtained in the above manner.

On the basis of the first probability table alert type CorTable, whether the attack type probability is mutated or not is detected, for the mutated attack type A, the related conditional probability of the attack type combination AX is recalculated, and X is any other attack type, and the method specifically comprises the following steps: taking super alarm in the latest time period N, wherein the time period N is greater than the time window T ₂ And is simultaneously larger than the calculation interval of the process 2; counting the probability of super alarm; in contrast to the alert TypeCorTable table, the probability of an attack type if it is increasedAdding more than J (real number more than 1) times, recalculating attack pair mode conditioned on the attack type, adding newly-appeared attack mode and known attack mode combination, recalculating process 1 with the attack type combination as range, storing the result into temporary probability table TemportertTypeCorTable, namely a second probability table, and only updating or newly adding the table in each calculation.

In order to keep the amount of data within the partition balanced, in some embodiments, the method further comprises: calculating the partition average data volume and the data volume of each partition, wherein if the data volume of the current partition is larger than the partition average data volume M, M is a real number larger than 1; then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.

In some embodiments provided by the present invention, traversing each piece of partition data, and obtaining an occurrence probability corresponding to each feature of an attack type combination according to whether each piece of data is in a value range of each feature of other alarm types, includes: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF _domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF _domain(B,F) (ii) a With conditional probability P (A | B) _{F∈domain(B,F)} ) =ABF _domain(B,F) / AF _domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type. The present embodiment provides the aforementioned AF _domain(B,F) And (4) calculating the parameters. Specifically, for attack type B, the value range of each feature of the attack type is calculated and is marked as domain (B, f) _j )，f _j Represents the jth feature; for attack type A, the domain (B, f) in the value range of attack type B is calculated with respect to all feature combinations _j ) The frequency of occurrence of the internal attack type A is recorded as AF _domain(B,F) Wherein F is a certain combination of characteristics; calculating over a time window T ₂ In, when B (start _ time)>=A(start_time)，B(end_time)>=A(end_time)+T ₂ And the number of events of type A with equal (high similarity) attributes in F is marked as ABF _domain(B,F) (ii) a Conditional probability P (A | B) _{F∈domain(B,F)} ) =ABF _domain(B,F) / AF _domain(B,F) Equal to or greater than Threshold ₁ Then, it means that in the case of the feature combination F, a is the cause of the occurrence of B, and the ordered combination AB is marked as a pair of attack type combinations.

In some embodiments provided herein, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification. Specifically, the multi-step attack detection result is stored in a composite data structure, wherein each element is a multi-step attack result, each multi-step attack detection result comprises an attack sequence of an original event, and the elements in the sequence comprise a unique identifier and an occurrence time of each alarm time, the latest time update _ time of the alarm in the sequence, and whether to compare identifiers compare _ flag. Multi-step attack generation process: for the attack type combination of the matching hit in the first probability table and the second probability table, denoted as ab, compare the known attack sequence of match _ flag = =1 if the attack type a is in the known attack sequence and ab is in the time window T ₂ And adding the sequence of a in the attack type b, if the sequence of a is not in the attack type b, newly establishing a multi-step attack sequence, and newly establishing a compare _ flag =1. The embodiment also comprises a step of updating the compare _ flag, and if the update _ time is not in the window T of the minimum alarm time of the new data before each round of new data calculation ₂ And if the attack sequence is invalid, the multi-step attack sequence is not updated, and the match _ flag is updated to 0.

Based on the same inventive concept, the invention also provides a multi-step attack detection device based on causal inference. Fig. 3 schematically shows a structural diagram of a multi-step attack detection device based on causal inference according to an embodiment of the present invention. As shown in fig. 3, a multi-step attack detection device based on causal inference includes: the sample acquisition module is used for acquiring an alarm sample set; the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm; the probability establishing module is used for establishing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the partition, the next partition of the partition and the data related to the characteristics of the attack type in the time window of the partition boundary; and the event detection module is used for combining the detection event with the historical events in the preset time period, then matching the detection event with the historical events in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.

In some optional embodiments, partitioning according to the timing of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.

In some optional embodiments, constructing the first probability table and the second probability table corresponding to the attack type combination according to the distribution characteristics of the alarm samples related to the characteristics of the attack type in the partition includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table.

In some optional embodiments, the method further comprises: calculating the partition average data volume and the data volume of each partition, wherein if the data volume of the current partition is larger than the partition average data volume M, M is a real number larger than 1; then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.

In some optional embodiments, traversing each partition data, and obtaining an occurrence probability corresponding to each feature of the attack type combination according to whether each piece of data is within a value range of each feature of other alarm types, includes: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF _domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF _domain(B,F) (ii) a With conditional probability P (A | B) _{F∈domain(B,F)} ) =ABF _domain(B,F) / AF _domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type.

In some optional embodiments, writing a combination of attack types determined according to the occurrence probability into the first probability table includes: performing multi-round iterative computation on attack type combinations under a plurality of feature combinations; screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold; and merging the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.

In some optional embodiments, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.

The specific definition of each functional module in the multi-step attack detection device based on causal inference can be referred to the definition of the multi-step attack detection method based on causal inference, and is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In some embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multi-step attack detection method based on causal inference when executing the computer program. The processor herein has functions of numerical calculation and logical operation, and has at least a central processing unit CPU having data processing capability, a random access memory RAM, a read only memory ROM, various I/O ports, an interrupt system, and the like. The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the method is realized by adjusting the kernel parameters. The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.

In an embodiment of the present invention, there is also provided a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the processor to be configured to perform the steps of the multi-step attack detection method based on causal inference described above.

In one embodiment provided by the present invention, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the multi-step attack detection method based on causal inference described above.

The above various embodiments are applicable to mining complex multi-step attack behavior from alarm events of network devices. The method or the device operates on a distributed platform, the input data is a structured alarm event, and the complex multi-step attack behavior can be detected through calculation of the method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A multi-step attack detection method based on causal inference, the method comprising:

acquiring an alarm sample set;

aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm;

sorting the data in the partitions;

calculating the value range of each characteristic of all attack types;

traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types;

writing the attack type combination determined according to the occurrence probability into a first probability table;

writing a combination of attack types with a sudden change in probability in the first probability table into a second probability table;

and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.

2. The method of claim 1, wherein partitioning according to the timing of super alarms comprises:

judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not;

when a preset relation is met, partitioning the data in the alarm sample set by the default partition number;

and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.

3. The method of claim 2, further comprising:

calculating the average data volume of the partitions and the data volume of each partition;

if the data amount of the current partition > the partition average data amount M, M is a real number greater than 1,

4. The method of claim 1, wherein traversing each partitioned data to obtain the probability of occurrence corresponding to each feature of the attack type combination according to whether each data is within the value range of each feature of other alarm types comprises:

acquiring all feature combinations of the first attack type A, calculating the frequency of the feature combinations in the value range of the second attack type B, and recording the frequency as AF _domain(B,F) (ii) a Wherein F is a certain combination of characteristics;

calculating the time sequence relation of the first attack type A and the second attack type B and the number of the first attack type A events of which F meets the similarity requirement in a preset time period, and recording as ABF _domain(B,F) ；

With conditional probability P (A | B) _{F∈domain(B,F)} )=ABF _domain(B,F) / AF _domain(B,F) As the probability of occurrence of an attack type combination composed of the first attack type a and the second attack type B.

5. The method of claim 1, wherein writing a combination of attack types determined from the probability of occurrence into a first probability table comprises:

computing attack type combinations under a plurality of feature combinations in a multi-round iterative manner;

screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold;

and combining the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.

6. The method of claim 5, further comprising:

constructing a composite data structure to identify a matching process;

the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.

7. A multi-step attack detection device based on causal inference, the device comprising:

the sample acquisition module is used for acquiring an alarm sample set;

the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm;

the probability establishing module is used for sequencing the data in the subareas; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; writing a combination of attack types with a sudden change in probability in the first probability table into a second probability table; and

and the event detection module is used for combining a detection event with a historical event in a preset time period, matching the detection event in the first probability table and the second probability table, and obtaining a detection result of the detection event according to a matching result.

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the multi-step attack detection method based on causal inference as claimed in any of claims 1 to 6.

9. A computer readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the steps of the causal inference based multi-step attack detection method of any one of claims 1 to 6.