CN114978778B - Multi-step attack detection method, device and equipment based on causal inference - Google Patents

Multi-step attack detection method, device and equipment based on causal inference Download PDF

Info

Publication number
CN114978778B
CN114978778B CN202210914607.5A CN202210914607A CN114978778B CN 114978778 B CN114978778 B CN 114978778B CN 202210914607 A CN202210914607 A CN 202210914607A CN 114978778 B CN114978778 B CN 114978778B
Authority
CN
China
Prior art keywords
attack
alarm
data
partition
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210914607.5A
Other languages
Chinese (zh)
Other versions
CN114978778A (en
Inventor
黄亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 6Cloud Technology Co Ltd
Beijing 6Cloud Information Technology Co Ltd
Original Assignee
Beijing 6Cloud Technology Co Ltd
Beijing 6Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 6Cloud Technology Co Ltd, Beijing 6Cloud Information Technology Co Ltd filed Critical Beijing 6Cloud Technology Co Ltd
Priority to CN202210914607.5A priority Critical patent/CN114978778B/en
Publication of CN114978778A publication Critical patent/CN114978778A/en
Application granted granted Critical
Publication of CN114978778B publication Critical patent/CN114978778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of attack detection, and provides a multi-step attack detection method, a multi-step attack detection device and multi-step attack detection equipment based on causal inference. The multi-step attack detection method based on causal inference comprises the following steps: aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; constructing a first probability table and a second probability table corresponding to attack type combinations according to distribution characteristics of data related to characteristics of attack types in a time window of the partitions, the next partition of the partitions and partition boundaries of the partitions and the next partition of the partitions; and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result. The implementation method provided by the invention can improve the efficiency and accuracy of multi-step attack detection.

Description

Multi-step attack detection method, device and equipment based on causal inference
Technical Field
The invention relates to the technical field of data processing, in particular to a multi-step attack detection method based on causal inference, a multi-step attack detection device based on causal inference, electronic equipment and a computer readable storage medium.
Background
In network attack, in order to achieve the attack purpose, a series of multi-step attack modes are generally adopted, for example, an ATT & CK model considers that different attack means are adopted in different attack stages, certain time sequence and causal relationship exist, different alarm events are triggered, and a detection mode aiming at the threats is called event correlation. At present, the multi-step attack behavior detection scheme generally has the following ideas:
event correlation analysis based on attributes: the association analysis technology based on the attribute features is to analyze the association dependency relationship (e.g. specific port, similar domain name) between the attributes from the perspective of the event itself, and perform matching detection on the event attributes. Such methods rely on expert knowledge and are not able to correlate unknown problems effectively.
Event correlation analysis based on logical reasoning: and reasonably selecting and effectively utilizing related knowledge from the incidence relation among the events, and deducing by utilizing expert knowledge. However, the design of the reasoning control strategy has high requirements on people and low efficiency.
Event correlation analysis based on statistics: from the angle of event occurrence probability and statistical data, the relation between alarm information is depicted in a probability mode, and the time sequence and the causal relation of the network security event are revealed. But statistics between events requires a large amount of computation and has poor correlation to unknown attack patterns and event sets with a large number of redundant alarms.
Event correlation analysis based on machine learning: and training a data set by applying a machine learning method to generate an event association rule. The method has the disadvantages that the algorithm is a black box, a large amount of sample debugging is needed, and the effect on new types of attacks cannot be evaluated.
However, the above solutions all have different problems, such as: on the basis of not depending on expert knowledge, a large number of offline samples are not required, online learning can be achieved, and new multi-step attack behaviors can be identified. The statistical-based scheme requires a large number of comparison calculations for events, which is usually expensive and even impossible to complete within a certain time limit.
Super alarm: based on the original alarm (such as IDS) of the alarm detection system, for the events with the same five tuples, the events with the interruption time not exceeding the time window are combined into one event, namely the super alarm.
Disclosure of Invention
The embodiment of the invention aims to provide a multi-step attack detection and decompression method, device and equipment based on causal inference so as to improve the identification efficiency and accuracy of multi-step attack detection.
In order to achieve the above object, a first aspect of the present invention provides a multi-step attack detection method based on causal inference, including: acquiring an alarm sample set; aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; constructing a first probability table and a second probability table corresponding to attack type combination according to distribution characteristics of data related to characteristics of attack types in a time window of a partition, a next partition of the partitions and partition boundaries of the next partition and the partition boundaries of the next partition; and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.
Preferably, the partitioning according to the time sequence of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.
Preferably, the constructing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the data related to the characteristics of the attack type in the time window of the partition, the next partition of the partition, and the partition boundary of the two includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table.
Preferably, the method further comprises: calculating the partition average data quantity and the data quantity of each partition, wherein if the data quantity of the current partition is larger than the partition average data quantity M, M is a real number larger than 1;
then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.
Preferably, each partition is traversedThe data obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each piece of data is in the value range of each feature of other alarm types comprises the following steps: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF domain(B,F) (ii) a With conditional probability P (A | B) F∈domain(B,F) ) =ABF domain(B,F) / AF domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type.
Preferably, writing the attack type combination determined according to the occurrence probability into the first probability table includes: performing multi-round iterative computation on attack type combinations under a plurality of feature combinations; screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold; and combining the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.
Preferably, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.
In a second aspect of the present invention, there is also provided a multi-step attack detection apparatus based on causal inference, the apparatus comprising: the sample acquisition module is used for acquiring an alarm sample set; the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm; the probability establishing module is used for establishing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the partition, the next partition of the partition and the data related to the characteristics of the attack type in the time window of the partition boundary; and the event detection module is used for combining the detection event with the historical event in the preset time period, matching the detection event in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.
In a third aspect of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multi-step attack detection method based on causal inference when executing the computer program.
In a fourth aspect of the present invention, there is also provided a computer readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the steps of the aforementioned multi-step attack detection method based on causal inference.
A fifth aspect of the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the aforementioned multi-step attack detection method based on causal inference.
The technical scheme at least has the following beneficial effects:
(1) Aiming at all N characteristics, the algorithm dynamically traverses 1 to N characteristic number combinations, selects an attack mode of the maximum combination characteristic number, and reduces the false alarm probability; and for the maximum feature combination number, all corresponding feature combinations are reserved, and the report missing is reduced to the maximum extent.
(2) According to the time sequence characteristics of multi-step attacks, the same alarm event set is similar in time, and distributed calculation of an algorithm is realized in a mode of partitioning and secondary boundary calculation; and meanwhile, sampling partitions with too high density (only a certain number of alarms of the same type need to be reserved), and realizing rapid parallel computation on large-scale data sets.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 is a schematic diagram illustrating an implementation of a multi-step attack detection method based on causal inference according to an embodiment of the present invention;
FIG. 2 schematically illustrates a first probability table construction according to an embodiment of the invention;
fig. 3 schematically shows a structural diagram of a multi-step attack detection device based on causal inference according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
Fig. 1 schematically shows an implementation diagram of a multi-step attack detection method based on causal inference according to an embodiment of the present invention. As shown in fig. 1, a multi-step attack detection method based on causal inference includes:
s01, acquiring an alarm sample set; the alarms in the alarm sample set are original alarms, and the data sources include but are not limited to big data sources, online acquisition or mass data analysis results.
S02, aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm; if the attributes of the alarm samples are the same and the interval is less than the time window T 1 Then merge into super alarm. The occurrence time of the first alarm sample in the set of alarm samples is denoted as start time, and the last super alarm time is denoted as end time.
S03, constructing a first probability table and a second probability table corresponding to attack type combinations according to the distribution characteristics of the partition, the next partition of the partition and data related to the characteristics of the attack types of the partition boundaries in a time window; the distribution characteristics and the probability calculation in the step are calculated according to the causal relationship, and the causal relationship adopts Bayes inferenceThe idea of (1). The conditional probability of the occurrence of type B under the condition that the event type A occurs within the time window satisfies a certain Threshold 1 A is a precondition for occurrence of B, and the similarity based on the features is taken as the reliability that type A is the cause of occurrence of type B.
And S04, combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.
After the detection event is generated, acquiring the related event of the nearest time window in a streaming or timing reading mode, updating according to the event window after each new event is acquired, and simultaneously marking the newly entered event; for one detection event, the other events are combined with the event two by two earlier than the time of occurrence and within the time window. And finding out all the corresponding characteristic rule combinations in the first probability table and the second probability table by using the combinations, and obtaining corresponding detection results according to the characteristic combinations.
In the embodiment, a statistical means is adopted, the causal relationship of the alarm types is deduced and mined based on Bayes, meanwhile, the strongest causal relationship combination is selected through the attribute equality or the similar relationship, and in addition, the new causal relationship is identified through the change of the probability surge; in the implementation aspect, based on a partition mode, firstly, a sample is sampled according to partition density, then, the global causal relationship calculation is converted into partition calculation and secondary boundary partition calculation, the two steps are completed, N rounds of calculation are carried out, dependence conditions with different strengths are obtained, then, a dependence condition set with the largest feature number is selected, finally, an online alarm event is matched with an attack mode, and multi-step attacks are detected.
In some embodiments provided by the present invention, partitioning according to the time sequence of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; when the preset relation is not met, the default partition number is adjusted, and the alarm sample is subjected to the adjusted default partition numberThe data in the collection is partitioned. Specifically, the default partition number is partition _ num, the time span of the data is All _ time, and the preset time window is T 2 The following determination is performed:
if the time window T is preset 2 Greater than frac x (All _ time/partition _ num), the number of partitions is reduced so that it satisfies T 2 Frac ≈ frac (All _ time/partition _ num _ new), where partition _ num _ new is the new partition number; otherwise, the default partition number is not changed, and frac in the formula is a decimal number between 0 and 1.
In some embodiments, constructing the first probability table and the second probability table corresponding to the attack type combination according to distribution characteristics of data related to characteristics of the attack type in a time window of a partition, a next partition of the partitions, and partition boundaries of the two includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table. The present embodiment adopts a partition calculation method. Traversing each partition data, judging whether each data is in the value range of each feature of other alarm types or not according to the alarm type of each data, if so, marking as 1, otherwise, marking as 0, finally summarizing and adding global data to obtain AF corresponding to each feature of all attack type pairs domain(B,F) (ii) a Based on the event time window T 2 First, calculate the ABF in each partition domain(B,F) And temporarily recording the results in each partition as a set without counting data, and summarizing the partition results to form a set. Compute partition ABF domain(B,F) And meanwhile, the data at the tail part of each partition cannot determine whether the calculation is finished, the data at the tail part of each partition is taken out, broadcasted, secondarily compared with the data of the next adjacent partition, represented by a set in the same way, and summarized after the calculation is finished.
In the previous stepThe result of (1) merging the sets, calculating the number of the sets, filtering out the signals larger than minMumOfABF (the signals meet a certain number and have reliability), and obtaining the final frequency ABF domain(B,F)
Combining the results obtained in the previous steps, the frequency of the same attack type combination AB (combination of attack type A and attack type B) is divided to obtain P (A | B) F∈domain(B,F) ) Is greater than Threshold 1 After screening, the attack mode of the attack type combination AB is obtained under the condition that the number of the feature combinations is 1.
On the basis of the steps, the attack mode under more feature combinations is calculated by iteration layer by layer, and in the Nth iteration, the last probability is used for being more than or equal to the Threshold value Threshold 2 (generally Threshold) 2 < Threshold 1 ) The attack type pair is the same, and all the attack type pairs are larger than or equal to the Threshold value with the first time (single feature) 2 The single feature combination is used as all feature combinations of ABF calculation in the current round, and the steps are repeated to obtain attack patterns under N feature combinations;
terminate when N reaches the maximum feature combination number maxNumOfFeaturesCombine; and then combining the results of the 1 to N rounds, taking the attack mode with the most feature combinations as the attack mode of the AB type for all the AB attack modes, and storing the attack mode into a first probability table which can be named as an alert TypeCorTable table, wherein the key fields of the table comprise: attack type A, attack type B, set of feature combinations, and conditional probability. FIG. 2 schematically illustrates a first probability table construction according to an embodiment of the invention. As shown in fig. 2, the first probability table can be obtained in the above manner.
On the basis of the first probability table alert type CorTable, whether the attack type probability is mutated or not is detected, for the mutated attack type A, the related conditional probability of the attack type combination AX is recalculated, and X is any other attack type, and the method specifically comprises the following steps: taking super alarm in the latest time period N, wherein the time period N is greater than the time window T 2 And is simultaneously larger than the calculation interval of the process 2; counting the probability of super alarm; in contrast to the alert TypeCorTable table, the probability of an attack type if it is increasedAdding more than J (real number more than 1) times, recalculating attack pair mode conditioned on the attack type, adding newly-appeared attack mode and known attack mode combination, recalculating process 1 with the attack type combination as range, storing the result into temporary probability table TemportertTypeCorTable, namely a second probability table, and only updating or newly adding the table in each calculation.
In order to keep the amount of data within the partition balanced, in some embodiments, the method further comprises: calculating the partition average data volume and the data volume of each partition, wherein if the data volume of the current partition is larger than the partition average data volume M, M is a real number larger than 1; then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.
In some embodiments provided by the present invention, traversing each piece of partition data, and obtaining an occurrence probability corresponding to each feature of an attack type combination according to whether each piece of data is in a value range of each feature of other alarm types, includes: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF domain(B,F) (ii) a With conditional probability P (A | B) F∈domain(B,F) ) =ABF domain(B,F) / AF domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type. The present embodiment provides the aforementioned AF domain(B,F) And (4) calculating the parameters. Specifically, for attack type B, the value range of each feature of the attack type is calculated and is marked as domain (B, f) j ),f j Represents the jth feature; for attack type A, the domain (B, f) in the value range of attack type B is calculated with respect to all feature combinations j ) The frequency of occurrence of the internal attack type A is recorded as AF domain(B,F) Wherein F is a certain combination of characteristics; calculating over a time window T 2 In, when B (start _ time)>=A(start_time),B(end_time)>=A(end_time)+T 2 And the number of events of type A with equal (high similarity) attributes in F is marked as ABF domain(B,F) (ii) a Conditional probability P (A | B) F∈domain(B,F) ) =ABF domain(B,F) / AF domain(B,F) Equal to or greater than Threshold 1 Then, it means that in the case of the feature combination F, a is the cause of the occurrence of B, and the ordered combination AB is marked as a pair of attack type combinations.
In some embodiments provided herein, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification. Specifically, the multi-step attack detection result is stored in a composite data structure, wherein each element is a multi-step attack result, each multi-step attack detection result comprises an attack sequence of an original event, and the elements in the sequence comprise a unique identifier and an occurrence time of each alarm time, the latest time update _ time of the alarm in the sequence, and whether to compare identifiers compare _ flag. Multi-step attack generation process: for the attack type combination of the matching hit in the first probability table and the second probability table, denoted as ab, compare the known attack sequence of match _ flag = =1 if the attack type a is in the known attack sequence and ab is in the time window T 2 And adding the sequence of a in the attack type b, if the sequence of a is not in the attack type b, newly establishing a multi-step attack sequence, and newly establishing a compare _ flag =1. The embodiment also comprises a step of updating the compare _ flag, and if the update _ time is not in the window T of the minimum alarm time of the new data before each round of new data calculation 2 And if the attack sequence is invalid, the multi-step attack sequence is not updated, and the match _ flag is updated to 0.
Based on the same inventive concept, the invention also provides a multi-step attack detection device based on causal inference. Fig. 3 schematically shows a structural diagram of a multi-step attack detection device based on causal inference according to an embodiment of the present invention. As shown in fig. 3, a multi-step attack detection device based on causal inference includes: the sample acquisition module is used for acquiring an alarm sample set; the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm; the probability establishing module is used for establishing a first probability table and a second probability table corresponding to the attack type combination according to the distribution characteristics of the partition, the next partition of the partition and the data related to the characteristics of the attack type in the time window of the partition boundary; and the event detection module is used for combining the detection event with the historical events in the preset time period, then matching the detection event with the historical events in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.
In some optional embodiments, partitioning according to the timing of the super alarm includes: judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not; when a preset relation is met, partitioning the data in the alarm sample set by the default partition number; and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.
In some optional embodiments, constructing the first probability table and the second probability table corresponding to the attack type combination according to the distribution characteristics of the alarm samples related to the characteristics of the attack type in the partition includes: sorting the data in the partitions; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; and writing the attack type combination with the abrupt change probability in the first probability table into a second probability table.
In some optional embodiments, the method further comprises: calculating the partition average data volume and the data volume of each partition, wherein if the data volume of the current partition is larger than the partition average data volume M, M is a real number larger than 1; then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.
In some optional embodiments, traversing each partition data, and obtaining an occurrence probability corresponding to each feature of the attack type combination according to whether each piece of data is within a value range of each feature of other alarm types, includes: acquiring all feature combinations of the first attack type, calculating the frequency of the feature combinations in the value range of the second attack type, and recording the frequency as AF domain(B,F) (ii) a Wherein F is a certain combination of characteristics; calculating the time sequence relation of the first attack type and the second attack type in a preset time period, and the number of events of the first attack type, of which F meets the similarity requirement, and recording as ABF domain(B,F) (ii) a With conditional probability P (A | B) F∈domain(B,F) ) =ABF domain(B,F) / AF domain(B,F) As the occurrence probability of an attack type combination composed of the first attack type and the second attack type.
In some optional embodiments, writing a combination of attack types determined according to the occurrence probability into the first probability table includes: performing multi-round iterative computation on attack type combinations under a plurality of feature combinations; screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold; and merging the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.
In some optional embodiments, the method further comprises: constructing a composite data structure to identify a matching process; the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.
The specific definition of each functional module in the multi-step attack detection device based on causal inference can be referred to the definition of the multi-step attack detection method based on causal inference, and is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In some embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the multi-step attack detection method based on causal inference when executing the computer program. The processor herein has functions of numerical calculation and logical operation, and has at least a central processing unit CPU having data processing capability, a random access memory RAM, a read only memory ROM, various I/O ports, an interrupt system, and the like. The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the method is realized by adjusting the kernel parameters. The memory may include volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.
In an embodiment of the present invention, there is also provided a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the processor to be configured to perform the steps of the multi-step attack detection method based on causal inference described above.
In one embodiment provided by the present invention, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the multi-step attack detection method based on causal inference described above.
The above various embodiments are applicable to mining complex multi-step attack behavior from alarm events of network devices. The method or the device operates on a distributed platform, the input data is a structured alarm event, and the complex multi-step attack behavior can be detected through calculation of the method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (9)

1. A multi-step attack detection method based on causal inference, the method comprising:
acquiring an alarm sample set;
aggregating the alarm samples meeting the time polymerization degree in the alarm sample set into a super alarm, and partitioning according to the time sequence of the super alarm;
sorting the data in the partitions;
calculating the value range of each characteristic of all attack types;
traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types;
writing the attack type combination determined according to the occurrence probability into a first probability table;
writing a combination of attack types with a sudden change in probability in the first probability table into a second probability table;
and combining the detection event with the historical event in the preset time period, matching in the first probability table and the second probability table, and obtaining the detection result of the detection event according to the matching result.
2. The method of claim 1, wherein partitioning according to the timing of super alarms comprises:
judging whether the default partition number, a preset time window and the time span of the alarm sample set meet a preset relation or not;
when a preset relation is met, partitioning the data in the alarm sample set by the default partition number;
and when the preset relation is not met, adjusting the default partition number, and partitioning the data in the alarm sample set by using the adjusted default partition number.
3. The method of claim 2, further comprising:
calculating the average data volume of the partitions and the data volume of each partition;
if the data amount of the current partition > the partition average data amount M, M is a real number greater than 1,
then sampling the current partition without putting back by taking the partition average data quantity M/the current partition data quantity as a sampling proportion.
4. The method of claim 1, wherein traversing each partitioned data to obtain the probability of occurrence corresponding to each feature of the attack type combination according to whether each data is within the value range of each feature of other alarm types comprises:
acquiring all feature combinations of the first attack type A, calculating the frequency of the feature combinations in the value range of the second attack type B, and recording the frequency as AF domain(B,F) (ii) a Wherein F is a certain combination of characteristics;
calculating the time sequence relation of the first attack type A and the second attack type B and the number of the first attack type A events of which F meets the similarity requirement in a preset time period, and recording as ABF domain(B,F)
With conditional probability P (A | B) F∈domain(B,F) )=ABF domain(B,F) / AF domain(B,F) As the probability of occurrence of an attack type combination composed of the first attack type a and the second attack type B.
5. The method of claim 1, wherein writing a combination of attack types determined from the probability of occurrence into a first probability table comprises:
computing attack type combinations under a plurality of feature combinations in a multi-round iterative manner;
screening attack type combinations under a plurality of feature combinations in each round according to a preset threshold;
and combining the obtained attack type combinations, and writing the attack type combination with the largest number of feature combinations into the first probability table.
6. The method of claim 5, further comprising:
constructing a composite data structure to identify a matching process;
the composite data structure comprises: event identification, event occurrence time, latest time of event in the sequence and whether to compare identification.
7. A multi-step attack detection device based on causal inference, the device comprising:
the sample acquisition module is used for acquiring an alarm sample set;
the aggregation partitioning module is used for aggregating the alarm samples meeting the time aggregation degree in the alarm sample set into a super alarm and partitioning according to the time sequence of the super alarm;
the probability establishing module is used for sequencing the data in the subareas; calculating the value range of each characteristic of all attack types; traversing each partition data, and obtaining the occurrence probability corresponding to each feature of the attack type combination according to whether each data is in the value range of each feature of other alarm types; writing the attack type combination determined according to the occurrence probability into a first probability table; writing a combination of attack types with a sudden change in probability in the first probability table into a second probability table; and
and the event detection module is used for combining a detection event with a historical event in a preset time period, matching the detection event in the first probability table and the second probability table, and obtaining a detection result of the detection event according to a matching result.
8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the multi-step attack detection method based on causal inference as claimed in any of claims 1 to 6.
9. A computer readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the steps of the causal inference based multi-step attack detection method of any one of claims 1 to 6.
CN202210914607.5A 2022-08-01 2022-08-01 Multi-step attack detection method, device and equipment based on causal inference Active CN114978778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210914607.5A CN114978778B (en) 2022-08-01 2022-08-01 Multi-step attack detection method, device and equipment based on causal inference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210914607.5A CN114978778B (en) 2022-08-01 2022-08-01 Multi-step attack detection method, device and equipment based on causal inference

Publications (2)

Publication Number Publication Date
CN114978778A CN114978778A (en) 2022-08-30
CN114978778B true CN114978778B (en) 2022-10-28

Family

ID=82968751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210914607.5A Active CN114978778B (en) 2022-08-01 2022-08-01 Multi-step attack detection method, device and equipment based on causal inference

Country Status (1)

Country Link
CN (1) CN114978778B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075516A (en) * 2010-11-26 2011-05-25 哈尔滨工程大学 Method for identifying and predicting network multi-step attacks
CN103746961A (en) * 2013-12-12 2014-04-23 中国人民解放军63928部队 Method, apparatus and server for mining causal knowledge of network attack scenario
EP2975801A1 (en) * 2014-07-18 2016-01-20 Deutsche Telekom AG Method for detecting an attack in a computer network
CN106341414A (en) * 2016-09-30 2017-01-18 重庆邮电大学 Bayesian network-based multi-step attack security situation assessment method
CN111541661A (en) * 2020-04-15 2020-08-14 全球能源互联网研究院有限公司 Power information network attack scene reconstruction method and system based on causal knowledge
CN112148772A (en) * 2020-09-24 2020-12-29 创新奇智(成都)科技有限公司 Alarm root cause identification method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102075516A (en) * 2010-11-26 2011-05-25 哈尔滨工程大学 Method for identifying and predicting network multi-step attacks
CN103746961A (en) * 2013-12-12 2014-04-23 中国人民解放军63928部队 Method, apparatus and server for mining causal knowledge of network attack scenario
EP2975801A1 (en) * 2014-07-18 2016-01-20 Deutsche Telekom AG Method for detecting an attack in a computer network
CN106341414A (en) * 2016-09-30 2017-01-18 重庆邮电大学 Bayesian network-based multi-step attack security situation assessment method
CN111541661A (en) * 2020-04-15 2020-08-14 全球能源互联网研究院有限公司 Power information network attack scene reconstruction method and system based on causal knowledge
CN112148772A (en) * 2020-09-24 2020-12-29 创新奇智(成都)科技有限公司 Alarm root cause identification method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于概率攻击图的内部攻击意图推断算法研究;陈小军等;《计算机学报》;20140131;第37卷(第1期);第62-72页 *

Also Published As

Publication number Publication date
CN114978778A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US11710131B2 (en) Method and apparatus of identifying a transaction risk
Kotenko et al. Attack detection in IoT critical infrastructures: a machine learning and big data processing approach
Ruan et al. Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data
JP2023550194A (en) Model training methods, data enrichment methods, equipment, electronic equipment and storage media
CN113961438A (en) Multi-granularity and multi-hierarchy based historical behavior abnormal user detection system, method, equipment and storage medium
Liu et al. Multi-step attack scenarios mining based on neural network and Bayesian network attack graph
CN116596095A (en) Training method and device of carbon emission prediction model based on machine learning
CN115514558A (en) Intrusion detection method, device, equipment and medium
CN107920067B (en) Intrusion detection method on active object storage system
CN112463564B (en) Method and device for determining associated index influencing host state
CN114978778B (en) Multi-step attack detection method, device and equipment based on causal inference
CN113821630B (en) Data clustering method and device
CN110019845B (en) Community evolution analysis method and device based on knowledge graph
US20230164162A1 (en) Valuable alert screening method efficiently detecting malicious threat
Albuquerque et al. A decision-based dynamic ensemble selection method for concept drift
Bacher et al. An Information Theory Subspace Analysis Approach with Application to Anomaly Detection Ensembles.
CN112906824B (en) Vehicle clustering method, system, device and storage medium
US11372832B1 (en) Efficient hashing of data objects
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
Sinadskiy et al. Formal Model and Algorithm for Zero Knowledge Complex Network Traffic Analysis
CN115955323A (en) Network security situation sensing method and device and electronic equipment
CN115629945A (en) Alarm processing method and device and electronic equipment
CN113609948A (en) Method, device and equipment for detecting video time sequence action
Sidibé et al. Big Data Framework for Abnormal Vessel Trajectories Detection using Adaptive Kernel Density Estimation
US20230010180A1 (en) Parafinitary neural learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant