WO2021077642A1 - 一种基于异构图嵌入的网络空间安全威胁检测方法及系统 - Google Patents

一种基于异构图嵌入的网络空间安全威胁检测方法及系统 Download PDF

Info

Publication number
WO2021077642A1
WO2021077642A1 PCT/CN2020/072591 CN2020072591W WO2021077642A1 WO 2021077642 A1 WO2021077642 A1 WO 2021077642A1 CN 2020072591 W CN2020072591 W CN 2020072591W WO 2021077642 A1 WO2021077642 A1 WO 2021077642A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
heterogeneous graph
vectorized
node
data item
Prior art date
Application number
PCT/CN2020/072591
Other languages
English (en)
French (fr)
Inventor
文雨
刘福承
张东雪
张博洋
杨纯
杜莹莹
郑阳
孟丹
Original Assignee
中国科学院信息工程研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院信息工程研究所 filed Critical 中国科学院信息工程研究所
Publication of WO2021077642A1 publication Critical patent/WO2021077642A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • This application relates to the field of computer technology, and in particular to a cyberspace security threat detection method and system based on heterogeneous graph embedding.
  • the system is always in the risk of cyberspace security threats.
  • the main cyberspace security threats are manifested in two aspects. One is an insider attack threat, and the other is an advanced persistent threat (APT attack).
  • APT attack advanced persistent threat
  • Insider attack threats usually come from internal malicious employees who have legal access to information systems and threats to use access rights to destroy the confidentiality, integrity, or availability of information systems.
  • the threat subject of an APT attack usually first penetrates into the host in the target network and steals legitimate accounts and permissions, and then, based on this, invades more hosts and steals confidential information through the internal interconnection network covertly and continuously.
  • sequence-based threat detection method by modeling user behaviors and discovering abnormal behaviors.
  • the various operations of the user ie data items
  • sequence analysis techniques such as deep neural networks, are used to learn sequence patterns from historical events And predict the next event. If the actual event and the predicted event have a large deviation, it is considered an abnormal event.
  • this type of method recognizes and models the normal behavior patterns of users, and judges user behaviors that deviate from the normal behavior patterns as abnormal behaviors.
  • this type of detection method ignores other useful relationships between data items. For example, the overall comparison of user behavior within a unit time window (such as a day or a week) is a common method for insider threat detection.
  • This type of method is based on the premise Yes, the user's behavior within a unit time window has a relatively stable regularity within a certain period of time.
  • the above-mentioned sequence-based threat detection method ignores this important relationship, which leads to unsatisfactory detection performance.
  • this type of method also requires known normal behavior data, and even a large amount of labeled log data to train the model. However, in real scenarios, aggressive behavior is very rare compared to normal behavior, which limits the ability of such methods to accurately predict behavior.
  • today's threat detection technologies for APT attacks mainly include: a threat detection method based on login structure diagrams in which abnormal hosts are discovered by analyzing the login behavior of entities.
  • this method can usually analyze specific interactions between hosts, it cannot detect the aforementioned internal attacks involving many other operations (such as file operations, website browsing).
  • the suspicious hosts discovered by such methods inevitably contain many normal behaviors and operations at the same time, which requires a lot of manual corrections later.
  • the embodiments of the present application provide a cyberspace security threat detection method and system based on heterogeneous graph embedding to solve the defects of the prior art that the cyberspace security threat detection object is single, low precision, and excessively dependent on detection sample training. , To achieve the purpose of effective detection of cyberspace security threats.
  • an embodiment of the present application provides a cyberspace security threat detection method based on heterogeneous graph embedding, including: obtaining entity behavior data; associating all data items in the entity behavior data according to the meta-attribute association relationship to obtain the data Item sequence, and construct a heterogeneous graph based on the data item sequence; based on the graph embedding learning method, each node in the heterogeneous graph is converted into a low-dimensional vector, and the vectorized expression of each node is obtained; Analyze and process to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior.
  • the above-mentioned association of all data items in the entity behavior data according to the meta-attribute association relationship obtains the data item sequence, and constructs a heterogeneous graph based on the data item sequence, including:
  • the above-mentioned associating data items in each type of entity behavior data according to the meta-attribute association relationship includes: according to the causal relationship and sequence relationship of the entity behavior in the unit time window between each meta-attribute, and the unit time window.
  • One or more of the similarity logic relationship between entity behaviors and the similarity logic relationship between operation objects associates data items in each type of entity behavior data.
  • the foregoing setting of a plurality of the meta-attributes includes: setting at least two of the data subject, the operation object, the operation type, the operation time, and the object host as meta-attributes.
  • the meta-attribute association relationship before associating the data items in each type of entity behavior data according to the meta-attribute association relationship, it also includes: determining the importance of each meta-attribute association relationship according to cyberspace security threat scenarios, and according to the importance of The size determines the degree to which all data items in the entity behavior data are associated.
  • the above-mentioned graph embedding learning method converting each node in the heterogeneous graph into a low-dimensional vector, and obtaining the vectorized expression of each node, includes: based on a random walking graph traversal algorithm, according to the heterogeneous graph The weight and type of each edge of, determine the node sequence of each node; based on the word2vec algorithm, calculate the vectorized expression of each node according to the node sequence of each node.
  • the above-mentioned classification method is used to analyze and process the characteristics of the vectorized expression to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior, including:
  • the vectorized expression is analyzed based on the anomaly detection algorithm. If an abnormal vectorized expression is found, the corresponding data item is a malicious behavior.
  • the aforementioned vectorized expression is analyzed based on anomaly detection, and an abnormal vectorized expression is found, then the corresponding data item is the malicious behavior, including: if the vectorized expression does not belong to The expected classification is abnormal; or, if the vectorized expression does not belong to any cluster cluster or does not belong to the expected distribution, it is abnormal; or, if the number of items in the cluster cluster to which the vectorized expression belongs is less than Abnormal threshold, all the vectorized expressions in the cluster are abnormal; or, if the number of vectorized expressions contained in the distribution to which the vectorized expression belongs is less than the abnormal threshold, then all the vectorized expressions in the distribution Vectorized expression is abnormal.
  • the aforementioned entity behavior data includes user behavior data and software behavior data.
  • an embodiment of the present application provides a cyberspace security threat detection system based on heterogeneous graph embedding, including: entity behavior data reading unit, heterogeneous graph construction unit, graph embedding unit, and detection operation unit, wherein: entity The behavior data reading unit is configured to obtain entity behavior data; the heterogeneous graph construction unit is configured to associate all the data items in the entity behavior data according to the meta-attribute association relationship, obtain the data item sequence, and construct the heterogeneous graph based on the data item sequence.
  • the graph embedding unit is configured to convert each node in the heterogeneous graph into a low-dimensional vector based on the graph embedding learning method to obtain the vectorized expression of each node;
  • the detection operation unit is configured to vectorize the vector The characteristics of the expression are analyzed and processed to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program as described in the first aspect. Steps of a cyberspace security threat detection method based on heterogeneous graph embedding.
  • an embodiment of the present application provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program When executed by a processor, it implements the network based on heterogeneous graph embedding as described in the first aspect above. The steps of the space security threat detection method.
  • the cyberspace security threat detection method and system based on heterogeneous graph embedding provided by the embodiments of the present application, through the establishment of a heterogeneous graph for threat detection, streamlines and vectorizes entity behavior data items, and provides data for cyberspace security.
  • Item-level threat detection does not require manual correction and labeled behavior data as training samples, which effectively improves the accuracy and comprehensiveness of detection.
  • Figure 1 is a schematic diagram of a sequence-based threat detection method in the prior art provided by this application and a schematic diagram of cyberspace security threat detection based on a login behavior structure diagram;
  • Figure 2 is a schematic flow diagram of a cyberspace security threat detection method based on heterogeneous graph embedding provided by this application;
  • FIG. 3 is a schematic flow diagram of another cyberspace security threat detection method based on heterogeneous graph embedding provided by the implementation of this application;
  • FIG. 4 is a schematic structural diagram of a cyberspace security threat detection system based on heterogeneous graph embedding provided by an embodiment of the application;
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • Figure 1 is a schematic diagram of the prior art sequence-based threat detection method and the cyberspace security threat detection based on the login behavior structure diagram provided by the implementation of this application.
  • Figure 1(a) shows the acquired entity for any three days Behavioral data, among which for the convenience of presentation, only the log data is shown in the figure;
  • Figure 1(b) shows the attribute field of the data item in Figure 1(a);
  • Figure 1(c) is the sequence-based data in the prior art Threat detection method;
  • Figure 1(d) is a schematic flow diagram of a cyberspace security threat detection method based on a login behavior structure diagram in the prior art.
  • each data item in Figure 1(a) is coded and arranged into a sequence in chronological order. Then use deep learning network models, such as long and short-term memory models (LSTM), to learn patterns and rules between events from past events, and predict the next events.
  • LSTM long and short-term memory models
  • This method is mainly based on learning from previous cyber threat events, that is, analyzing and remembering the causal relationship and sequence relationship between the data items in the previous cyber threat events, so as to threaten the current time of occurrence. judgment. Therefore, the sequence-based threat detection method only performs threat detection based on the causal relationship and sequence relationship between the data items, and ignores the valuable association relationships of the data items in other aspects, and cannot obtain high-precision detection results.
  • sequence-based threat detection methods only considers the causal relationship and sequence relationship between various data items, but does not take into account the interaction relationship between hosts, so it cannot be applied to advanced persistent threats (APT attacks). Detection, resulting in a single detection performance.
  • the traditional detection method based on machine learning adopts the modeling of the user's behavior in a certain unit time window (such as one day, one week), and outputs the specific time period containing the user's suspicious behavior.
  • these methods are coarse-grained detection methods because the time period given by the detection results inevitably includes a large number of normal operations.
  • Figure 1(d) is a schematic flow diagram of a cyberspace security threat detection method based on a login behavior structure diagram in the prior art.
  • a method for cyberspace security threat detection based on a login behavior structure diagram is passed Analyze this kind of interaction between hosts and find abnormal login behaviors to detect APT attacks. For example, an administrator can log in to a group of hosts for system maintenance on a regular basis, while ordinary users can only access the hosts that they have access rights to. If it is the login behavior of the APT attack, the number of hosts involved in the APT attack is usually different from the normal login behavior, and this abnormal login can be captured based on the login trace data.
  • the intruded host can be identified based on the abnormal login behavior, and then the operation records (data items) on the intruded host can be manually extracted to analyze whether it suffers from network security threats.
  • the identified data items of the compromised host often also contain many normal operations, resulting in low detection accuracy.
  • manual extraction of specific domain features is not suitable for the insider attack threat shown in Figure 1(c).
  • an embodiment of the present application provides a cyberspace security threat detection method based on heterogeneous graph embedding, as shown in FIG. 2, including but not limited to the following steps:
  • Step S21 Obtain entity behavior data
  • Step S22 Associate all the data items in the entity behavior data according to the meta-attribute association relationship, obtain a data item sequence, and construct a heterogeneous graph based on the data item sequence;
  • Step S23 Based on the graph embedding learning method, each node in the heterogeneous graph is converted into a low-dimensional vector, and the vectorized expression of each node is obtained;
  • Step S34 Analyze and process the characteristics of the vectorized expression to determine whether the data item corresponding to each vectorized expression is a malicious operation behavior.
  • the method for obtaining entity behavior data may be to obtain entity behavior data on each monitored host in real time to achieve real-time monitoring; it may also be based on regular collection of data on each monitored host. Entity behavior data in order to achieve post-mortem detection.
  • each entity behavior data is composed of multiple data items (the data item refers to any data item in the entity behavior data), and each data item must pass through multiple meta-attributes.
  • the relationship is described.
  • step S22 of the embodiment of the present application first by defining a plurality of meta-attributes, and based on the association relationship between the meta-attributes, all the data items in the entity behavior data to be analyzed are associated to form a data item sequence.
  • the association relationship between the meta-attributes can include causality, order relationship, logical relationship, etc.; it needs to be explained that while the data item sequence is formed, it cannot be ignored that there must be many different meta-attributes.
  • Association relationship for example, when the meta attribute is set to the object host, the data subject contains multiple different object hosts, and there must be different association relationships between each object host.
  • association relationship between the meta-attributes and the association relationship between the meta-attributes themselves are integrated to create, and the constructed data item sequence is further mapped to construct Heterogeneous graph.
  • Figure 1 (a).
  • the entity behavior data it can be known that in day2, the administrator logged in to his computer, then remotely logged in to the server and opened a folder to view the status of the system.
  • the meta attributes can be set to the two attributes of time and subject, and according to the association relationship between the two attributes, the data items of the same user are associated in the order of time to obtain the operation data sequence.
  • the operation data sequence is converted into a part of the heterogeneous graph (or called heterogeneous graph subgraph), and then each heterogeneous graph subgraph is connected to form a heterogeneous graph according to the association relationship between the operation data sequences.
  • the graph embedding learning method provided in this embodiment may be an icon representation method based on machine learning, which is mainly used to integrate the nodes (that is, each of the nodes in the heterogeneous graph constructed in step S22). Data item) is converted into a low-dimensional vector to obtain the vectorized expression of each node.
  • heterogeneous graphs refer to different forms of nodes in the graph, and there are many different relationships between nodes in the graph. form.
  • the main methods are as follows: first, by mapping heterogeneous graphs to isomorphic graphs; second, using different types of encoding for different types of junction points, and third, using specific types of parameters to extend the pair of decoders . Fourth, the use is through the expansion of random walks.
  • the above-mentioned heterogeneous graph may be transformed based on random walks (walking operator) to obtain the vectorized expression of each node.
  • step S24 in the cyberspace security threat detection method based on heterogeneous graph embedding provided by the embodiment of the present application, the vectorized expression is analyzed and processed to determine the corresponding vectorized expression. Whether the data item is a malicious operation.
  • the above analysis and processing method can be based on an unsupervised analysis method, such as a clustering algorithm, that is, the above vectorized expression is first divided into different clusters, because each vectorized expression corresponds to each operation data item . Therefore, after clustering, all operating data items can be divided into multiple different clusters. Finally, through the threat judgment of each clustering cluster, the cyberspace security threat detection based on heterogeneous graph embedding is completed.
  • an unsupervised analysis method such as a clustering algorithm
  • the above classification method can also be a method based on supervised classification processing, such as using a deep learning model for analysis, that is, after training the learning model with a trained classification label, after inputting any vectorized expression into the learning model, The score corresponding to the vectorized expression is obtained. Then, a judgment threshold is set, and if the score is lower (or higher) than the above threshold, it is judged that the data item corresponding to the vectorized expression is a malicious operation behavior.
  • the method of analyzing and processing the vectorized expression is not specifically limited.
  • the cyberspace security threat detection method based on heterogeneous graph embedding provided by the embodiments of the present application, through the establishment of a heterogeneous graph for threat detection, streamlines and vectorizes the representation of entity behavior data items, and provides data item-level cyberspace security Threat detection does not require manual correction and labeled behavior data as training samples, which effectively improves the accuracy and comprehensiveness of detection.
  • the foregoing associates all data items in the entity behavior data according to the meta-attribute association relationship, obtains a data item sequence, and constructs a heterogeneous graph based on the data item sequence, include:
  • the associating the data items in each entity behavior data according to the meta-attribute association relationship includes: according to the causal relationship and sequence of user operations in a unit time window between each meta-attribute One or more of the relationship, the similarity logic relationship between user operations within a unit time window, and the similarity logic relationship between the operation objects associate the data items in each entity behavior data.
  • the above setting of multiple meta attributes includes: setting at least two of the data subject, operation object, operation type, operation time, and object host as meta attributes.
  • each data item can be summarized into multiple meta attributes (including: subject, object, operation type, time, and time). Any combination in the host) constituted.
  • meta attributes including: subject, object, operation type, time, and time. Any combination in the host) constituted.
  • the node can then completely map the association relationship between the various data items to the heterogeneous graph.
  • rule A can be set to associate all data items of the same user in chronological order by using the two meta attributes of subject and time.
  • rule B can also be set to associate the above-mentioned content by using the three meta-attributes of subject, time, and operation type (such as device access). At this time, since the acquired data item sequence only contains data items with one meta-attribute of the device access, the number of data items in the data item sequence is far less than the number of data items in the data item sequence acquired through Rule A .
  • each node can be expressed differently according to the weight difference.
  • data items can be converted into data item sequences (or sub-pictures) to form a heterogeneous graph.
  • edges in a heterogeneous graph because different associations have different effects in various detection scenarios, edge types can be used instead of weights to distinguish them. That is, each edge type of a heterogeneous graph corresponds to a definition of a certain edge.
  • kind of rules for specific association relationships can be used instead of
  • the cyberspace security threat detection method based on heterogeneous graph embedding provided by the embodiment of the application can effectively reduce the operation data item in the acquired data item sequence by setting multiple meta attributes and according to different combinations of meta attributes. At the same time of quantity, it can accurately reflect the actual association relationship, which effectively improves the efficiency and accuracy of detection.
  • the foregoing graph-based embedding learning method converts each node in a heterogeneous graph into a low-dimensional vector, and obtains the vectorized expression of each node, including: based on random The walking graph traversal algorithm determines the node sequence of each node according to the weight and type of each edge in the heterogeneous graph; based on the word2vec algorithm, calculates the vectorized expression of each node according to the node sequence of each node.
  • the graph-based embedding learning method includes two sub-steps. One is to determine the node sequence of each node according to the weight and type of each edge in the heterogeneous graph; According to the above node sequence, the vectorized expression of the node is calculated.
  • the method of determining the node sequence of each node can be based on the random walk algorithm (random walk), that is, assuming that a certain walking operator is located on a certain node in the graph, the operator will be based on the weight of each edge And the type determines the node to be visited next.
  • the path generated by this operator that is, the sequence of nodes, is regarded as the context of the nodes on this path. For example, in Figure 1(a), when the walking operator is located in the data item sequence containing device access on day 1 or day 2, it is less likely to choose the data item sequence containing device access on day 3.
  • the node in is used as the next node, because the sequence on the 3rd day has a smaller correlation weight with the sequence on the other two days.
  • the word2vec model can be used to calculate the vectorized expression of each node with a path after calculating the vectorized expression of the node.
  • the data items on the first day and the second day are located in the same path, so they share similar vectorized expressions, and the third day
  • the data items of are expressed as vectors with larger differences.
  • the foregoing analysis and processing of the vectorized expression features to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior includes: vectorization based on anomaly detection The expression is analyzed, and an abnormal vectorized expression is found, and the corresponding data item is a malicious behavior.
  • the said vectorized expression is analyzed based on abnormality detection, and an abnormal vectorized expression is found, then the corresponding data item is the malicious behavior, including:
  • the vectorized expression does not belong to any cluster or does not belong to the expected distribution, it is abnormal;
  • the above-mentioned method for analyzing vectorized expression based on anomaly detection may also be: clustering the vectorized expression based on a clustering algorithm according to the characteristics of each vectorized expression, obtaining multiple clusters, and judging the clusters Whether there is a malicious operation behavior class; if there is a malicious operation behavior class, the data item corresponding to each vectorized expression in the malicious operation behavior class is a malicious operation behavior.
  • the above-mentioned clustering algorithm may be the SVC vector clustering method.
  • determining whether there is a malicious operation behavior class in the clustering cluster to complete the cyberspace security threat detection includes but not limited to the following steps:
  • Clustering clusters is a class of malicious operation behaviors.
  • a threat judgment threshold can be set according to different requirements of detection accuracy, which is used to compare with the number of items contained in each cluster cluster. When the number of items in the cluster cluster is less than the threat judgment threshold, then It is judged that there is no malicious operation behavior class, that is, it is judged that the current network is safe.
  • the cluster cluster is a malicious operation behavior class.
  • the entity behavior data may include user behavior data and software process data, and may also include other operation data.
  • the software process data can be a process log, which mainly includes: system calls (such as the creation and cancellation of sub-processes or threads), various access operations to files, and inter-process communication.
  • the user behavior data may be data generated by the user's operation of the software.
  • the entity behavior data may also include the login of Alipay and other payment software, the input of passwords, and the reading of consumption records; for communications such as WeChat, QQ, etc. The reading and downloading of friend resources of the software, etc., are not specifically limited in this embodiment on how to obtain the aforementioned entity behavior data and the specific content of the entity behavior data.
  • the embodiment of the present application also provides a cyberspace security threat detection system based on heterogeneous graph embedding, as shown in FIG. 4, including but not limited to:
  • the entity behavior data reading unit 41 the heterogeneous graph construction unit 42, the graph embedding unit 43, and the detection operation unit 44, wherein:
  • the entity behavior data reading unit 41 is configured to obtain entity behavior data
  • the heterogeneous graph construction unit 42 is configured to associate all data items in the entity behavior data according to the meta-attribute association relationship, obtain a data item sequence, and construct a heterogeneous graph based on the data item sequence;
  • the graph embedding unit 43 is configured to convert each node in the heterogeneous graph into a low-dimensional vector based on the graph embedding learning method, and obtain the vectorized expression of each node;
  • the detection operation unit 44 is configured to analyze and process the features of the vectorized expression based on a classification method to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior.
  • the cyberspace security threat detection system based on heterogeneous graph embedding provided by the embodiments of the present application, through the establishment of a heterogeneous graph for threat detection, streamlines and vectorizes entity behavior data items, and provides data item-level cyberspace security Threat detection does not require manual correction and labeled operation data as training samples, which effectively improves the accuracy and comprehensiveness of detection.
  • the embodiment of this application uses two sets of data sets.
  • One set of synthetic data sets is the internal threat test data set (corresponding to the internal attack threat) of the CERT Center of Carnegie Mellon University in the United States, and the other set of real data sets is Comprehensive cyberspace security incident data set (corresponding to advanced persistent threats) from Los Alamos National Laboratory (LANL) in the United States.
  • LTL Los Alamos National Laboratory
  • the CERT data set is a comprehensive data set that contains complete user behavior records and attack scenarios, and the latest version r6.2 of the data set is used in this embodiment.
  • the data set contains a total of 135,117,169 operations performed by 4,000 users in 516 days.
  • the data set contains 5 attack scenarios and 470 malicious operations by 6 malicious users.
  • This data set shows the extreme data imbalance that is common in insider threat detection.
  • the above five types of insider threat scenarios are used to evaluate whether log2vec can determine the importance of each edge type according to different scenarios, and extract and express these association relationships differently.
  • the LANL data set contains more than 1 billion log data collected on 12,425 users and 17,684 computers in the LANL internal network for 58 days. It contains a typical APT attack scenario, that is, 749 malicious host logins using 98 stolen accounts. We used two data files about identity authentication and process to verify Log2vec's malicious operation detection effect. This data set can be used to evaluate whether log2vec can detect APT attack scenarios.
  • the combination of the above two data sets can be used to prove the effectiveness of log2vec in detecting user malicious operations (including insider threats and APT attacks), and can cover various attack scenarios.
  • TIRESIAS needs to use pre-marked security events for training, but the CERT data set and the LANL data set are both unbalanced data sets. As shown in Table 2, some users only performed 22, 18, or even 4 malicious operations. , So there is a lack of sufficient training samples for malicious operations.
  • Deep learning methods are different from TIRESIAS and DeepLog.
  • TIRESIAS takes a sequence of data items arranged in chronological order as input
  • LSTM uses statistical features extracted from daily log data to form the input sequence.
  • DNN and LSTM are inferior to log2vec in the detection performance of data item granularity, their detection effect is worse than TIRESIAS and DeepLog because they consider more correlations (for example, the similarity logic relationship between cross-day sequences).
  • Hidden Markov models (markov-s and markov-c) are designed to identify suspicious dates when malicious events occur.
  • STREAMSPOT is designed to detect malicious information flow graphs. Table 1 shows that these methods can not achieve the detection effect of log2vec.
  • Metapath2vec and node2vec are high-level graph embedding models. Since they do not include composition and detection algorithms, we use the same composition and detection methods as log2vec. Node2vec is designed to process isomorphic graphs, so the detection effect is poor. Metapath2vec can handle heterogeneous graphs. In fact, the main difference between metapath2vec and log2vec in graph embedding is that log2vec has the ability to adjust the proportion of edge types, while metapath2vec does not support it. If the percentages of all types of edge types are the same by default, the effects of these two methods are similar. However, insider threat detection requires that the proportions of different edge types should be different, so log2vec can achieve better detection performance.
  • Log2vec-Euclidean and log2vec-cosine use k-means with Euclidean distance and cosine to detect malicious events. However, their performance is not ideal.
  • Fig. 5 illustrates a schematic diagram of the physical structure of a server.
  • the server may include: a processor 510, a communications interface 520, a memory 530, and a communications bus 540, where, The processor 510, the communication interface 520, and the memory 530 communicate with each other through the communication bus 540.
  • the processor 510 may call the logic instructions in the memory 530 to perform the following methods, including: obtaining entity behavior data; associating all data items in the entity behavior data according to the meta-attribute association relationship, obtaining the data item sequence, and based on the data item Sequence construction of heterogeneous graphs; based on the graph embedding learning method, each node in the heterogeneous graph is converted into a low-dimensional vector, and the vectorized expression of each node is obtained; used to perform the feature of the vectorized expression based on the classification method Analyze and process to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior.
  • the above-mentioned logical instructions in the memory 530 can be implemented in the form of a software functional unit and when sold or used as an independent product, they can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .
  • the embodiments of the present application also provide a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is implemented when executed by a processor to perform the transmission method provided in the foregoing embodiments, for example, including: acquiring entity behavior Data; associate all the data items in the entity behavior data according to the meta-attribute association relationship, obtain the data item sequence, and construct the heterogeneous graph based on the data item sequence; convert each node in the heterogeneous graph based on the graph embedding learning method Convert into a low-dimensional vector to obtain the vectorized expression of each node; used to analyze and process the characteristics of the vectorized expression based on the classification method to determine whether the data item corresponding to the vectorized expression is a malicious operation behavior .
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.
  • each implementation manner can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Discrete Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请实施例提供一种基于异构图嵌入的网络空间安全威胁检测方法及系统,包括:获取实体行为数据;根据元属性关联关系对所述实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意行为。本实施例通过建立用于威胁检测的异构图,精简并向量化表示实体行为数据项,提供的针对网络空间安全的数据项级威胁检测,无需后期人工修正以及有标签的数据项作为训练样本,有效的提高了检测的精度和检测的可行性。

Description

一种基于异构图嵌入的网络空间安全威胁检测方法及系统
交叉引用
本申请引用于2019年10月24日提交的专利名称为“一种基于异构图嵌入的网络空间安全威胁检测方法及系统”的第2019110196209号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及计算机技术领域,尤其涉及一种基于异构图嵌入的网络空间安全威胁检测方法及系统。
背景技术
现代信息系统对于当今的企业和组织而言,已经成为重要且不可替代的组成部分。而该系统始终处于网络空间安全威胁的风险中,其中主要的网络空间安全威胁表现在两个方面,其一为内部攻击威胁,另一种是高级持续威胁(APT攻击)。
内部攻击威胁通常来自于内部恶意雇员,其具有合法访问信息系统的权限,并具有利用访问权限以破坏信息系统的机密性、完整性或可用性的威胁。APT攻击的威胁主体通常先渗透进入目标网络中的主机并窃取合法账号和权限,然后以此为基础,通过内部互联网络隐蔽且持续地入侵更多的主机并窃取机密信息。这两种攻击被认为是现代企业的主要安全威胁。然而,这两种攻击的检测和发现技术存在一定差异。
其中,对于内部攻击威胁的检测和识别一般是通过基于序列的威胁检测方法,通过对用户行为进行建模并以此发现异常行为。通常会将用户的各种操作(即数据项)转换为序列,这些序列基于各数据项之间的时序关系或因果关系,然后使用序列分析技术,例如深度神经网络,从历史事件中学习序列模式并预测接下来的事件,如果实际发生的事件与预测发生的事件存在较大偏离,则认为是异常事件。
本质上,这类方法对用户正常行为模式进行识别和建模,并将偏离正常行为模式的用户行为判断为异常行为。然而,这类检测方法忽略了数据 项之间的其它有用关系,例如:整体比较用户单位时间窗口(如一天、一周)内的行为是内部威胁检测的一种常用方法,这类方法基于的前提是,用户在单位时间窗口内的行为在一定时期内具有相对稳定的规律性。而上述基于序列的威胁检测方法忽略了这种重要关系,因此导致检测性能不够理想。此外,这类方法还要求已知的正常行为数据,甚至需要大量有标签日志数据来训练模型。但在现实场景中,攻击行为相对正常行为非常罕见,因此限制了这类方法准确进行行为预测的能力。
另外,现今对于APT攻击的威胁检测技术主要包括:在通过分析实体登录行为来发现异常主机的基于登录结构图的威胁检测方法。虽然该方法通常能够分析主机之间的特定交互关系,然而无法检测前面提到的涉及许多其它操作(例如文件操作、网站浏览)的内部攻击。此外,这类方法发现的可疑主机,不可避免地同时包含许多正常行为和操作,从而需要大量的后期人工修正。
综上所述,现阶段对于网络空间安全威胁的检测面临以下三个问题:
1)如何同时检测内部攻击威胁以及APT攻击;
2)如何细粒度的检测APT攻击,尤其是深入挖掘和分析主机数据项之间的关联关系;
3)如何不依赖攻击样本训练实现威胁检测。
发明内容
本申请实施例提供了一种基于异构图嵌入的网络空间安全威胁检测方法及系统,用以解决现有技术中存在的网络空间安全威胁检测对象单一、精度低以及过度依赖检测样本训练的缺陷,实现对网络空间安全威胁的有效检测的目的。
第一方面,本申请实施例提供一种基于异构图嵌入的网络空间安全威胁检测方法,包括:获取实体行为数据;根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;对向量化表达的特征进行分析处理,以判断向量化表达所对应的数据项是否为恶意操作行为。
进一步地,上述根据元属性关联关系对实体行为数据中所有的数据项 进行关联,获取数据项序列,并基于数据项序列构建异构图,包括:
设定多个元属性,根据元属性关联关系,对每类实体行为数据中的数据项进行关联,获取数据项序列;以每个数据项为节点,以数据项序列为边类型映射构建异构图。
进一步地,上述根据元属性关联关系,对每类实体行为数据中的数据项进行关联,包括:根据每个元属性之间的单位时间窗口内实体行为的因果关系和顺序关系、单位时间窗口内实体行为之间的相似性逻辑关系、操作对象之间的相似性逻辑关系中的一个或多个对每类实体行为数据中的数据项进行关联。
进一步地,上述设定多个所述元属性,包括:设定数据主体、操作对象、操作类型、操作时间以及对象主机中的至少两个为元属性。
进一步地,在根据元属性关联关系,对每类实体行为数据中的数据项进行关联之前,还包括:根据网络空间安全威胁场景,确定每个元属性关联关系的重要性,并根据重要性的大小,确定对实体行为数据中所有的数据项进行关联的程度。
进一步地,上述基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达,包括:基于随机行走图遍历算法,根据所述异构图中的每条边的权重和类型,确定每个节点的节点序列;基于word2vec算法,根据每个节点的节点序列,计算每个节点的向量化表达。
进一步地,上述基于分类方法对向量化表达的特征进行分析处理,以判断向量化表达所对应的数据项是否为恶意操作行为,包括:
根据每个向量化表达的特征,基于异常检测算法对向量化表达进行分析,发现异常的向量化表达,则其所对应的数据项为恶意行为。
进一步地,上述基于异常检测对所述向量化表达进行分析,发现异常的所述向量化表达,则其所对应的所述数据项为所述恶意行为,包括:若所述向量化表达不属于期望的分类,则为异常;或者,若所述向量化表达不属于任何聚类集群或不属于期望的分布,则为异常;或者,若所述向量化表达所属的聚类集群的项数目小于异常阈值,则所述聚类集群中所有所述向量化表达为异常;或者,若所述向量化表达所属的分布包含的所述向 量化表达数目小于异常阈值,则所述分布中所有所述向量化表达为异常。
进一步地,上述实体行为数据包括用户行为数据以及软件行为数据。
第二方面,本申请实施例提供一种基于异构图嵌入的网络空间安全威胁检测系统,包括:实体行为数据读取单元、异构图构建单元、图嵌入单元以及检测运算单元,其中:实体行为数据读取单元被配置为获取实体行为数据;异构图构建单元被配置为根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;图嵌入单元被配置为基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;检测运算单元被配置为对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
第三方面,本申请实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中:处理器执行程序时实现如上述第一方面记载的基于异构图嵌入的网络空间安全威胁检测方法的步骤。
第四方面,本申请实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述第一方面记载的基于异构图嵌入的网络空间安全威胁检测方法的步骤。
本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测方法及系统,通过建立用于威胁检测的异构图,精简并向量化表示实体行为数据项,提供的针对网络空间安全的数据项级威胁检测,无需后期人工修正以及有标签的行为数据作为训练样本,有效的提高了检测的精度和检测的全面性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施提供的现有技术中基于序列的威胁检测方法以及基 于登录行为结构图进行网络空间安全威胁检测的示意图;
图2为本申请实施提供的基于异构图嵌入的网络空间安全威胁检测方法的流程示意图;
图3为本申请实施提供的又一基于异构图嵌入的网络空间安全威胁检测方法的流程示意图;
图4为本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测系统的结构示意图;
图5为本申请实施例提供的电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请实施提供的现有技术中基于序列的威胁检测方法以及基于登录行为结构图进行网络空间安全威胁检测的示意图,其中图1(a)中示出了获取的任意三天的实体行为数据,其中为表述方便,在图中仅示意出日志数据;图1(b)表示对图1(a)中数据项的属性字段;图1(c)为现有技术中的基于序列的威胁检测方法;图1(d)为现有技术中基于登录行为结构图进行网络空间安全威胁检测方法的流程示意图。
如图1(c)所示,在基于序列的威胁检测方法中,通过对图1(a)中的每一条数据项进行编码,并按照时间顺序排列为序列。然后利用深度学习网络模型,如长短期记忆模型(LSTM),从过去的事件中学习各事件间的模式以及规律,并对接下来发生的事件进行预测。该方法由于主要是通过对之前发生过的网络威胁事件进行学习,即分析并记忆之前发生过的网络威胁事件中各数据项之间的因果关系以及顺序关系,来对当前发生的时间进行威胁性判断。因此,基于序列的威胁检测方法仅仅是根据数据项之间的因果关系以及顺序关系来进行威胁性检测,而忽略了数据项在其它方面有价值的关联关系,无法获取较高精度的检测结果。
一方面,由于通常情况下,用户每天的行为相对稳定且相似,通过以 直接比较用户各单位时间窗口的行为差异可以分析出是否遭受网络威胁。例如:在图1中,相对于day1和day2,在day3存在大量的设备接入以及文件复制的操作,发生数据泄漏(遭受网络攻击)的可能性会更高。但采用基于序列的威胁检测方法,其中深度网络模型(如LSTM模型),虽然能够分析出,在时间序列上事件之间的长期依赖关系,但是无法根据用户行为宏观上的相似程度(如day1和day2中具没有大量的设备接入以及文件复制的操作)的变化,对当前事件是否为网络攻击进行判断,造成检测精度低。
另一方面,采用基于序列的威胁检测方法仅仅考虑到各数据项之间的因果关系以及顺序关系而未兼顾到主机之间的交互关系,因此也不能适用于对高级持续威胁(APT攻击)的检测,造成检测性能单一。
再一方面,采用基于序列的威胁检测方法中的深度网络模型(或其它模型)需要大量待标签的行为数据来进行预训练,然而,在实际场景中即便是已知的攻击行为也是相当罕见的,从而难以获取有效的训练样本。
最后,传统的基于机器学习的检测方法,采用对用户在某个单位时间窗口内(如一天、一周)的行为进行建模,并输出包含用户可疑行为的具体时间段。但这类方法由于检测结果给出的时间段不可避免地包含了大量正常操作,因此均为粗粒度的检测方法。
图1(d)为现有技术中基于登录行为结构图进行网络空间安全威胁检测方法的流程示意图,如图1(d)所示,基于登录行为结构图进行网络空间安全威胁检测的方法,通过分析这种主机间的交互关系,发现异常登录行为,来检测APT攻击。例如,管理员可以定期的登录到一组主机上进行系统维护,而普通用户则只能访问其有访问权限的主机。如果是APT攻击的登录行为,则该APT攻击涉及的主机数量通常不同于正常登录行为,可以根据登录踪迹数据能够捕获这种异常登录。基于上述原因,能够基于登录行为异常,识别出被入侵的主机,再根据人工提取出被入侵的主机上的操作记录(数据项),分析出是否遭受网络安全威胁。然而识别出的被入侵的主机的数据项中,往往也包含有许多正常的操作,从而造成检测精度低。并且人工提取特定领域特征也不适用于图1(c)所示的内部攻击威胁。
为了克服现有技术中对于网络空间安全威胁检测的诸多缺陷,本申请实施例提供一种基于异构图嵌入的网络空间安全威胁检测方法,如图2所示,包括但不限于以下步骤:
步骤S21:获取实体行为数据;
步骤S22:根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;
步骤S23:基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;
步骤S34:对上述向量化表达的特征进行分析处理,以判断每个向量化表达所对应的数据项是否为恶意操作行为。
具体地,在本申请实施例步骤S21中,获取实体行为数据的方法可以是实时获取到各被监控主机上的实体行为数据,以实现实时监控;也可以是根据定期的收集各被监控主机上的实体行为数据,以实现事后检测。在本申请实施例中,不对如何获取到实体行为数据做出具体地限定。
进一步地,每个实体行为数据是由多个数据项组成的(其中所述的数据项是指实体行为数据中的任一条数据条目),而每个数据项必然通过多个元属性之间的关联关系进行描述的。在本申请实施例步骤S22中,首先通过定义出多个元属性,并基于各元属性之间的关联关系,对待分析的实体行为数据中所有的数据项进行关联,组建成数据项序列。其中各元属性之间的关联关系可以包括因果关系、顺序关系、逻辑关系等;需要说明的是在组建成数据项序列的同时,不可忽视的是,各元属性之间也必然存在着诸多的关联关系;例如当元属性设置为对象主机时,该数据主体则包含着多个不同的对象主机,而各个对象主机之间也必然存在着不同的关联关系。在本申请实施例中,在建立数据项序列时,是综合各元属性之间的关联关系以及各元属性本身的关联关系进行创建的,并进一步的将构建的数据项序列进行映射以构建出异构图。
例如:将图1(a)中。根据实体行为数据可以获知,在day2中,管理员登录到自己的计算机,然后远程登录到服务器并打开了了一个文件夹,以查看系统的状态。基于本实施例的方法,由于可以将元属性设置为时间以及主体两个属性,并根据两个属性之间的关联关系,将同一用户的 数据项按照时间的顺序进行关联,获取操作数据序列,最后将操作数据序列转换成异构图的一部分(或称作异构图子图),再根据各操作数据序列之间的关联关系将每个异构图子图相连接组建成异构图。
进一步地,在步骤S23中,本实施例中所提供的图嵌入学习方法可以是一种基于机器学习的图标表示方法,主要用于将步骤S22中所构建的异构图中的节点(即各数据项)转换成低维向量,以获取每个节点的向量化表达。
在对于同构图的网络表示有多种方法,不过异构图在实际运用中更为广泛,异构图指的是图中的节点有不同的形式,图中节点之间的关系也有多种不同的形式。主要以下几种方法:其一,通过将异构图映射到同构图;其二,对不同类型的节转点使用不同类型的编码,其三,用特定类型的参数来扩展成对的解码器。其四,利用是通过对random walks的扩展。其中,在本申请实施例中可以是,基于random walks(行走算子)对上述异构图进行转换,以获取到每个节点的向量化表达。通过上述处理,保留了每个节点与其节点序列中其它节点之间的相近性,使得节点(数据项)及其相邻节点(与之关系密切的数据项)共有相似的嵌入(即向量化表达)。如图1(a)所示,day1(第一天)和day2(第二天)的数据项(设备接入操作)共有相近的向量化表达,而day3(第三天)的数据项则被表达成差异较大的向量。
在步骤S24中,本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测方法中,通过对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
其中,上述分析处理的方法可以是基于无监督的分析方法,例如聚类算法,即先将上述向量化表达划分为不同的聚类集群,由于每个向量化表达与每个操作数据项相对应。因此,经过聚类后即可以将所有的操作数据项划分为多个不同的聚类集群。最后,通过对每一个聚类集群进行威胁判断,完成基于异构图嵌入的网络空间安全威胁检测。
上述分类方法也可以是基于有监督的分类处理的方法,如利用深度学习模型进行分析,即利用训练好的分类标签对学习模型进行训练后,将任意一个向量化表达输入至该学习模型后,获取到与所述向量化表达相对应 的得分。然后,设定一个判断阈值,若得分低于(或高于)上述阈值,则判断与该向量化表达所对应的数据项为恶意操作行为。
需要说明的是在本申请实施例中,在获取到与每个数据项所对应的向量化表达后对于该向量化表达的分析处理的方式是不作具体地限定的。
本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测方法,通过建立用于威胁检测的异构图,精简并向量化表示实体行为数据项,提供的针对数据项级网络空间安全的威胁检测,无需后期人工修正以及有标签的行为数据作为训练样本,有效的提高了检测的精度和检测的全面性。
基于上述实施例的内容,作为一种可选实施例,上述根据元属性关联关系对所述实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图,包括:
设定多个所述元属性,根据元属性关联关系,对每个实体行为数据中的数据项进行关联,获取数据项序列;以每个数据项为节点,以数据项序列为边类型映射构建异构图。
其中,所述根据所述元属性关联关系,对每个所述实体行为数据中的所述数据项进行关联,包括:根据每个元属性之间的单位时间窗口内用户操作的因果关系和顺序关系、单位时间窗口内用户操作之间的相似性逻辑关系、操作对象之间的相似性逻辑关系中的一个或多个对每个实体行为数据中的数据项进行关联。
其中,上述设定多个元属性,包括:设定数据主体、操作对象、操作类型、操作时间以及对象主机中的至少两个为元属性。
具体地,为了更为精确的在异构图中体现出个数据项之间的关联关系,可以将每个数据项归纳整理为由多个元属性(包括:主体、对象、操作类型、时间以及主机中的任意组合)所构成。其次在利用元属性关联关系进行构建异构图时,可以综合考虑各元属性自身的关联关系,以及各元属性之间的关联关系,选取合适的元属性进行组合,以尽量通过关联较少的节点便可以将各数据项之间的关联关系完整的映射至该异构图中。
例如:在图1(a)中,第二天的数据内容为:管理员登录到自己的计算机,然后远程登录到服务器并打开一个文件以查看系统状态。作为一种可选实施例,可以通过设置规则A,利用主体和时间两个元属性,将同一 用户的所有数据项按照时间顺序进行关联。作为另一实施例,也可以设置规则B,通过利用主体、时间以及操作类型(如设备接入)这三个元属性,对上述内容进行关联。此时由于获取到的数据项序列中仅包含设备接入之一元属性的数据项,使得该数据项序列中的数据项数量远少于通过规则A所获取到的数据项序列中的数据项数量。
进一步地,在分别对每个单位时间窗口内的实体行为数据生成一个数据项序列之后,再设置其它的规则根据这些序列的相似性将它们进一步进行关联。由于第3天的序列涉及的设备接入操作,远远超过其它两天的序列,因此该序列与其它两天序列的关联权重较小。而在图嵌入学习中,则可以根据该权重差异,对每个节点进行不同的向量化表达。
进一步地,在本实施例中,根据不同的构图规则,可以将数据项转换为数据项序列(或子图),从而构成一个异构图。其中关于异构图中的边,由于不同的关联关系在各种检测场景下具有不同的作用,因此可以使用边类型而不是权重来区分它们,即异构图的每种边类型对应一条定义某种特定关联关系的规则。
本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测方法,通过设置多个元属性,并根据元属性之间的不同组合,可以有效减少所获取的数据项序列中操作数据项的数量的同时,还能够准确的反映出实际的关联关系,有效的提高了检测的效率和精度。
基于上述实施例的内容,作为一种可选实施例,上述基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达,包括:基于随机行走图遍历算法,根据异构图中的每条边的权重和类型,确定每个节点的节点序列;基于word2vec算法,根据每个节点的节点序列,计算每个节点的向量化表达。
结合图3所示,在本申请实施例中,基于图嵌入学习方法包括两个子步骤,其一是根据异构图中的每条边的权重和类型,确定每个节点的节点序列;其二是根据上述节点序列,计算出节点的向量化表达。
其中,确定每个节点的节点序列的方法可以是基于随机行走图遍历算法(random walk),即假设某个行走算子位于图中的某个节点上,该算子将根据每条边的权重和类型决定接下来要访问的节点。由该算子生成的路 径,即节点序列,被视为这个路径上节点的上下文。例如在图1(a)中,当行走算子位于图中第1天或第2天的包含设备接入的数据项序列时,它较低可能选择第3天包含设备接入的数据项序列中的节点作为接下来的节点,因为第3天的该序列与其它两天的该序列的关联权重较小。同样的,当它在第3天序列的某个节点上时,也不大可能选择第1天或第2天该序列中的节点。因此,在本申请实施例中,要么关联包含第1天或第2天节点的路径创建每个所述节点的节点序列,要么关联单独包含第3天节点的路径创建每个所述节点的节点序列。
进一步地,在计算出节点的向量化表达可以使用word2vec模型,计算每个带有路径的节点的向量化表达。例如:在图1(a)所示的实体行为数据中,第1天和第2天的数据项(含设备接入操作)位于同一路径中,因此共有相近的向量化表达,而第3天的数据项则被表达成差异较大的向量。
基于上述实施例的内容,作为一种可选实施例,上述对向量化表达的特征进行分析处理,以判断向量化表达所对应的数据项是否为恶意操作行为,包括:基于异常检测对向量化表达进行分析,发现异常的所述向量化表达,则其所对应的数据项为恶意行为。
进一步地,所述基于异常检测对所述向量化表达进行分析,发现异常的向量化表达,则其所对应的数据项为所述恶意行为,包括:
若向量化表达不属于期望的分类,则为异常;
或者,若向量化表达不属于任何聚类集群或不属于期望的分布,则为异常;
或者,若向量化表达所属的聚类集群的项数目小于异常阈值,则聚类集群中所有向量化表达为异常;
或者,若向量化表达所属的分布包含的向量化表达数目小于异常阈值,则分布中所有向量化表达为异常。
其中,上述基于异常检测对向量化表达进行分析方法也可以是:根据每个向量化表达的特征,基于聚类算法对向量化表达进行聚类,获取多个聚类集群,判断聚类集群中是否存在恶意操作行为类;若存在恶意操作行为类,则恶意操作行为类中的每个向量化表达所对应的所述数据项为恶意 操作行为。其中,上述聚类算法可以是SVC向量聚类法。作为一种可选实施例,其中判断聚类集群中是否存在恶意操作行为类,以完成网络空间安全威胁检测,包括但不限于以下步骤:
设定威胁判断阈值;若所有聚类集群的项数目均大于威胁判断阈值,则判断聚类集群中不存在恶意操作行为类;若任一聚类集群的项数目小于威胁判断阈值,则判断该聚类集群为恶意操作行为类。
具体地,由于相对于恶意操作,用户正常操作之间存在着相对更密切的关联关系;同样的道理,相对于用户正常操作,恶意操作之间也存在着相对更密切的关联关系,而用户正常操作与恶意操作之间的具有更少的关联关系甚至没有关联关系。因此在本申请实施例中,通过区别性的关联和表达这些关联关系,从而将它们分为不同的聚类。此外,由于恶意操作的数量相对正常操作数量要小很多,因此包含较小的项数目的聚类集群更可能包含恶意操作。
在本申请实施例中,可以根据检测精度的不同要求设置一个威胁判断阈值,用于跟每个聚类集群中包含的项数目进行比较,当聚类集群的项数目小于该威胁判断阈值,则判断不存在恶意操作行为类,即判断当前网络为安全。
进一步地,当某个聚类集群的项数目小于该威胁判断阈值时,则可以判断该聚类集群为恶意操作行为类。
进一步地,在本申请实施例中,实体行为数据可以包括用户行为数据以及软件进程数据,也可以包括其它的操作数据。其中软件进程数据,可以是进程日志,主要包括:系统调用(如子进程或线程的建立和撤销)、对于文件的各种访问操作、进程间的通信等。其中,用户行为数据可以是用户对于软件的操作所产生的数据,比如,该实体行为数据还可以包括对于支付宝等支付软件的登陆、密码的输入、消费记录的读取;对于微信、QQ等通信软件的好友资源的读取及下载等等,在本实施例中不对如何获取上述实体行为数据、以及所述实体行为数据的具体内容作出具体地限定。
本申请实施例还提供了一种基于异构图嵌入的网络空间安全威胁检测系统,如图4所示,包括但不限于:
实体行为数据读取单元41、异构图构建单元42、图嵌入单元43以及检测运算单元44,其中:
实体行为数据读取单元41被配置为获取实体行为数据;
异构图构建单元42被配置为根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;
图嵌入单元43被配置为基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;
检测运算单元44被配置为基于分类方法对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测系统,通过建立用于威胁检测的异构图,精简并向量化表示实体行为数据项,提供的针对数据项级网络空间安全的威胁检测,无需后期人工修正以及有标签的操作数据作为训练样本,有效的提高了检测的精度和检测的全面性。
给了更充分的展示本申请实施例提供的基于异构图嵌入的网络空间安全威胁检测方法及系统,在实际检测过程中的先进性,以下通过两个不同类型的数据集分别对其进行验证,具体内容为:
一、关于数据集
在验证过程中,本申请实施例使用了两套数据集,一套合成数据集是美国卡耐基梅隆大学CERT中心的内部威胁测试数据集(对应于内部攻击威胁),另一套真实数据集是美国洛斯阿拉莫斯国家实验室(LANL)的综合网络空间安全事件数据集(对应于高级持续威胁)。
其中,CERT数据集是一套全面的数据集,它包含完整的用户行为记录和攻击场景,且在本实施例中使用的是该数据集的最新版本r6.2。我们使用了五个数据文件,分别记录了用户登录操作、移动存储设备使用操作、文件操作、网络操作和电子邮件流量,以及另一个记录用户角色及其隶属部门的文件。该数据集总共包含4,000个用户在516天中的135,117,169个操作。在该数据集中包含5种攻击场景,6个恶意用户的470次恶意操作。此数据集显示了内部威胁检测中常见的数据极端不平衡问题。上述5类内部威胁场景,被用于评价log2vec是否可以根据不同场景确定每种边 类型的重要性,并区别性地提取和表达这些关联关系。
LANL数据集包含了历时58天在LANL内部网络中12425位用户和17684台计算机上收集的超过10亿条的日志数据。它包含一种典型的APT攻击场景,即利用98个窃取的帐户进行的749次恶意主机登录。我们使用了两个分别关于身份认证和进程的数据文件来验证Log2vec的恶意操作检测效果,该数据集可用于评价log2vec是否能够检测APT攻击场景。
上述两个数据集香结合完全可以被用于证明log2vec检测用户恶意操作(包括内部威胁和APT攻击)的有效性,并能够涵盖各种攻击场景。
二、关于基线方法
在整个验证过程中,在使用CERT数据集的验证上,总共使用了11种基线方法,包括:数据项粒度的异常检测方法TIRESIAS和DeepLog;隐马尔可夫模型(markovs和markov-c)和深度学习模型(DNN和LSTM),上述两个模型是在CERT数据集上目前最先进的技术;检测恶意信息流的高级方法STREAMSPOT;利用node2vec和metapath2vec来对比本实施例提供的系统(log2vec系统)中异构图随机行走效果;利用Log2vec-euclidean和log2vec-cosine证明本系统的聚类方法在解决本实施例中面对的聚类问题方面优于普通的k-means;利用集成检测方法和TIRESIAS来显示log2vec系统在LANL数据集上检测APT攻击的有效性;同时,引入了log2vec的新版本log2vec++,其参数可以根据不同的用户和攻击类型灵活设置。
三、关于实验效果
表1:不同方法的检测效果
Figure PCTCN2020072591-appb-000001
表2:Log2vec分别在CERT数据集上对6个恶意用户和在LANL数据集
上50个攻击者的检测结果
Figure PCTCN2020072591-appb-000002
本实施例通过采用AUC(ROC曲线下的面积)来比较不同方法的实验效果。由表1可知,log2vec的检测效果优于其它基线方法。TIRESIAS和DeepLog是当前最先进的数据项粒度的异常检测方法,但在CERT数据集检测上,它们都只是使用了利用因果关系和顺序关系,而没有考虑其它两类关联关系,即序列之间的相似性逻辑关系和操作对象之间的相似性逻辑关系。因此,不能获得令人满意的检测性能(0.39,0.10)。此外,缺乏足够的恶意操作样本同样会影响它们的检测性能。例如,TIRESIAS需要使用预先标记的安全事件进行训练,但是CERT数据集和LANL数据集都是不平衡数据集,如表2所示,某些用户仅执行了22个、18个甚至4个恶意操作,因此缺乏足够的恶意操作训练样本。
深度学习方法(DNN和LSTM)不同于TIRESIAS和DeepLog。具体来说,TIRESIAS将按时间顺序排列的数据项序列作为输入,而LSTM使用从每天日志数据中提取的统计特征构成输入序列。尽管DNN和LSTM在数据项粒度的检测性能不如log2vec,但由于它们考虑了更多的关联关系(例如,跨天序列之间的相似性逻辑关系),它们的检测效果要好于TIRESIAS和DeepLog。隐马尔可夫模型(markov-s和markov-c)旨在识别发生恶意事件的可疑日期。STREAMSPOT旨在检测恶意信息流图。表1显示这些方法也无法达到log2vec的检测效果。
Metapath2vec和node2vec是高级图嵌入模型。由于它们不包括构图和检测算法,我们使用与log2vec相同的构图和检测方法。Node2vec被设计为处理同构图,因此检测效果较差。Metapath2vec能够处理异构图。实际上,metapath2vec和log2vec在图嵌入方面的主要区别是,log2vec具有调整边类型占比的能力,而metapath2vec却不支持。如果默认各类边类型的占比相同,则这两种方法效果相近。然而,内部威胁检测要求不同边类型的占比应该不同,因此log2vec能够到达更好的检测性能。
Log2vec-Euclidean和log2vec-cosine分别使用具有欧式距离和余弦值的k-means来检测恶意事件。但是,它们的表现并不理想。
图5示例了一种服务器的实体结构示意图,如图5所示,该服务器可以包括:处理器(processor)510、通信接口(Communications Interface)520、存储器(memory)530和通信总线540,其中,处理器510,通信接口520,存储器530通过通信总线540完成相互间的通信。处理器510可以调用存储器530中的逻辑指令,以执行如下方法,包括:获取实体行为数据;根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;用于基于分类方法对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
此外,上述的存储器530中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的传输方法,例如包括:获取实体行为数据;根据元属性关联关系对实体行为数据中所有的数据项进行关联,获取数据项序列,并基于数据项序列构建异构图;基于图嵌入学习方法,将异构图中的每个节点转换成低维向量,获取每个节点的向量化表达;用于基于分类方法对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说 明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (12)

  1. 一种基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,包括:
    获取实体行为数据;
    根据元属性关联关系对所述实体行为数据中所有的数据项进行关联,获取数据项序列,并基于所述数据项序列构建异构图;
    基于图嵌入学习方法,将所述异构图中的每个节点转换成低维向量,获取每个所述节点的向量化表达;
    对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意行为。
  2. 根据权利要求1所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述根据元属性关联关系对所述实体行为数据中所有的数据项进行关联,获取数据项序列,并基于所述数据项序列构建异构图,包括:
    设定多个所述元属性,根据所述元属性关联关系,对每类所述实体行为数据中的所述数据项进行关联,获取所述数据项序列;
    以每个所述数据项为节点,以所述数据项序列为边类型映射构建所述异构图。
  3. 根据权利要求2所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述根据所述元属性关联关系,对每类所述实体行为数据中的所述数据项进行关联,包括:
    根据每个元属性之间的单位时间窗口内实体行为的因果关系和顺序关系、单位时间窗口内实体行为之间的相似性逻辑关系、操作对象之间的相似性逻辑关系中的一个或多个对每类所述实体行为数据中的所述数据项进行关联。
  4. 根据权利要求2所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述设定多个所述元属性,包括:
    设定数据主体、操作对象、操作类型、操作时间以及对象主机中的至少两个为所述元属性。
  5. 根据权利要求2所述的基于异构图嵌入的网络空间安全威胁检测 方法,其特征在于,在所述根据所述元属性关联关系,对每类所述实体行为数据中的所述数据项进行关联之前,还包括:
    根据网络空间安全威胁场景,确定每个所述元属性关联关系的重要性,并根据所述重要性的大小,确定对所述实体行为数据中所有的数据项进行关联的程度。
  6. 根据权利要求1所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述基于图嵌入学习方法,将所述异构图中的每个节点转换成低维向量,获取每个所述节点的向量化表达,包括:
    基于随机行走图遍历算法,根据所述异构图中的每条边的权重和类型,确定每个所述节点的节点序列;
    基于word2vec算法,根据每个所述节点的节点序列,计算每个所述节点的向量化表达。
  7. 根据权利要求1所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意行为,包括:
    根据每个所述向量化表达的特征,基于异常检测对所述向量化表达进行分析,发现异常的所述向量化表达,则其所对应的所述数据项为所述恶意行为。
  8. 根据权利要求7所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述基于异常检测对所述向量化表达进行分析,发现异常的所述向量化表达,则其所对应的所述数据项为所述恶意行为,包括:
    若所述向量化表达不属于期望的分类,则为异常;
    或者,若所述向量化表达不属于任何聚类集群或不属于期望的分布,则为异常;
    或者,若所述向量化表达所属的聚类集群的项数目小于异常阈值,则所述聚类集群中所有所述向量化表达为异常;
    或者,若所述向量化表达所属的分布包含的所述向量化表达数目小于异常阈值,则所述分布中所有所述向量化表达为异常。
  9. 根据权利要求1所述的基于异构图嵌入的网络空间安全威胁检测方法,其特征在于,所述实体行为数据包括用户行为数据以及软件行为数 据。
  10. 一种基于异构图嵌入的网络空间安全威胁检测系统,其特征在于,包括:
    实体行为数据读取单元、异构图构建单元、图嵌入单元以及检测运算单元;
    所述实体行为数据读取单元被配置为获取实体行为数据;
    所述异构图构建单元被配置为根据元属性关联关系对所述实体行为数据中所有的数据项进行关联,获取数据项序列,并基于所述数据项序列构建异构图;
    所述图嵌入单元被配置为基于图嵌入学习方法,将所述异构图中的每个节点转换成低维向量,获取每个所述节点的向量化表达;
    所述检测运算单元被配置为基于异常检测方法对所述向量化表达的特征进行分析处理,以判断所述向量化表达所对应的所述数据项是否为恶意操作行为。
  11. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至9任一项所述基于异构图嵌入的网络空间安全威胁检测方法。
  12. 一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至9任一项所述基于异构图嵌入的网络空间安全威胁检测方法。
PCT/CN2020/072591 2019-10-24 2020-01-17 一种基于异构图嵌入的网络空间安全威胁检测方法及系统 WO2021077642A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911019620.9A CN110958220B (zh) 2019-10-24 2019-10-24 一种基于异构图嵌入的网络空间安全威胁检测方法及系统
CN201911019620.9 2019-10-24

Publications (1)

Publication Number Publication Date
WO2021077642A1 true WO2021077642A1 (zh) 2021-04-29

Family

ID=69975694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072591 WO2021077642A1 (zh) 2019-10-24 2020-01-17 一种基于异构图嵌入的网络空间安全威胁检测方法及系统

Country Status (2)

Country Link
CN (1) CN110958220B (zh)
WO (1) WO2021077642A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190844A (zh) * 2021-05-20 2021-07-30 深信服科技股份有限公司 一种检测方法、相关方法及相关装置
CN114024736A (zh) * 2021-11-02 2022-02-08 北京丁牛科技有限公司 威胁源关联性识别处理方法、装置及电子设备、存储介质
CN115859279A (zh) * 2023-03-01 2023-03-28 北京微步在线科技有限公司 一种主机行为的检测方法、装置、电子设备及存储介质
CN116781430A (zh) * 2023-08-24 2023-09-19 克拉玛依市燃气有限责任公司 用于燃气管网的网络信息安全系统及其方法
WO2024114827A1 (zh) * 2022-12-01 2024-06-06 南京南瑞信息通信科技有限公司 基于连续时间动态异质图神经网络的apt检测方法及系统
WO2024113927A1 (zh) * 2022-12-01 2024-06-06 南京南瑞信息通信科技有限公司 一种电力系统复杂多步攻击检测方法及系统

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967011B (zh) * 2020-07-10 2022-10-14 电子科技大学 一种基于可解释的内部威胁评估方法
CN113971285A (zh) * 2020-07-24 2022-01-25 深信服科技股份有限公司 一种终端恶意进程识别方法、装置、设备及可读存储介质
CN112328801B (zh) * 2020-09-28 2022-06-14 西南电子技术研究所(中国电子科技集团公司第十研究所) 事件知识图谱预测群体性事件的方法
CN112487421B (zh) * 2020-10-26 2024-06-11 中国科学院信息工程研究所 基于异质网络的安卓恶意应用检测方法及系统
CN112257066B (zh) * 2020-10-30 2021-09-07 广州大学 面向带权异质图的恶意行为识别方法、系统和存储介质
CN112333195B (zh) * 2020-11-10 2021-11-30 西安电子科技大学 基于多源日志关联分析的apt攻击场景还原检测方法及系统
CN112291260A (zh) * 2020-11-12 2021-01-29 福建奇点时空数字科技有限公司 一种面向apt攻击的网络安全威胁隐蔽目标识别方法
CN112291272B (zh) * 2020-12-24 2021-05-11 鹏城实验室 网络威胁检测方法、装置、设备及计算机可读存储介质
CN113206855B (zh) * 2021-05-10 2022-10-28 中国工商银行股份有限公司 数据访问异常的检测方法、装置、电子设备及存储介质
CN113162951B (zh) * 2021-05-20 2023-05-12 深信服科技股份有限公司 威胁检测、模型生成方法、装置及电子设备和存储介质
CN113689288B (zh) * 2021-08-25 2024-05-14 深圳前海微众银行股份有限公司 基于实体列表的风险识别方法、装置、设备及存储介质
CN114329455B (zh) * 2022-03-08 2022-07-29 北京大学 基于异构图嵌入的用户异常行为检测方法及装置
CN115118451B (zh) * 2022-05-17 2023-09-08 北京理工大学 结合图嵌入知识建模的网络入侵检测方法
CN114900364B (zh) * 2022-05-18 2024-03-08 桂林电子科技大学 基于溯源图和异构图神经网络的高级持续威胁检测方法
CN115037532B (zh) * 2022-05-27 2023-03-24 中国科学院信息工程研究所 基于异构图的恶意域名检测方法、电子装置及存储介质
CN115982646B (zh) * 2023-03-20 2023-07-18 西安弘捷电子技术有限公司 一种基于云平台的多源测试数据的管理方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
CN109558494A (zh) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 一种基于异质网络嵌入的学者名字消歧方法
CN110138763A (zh) * 2019-05-09 2019-08-16 中国科学院信息工程研究所 一种基于动态web浏览行为的内部威胁检测系统及方法
CN110189167A (zh) * 2019-05-20 2019-08-30 华南理工大学 一种基于异构图嵌入的移动广告欺诈检测方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105516127B (zh) * 2015-12-07 2019-01-25 中国科学院信息工程研究所 面向内部威胁检测的用户跨域行为模式挖掘方法
KR20180089212A (ko) * 2017-01-31 2018-08-08 주식회사 퓨쳐시스템 벤더 클래스의 인스턴스를 이용한 이기종 디바이스 통합 관리 장치 및 그 방법
CN107036594A (zh) * 2017-05-07 2017-08-11 郑州大学 智能电站巡检智能体的定位与多粒度环境感知技术
US10841302B2 (en) * 2017-05-24 2020-11-17 Lg Electronics Inc. Method and apparatus for authenticating UE between heterogeneous networks in wireless communication system
CN109495513B (zh) * 2018-12-29 2021-06-01 极客信安(北京)科技有限公司 无监督的加密恶意流量检测方法、装置、设备及介质
CN110223106B (zh) * 2019-05-20 2021-09-21 华南理工大学 一种基于深度学习的欺诈应用检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
CN109558494A (zh) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 一种基于异质网络嵌入的学者名字消歧方法
CN110138763A (zh) * 2019-05-09 2019-08-16 中国科学院信息工程研究所 一种基于动态web浏览行为的内部威胁检测系统及方法
CN110189167A (zh) * 2019-05-20 2019-08-30 华南理工大学 一种基于异构图嵌入的移动广告欺诈检测方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUO YIKE, FAROOQ FAISAL, FAN YUJIE, HOU SHIFU, ZHANG YIMING, YE YANFANG, ABDULHAYOGLU MELIH: "Gotcha - Sly Malware! : Scorpion A Metagraph2vec Based Malware Detection System", PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING KDD 18, ACM PRESS, NEW YORK, NEW YOR, US, 19 August 2018 (2018-08-19), US, pages 253 - 262, XP055805704, ISBN: 978-1-4503-5552-0, DOI: 10.1145/3219819.3219862 *
LIU FUCHENG LIUFUCHENG@IIE.AC.CN; WEN YU WENYU@IIE.AC.CN; ZHANG DONGXUE ZHANGDONGXUE@IIE.AC.CN; JIANG XIHE JIANGXIHE@IIE.AC.CN; XI: "Log2vec A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise", COMPUTER AND COMMUNICATIONS SECURITY, ACM, 2 PENN PLAZA, SUITE 701NEW YORKNY10121-0701USA, 6 November 2019 (2019-11-06) - 15 November 2019 (2019-11-15), 2 Penn Plaza, Suite 701New YorkNY10121-0701USA, pages 1777 - 1794, XP058448653, ISBN: 978-1-4503-6747-9, DOI: 10.1145/3319535.3363224 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190844A (zh) * 2021-05-20 2021-07-30 深信服科技股份有限公司 一种检测方法、相关方法及相关装置
CN113190844B (zh) * 2021-05-20 2024-05-28 深信服科技股份有限公司 一种检测方法、相关方法及相关装置
CN114024736A (zh) * 2021-11-02 2022-02-08 北京丁牛科技有限公司 威胁源关联性识别处理方法、装置及电子设备、存储介质
CN114024736B (zh) * 2021-11-02 2024-04-12 丁牛信息安全科技(江苏)有限公司 威胁源关联性识别处理方法、装置及电子设备、存储介质
WO2024114827A1 (zh) * 2022-12-01 2024-06-06 南京南瑞信息通信科技有限公司 基于连续时间动态异质图神经网络的apt检测方法及系统
WO2024113927A1 (zh) * 2022-12-01 2024-06-06 南京南瑞信息通信科技有限公司 一种电力系统复杂多步攻击检测方法及系统
CN115859279A (zh) * 2023-03-01 2023-03-28 北京微步在线科技有限公司 一种主机行为的检测方法、装置、电子设备及存储介质
CN116781430A (zh) * 2023-08-24 2023-09-19 克拉玛依市燃气有限责任公司 用于燃气管网的网络信息安全系统及其方法
CN116781430B (zh) * 2023-08-24 2023-12-01 克拉玛依市燃气有限责任公司 用于燃气管网的网络信息安全系统及其方法

Also Published As

Publication number Publication date
CN110958220A (zh) 2020-04-03
CN110958220B (zh) 2020-12-29

Similar Documents

Publication Publication Date Title
WO2021077642A1 (zh) 一种基于异构图嵌入的网络空间安全威胁检测方法及系统
Li et al. Significant permission identification for machine-learning-based android malware detection
US10686829B2 (en) Identifying changes in use of user credentials
US11558408B2 (en) Anomaly detection based on evaluation of user behavior using multi-context machine learning
Siadati et al. Detecting structurally anomalous logins within enterprise networks
Jia et al. Big-data analysis of multi-source logs for anomaly detection on network-based system
Syarif Feature selection of network intrusion data using genetic algorithm and particle swarm optimization
Vidal et al. Online masquerade detection resistant to mimicry
WO2021155344A1 (en) Aggregation and flow propagation of elements of cyber-risk in an enterprise
Srivastava et al. Big data analytics technique in cyber security: a review
Chen et al. Advanced persistent threat organization identification based on software gene of malware
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
Datta et al. Real-time threat detection in ueba using unsupervised learning algorithms
KR102437278B1 (ko) 머신러닝과 시그니처 매칭을 결합한 문서형 악성코드 탐지 장치 및 방법
Bahl et al. Improving classification accuracy of intrusion detection system using feature subset selection
KR102311997B1 (ko) 인공지능 행위분석 기반의 edr 장치 및 방법
Wang et al. Combating Advanced Persistent Threats: Challenges and Solutions
AL-Maliki et al. Comparison study for NLP using machine learning techniques to detecting SQL injection vulnerabilities
Wei et al. Age: authentication graph embedding for detecting anomalous login activities
Muhammad et al. Device-type profiling for network access control systems using clustering-based multivariate Gaussian outlier score
Sapegin et al. Evaluation of in‐memory storage engine for machine learning analysis of security events
Wang et al. Network attack detection based on domain attack behavior analysis
Yang et al. [Retracted] Computer User Behavior Anomaly Detection Based on K‐Means Algorithm
CN111917801A (zh) 私有云环境下基于Petri网的用户行为认证方法
Alguliev et al. Illegal access detection in the cloud computing environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20878832

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20878832

Country of ref document: EP

Kind code of ref document: A1