WO2024124640A1 - Procédé et appareil d'analyse de nœud basés sur un graphe d'analyse de menace - Google Patents

Procédé et appareil d'analyse de nœud basés sur un graphe d'analyse de menace Download PDF

Info

Publication number
WO2024124640A1
WO2024124640A1 PCT/CN2022/144095 CN2022144095W WO2024124640A1 WO 2024124640 A1 WO2024124640 A1 WO 2024124640A1 CN 2022144095 W CN2022144095 W CN 2022144095W WO 2024124640 A1 WO2024124640 A1 WO 2024124640A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
target
data
graph
representation
Prior art date
Application number
PCT/CN2022/144095
Other languages
English (en)
Chinese (zh)
Inventor
刘浩然
王占一
吴萌
黄朝文
白敏�
汪列军
Original Assignee
奇安信科技集团股份有限公司
奇安信网神信息技术(北京)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 奇安信科技集团股份有限公司, 奇安信网神信息技术(北京)股份有限公司 filed Critical 奇安信科技集团股份有限公司
Publication of WO2024124640A1 publication Critical patent/WO2024124640A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present application relates to the field of network security technology, and in particular to a node analysis method and device based on a threat analysis graph.
  • APT Advanced Persistent Threat
  • log data is usually obtained by detecting the network layer, and the log data is analyzed to obtain threat intelligence from the massive log data.
  • the embodiments of the present application provide a node analysis method and device based on a threat analysis graph.
  • an embodiment of the present application provides a node analysis method based on a threat analysis graph, comprising:
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • extracting target subgraph data associated with the seed node from the threat analysis graph stored in the graph database includes:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • determining the node representation of the target node in the target subgraph data includes:
  • the node representation of the target node is determined based on the graph embedding vector of each node.
  • the determining of the graph embedding vector of each node in the target subgraph data includes:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • determining the node representation of the target node based on the graph embedding vector of each node includes:
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the target graph neural network model includes an acquisition module and an aggregation module
  • the step of inputting the graph embedding vector of each of the nodes into the target graph neural network model to obtain the node representation of the target node output by the target graph neural network model includes:
  • the node aggregation information is determined as a node representation of the target node.
  • the analyzing the target node based on the node representation of the target node includes:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • the analyzing the target node based on the node representation of the target node includes:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • the embodiment of the present application further provides a node analysis device based on a threat analysis graph, including:
  • a first extraction unit is used to extract target data from the source data and use the target data as a seed node; the target data is data with security risks;
  • a second extraction unit configured to extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database
  • a determination unit configured to determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • An analyzing unit is used to analyze the target node based on the node representation of the target node.
  • an embodiment of the present application further provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the steps of the node analysis method based on the threat analysis graph as described in the first aspect are implemented.
  • an embodiment of the present application further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the node analysis method based on the threat analysis graph as described in the first aspect.
  • an embodiment of the present application further provides a computer program product having executable instructions stored thereon, which, when executed by a processor, enables the processor to implement the steps of the node analysis method based on the threat analysis graph described in the first aspect.
  • the node analysis method and device based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • FIG1 is a schematic diagram of a process flow of a node analysis method based on a threat analysis graph provided in an embodiment of the present application
  • FIG2 is a schematic diagram of target subgraph data extraction provided by an embodiment of the present application.
  • FIG3 is a second flow chart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application
  • FIG4 is a schematic diagram of the structure of an initial autoencoder model provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of converting target subgraph data into a node representation of a target node according to an embodiment of the present application
  • FIG6 is a schematic diagram of the structure of a node analysis system based on a threat analysis graph according to an embodiment of the present application
  • FIG7 is a schematic diagram of the structure of a node analysis device based on a threat analysis graph provided in an embodiment of the present application;
  • FIG8 is a schematic diagram of the physical structure of an electronic device provided in an embodiment of the present application.
  • FIG. 1 is a flowchart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application. As shown in FIG. 1 , the node analysis method based on a threat analysis graph includes the following steps:
  • Step 101 extract target data from source data and use the target data as a seed node; the target data is data with security risks.
  • the source data can be data in the sandbox, crawler data, or Indicator of Compromise (IOC) data, etc.
  • IOC Indicator of Compromise
  • IOC is a type of threat intelligence, i.e., the remote command and control server intelligence used by the attacker to control the victim host.
  • IOC usually includes domain name, Internet Protocol (IP), uniform resource locator (URL), Secure Socket Layer (SSL) certificate, HASH, etc.
  • IP Internet Protocol
  • URL uniform resource locator
  • SSL Secure Socket Layer
  • massive amounts of data are generated every day, including but not limited to network behavior data generated by malicious samples running in sandboxes, Internet threat risk data crawled by web crawlers, threat intelligence data in open source security reports, etc.
  • Multiple source data are collected regularly, and target data with security risks such as domains, URLs, and IP addresses are extracted from the source data as seed nodes.
  • Step 102 extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database.
  • the graph database may be NebulaGraph, which is a distributed graph database that stores tens of billions of threat analysis graphs.
  • the threat analysis graph contains multiple nodes and multiple edges, where a node represents an entity and an edge represents the relationship between two entities.
  • the data association undirected graph is determined based on the data type (node type, edge type) of the graph data and the relationship between the graph data. Then, based on this, the data association pointing of the relationship between the graph data is combined to construct a threat analysis graph, and the threat analysis graph is put into practical application.
  • the relationship network is very flexible and can display heterogeneous information in a unified view. Through the built-in services provided by NebulaGraph, graph data and the relationship between graph data can be queried based on different rules.
  • Node types include but are not limited to V(IP), V(domain), and V(URL), where V represents a node; edge types include but are not limited to E(connect), E(release), E(download), and E(delivery), where E represents an edge.
  • data When data initiates a network connection, it may connect to an IP, domain name, or URL. This type of relationship is called a connection. Data may be used to release files, and the relationship is called release. Data may be used to download files, and the relationship is called download. An IP, domain name, or URL may also be used to distribute malicious files, and the relationship is called delivery. All of the above data types and associations can be proposed based on threat intelligence in a specified network environment, and the threat analysis graph can be directly put into the threat intelligence analysis process. At the same time, because the threat analysis graph is suitable for the user's network environment, it is easier and more convenient to obtain threat intelligence suitable for the user's network environment based on the threat analysis graph.
  • the subgraph extraction module uses the point and edge query services provided by NebulaGraph to flexibly extract target subgraph data of different scales associated with the seed node from NebulaGraph.
  • the extracted target subgraph data is saved as point data and edge data, and the data can be in json format.
  • the main fields used in the point data include but are not limited to: fields for identifying the content of the node data, fields for indicating the node type, and fields for indicating the unique identification of the node in the graph data.
  • the node data can be: ⁇ "name":"b**du.com”,”label”:"domain”,”vertexId”:"0005d1b1f7fde4c98455d29ece315570” ⁇
  • the field name stores the node data
  • the field label indicates that the node type is a domain name
  • the field vertexId indicates that the hash value of the node is 0005d1b1f7fde4c98455d29ece315570.
  • edge data includes but are not limited to: fields that indicate the unique identifier of a node in the graph data and fields that indicate the node type.
  • edge data can be: ⁇ "srcId”:"2c238667ca0068cead9c529e06b8675d”
  • dstId :"d878b8a1a12e3920a6a713f12a3d18e2"
  • label :"contain” ⁇ .
  • the field srcId indicates that the hash value of node 1 is 2c238667ca0068cead9c529e06b8675d
  • the field dstId indicates that the hash value of node 2 is d878b8a1a12e3920a6a713f12a3d18e2
  • the field label indicates that the edge type is include (contain).
  • the direction of the edge is from the node represented by the field srcId to the node represented by the field dstId.
  • Step 103 Determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node.
  • the target subgraph data is extracted, for the target node, the neighbor nodes of the target node are determined in the target subgraph data, and the node data of the target node and the node data of the neighbor nodes of the target node are aggregated to obtain the node representation of the target node; in addition, the target node can be one or more, and the specific number of target nodes can be determined based on actual needs.
  • Step 104 Analyze the target node based on the node representation of the target node.
  • threat analysis when the node representation of each target node is obtained, threat analysis, similarity analysis, etc. may be performed based on the node representation of each target node.
  • the node analysis method based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • step 102 may be implemented in the following manner:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • the preset number of hops may be 1 hop, 2 hops, or 3 hops, etc., and may be set based on demand.
  • the point and edge query service provided by the graph database NebulaGraph can be used to extract node data and edge data of different scales associated with the seed node from NebulaGraph, and then the node data of the seed node, the node data and edge data associated with the seed node are combined to obtain the target subgraph data; the specific size of the target subgraph data is determined based on the preset number of hops.
  • Figure 2 is a schematic diagram of the target subgraph data extraction provided by an embodiment of the present application. As shown in Figure 2, the seed node 202 is extracted from the source data 201, and the subgraph extraction module 203 extracts the target subgraph data 205 based on the seed node 202 in the threat analysis graph of the graph database 204.
  • the source data 201 can be sandbox data, crawler data or fall identification data, and the seed node 202 takes node A, node B, node C, node E and node F as examples.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application is based on the point and edge query service provided by the graph database NebulaGraph to extract the target subgraph data associated with the seed node, and the extraction is convenient.
  • FIG. 3 is a second flow chart of a node analysis method based on a threat analysis graph provided in an embodiment of the present application. As shown in FIG. 3 , the above step 103 can be specifically implemented by the following steps:
  • Step 1031 Determine the graph embedding vector of each node in the target subgraph data.
  • determining the graph embedding vector of each node in the target subgraph data may be specifically implemented in the following manner:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • the target subgraph data consists of edge data and node data.
  • the network relationship in the target subgraph data belongs to non-Euclidean space data, which is not convenient for direct processing and calculation.
  • Euclidean space is a vector space with a richer set of methods and tools.
  • Graph embedding is a process of mapping graph data into low-dimensional dense vectors, which can solve the problem that graph data is difficult to efficiently input into machine learning algorithms and can be calculated in Euclidean space.
  • Graph embedding is more practical than adjacency matrix because graph embedding can pack node attributes into a vector with smaller dimension, and vector operations are simpler and faster than operations on graphs.
  • the purpose of graph embedding is to represent nodes and edges using vectors.
  • graph embedding is to convert the node data of each node in the target subgraph data into the corresponding graph embedding vector.
  • Graph embedding captures the topological structure of the target subgraph data, and more attribute embedding encoding can obtain better results in future tasks.
  • the corresponding algorithm can be selected according to different business scenarios. That is, in the business scenario of searching for structurally similar nodes, a structurally similar algorithm can be selected to determine the graph embedding vector of each node in the target subgraph data; in the business scenario of searching for content-similar nodes, a content-similar algorithm can be selected to determine the graph embedding vector of each node in the target subgraph data.
  • the content similarity algorithm is an algorithm for graph embedding representation of nodes and relationships in a graph structure, including but not limited to the TransE algorithm. It can be widely used in various subsequent graph-based tasks.
  • a piece of content can be represented as a triple (srcId, label, dstId).
  • the triple can be represented as: ⁇ "srcId”:"2c238667ca0068cead9c529e06b8675d”,”dstId”:"d878b8a1a12e3920a6a713f12a3d18e2",”label”:"contain” ⁇ .
  • the fields srcId and dstId are both nodes, which can be represented by the hash value (md5) of the node in the target subgraph data.
  • Contain is a relation, which is represented by the edge in the target subgraph data.
  • the dimension size of the graph embedding vector is between 64 and 512, which can be flexibly selected according to the actual effect of the downstream task and business needs.
  • the structural similarity algorithm is specifically as follows: the edge type statistics corresponding to each node in the target subgraph data are input into the target autoencoder model to obtain the graph embedding vector of each node output by the target autoencoder model.
  • the target autoencoder model is trained based on the edge type statistics sample information corresponding to each node in the graph structure sample.
  • the specific training process of the target autoencoder model is: obtain a large number of graph structure samples, determine the edge type statistical sample information corresponding to each node in each graph structure sample, and then input the edge type statistical sample information corresponding to each node in the graph structure sample into the pre-created initial autoencoder model, and the initial autoencoder model performs feature analysis on the edge type statistical sample information corresponding to each node to obtain the edge type statistical prediction information output by the initial autoencoder model, and then construct a loss function based on the edge type statistical prediction information and the edge type statistical sample information, and optimize the model parameters of the initial autoencoder model based on the loss function until the convergence condition is reached, and the model training is completed.
  • FIG 4 is a schematic diagram of the structure of the initial autoencoder model provided in an embodiment of the present application.
  • the input layer numbered 1 is the input layer
  • the middle hidden layer numbered 2 is the middle hidden layer
  • the output layer numbered 3 is the output layer.
  • the input layer numbered 1 and the middle hidden layer numbered 2 are used as the target autoencoder model, that is, the part in the dotted box is used as the target autoencoder model.
  • the initial autoencoding model can be a three-layer deep neural network (DNN), or the number of layers of the deep neural network can be increased, or other network structures can be used; vector dimensionality reduction (such as PCA) or other encoding techniques can also be used, and this application does not limit this.
  • DNN three-layer deep neural network
  • PCA vector dimensionality reduction
  • PCA vector dimensionality reduction
  • the target autoencoder model can be input in batches for prediction calculation. After the prediction calculation, each node in the target subgraph data corresponds to a graph embedding vector, and the dimension of the graph embedding vector is the encoding layer dimension of the target autoencoder model or other encoding structure.
  • Step 1032 Determine a node representation of the target node based on the graph embedding vector of each node.
  • the graph embedding vector of each of the nodes is input into a target graph neural network model to obtain a node representation of the target node output by the target graph neural network model.
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the message passing paradigm is a paradigm that aggregates adjacent node information to update central node information. It generalizes the convolution operator to the field of irregular data and realizes the connection between graphs and neural networks. The message passing paradigm is widely used because of its simple and powerful characteristics.
  • the present application determines the node representation of the target node based on the graph embedding vector of each node and the target graph neural network model.
  • the target graph neural network model includes an acquisition module and an aggregation module; the graph embedding vector of each node is input into the target graph neural network model to obtain the node representation of the target node output by the target graph neural network model, which can be specifically implemented in the following manner:
  • the node aggregation information is determined as a node representation of the target node.
  • the target graph neural network model can have built-in multiple mainstream graph neural network algorithms to meet the usage requirements of different security scenarios, including but not limited to the GraphSAGE algorithm.
  • the GraphSAGE algorithm is taken as an example below.
  • GraphSAGE is a graph neural network algorithm that solves the limitations of the Graph Convolutional Nueral Network (GCN). GCN training requires the adjacency matrix of the entire graph, which depends on the specific graph structure and can generally only be used in direct learning.
  • GCN uses multiple layers of aggregation functions. Each layer of aggregation function aggregates the information of the node and its neighbors to obtain the feature vector of the next layer.
  • GraphSAGE uses the neighborhood information of the node and does not depend on the global graph structure.
  • GraphSAGE includes a sampling module and an aggregation module.
  • connection information between nodes is used to sample neighboring nodes, and then the information of adjacent nodes is continuously aggregated through multiple layers of aggregation functions to obtain node aggregation information, and the node aggregation information is determined as the node representation of the target node.
  • the aggregation function can be any of the following: mean aggregator, graph convolution aggregator (GCN aggregator), long short-term memory network aggregator (LSTM aggregator), pooling aggregator (Pooling aggregator).
  • FIG5 is a schematic diagram of converting the target subgraph data provided by the embodiment of the present application into the node representation of the target node.
  • the target subgraph data 501 includes nodes A, B, C, D, E and F.
  • the connection relationship between the specific six nodes is shown in FIG5.
  • FIG5 shows the process of transmitting the node information of a neighbor node to the target node.
  • the neighbor nodes of node B include nodes A and C.
  • the node data of node A and the node data of node C are linearly transformed and aggregated to node B.
  • the node data of node B, the node data of node A after linear transformation and the node data of node C are linearly transformed to obtain the node aggregation information of node B.
  • the neighbor nodes of node C include nodes A, B, E and F.
  • the node data of node A, the node data of node B, the node data of node E and the node data of node F are linearly transformed and aggregated to node C.
  • the node data of node C, the node data of node A after linear transformation, the node data of node B, the node data of node E and the node data of node F are linearly transformed to obtain the node aggregation information of node C.
  • the neighboring nodes of node D include node A.
  • the node data of node A is linearly transformed and then aggregated to node D.
  • the node data of node D and the node data of node A after linear transformation are linearly transformed again to obtain the node aggregation information of node D.
  • the training process of the target graph neural network model can be: inputting graph embedding vector samples of multiple nodes into the initial graph neural network model, the algorithm adopted by the initial graph neural network model can be the GraphSAGE algorithm, the initial graph neural network model collects the node data of the neighbor nodes of the sample node, and aggregates the node data of the sample node and the node data of the neighbor nodes of the sample node based on the aggregation function to obtain the node representation of the sample node; constructing a loss function based on the node representation of the sample node and the graph embedding vector of the sample node, optimizing the initial graph neural network model based on the loss function until the convergence condition is reached, and finally obtaining the target graph neural network model.
  • the algorithm adopted by the initial graph neural network model can be the GraphSAGE algorithm
  • the initial graph neural network model collects the node data of the neighbor nodes of the sample node, and aggregates the node data of the sample node and the node data of the neighbor nodes of
  • the node analysis method based on the threat analysis graph determines the node representation of the target node based on the graph embedding vector of each node and the target graph neural network model, and adds the node information of the neighboring nodes of the target node to the target node, so that the node representation of the target node contains more information. In this way, when the target node is subsequently analyzed based on the node representation of the target node, the accuracy of the analysis can be improved.
  • step 104 may be implemented in the following manner:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • a target graph neural network model is used to detect, analyze, and track threat events.
  • the graph neural network is used to analyze the node representation of the target node to obtain the threat risk coefficient of the target node, and then the threat risk coefficient of the target node is compared with the preset coefficient value.
  • the threat risk coefficient of the target node is greater than the preset coefficient value, it means that the target node is a risk node.
  • the node data of the target node can be determined as fall identification data, and the node data of the target node can be stored in the fall identification graph database, so that security experts can view the node data of the target node in the graph database; or, when it is determined that the threat risk coefficient of the target node is greater than the preset coefficient value, the target node can also be alarmed to achieve early warning of the risk node; in addition, the node data of the target node and the node data of the neighboring nodes of the target node can be displayed in a visual manner to assist security experts in operation, analysis and confrontation.
  • the target node can be further manually judged and analyzed.
  • the node data of the target node is stored in the fall identification map database.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application can utilize the target graph neural network model to continuously monitor the massive data generated daily, realize the prediction of unknown risk nodes and the early warning of risk nodes, and in addition, can also display the node data of the target node and the node data of the neighboring nodes of the target node, which can assist security experts in operation, analysis and confrontation.
  • step 104 may be implemented in the following manner:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • node representations of multiple nodes can be obtained, and the node representation of the target node can be compared with the node representations of other nodes for similarity, thereby determining nodes similar to the target node. In this way, if the target node is determined to be a risk node, nodes similar to the target node are also risk nodes.
  • the node analysis method based on the threat analysis graph provided in the embodiment of the present application can use the target graph neural network model to continuously monitor the massive data generated daily, and realize the search for similar nodes.
  • the node analysis system based on a threat analysis graph can be deployed on the server side.
  • the node analysis system based on a threat analysis graph includes a graph data storage module 601, a subgraph extraction module 602, a graph embedding module 603, a graph calculation module 604, a data post-processing module 605, and a data acquisition module 606; wherein the data acquisition module is used to collect source data; the graph data storage module 601 is used to store the threat analysis graph and provide query services; the subgraph extraction module 602 is used to generate a query service based on the seed node in the The target subgraph data is extracted from the threat analysis graph; the graph embedding module 603 is used to determine the graph embedding vector of each node in the target subgraph data; the graph calculation module 604 is used to determine the node representation of the target node based on the target graph neural network model and the
  • the node analysis method based on the threat analysis map provided in the embodiment of the present application is based on the threat analysis map and combined with the basic network facilities used by the APT organization to perform correlation analysis on massive heterogeneous multi-source data to realize the calculation of unknown risk nodes, early warning of risk nodes and search for similar nodes.
  • FIG7 is a schematic diagram of the structure of a node analysis device based on a threat analysis graph according to an embodiment of the present application.
  • the node analysis device 700 based on a threat analysis graph includes a first extraction unit 701, a second extraction unit 702, a determination unit 703 and an analysis unit 704; wherein:
  • the first extraction unit 701 is used to extract target data from the source data and use the target data as a seed node; the target data is data with security risks;
  • a second extraction unit 702 is used to extract target subgraph data associated with the seed node from the threat analysis graph stored in the graph database;
  • a determination unit 703 is used to determine a node representation of a target node in the target subgraph data; the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the analyzing unit 704 is configured to analyze the target node based on the node representation of the target node.
  • the node analysis device based on the threat analysis graph uses the target data with security risks extracted from the source data as the seed node, extracts the target subgraph data associated with the seed node in the threat analysis graph, determines the node representation of the target node in the target subgraph data, and the node representation includes the node data of the target node and the node data of the neighboring nodes of the target node, and finally performs a correlation analysis on the target node based on the node representation of the target node. It can be seen that the present application only determines the node representation of the target node in the target subgraph data associated with the seed node, and there is no need to calculate and analyze all the graph data in the threat analysis graph, thereby improving the efficiency of data analysis.
  • the first extraction unit 702 is specifically configured to:
  • the target association data includes node data and edge data;
  • the node data of the seed node and the target associated data are combined to obtain the target subgraph data.
  • the determining unit 703 is specifically configured to:
  • the node representation of the target node is determined based on the graph embedding vector of each node.
  • the determining unit 703 is further specifically configured to:
  • the current business scenario includes a business scenario of searching for structurally similar nodes, determining a graph embedding vector of each node in the target subgraph data based on a structural similarity algorithm;
  • a graph embedding vector of each node in the target subgraph data is determined based on a content similarity algorithm.
  • the determining unit 703 is further specifically configured to:
  • the target graph neural network model is obtained by training based on graph embedding vector samples of multiple nodes.
  • the target graph neural network model includes an acquisition module and an aggregation module
  • the determining unit 703 is further specifically configured to:
  • the node aggregation information is determined as a node representation of the target node.
  • the analysis unit 704 is specifically used for:
  • the node data of the target node is stored in a fall identification map database, or an alarm is issued to the target node, or the node data of the target node and the node data of the associated nodes of the target node are displayed.
  • the analysis unit 704 is specifically used for:
  • the node representation of the target node is compared and analyzed with the node representations of other nodes to determine nodes similar to the target node.
  • FIG8 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present application.
  • the electronic device may include: a processor 810, a communications interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communications interface 820, and the memory 830 communicate with each other through the communication bus 840.
  • the processor 810 may call the logic instructions in the memory 830 to execute the following method: extract target data from the source data, and use the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the logic instructions in the above-mentioned memory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
  • the technical solution of the present application can be essentially or partly embodied in the form of a software product that contributes to the prior art.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc.
  • the embodiment of the present application further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the node analysis method based on the threat analysis graph provided in the above embodiments is implemented, for example, including: extracting target data from source data, and using the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the present application further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which is implemented when the computer program is executed by a processor to execute the node analysis method based on the threat analysis graph provided by the above methods, the method comprising: extracting target data from source data, and using the target data as a seed node; the target data is data with security risks;
  • the node representation of the target node includes node data of the target node and node data of neighboring nodes of the target node;
  • the target node is analyzed based on the node representation of the target node.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.
  • each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Des modes de réalisation de la présente demande se rapportent au domaine technique de la sécurité des réseaux, et concernent un procédé et un appareil d'analyse de nœud basés sur un graphe d'analyse de menace. Le procédé consiste à : extraire des données cibles à partir de données sources, et prendre les données cibles en tant que nœud germe, les données cibles étant des données à un risque de sécurité ; extraire, à partir d'un graphe d'analyse de menace mémorisé dans une base de données de graphes, des données de sous-graphe cibles associées au nœud germe ; déterminer une représentation de nœud d'un nœud cible dans les données de sous-graphe cibles, la représentation de nœud du nœud cible comprenant des données de nœud du nœud cible et des données de nœud de nœuds voisins du nœud cible ; et analyser le nœud cible sur la base de la représentation de nœud du nœud cible. Dans la présente demande, seule la représentation de nœud du nœud cible dans les données de sous-graphe cibles associées au nœud germe est déterminée, et il n'est pas nécessaire de calculer et d'analyser toutes les données de graphe dans le graphe d'analyse de menace, ce qui permet d'améliorer l'efficacité de l'analyse des données.
PCT/CN2022/144095 2022-12-12 2022-12-30 Procédé et appareil d'analyse de nœud basés sur un graphe d'analyse de menace WO2024124640A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211600664.2 2022-12-12
CN202211600664.2A CN116248325A (zh) 2022-12-12 2022-12-12 基于威胁分析图谱的节点分析方法及装置

Publications (1)

Publication Number Publication Date
WO2024124640A1 true WO2024124640A1 (fr) 2024-06-20

Family

ID=86626633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/144095 WO2024124640A1 (fr) 2022-12-12 2022-12-30 Procédé et appareil d'analyse de nœud basés sur un graphe d'analyse de menace

Country Status (2)

Country Link
CN (1) CN116248325A (fr)
WO (1) WO2024124640A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
US20200396230A1 (en) * 2019-06-13 2020-12-17 International Business Machines Corporation Real-time alert reasoning and priority-based campaign discovery
CN113364802A (zh) * 2021-06-25 2021-09-07 中国电子科技集团公司第十五研究所 安全告警威胁性研判方法及装置
CN114584351A (zh) * 2022-02-21 2022-06-03 北京恒安嘉新安全技术有限公司 一种监控方法、装置、电子设备以及存储介质
CN114928493A (zh) * 2022-05-23 2022-08-19 昆明元叙网络科技有限公司 基于威胁攻击大数据的威胁情报生成方法及ai安全系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032724A1 (en) * 2015-04-16 2018-02-01 Nec Laboratories America, Inc. Graph-based attack chain discovery in enterprise security systems
US20200396230A1 (en) * 2019-06-13 2020-12-17 International Business Machines Corporation Real-time alert reasoning and priority-based campaign discovery
CN113364802A (zh) * 2021-06-25 2021-09-07 中国电子科技集团公司第十五研究所 安全告警威胁性研判方法及装置
CN114584351A (zh) * 2022-02-21 2022-06-03 北京恒安嘉新安全技术有限公司 一种监控方法、装置、电子设备以及存储介质
CN114928493A (zh) * 2022-05-23 2022-08-19 昆明元叙网络科技有限公司 基于威胁攻击大数据的威胁情报生成方法及ai安全系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MEICONG LI; WEI HUANG; YONGBIN WANG; WENQING FAN: "The optimized attribute attack graph based on APT attack stage model", 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), IEEE, 14 October 2016 (2016-10-14), pages 2781 - 2785, XP033094970, DOI: 10.1109/CompComm.2016.7925204 *
宋晓峰等 (SONG, XIAOFENG ET AL.): "基于大数据引擎的军事信息网络安全防护系统 (Research on Security Defense System for Military Information Network Based on Big Data Engine)", 电子信息对抗技术 (ELECTRONIC INFORMATION WARFARE TECHNOLOGY), no. 3, 15 May 2019 (2019-05-15) *

Also Published As

Publication number Publication date
CN116248325A (zh) 2023-06-09

Similar Documents

Publication Publication Date Title
Asif et al. MapReduce based intelligent model for intrusion detection using machine learning technique
Zhong et al. A cyber security data triage operation retrieval system
US9910980B2 (en) Cyber security
Jha et al. Intrusion detection system using support vector machine
Afuwape et al. Performance evaluation of secured network traffic classification using a machine learning approach
WO2020133986A1 (fr) Procédé de détection de famille de noms de domaines de réseau de zombies, appareil, dispositif, et support de stockage
Maza et al. Feature selection algorithms in intrusion detection system: A survey
CN114172688B (zh) 基于gcn-dl的加密流量网络威胁关键节点自动提取方法
CN115242438B (zh) 基于异质信息网络的潜在受害群体定位方法
Al-Utaibi et al. Intrusion detection taxonomy and data preprocessing mechanisms
Gogoi et al. A rough set–based effective rule generation method for classification with an application in intrusion detection
CN114401516B (zh) 一种基于虚拟网络流量分析的5g切片网络异常检测方法
CN115514558A (zh) 一种入侵检测方法、装置、设备及介质
Price-Williams et al. Nonparametric self-exciting models for computer network traffic
Li et al. Anomaly detection by discovering bipartite structure on complex networks
WO2019175880A1 (fr) Procédé et système de classification d'objets de données en fonction de leur encombrement dans le réseau
More et al. Enhanced-PCA based dimensionality reduction and feature selection for real-time network threat detection
WO2024124640A1 (fr) Procédé et appareil d'analyse de nœud basés sur un graphe d'analyse de menace
Azath et al. Identification of iot device from network traffic using artificial intelligence based capsule networks
Morshed et al. LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction
Fang et al. Active exploration: simultaneous sampling and labeling for large graphs
Huang et al. A multi-channel cybersecurity news and threat intelligent engine-SecBuzzer
CN112750047A (zh) 行为关系信息提取方法及装置、存储介质、电子设备
Jose et al. Desinging Intrusion Detection System In Software Defined Networks Using Hybrid Gwo-Ae-Rf Model
Venkatasubramanian et al. Federated Learning Assisted IoT Malware Detection Using Static Analysis