CN117708821B - Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding - Google Patents

Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding Download PDF

Info

Publication number
CN117708821B
CN117708821B CN202410166029.0A CN202410166029A CN117708821B CN 117708821 B CN117708821 B CN 117708821B CN 202410166029 A CN202410166029 A CN 202410166029A CN 117708821 B CN117708821 B CN 117708821B
Authority
CN
China
Prior art keywords
paths
path
node
edges
types
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410166029.0A
Other languages
Chinese (zh)
Other versions
CN117708821A (en
Inventor
杨英
李雨颖
闫莉莉
王伟
侯仰志
马文豪
于召勇
李富坤
王德淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Qilu University of Technology
Priority to CN202410166029.0A priority Critical patent/CN117708821B/en
Publication of CN117708821A publication Critical patent/CN117708821A/en
Application granted granted Critical
Publication of CN117708821B publication Critical patent/CN117708821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a method, a system, equipment and a medium for detecting lux software based on heterograph embedding, belonging to the technical field of software security, comprising the following steps: acquiring a software sample; acquiring various behavior characteristics of a software sample; taking all behavior features as nodes, taking the relation among the behavior features as edges among adjacent nodes, and constructing to obtain an abnormal composition; obtaining a multi-class meta-path according to the types of edges in the heterograms; for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path; acquiring node embeddings of each multiple path, and aggregating the node embeddings of all the multiple paths to acquire a graph embedment; and determining a software sample identification result according to the graph embedding. The accuracy of the lux software detection is improved.

Description

Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding
Technical Field
The invention relates to the technical field of software security, in particular to a method, a system, equipment and a medium for detecting Lesu software based on heterogeneous graph embedding.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The detection methods of the malicious software of the luxo software class are mainly divided into two main classes: static analysis and dynamic analysis. Static analysis is a method of disassembling an executable program without executing a sample, which analyzes and extracts static structural features of code by a disassembly tool to infer sample behavior. Dynamic analysis requires running the sample in real time in a controlled isolated environment, and comprehensively analyzing the behavior and operation of the sample during execution, and extracting the behavior characteristics of the sample, such as file system operation, registry operation, application programming interface, etc., so as to determine whether the sample is the lux software. Compared with the situation that static analysis is easily bypassed by various code confusion technologies, the dynamic analysis method mainly focuses on analyzing the behavior patterns of executable programs when running, and has higher detection rate.
Currently, dynamic analysis of lux software is mainly dependent on research on the system application programming interface. This is because software involves multiple API calls at runtime, and these APIs record various behaviors of the software. Conventional graph neural network approaches typically focus on isomorphic graphs only, where nodes and relationships consist of only one type. However, this approach has limitations in handling complex relational networks because the lux software, when executed, calls many different types of APIs, involving many heterogeneous entities. Therefore, the isomorphic diagram-based method may not sufficiently capture the complex association relationship between these heterogeneous entities, resulting in poor detection performance.
Disclosure of Invention
In order to solve the problems, the invention provides a method, a system, equipment and a medium for detecting the le cable software based on heterogeneous graph embedding, which realize accurate identification of the le cable software.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, a method for detecting a lux software based on heterogeneous graph embedding is provided, including:
acquiring a software sample;
Acquiring various behavior characteristics of a software sample;
Taking all behavior features as nodes, taking the relation among the behavior features as edges among adjacent nodes, and constructing to obtain an abnormal composition;
Obtaining a multi-class meta-path according to the types of edges in the heterograms;
for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path;
acquiring node embedding of each multiple path; aggregating node embedments of all the multiple paths to obtain graph embeddings;
and determining a software sample identification result according to the graph embedding.
In a second aspect, a system for detecting a lux software based on heterogeneous graph embedding is provided, including:
the preprocessing module is used for acquiring a software sample and acquiring various behavior characteristics of the software sample;
The diagram construction module is used for taking all behavior characteristics as nodes, taking the relation among the behavior characteristics as edges among adjacent nodes, and constructing to obtain an abnormal composition;
The diagram embedding module is used for obtaining a multi-class meta-path according to the types of edges in the heterograms; for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path; acquiring node embedding of each multiple path; aggregating node embedments of all the multiple paths to obtain graph embeddings;
and the model detection module is used for determining a software sample identification result according to the graph embedding.
In a third aspect, an electronic device is provided, including a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps described in the method for detecting a lux software based on heterogeneous graph embedding.
In a fourth aspect, a computer readable storage medium is provided for storing computer instructions that, when executed by a processor, perform the steps described in a method for detecting a lux software based on heterogeneous graph embedding.
Compared with the prior art, the invention has the beneficial effects that:
By using a hierarchical attention mechanism, node characteristics can be learned on different levels, and the structure and semantic information of the graph can be more comprehensively captured. Meanwhile, multipath sampling is carried out based on the predefined meta-paths, the obtained multiple meta-paths comprise a plurality of paths among the nodes, semantic information among different paths can be captured, so that the model can more comprehensively capture rich contexts and semantic relations among the nodes on different levels, and a deeper semantic abstraction is formed, which is helpful for processing complex behavior patterns of the Lesu software.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
FIG. 1 is a flow chart of the method for detecting Lesu software based on iso-composition embedding disclosed in example 1;
fig. 2 is a schematic structural diagram of the lux software detection system based on the iso-patterning embedding disclosed in embodiment 2.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
In this embodiment, a method for detecting Lesu software based on heterogeneous graph embedding is disclosed, as shown in FIG. 1, including:
S1: acquiring a software sample; acquiring a plurality of behavioral characteristics of a software sample, the process comprising:
S11: and obtaining a behavior analysis report of the software sample.
And building a Cuckoo sandbox environment, uploading the collected software sample to the Cuckoo sandbox environment to execute dynamic analysis, monitoring the behavior of the sample during execution, and acquiring a behavior analysis report of the sample based on the result of the dynamic analysis.
The Cuckoo sandbox is an automatic, open-source malware analysis system, and aims to provide a safe and controllable simulation environment for dynamically analyzing suspicious software samples.
The behavior analysis report of the sample records the behavior track of the software sample in the virtual environment and detailed information related to various operations in detail, and key behavior characteristics of the software sample can be extracted through the behavior analysis report, so that the identification of potential malicious behaviors is facilitated.
S12: various behavioral characteristics of the obtained software sample are extracted from the behavioral analysis report.
The lux software sample, when executed, typically interacts with a process in the system and invokes the process in the system to perform its malicious activity, which may involve the process's API call, the file's API call, and the registry's API call, etc. Thus, analysis of these processes and their associated operations may help identify potentially malicious behavior.
The present embodiment extracts various behavioral characteristics of the obtained software sample from the behavioral analysis report, including file operations, process operations, registry operations, process API calls, registry API calls, file API calls, and the like.
S2: and taking all the behavior characteristics as nodes, taking the relation among the behavior characteristics as edges among adjacent nodes, and constructing to obtain the heterograms.
The heterograph is a special graph structure, denoted asIncluding different types of nodes and edges. Heterogeneous graphs involve two key mapping functions, namely node type mapping function/>Sum edge type mapping functionWherein/>And/>Respectively representing the type sets of the nodes and the edges, satisfying/>. In the lux software dynamic analysis task, the heterogeneous graph can be used for comprehensively capturing complex associations between different entities and between the entities and the relationships, so that richer semantic information is provided for the detection task, and behavior information of a sample in operation can be more comprehensively understood and analyzed, thereby improving the accuracy and the robustness of detection.
In this embodiment, all behavior features are used as nodes, and relationships among the behavior features are used as edges between adjacent nodes, so that an abnormal pattern is constructed and obtained. The constructed iso-graph contains six types of nodes and five types of edges, specifically, nodesIncluding file operations, process operations, registry operations, process API calls, registry API calls, and file API calls. Edge(s)Five connection relationships are included, namely, a call relationship (PPA) between a process operation and a process API, a call relationship (PRA) between a process operation and a registry API, a call relationship (PFA) between a process operation and a file API, an execution relationship (RAR) between a registry API call and a registry operation, and an execution relationship (FAF) between a file API call and a file operation.
S3: obtaining a multi-class meta-path according to the types of edges in the heterograms; for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path; acquiring node embedding of each multiple path; and aggregating node embeddings of all the multiple paths to obtain graph embeddings.
The iso-graph constructed in this embodiment includes five connection relations, that is, five types of edges, each type of edge determines a meta-path, and classifies all meta-paths to obtain multi-class meta-paths.
The multi-class meta-paths include a process meta-path, a registry meta-path, and a file meta-path, as shown in table 1.
As can be seen from Table 1, the meta-paths defined by PPA type edges are process meta-paths, the meta-paths defined by PRA type and RAR type edges are registry meta-paths, and the meta-paths defined by PFA and FAF type edges are file meta-paths.
TABLE 1
Since the behavior of lux software typically involves complex relationships between multiple entity types, there are multiple levels of semantic abstraction, and thus a single meta-path may not represent well the diverse relationships between these different entity types. To overcome this challenge, the present embodiment introduces the concept of multiple paths, including multiple paths between nodes, where each multiple path describes a type of connection relationship, so that when determining graph embedding, rich context and semantic relationships between nodes can be more comprehensively captured at different levels, forming a deeper semantic abstraction, which helps to handle complex behavioral patterns of the lux software.
In this embodiment, for any two types of element paths, edges having the same path in the two types of element paths are determined, and the edges having the same path are connected to obtain a multiple element path.
Specifically, based on the determined multi-class element paths, sampling the common paths of the multi-class element paths in pairs respectively to realize definition of the multi-class element paths, and the specific steps are as follows:
constructing an adjacency matrix for each of the two types of meta paths;
Determining edges with the same path in the two types of element paths through an adjacency matrix;
and performing dot product operation on the two adjacent matrixes with the edges with the same path, so as to connect the edges with the same path and obtain a multi-element path.
When the adjacency matrix is constructed by the index data, performing de-duplication operation on all the index data, constructing an initial adjacency matrix by the index data after de-duplication operation, and removing elements on diagonal lines in the initial adjacency matrix to obtain a final adjacency matrix.
Here, a process of determining multiple element paths is described by selecting one element path from the process element path and the registry element path, where the two element paths are respectively:
meta-path 1: "Process API call-Process operation-Process API call".
Meta path 2: "registry API call-Process operation-registry API call".
Two numpy arrays are used to store the side information of the two element paths respectively, numpy is an open source numerical calculation extension.
The array stores the side information of the meta-path 1, each row represents one side, each row comprises two columns, the first column is the index of the source node of the meta-path 1, and the second column is the index of the target node of the meta-path 1. The array two stores the side information of the meta-path 2, and each row represents one side, and each row comprises two columns, wherein the first column is the index of the source node of the meta-path 2, and the second column is the index of the target node of the meta-path 2.
And performing deduplication operation on the first array and the second array respectively to ensure that each row is unique.
Constructing an initial adjacent matrix M1 by using index data in the first array after the duplication removal, and constructing another initial adjacent matrix M2 by using index data in the second array after the duplication removal; the elements on the diagonal of the two initial adjacency matrices are removed to obtain the adjacency matrix of the final two-element path, which is because the elements on the diagonal usually represent the connection relationship between the node and itself, and noise may be introduced.
And carrying out common path sampling on the two element paths, namely, finding the edges with the same paths by checking the adjacent matrixes corresponding to the two element paths.
Specifically, meta-path 1 and meta-path 2 both include a node of "process operation", so that a common edge is obtained after sampling the common path, for example: "Process operation-Process API call" and "Process operation-registry API call".
Performing dot product operation on two adjacent matrixes with edges of the same path to generate a new adjacent matrix, wherein the new adjacent matrix obtained by the step is defined multiple paths, such as: "Process API call-Process operation-registry API call".
Before the common path sampling, there are two original adjacency matrices (or adjacency matrices defined by different element paths), edges with the same path are extracted from the two matrices by common sampling, and then dot product operation is performed on the two adjacency matrices on the basis of the edges of the common paths. The result of this dot product operation is a new adjacency matrix containing information about the common paths, reflecting the interaction between the two original matrices and being considered as a representation of the multiple paths. Thus, this dot product operation can be considered as an integration operation on the common path of the two matrices, so that the final adjacency matrix captures better the structural information of the multiple paths.
After the multiple paths are acquired, the multiple paths are analyzed through the heterograph attention network, and graph embedding is obtained.
The heterograph attention network (Heterogeneous Graph Attention Network, HAN) model is a deep learning model for processing heterograph data, which adopts a multi-layer attention mechanism and meta-paths to learn embedded representations of nodes, and can effectively capture complex structure information in the heterograph through hierarchical attention aggregation of different node types and relationships, thereby providing efficient representation learning for node classification tasks.
Specifically, the HAN model includes two stages of attention mechanisms, node level attention and semantic level attention. In node level attention, the model learns the weights between nodes and the neighbor nodes based on meta-paths, giving more attention to neighbor nodes that are more relevant to the node. In semantic level attention, the HAN model enables the model to learn more fully the complex semantic relationships between different types of nodes by learning weights between different meta-paths. This design makes the HAN model excellent in processing the heterograph data, providing a powerful modeling capability for node classification tasks.
The node embedding of each multiple path is obtained by using the node level attention; and aggregating node embeddings of all the multiple paths based on semantic level attention to obtain graph embeddings. The specific process is as follows:
S31: determining the neighbor nodes of each node i in the multiple paths, determining the attention weight of the neighbor nodes of each node i by using the node level attention, summarizing the attention weights of all the neighbor nodes of the node i to obtain node embedments of the node i, and carrying out weighted aggregation on the node embedments of all the node i to obtain the node embedments of the multiple paths.
The method comprises aggregating the interior of multiple paths, mapping the features of different types of nodes into the same low-dimensional space by linear transformation to ensure that the representations of different types of nodes are located in the same dimension, and giving a transformation matrixFor vector/>Mapped results/>Can be expressed as/>
Based on each multiple path to find the neighbor node of each node contained in the multiple path, calculating each neighbor node pair by using the node level attention of HAN algorithmWeights between, i.e. node/>Pair node/>Is then usedThe function normalizes the weights of all the neighbor nodes of the node i to obtain the attention weight of each neighbor node, and finally the calculated node/>Summarizing the attention weights of all neighbor nodes to obtain node/>Is embedded in the node of (a).
Extending node level attention to multi-head attention, i.e. repetitionSecondary node level attention operations and concatenating learned embeddings to get/>The group node is embedded.
Specifically, given a multiple path setAfter inputting the node characteristics into the node level attention, the/>Group node embedding, noted/>
Wherein,Is/>Multiple paths,/>Is/>Multiple paths/>Is embedded in the node of (a).
S32: determining the attention weight of each multiple element path by using semantic level attention; and carrying out weighted aggregation on node embedding of all nodes by using the attention weights of all the multiple paths to obtain graph embedding.
The step is aggregation among multiple element paths, and because the heterogram contains various types of semantic information, the embedding of a single node cannot comprehensively reflect the node, so that the semantic information of the element paths is also required to be fused.
Embedding the nodes determined in S32 as input of semantic level attention, learning the attention weight of each multiple path, usingThe function normalizes the learned attention weight of each multiple path to obtain the weight of each multiple path.
And finally, fusing the weights of the node level and the weights of the semantic level in a linear weighted summation mode, namely weighting and aggregating the node embedding of all the nodes by using the attention weights of all the multiple paths to obtain the final graph embedding. The process can retain node level and semantic level information, thereby obtaining a more comprehensive node representation.
S4: and determining a software sample identification result according to the graph embedding.
The embodiment realizes the classification of the software sample based on the embedding of the multi-layer perceptron (MLP) model recognition graph, and determines the recognition result of the software sample, and the specific steps are as follows:
Using the trained MLP model as a classifier of the Lecable software, inputting a final embedded vector obtained from a graph embedded network layer into the MLP model, and learning a high-level complex relationship among nodes in the graph through nonlinear transformation of a plurality of hidden layers; and then use The activation function maps the output vector of the MLP model to the [0, 1] interval to obtain a classified probability value; finally, classifying the samples into normal and malicious samples by using the probability values, and detecting the Leuco software.
In this embodiment, m lux software samples and n benign software samples are collected respectively to obtain a lux software training set r= {Sum benign software training set b= {/>}。
Determining the graph embedding of each sample in the training samples according to the steps S1-S3, taking the graph embedding of the training samples and the corresponding labels as training data, training the MLP model, and obtaining the trained MLP model after the training is completed.
In the training process, binary cross entropy is used as a loss function, and model parameters are continuously adjusted to optimize the model, so that the accuracy of sample classification is improved.
The method disclosed by the embodiment can learn node characteristics on different levels by using a hierarchical attention mechanism, and more comprehensively capture the structure and semantic information of the graph. Meanwhile, multipath sampling is carried out based on the predefined meta-paths, the obtained multiple meta-paths comprise a plurality of paths among the nodes, semantic information among different paths can be captured, so that the model can more comprehensively capture rich contexts and semantic relations among the nodes on different levels, and a deeper semantic abstraction is formed, which is helpful for processing complex behavior patterns of the Lesu software.
Example 2
In this embodiment, a lux software detection system based on heterogeneous graph embedding is disclosed, as shown in fig. 2, comprising:
the preprocessing module is used for acquiring a software sample and acquiring various behavior characteristics of the software sample;
The diagram construction module is used for taking all behavior characteristics as nodes, taking the relation among the behavior characteristics as edges among adjacent nodes, and constructing to obtain an abnormal composition;
The diagram embedding module is used for obtaining a multi-class meta-path according to the types of edges in the heterograms; for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path; acquiring node embedding of each multiple path; aggregating node embedments of all the multiple paths to obtain graph embeddings;
and the model detection module is used for determining a software sample identification result according to the graph embedding.
The invention also discloses an electronic device, which comprises a memory, a processor and computer instructions stored on the memory and running on the processor, wherein the computer instructions, when being run by the processor, complete the steps of the method for detecting the Lesu software based on the different composition embedding disclosed in the embodiment 1.
The invention also discloses a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the abnormal pattern embedding-based lux software detection method disclosed in embodiment 1.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, and any modifications and equivalents are intended to be included within the scope of the invention.

Claims (9)

1. The method for detecting the Lesu software based on the heterogeneous graph embedding is characterized by comprising the following steps of:
acquiring a software sample;
Acquiring various behavior characteristics of a software sample;
Taking all behavior features as nodes, taking the relation among the behavior features as edges among adjacent nodes, and constructing to obtain an abnormal composition;
Obtaining a multi-class meta-path according to the types of edges in the heterograms;
for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path;
Acquiring node embeddings of each multiple path, and aggregating the node embeddings of all the multiple paths to acquire a graph embedment;
Determining a software sample identification result according to the graph embedding;
for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path, wherein the multiple element path comprises the following steps: constructing an adjacency matrix for each of the two types of meta paths; determining edges with the same path in the two types of element paths through an adjacency matrix; and performing dot product operation on the two adjacent matrixes with the edges with the same path, so as to connect the edges with the same path and obtain a multi-element path.
2. The method for detecting the lux software based on the heterogeneous graph embedding of claim 1, wherein a behavior analysis report of a software sample is obtained;
Various behavioral characteristics of the obtained software sample are extracted from the behavioral analysis report.
3. The heterogeneous graph embedded based lux software detection method of claim 1, wherein the plurality of behavioral characteristics comprises file operations, process operations, registry operations, process API calls, registry API calls, and file API calls;
the multi-class meta-paths include a process meta-path, a registry meta-path, and a file meta-path.
4. The method for detecting the lux software based on the heterogeneous graph embedding of claim 1, wherein the source node and the target node of the edge in the meta-path are used as index data to construct an adjacency matrix, when the adjacency matrix is constructed through the index data, all the index data are subjected to de-duplication operation, an initial adjacency matrix is constructed through the index data after de-duplication operation, and elements on diagonal lines in the initial adjacency matrix are removed to obtain a final adjacency matrix.
5. The method for detecting the lux software based on the heterogeneous graph embedding of claim 1, wherein the node embedding of each multiple path is obtained by using the node level attention; and aggregating node embeddings of all the multiple paths based on semantic level attention to obtain graph embeddings.
6. The method for detecting the le-sosoloid software based on the heterogeneous graph embedding of claim 5, wherein the neighbor node of each node i in the multiple paths is determined, the attention weight of the neighbor node of each node i is determined by using the attention of the node level, the attention weights of all the neighbor nodes of the node i are summarized to obtain the node embedding of the node i, and the node embedding of all the node i is weighted and aggregated to obtain the node embedding of the multiple paths;
Determining the attention weight of each multiple element path by using semantic level attention; and carrying out weighted aggregation on node embedding of all nodes by using the attention weights of all the multiple paths to obtain graph embedding.
7. The system for detecting the Lesu software based on heterogeneous graph embedding is characterized by comprising the following components:
the preprocessing module is used for acquiring a software sample and acquiring various behavior characteristics of the software sample;
The diagram construction module is used for taking all behavior characteristics as nodes, taking the relation among the behavior characteristics as edges among adjacent nodes, and constructing to obtain an abnormal composition;
The diagram embedding module is used for obtaining a multi-class meta-path according to the types of edges in the heterograms; for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path; acquiring node embedding of each multiple path; aggregating node embedments of all the multiple paths to obtain graph embeddings;
the model detection module is used for determining a software sample identification result according to the graph embedding;
for any two types of element paths, determining edges with the same paths in the two types of element paths, and connecting the edges with the same paths to obtain a multiple element path, wherein the multiple element path comprises the following steps: constructing an adjacency matrix for each of the two types of meta paths; determining edges with the same path in the two types of element paths through an adjacency matrix; and performing dot product operation on the two adjacent matrixes with the edges with the same path, so as to connect the edges with the same path and obtain a multi-element path.
8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the heterogeneous graph embedding-based lux software detection method of any of claims 1-6.
9. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method for detecting lux software based on heterogeneous graph embedding of any of claims 1-6.
CN202410166029.0A 2024-02-06 2024-02-06 Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding Active CN117708821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410166029.0A CN117708821B (en) 2024-02-06 2024-02-06 Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410166029.0A CN117708821B (en) 2024-02-06 2024-02-06 Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding

Publications (2)

Publication Number Publication Date
CN117708821A CN117708821A (en) 2024-03-15
CN117708821B true CN117708821B (en) 2024-04-30

Family

ID=90155672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410166029.0A Active CN117708821B (en) 2024-02-06 2024-02-06 Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding

Country Status (1)

Country Link
CN (1) CN117708821B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577710A (en) * 2017-08-01 2018-01-12 广州市香港科大霍英东研究院 Recommendation method and device based on Heterogeneous Information network
CN113095439A (en) * 2021-04-30 2021-07-09 东南大学 Heterogeneous graph embedding learning method based on attention mechanism
CN113704566A (en) * 2021-10-29 2021-11-26 贝壳技术有限公司 Identification number body identification method, storage medium and electronic equipment
CN114141375A (en) * 2021-12-10 2022-03-04 哈尔滨工业大学(深圳) Heterogeneous graph representation method, device, equipment and storage medium for disease prediction
CN115146312A (en) * 2022-07-04 2022-10-04 广西师范大学 Social influence prediction method and system based on heterogeneous graph neural network privacy protection
CN115344863A (en) * 2022-08-19 2022-11-15 重庆邮电大学 Malicious software rapid detection method based on graph neural network
CN115391778A (en) * 2022-08-16 2022-11-25 广东工业大学 Android malicious program detection method and device based on special-pattern attention network
CN115660688A (en) * 2022-10-24 2023-01-31 西南财经大学 Financial transaction abnormity detection method and cross-region sustainable training method thereof
CN116010947A (en) * 2021-09-03 2023-04-25 西安胡门网络技术有限公司 Android malicious software detection method based on heterogeneous network
CN116204882A (en) * 2023-01-05 2023-06-02 北京航空航天大学 Android malicious software detection method and device based on different composition
CN116305111A (en) * 2023-01-13 2023-06-23 河北师范大学 Primitive path embedding-based graph neural network android malicious software detection method
CN116467710A (en) * 2023-03-21 2023-07-21 重庆邮电大学 Unbalanced network-oriented malicious software detection method
CN116804997A (en) * 2023-07-19 2023-09-26 中国人民解放军国防科技大学 Chinese similar case recommending method, device and equipment based on graph neural network
CN116894180A (en) * 2023-09-11 2023-10-17 南京航空航天大学 Product manufacturing quality prediction method based on different composition attention network
CN117113350A (en) * 2023-09-11 2023-11-24 上海计算机软件技术开发中心 Path self-adaption-based malicious software detection method, system and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11463472B2 (en) * 2018-10-24 2022-10-04 Nec Corporation Unknown malicious program behavior detection using a graph neural network
CN112257066B (en) * 2020-10-30 2021-09-07 广州大学 Malicious behavior identification method and system for weighted heterogeneous graph and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577710A (en) * 2017-08-01 2018-01-12 广州市香港科大霍英东研究院 Recommendation method and device based on Heterogeneous Information network
CN113095439A (en) * 2021-04-30 2021-07-09 东南大学 Heterogeneous graph embedding learning method based on attention mechanism
CN116010947A (en) * 2021-09-03 2023-04-25 西安胡门网络技术有限公司 Android malicious software detection method based on heterogeneous network
CN113704566A (en) * 2021-10-29 2021-11-26 贝壳技术有限公司 Identification number body identification method, storage medium and electronic equipment
CN114141375A (en) * 2021-12-10 2022-03-04 哈尔滨工业大学(深圳) Heterogeneous graph representation method, device, equipment and storage medium for disease prediction
CN115146312A (en) * 2022-07-04 2022-10-04 广西师范大学 Social influence prediction method and system based on heterogeneous graph neural network privacy protection
CN115391778A (en) * 2022-08-16 2022-11-25 广东工业大学 Android malicious program detection method and device based on special-pattern attention network
CN115344863A (en) * 2022-08-19 2022-11-15 重庆邮电大学 Malicious software rapid detection method based on graph neural network
CN115660688A (en) * 2022-10-24 2023-01-31 西南财经大学 Financial transaction abnormity detection method and cross-region sustainable training method thereof
CN116204882A (en) * 2023-01-05 2023-06-02 北京航空航天大学 Android malicious software detection method and device based on different composition
CN116305111A (en) * 2023-01-13 2023-06-23 河北师范大学 Primitive path embedding-based graph neural network android malicious software detection method
CN116467710A (en) * 2023-03-21 2023-07-21 重庆邮电大学 Unbalanced network-oriented malicious software detection method
CN116804997A (en) * 2023-07-19 2023-09-26 中国人民解放军国防科技大学 Chinese similar case recommending method, device and equipment based on graph neural network
CN116894180A (en) * 2023-09-11 2023-10-17 南京航空航天大学 Product manufacturing quality prediction method based on different composition attention network
CN117113350A (en) * 2023-09-11 2023-11-24 上海计算机软件技术开发中心 Path self-adaption-based malicious software detection method, system and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于机器学习的勒索软件检测方法;项子豪;邱卫东;;信息技术;20180525(第05期);全文 *

Also Published As

Publication number Publication date
CN117708821A (en) 2024-03-15

Similar Documents

Publication Publication Date Title
Martín et al. CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains
CN112491796B (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN111600919B (en) Method and device for constructing intelligent network application protection system model
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
JP2005523533A (en) Processing mixed numeric and / or non-numeric data
JP4940220B2 (en) Abnormal operation detection device and program
Dua Attribute selection and ensemble classifier based novel approach to intrusion detection system
Torres et al. Active learning approach to label network traffic datasets
CN113360912A (en) Malicious software detection method, device, equipment and storage medium
Yang et al. TuningMalconv: malware detection with not just raw bytes
Udayakumar et al. Malware classification using machine learning algorithms
Wang et al. Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things
Al-Shabi Design of a network intrusion detection system using complex deep neuronal networks
CN111400713A (en) Malicious software family classification method based on operation code adjacency graph characteristics
Shao et al. Deep learning hierarchical representation from heterogeneous flow-level communication data
CN117240632B (en) Attack detection method and system based on knowledge graph
Manthena et al. Analyzing and Explaining Black-Box Models for Online Malware Detection
CN117708821B (en) Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding
CN115567305B (en) Sequential network attack prediction analysis method based on deep learning
Achar et al. Statistical significance of episodes with general partial orders
Britel Big data analytic for intrusion detection system
KR102212310B1 (en) System and method for detecting of Incorrect Triple
Mohanty et al. Improving Suspicious URL Detection through Ensemble Machine Learning Techniques
Lai et al. Detecting network intrusions using signal processing with query-based sampling filter
Mokhtar et al. A review of evidence extraction techniques in big data environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant