CN115391778A - Android malware detection method and device based on heterogeneous graph attention network - Google Patents
Android malware detection method and device based on heterogeneous graph attention network Download PDFInfo
- Publication number
- CN115391778A CN115391778A CN202210983464.3A CN202210983464A CN115391778A CN 115391778 A CN115391778 A CN 115391778A CN 202210983464 A CN202210983464 A CN 202210983464A CN 115391778 A CN115391778 A CN 115391778A
- Authority
- CN
- China
- Prior art keywords
- android
- meta
- node
- attention network
- heterogeneous graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 39
- 239000011159 matrix material Substances 0.000 claims abstract description 67
- 239000013598 vector Substances 0.000 claims abstract description 30
- 238000007477 logistic regression Methods 0.000 claims abstract description 25
- 239000000284 extract Substances 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000009434 installation Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000002372 labelling Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/561—Virus type analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Virology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种基于异构图注意力网络的安卓恶意程序检测方法,包括以下步骤:S1:下载APP并进行标签;S2:对APK进行反编译,并提取得到多种关键特征实体;S3:构建异构图注意力网络,将异构图注意力网络转化为多个元结构,计算得到各个元结构的邻接矩阵;S4:获取低维向量嵌入;S5:训练逻辑回归模型,以及获取待检测的安卓应用程序的节点嵌入;S6:得到检测结果。本发明还提供一种基于异构图注意力网络的安卓恶意程序检测装置,用于实现所述的一种基于异构图注意力网络的安卓恶意程序检测方法。本发明提供一种基于异构图注意力网络的安卓恶意程序检测方法和装置,解决了现有的恶意程序检测技术无法有效针对安卓恶意应用程序进行分类检测的问题。
The present invention provides a method for detecting an Android malicious program based on a heterogeneous graph attention network, comprising the following steps: S1: downloading an APP and labeling it; S2: decompiling the APK, and extracting various key feature entities; S3: Build a heterogeneous graph attention network, convert the heterogeneous graph attention network into multiple meta-structures, and calculate the adjacency matrix of each meta-structure; S4: Obtain low-dimensional vector embedding; S5: Train the logistic regression model, and obtain the detection The node embedding of the Android application program; S6: Get the detection result. The present invention also provides an Android malware detection device based on a heterogeneous graph attention network, which is used to implement the above-mentioned Android malware detection method based on a heterogeneous graph attention network. The present invention provides a method and device for detecting Android malicious programs based on a heterogeneous graph attention network, which solves the problem that the existing malicious program detection technology cannot effectively classify and detect Android malicious applications.
Description
技术领域technical field
本发明涉及信息安全的技术领域,更具体的,涉及一种基于异构图注意力网络的安卓恶意程序检测方法和装置。The present invention relates to the technical field of information security, and more specifically, to a method and device for detecting Android malicious programs based on a heterogeneous graph attention network.
背景技术Background technique
在互联网服务的高速发展带动下,移动应用已经进入到大众生活的各个方面,例如通信、金融、出行、娱乐等等,目前安卓已经是全球智能手机市场最大的操作系统平台,安卓平台的扩展性和开放性导致用户面临各种恶意程序的威胁和攻击,包括隐私侵犯、数据泄露、垃圾广告以及一些涉及用户个人财产安全的交易支付操作等,因此安卓恶意程序的识别与检测方法的研究具有重要的应用价值。Driven by the rapid development of Internet services, mobile applications have entered all aspects of public life, such as communication, finance, travel, entertainment, etc. At present, Android is the largest operating system platform in the global smartphone market, and the scalability of the Android platform Due to its openness and openness, users are faced with threats and attacks from various malicious programs, including privacy violations, data leaks, spam advertisements, and some transaction payment operations involving the safety of users’ personal property. Therefore, the research on identification and detection methods of Android malicious programs is of great importance. application value.
传统安卓恶意程序检测方法有针对APK文件的静态分析,对其中的项目清单以及代码文件、资源文件进行特征表征,再通过相似性比较,来判断是否为恶意,然而这种方法可能会因为简单的模糊处理而无法有效识别利用代码混淆技术和安卓漏洞的恶意应用程序,动态分析方法则是通过运行程序代码的方式收集系统信息,包括系统调用、API调用、网络信息等构建特征库,再通过相似性比较进行识别,弊端是严重依赖操作系统的版本和程序的运行时间,为了解决这个问题,目前基于机器学习算法的检测技术通过提取关键特征并应用分类算法来区分恶意还是良性,然而这种方法没有考虑到节点之间丰富的语义信息,无法检测出特征伪装隐藏的恶意应用程序。Traditional Android malware detection methods include static analysis of APK files, characterizing the list of items, code files, and resource files, and then comparing similarities to determine whether they are malicious. However, this method may be because of simple Malicious applications using code obfuscation technology and Android vulnerabilities cannot be effectively identified through fuzzy processing. The dynamic analysis method is to collect system information by running program codes, including system calls, API calls, network information, etc. to build a feature library, and then use similar The disadvantage is that it depends heavily on the version of the operating system and the running time of the program. In order to solve this problem, the current detection technology based on machine learning algorithms extracts key features and applies classification algorithms to distinguish between malicious and benign. However, this method Without taking into account the rich semantic information between nodes, it is impossible to detect malicious applications whose features are disguised and hidden.
因此,现有的恶意程序检测技术无法有效针对安卓恶意应用程序进行分类检测。Therefore, the existing malicious program detection technology cannot effectively classify and detect Android malicious applications.
发明内容Contents of the invention
本发明为克服现有的恶意程序检测技术无法有效针对安卓恶意应用程序进行分类检测的技术缺陷,提供一种基于异构图注意力网络的安卓恶意程序检测方法和装置。In order to overcome the technical defect that the existing malicious program detection technology cannot effectively classify and detect Android malicious applications, the present invention provides a method and device for detecting Android malicious programs based on a heterogeneous graph attention network.
为解决上述技术问题,本发明的技术方案如下:In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:
一种基于异构图注意力网络的安卓恶意程序检测方法,包括以下步骤:A kind of Android malware detection method based on heterogeneous graph attention network, comprises the following steps:
S1:下载安卓应用程序APP并进行标签,得到安卓应用程序集合;其中,安卓应用程序包括良性安卓应用程序和恶意安卓应用程序;S1: Download the Android application APP and label it to obtain a set of Android applications; wherein, the Android application includes a benign Android application and a malicious Android application;
S2:对安卓应用程序的安装包APK进行反编译,并从反编译后的文件中提取得到多种关键特征实体;S2: Decompile the installation package APK of the Android application program, and extract various key feature entities from the decompiled file;
S3:根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,将异构图注意力网络转化为多个元结构,计算得到各个元结构的邻接矩阵;S3: Construct a heterogeneous graph attention network based on the relationship between Android applications and key feature entities, transform the heterogeneous graph attention network into multiple meta-structures, and calculate the adjacency matrix of each meta-structure;
S4:根据元结构的邻接矩阵获取已有节点的低维向量嵌入;S4: Obtain the low-dimensional vector embedding of existing nodes according to the adjacency matrix of the meta structure;
S5:利用已有节点的低维向量嵌入和标签训练逻辑回归模型,得到训练好的逻辑回归模型,以及获取待检测的安卓应用程序的节点嵌入;S5: Use the low-dimensional vector embedding and label of the existing nodes to train the logistic regression model, obtain the trained logistic regression model, and obtain the node embedding of the Android application to be detected;
S6:将待检测的安卓应用程序的节点嵌入输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。S6: Embed the nodes of the Android application to be detected into the trained logistic regression model for detection, and obtain a detection result indicating whether the Android application to be detected is malicious or benign.
上述方案中,首先通过对APK反编译提取得到多种关键特征实体,根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,并将异构图注意力网络转化为多个元结构,然后由元结构的邻接矩阵获取已有节点的低维向量嵌入,利用低维向量嵌入和标签训练逻辑回归模型,最后获取待检测的安卓应用程序的节点嵌入并输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。In the above scheme, firstly, a variety of key feature entities are extracted by decompiling the APK, and a heterogeneous graph attention network is constructed according to the relationship between the Android application and the key feature entities, and the heterogeneous graph attention network is transformed into multiple Metastructure, then obtain the low-dimensional vector embedding of existing nodes from the adjacency matrix of the metastructure, use the low-dimensional vector embedding and label training logistic regression model, finally obtain the node embedding of the Android application to be tested and input the trained logistic regression The model detects and obtains the detection result that the Android application to be detected is malicious or benign.
优选的,所述关键特征实体包括API、权限、权限类型、类、接口和so文件。Preferably, the key feature entities include API, authority, authority type, class, interface and so file.
优选的,根据安卓应用程序与关键特征实体之间的关系形成图内关系矩阵Rlin,l∈[1,6];其中,R1in表示App与API之间的关系,R2in表示App与权限之间的关系,R3in表示App所属的权限类型,R4in表示App与类之间的关系,R5in表示App与接口之间的关系,R6in表示App与so文件之间的关系。Preferably, the relationship matrix Rlin in the graph is formed according to the relationship between the Android application program and the key feature entity, l∈[1,6]; wherein, R1 in represents the relationship between the App and the API, and R2 in represents the relationship between the App and the authority R3 in indicates the permission type of the App, R4 in indicates the relationship between the App and the class, R5 in indicates the relationship between the App and the interface, and R6 in indicates the relationship between the App and the so file.
优选的,所述异构图注意力网络为图G=(V,E,A,R),其节点的类型包括APP、API、权限、权限类型、类、接口和so文件,边的类型包括R1in、R2in、R3in、R4in、R5in和R6in;其中,V表示节点的集合,E表示边的集合,A表示节点的类型集,R表示边的类型集,|A|+|R|>2。Preferably, the heterogeneous graph attention network is a graph G=(V, E, A, R), the types of its nodes include APP, API, authority, authority type, class, interface and so file, and the types of edges include R1 in , R2 in , R3 in , R4 in , R5 in and R6 in ; among them, V represents the set of nodes, E represents the set of edges, A represents the type set of nodes, R represents the type set of edges, |A|+ |R|>2.
优选的,所述元结构为元路径或元图,所述元路径是在异构图注意力网络上定义的路径,源对象和目标对象位于路径的两端,若源对象和目标对象之间有多条元路径则构成元图。Preferably, the meta-structure is a meta-path or a meta-graph, the meta-path is a path defined on a heterogeneous graph attention network, the source object and the target object are located at both ends of the path, if the source object and the target object Multiple meta-paths constitute a meta-graph.
优选的,由K个元结构的邻接矩阵组成邻接矩阵集合{ΨM1,...,ΨMk,...,ΨMK},元结构的邻接矩阵为元路径的邻接矩阵或元图的邻接矩阵,Preferably, the adjacency matrix set {Ψ M1 ,...,Ψ Mk ,...,Ψ MK } is composed of K adjacency matrices of the meta-structure, and the adjacency matrix of the meta-structure is the adjacency matrix of the meta-path or the adjacency of the meta-graph matrix,
其中,元路径的邻接矩阵计算公式为:Among them, the calculation formula of the adjacency matrix of the meta-path is:
ΨMP=RA1A2·...·RAiA(i+1)·...·RA(n-1)An Ψ MP =R A1A2 ·...·R AiA(i+1) ·...·R A(n-1)An
元图的邻接矩阵计算公式为:The formula for calculating the adjacency matrix of the meta graph is:
ΨRG=ΨMP1⊙...⊙ΨMPj⊙...⊙ΨMPm;Ψ RG = Ψ MP1 ⊙...⊙Ψ MPj ⊙...⊙Ψ MPm ;
其中,ΨMk表示第k个元结构的邻接矩阵,RAiA(i+1)表示第i个节点和第i+1个节点之间的关系矩阵,i=1,2,...,n,n表示元路径节点的数量,ΨMPj表示第j个ΨMP,⊙表示哈达玛积,m表示ΨMP的数量。Among them, Ψ Mk represents the adjacency matrix of the k-th element structure, R AiA(i+1) represents the relationship matrix between the i-th node and the i+1-th node, i=1,2,...,n , n represents the number of meta-path nodes, Ψ MPj represents the jth Ψ MP , ⊙ represents the Hadamard product, and m represents the number of Ψ MP .
优选的,步骤S4包括以下步骤:Preferably, step S4 includes the following steps:
S41:以one-hot向量形式对每个节点进行编码,得到矩阵H,将H和给定元结构Mk的邻接矩阵结合起来,通过归一化操作获得元结构内部节点的邻接矩阵:S41: Encode each node in the form of a one-hot vector to obtain a matrix H, combine H with the adjacency matrix of a given metastructure Mk, and obtain the adjacency matrix of the internal nodes of the metastructure through a normalization operation:
ΨMk’=Normalize(H·HT⊙ΨMk)Ψ Mk' =Normalize(H·H T ⊙Ψ Mk )
并引入边缘权重感知的GAT模型更新元结构Mk内部节点嵌入ΦMk=GAT(H;ΨMk’);And introduce the GAT model of edge weight perception to update the meta-structure Mk internal node embedding Φ Mk = GAT(H; Ψ Mk' );
S42:利用多层感知器学习融合中每个元结构Mk的权重βMk,S42: Using a multi-layer perceptron to learn the weight β Mk of each meta-structure Mk in the fusion,
(βM1,...,βMk,...,βMK)=softmax(NN(ΦM1),...,NN(ΦMk),...,NN(ΦMK))(β M1 ,...,β Mk ,...,β MK )=softmax(NN(Φ M1 ),...,NN(Φ Mk ),...,NN(Φ MK ))
其中,NN是将给定矩阵映射为数值的原生神经网络,Among them, NN is a native neural network that maps a given matrix to a numerical value,
从而获得已有节点的低维向量嵌入:To obtain the low-dimensional vector embedding of existing nodes:
优选的,在步骤S5中,通过以下步骤获取待检测的安卓应用程序的节点嵌入:Preferably, in step S5, the node embedding of the Android application program to be detected is obtained through the following steps:
S51:根据待检测的安卓应用程序与关键特征实体之间的关系形成图外关系矩阵Rlout,l∈[1,6];S51: Form an out-of-graph relationship matrix Rl out according to the relationship between the Android application to be detected and the key feature entity, l∈[1,6];
S52:形成节点邻接矩阵的增量段形式为j行列矩阵,j表示图内节点的个数,矩阵的第j行数值代表新节点与图内节点vj之间元结构的数量;S52: Form an incremental segment of the node adjacency matrix The form is a matrix of j rows and columns, j represents the number of nodes in the graph, and the value of the jth row of the matrix Represents the number of metastructures between the new node and the node v j in the graph;
S53:使用top-k算法对进行排序,选出数值较大的前t个图内节点作为图内邻居节点vs,s=1,2,...,t,聚合新节点与图内邻居节点的向量,得到待检测的安卓应用程序的节点嵌入:S53: Use the top-k algorithm pair Sort, select the first t nodes in the graph with larger values as the neighbor nodes v s in the graph, s=1,2,...,t, aggregate the vectors of the new node and the neighbor nodes in the graph, and obtain the to-be-detected Node Embedding for Android Apps:
其中,表示vs在元路径Mk上的权重,表示新节点与图内邻居节点vs之间元结构的数量。in, Indicates the weight of v s on the meta-path Mk, Indicates the number of metastructures between the new node and its neighbor nodes vs s in the graph.
优选的,逻辑回归模型输出的预测值为:Preferably, the predicted value of the logistic regression model output is:
其中,b表示偏移参数,w表示权重,表示待检测的安卓应用程序的节点嵌入;Among them, b represents the offset parameter, w represents the weight, a node embedding representing the Android application to be detected;
当逻辑回归模型输出的预测值a大于0.5,则得到检测结果为恶意,否则,得到检测结果为良性。When the predicted value a output by the logistic regression model is greater than 0.5, the detection result is malicious; otherwise, the detection result is benign.
一种基于异构图注意力网络的安卓恶意程序检测装置,用于实现所述的一种基于异构图注意力网络的安卓恶意程序检测方法,包括:An Android malware detection device based on a heterogeneous graph attention network, used to implement the described Android malware detection method based on a heterogeneous graph attention network, comprising:
特征工程模块,用于对APP进行标签,并将APK反编译,提取关键特征实体;The feature engineering module is used to label the APP, decompile the APK, and extract key feature entities;
图构建模块,用于根据安卓应用程序与关键特征实体之间的关系以点和边的形式构建异构图注意力网络;还用于将异构图注意力网络转化为多个元结构,并计算各个元结构的邻接矩阵;Graph building blocks for constructing heterogeneous graph attention networks in the form of points and edges based on relationships between Android apps and key feature entities; also for transforming heterogeneous graph attention networks into multiple metastructures, and Calculate the adjacency matrix of each meta-structure;
节点聚合模块,用于获取安卓应用程序的节点嵌入,以及根据元结构的邻接矩阵获取节点的低维向量嵌入;The node aggregation module is used to obtain the node embedding of the Android application, and obtain the low-dimensional vector embedding of the node according to the adjacency matrix of the meta structure;
检测模块,用于通过节点的低维向量嵌入和标签学习分类,以及根据待检测的安卓应用程序的节点嵌入进行检测,输出待检测的安卓应用程序为恶意或良性的检测结果。The detection module is used to learn and classify through the low-dimensional vector embedding and label of the node, and detect according to the node embedding of the Android application to be detected, and output the detection result that the Android application to be detected is malicious or benign.
与现有技术相比,本发明技术方案的有益效果是:Compared with the prior art, the beneficial effects of the technical solution of the present invention are:
本发明提供了一种基于异构图注意力网络的安卓恶意程序检测方法和装置,首先通过对APK反编译提取得到多种关键特征实体,根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,并将异构图注意力网络转化为多个元结构,然后由元结构的邻接矩阵获取已有节点的低维向量嵌入,利用低维向量嵌入和标签训练逻辑回归模型,最后获取待检测的安卓应用程序的节点嵌入并输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。The present invention provides a method and device for detecting an Android malicious program based on a heterogeneous graph attention network. First, multiple key feature entities are obtained by decompiling and extracting the APK, and a heterogeneous program is constructed according to the relationship between the Android application program and the key feature entity. Construct the attention network and convert the heterogeneous graph attention network into multiple meta-structures, then obtain the low-dimensional vector embedding of the existing nodes from the adjacency matrix of the meta-structure, and use the low-dimensional vector embedding and label to train the logistic regression model, Finally, the node embedding of the Android application to be detected is obtained and input into the trained logistic regression model for detection, and the detection result that the Android application to be detected is malicious or benign is obtained.
附图说明Description of drawings
图1为本发明的技术方案实施步骤流程图;Fig. 1 is a flowchart of implementation steps of the technical solution of the present invention;
图2为本发明中异构图注意力网络的结构示意图;Fig. 2 is a schematic structural diagram of a heterogeneous graph attention network in the present invention;
图3为本发明中元路径的结构示意图;Fig. 3 is a structural schematic diagram of the meta path in the present invention;
图4为本发明中元图的结构示意图。Fig. 4 is a schematic structural diagram of a metagraph in the present invention.
具体实施方式Detailed ways
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.
下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.
实施例1Example 1
如图1所示,一种基于异构图注意力网络的安卓恶意程序检测方法,包括以下步骤:As shown in Figure 1, a method for detecting Android malware based on a heterogeneous graph attention network includes the following steps:
S1:下载安卓应用程序APP并进行标签,得到安卓应用程序集合;其中,安卓应用程序包括良性安卓应用程序和恶意安卓应用程序;S1: Download the Android application APP and label it to obtain a set of Android applications; wherein, the Android application includes a benign Android application and a malicious Android application;
S2:对安卓应用程序的安装包APK进行反编译,并从反编译后的文件中提取得到多种关键特征实体;S2: Decompile the installation package APK of the Android application program, and extract various key feature entities from the decompiled file;
S3:根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,将异构图注意力网络转化为多个元结构,计算得到各个元结构的邻接矩阵;S3: Construct a heterogeneous graph attention network based on the relationship between Android applications and key feature entities, transform the heterogeneous graph attention network into multiple meta-structures, and calculate the adjacency matrix of each meta-structure;
S4:根据元结构的邻接矩阵获取已有节点的低维向量嵌入;S4: Obtain the low-dimensional vector embedding of existing nodes according to the adjacency matrix of the meta structure;
S5:利用已有节点的低维向量嵌入和标签训练逻辑回归模型,得到训练好的逻辑回归模型,以及获取待检测的安卓应用程序的节点嵌入;S5: Use the low-dimensional vector embedding and label of the existing nodes to train the logistic regression model, obtain the trained logistic regression model, and obtain the node embedding of the Android application to be detected;
S6:将待检测的安卓应用程序的节点嵌入输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。S6: Embed the nodes of the Android application to be detected into the trained logistic regression model for detection, and obtain a detection result indicating whether the Android application to be detected is malicious or benign.
在具体实施过程中,首先通过对APK反编译提取得到多种关键特征实体,根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,并将异构图注意力网络转化为多个元结构,然后由元结构的邻接矩阵获取已有节点的低维向量嵌入,利用低维向量嵌入和标签训练逻辑回归模型,最后获取待检测的安卓应用程序的节点嵌入并输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。In the specific implementation process, firstly, a variety of key feature entities are extracted by decompiling the APK, and a heterogeneous graph attention network is constructed according to the relationship between the Android application program and the key feature entities, and the heterogeneous graph attention network is transformed into Multiple meta-structures, and then obtain the low-dimensional vector embedding of existing nodes from the adjacency matrix of the meta-structure, use the low-dimensional vector embedding and label to train the logistic regression model, and finally obtain the node embedding of the Android application to be tested and input the trained The logistic regression model is used for detection, and the detection result of whether the Android application program to be detected is malicious or benign is obtained.
实施例2Example 2
一种基于异构图注意力网络的安卓恶意程序检测方法,包括以下步骤:A kind of Android malware detection method based on heterogeneous graph attention network, comprises the following steps:
S1:下载安卓应用程序APP并进行标签,得到安卓应用程序集合;其中,安卓应用程序包括良性安卓应用程序和恶意安卓应用程序,良性安卓应用程序的标签为0,恶意安卓应用程序的标签为1;良性应用程序从Google Play商店下载,恶意应用程序从virusshare.com下载;S1: Download the Android application APP and label it to obtain a set of Android applications; where the Android application includes a benign Android application and a malicious Android application, the label of a benign Android application is 0, and the label of a malicious Android application is 1 ; Benign apps are downloaded from Google Play Store, malicious apps are downloaded from virusshare.com;
S2:利用反编译工具apktool对安卓应用程序的安装包APK进行反编译,并从反编译后的文件中提取得到多种关键特征实体;S2: Use the decompilation tool apktool to decompile the installation package APK of the Android application, and extract various key feature entities from the decompiled file;
S3:根据安卓应用程序与关键特征实体之间的关系构建异构图注意力网络,将异构图注意力网络转化为多个元结构,计算得到各个元结构的邻接矩阵;S3: Construct a heterogeneous graph attention network based on the relationship between Android applications and key feature entities, transform the heterogeneous graph attention network into multiple meta-structures, and calculate the adjacency matrix of each meta-structure;
S4:根据元结构的邻接矩阵获取已有节点的低维向量嵌入;S4: Obtain the low-dimensional vector embedding of existing nodes according to the adjacency matrix of the meta structure;
S5:利用已有节点的低维向量嵌入和标签训练逻辑回归模型,得到训练好的逻辑回归模型,以及获取待检测的安卓应用程序的节点嵌入;S5: Use the low-dimensional vector embedding and label of the existing nodes to train the logistic regression model, obtain the trained logistic regression model, and obtain the node embedding of the Android application to be detected;
S6:将待检测的安卓应用程序的节点嵌入输入训练好的逻辑回归模型进行检测,得到待检测的安卓应用程序为恶意或良性的检测结果。S6: Embed the nodes of the Android application to be detected into the trained logistic regression model for detection, and obtain a detection result indicating whether the Android application to be detected is malicious or benign.
更具体的,所述关键特征实体包括API、权限、权限类型、类、接口和so文件。More specifically, the key feature entities include API, authority, authority type, class, interface and so file.
如图2-4所示,图中A表示API,应用程序编程接口;P表示权限,规定应用程序执行的操作;T表示权限类型;C表示类,将共同属性和行为抽象为相对复杂的数据类型;I表示接口,是一种抽象的数据结构用来定义一个规范;S表示so文件,.so文件是安卓的动态链接库;其中A、C、I来源于反编译后的smali文件,P、T来源于反编译后的AndroidManifest.xml文件,S来源于反编译后的lib文件。As shown in Figure 2-4, A in the figure represents API, application programming interface; P represents permissions, specifying the operations performed by applications; T represents permission types; C represents classes, which abstract common attributes and behaviors into relatively complex data Type; I means interface, which is an abstract data structure used to define a specification; S means so file, and the .so file is the dynamic link library of Android; where A, C, and I come from decompiled smali files, and P , T comes from the decompiled AndroidManifest.xml file, and S comes from the decompiled lib file.
更具体的,根据安卓应用程序与关键特征实体之间的关系形成图内关系矩阵Rlin,l∈[1,6];其中,R1in表示App与API之间的关系,R2in表示App与权限之间的关系,R3in表示App所属的权限类型,R4in表示App与类之间的关系,R5in表示App与接口之间的关系,R6in表示App与so文件之间的关系。More specifically, the relationship matrix Rlin in the graph is formed according to the relationship between Android applications and key feature entities, l∈[1,6]; where R1 in represents the relationship between App and API, and R2 in represents the relationship between App and The relationship between permissions, R3 in indicates the type of permission to which the App belongs, R4 in indicates the relationship between the App and the class, R5 in indicates the relationship between the App and the interface, and R6 in indicates the relationship between the App and the so file.
在具体实施过程中,对于R1in,用aij∈(0,1)表示Appi是否含有APIj,如果是,那么aij=1,否则,aij=0;对于R2in,用Pij∈(0,1)表示Appi是否含有权限j,如果是,那么Pij=1,否则,Pij=0;对于R3in,用Tij∈(0,1)表示权限i是否属于类型j,如果是,那么Tij=1,否则,Tij=0;对于R4in,用Cij∈(0,1)表示Appi是否含有类j,如果是,那么Cij=1,否则,Cij=0;对于R5in,用Iij∈(0,1)表示Appi是否含有接口j,如果是,那么Iij=1,否则,Iij=0;对于R6in,用Sij∈(0,1)表示Appi是否含有.so文件j,如果是,那么Sij=1,否则,Sij=0。In the specific implementation process, for R1 in , use a ij ∈ (0,1) to indicate whether App i contains API j , if yes, then a ij = 1, otherwise, a ij = 0; for R2 in , use P ij ∈(0,1) indicates whether App i contains authority j, if yes, then P ij =1, otherwise, P ij =0; for R3 in , use T ij ∈(0,1) to indicate whether authority i belongs to type j , if yes, then T ij =1, otherwise, T ij =0; for R4 in , use C ij ∈ (0,1) to indicate whether App i contains class j, if yes, then C ij =1, otherwise, C ij = 0; for R5 in , use I ij ∈ (0,1) to indicate whether App i contains interface j, if yes, then I ij = 1, otherwise, I ij = 0; for R6 in , use S ij ∈ ( 0,1) indicates whether App i contains .so file j, if yes, then S ij =1, otherwise, S ij =0.
更具体的,所述异构图注意力网络为图G=(V,E,A,R),其节点的类型包括APP、API、权限、权限类型、类、接口和so文件,边的类型包括R1in、R2in、R3in、R4in、R5in和R6in;其中,V表示节点的集合,E表示边的集合,A表示节点的类型集,R表示边的类型集,|A|+|R|>2。More specifically, the heterogeneous graph attention network is a graph G=(V, E, A, R), the types of its nodes include APP, API, authority, authority type, class, interface and so file, and the type of edge Including R1 in , R2 in , R3 in , R4 in , R5 in and R6 in ; among them, V represents the set of nodes, E represents the set of edges, A represents the type set of nodes, R represents the type set of edges, |A| +|R|>2.
更具体的,所述元结构为元路径或元图,所述元路径是在异构图注意力网络上定义的路径,源对象和目标对象位于路径的两端,若源对象和目标对象之间有多条元路径则构成元图。More specifically, the meta-structure is a meta-path or a meta-graph. The meta-path is a path defined on a heterogeneous graph attention network. The source object and the target object are located at both ends of the path. If the source object and the target object There are multiple meta-paths between them to form a meta-graph.
更具体的,由K个元结构的邻接矩阵组成邻接矩阵集合{ΨM1,...,ΨMk,...,ΨMK},元结构的邻接矩阵为元路径的邻接矩阵或元图的邻接矩阵,More specifically, the adjacency matrix set {Ψ M1 ,...,Ψ Mk ,...,Ψ MK } is composed of K adjacency matrices of meta-structure, and the adjacency matrix of meta-structure is the adjacency matrix of meta-path or meta-graph adjacency matrix,
其中,元路径的邻接矩阵计算公式为:Among them, the calculation formula of the adjacency matrix of the meta-path is:
ΨMP=RA1A2·...·RAiA(i+1)·...·RA(n-1)An Ψ MP =R A1A2 ·...·R AiA(i+1) ·...·R A(n-1)An
元图的邻接矩阵计算公式为:The formula for calculating the adjacency matrix of the meta graph is:
ΨMG=ΨMP1⊙...⊙ΨMPj⊙...⊙ΨMPm;Ψ MG = Ψ MP1 ⊙...⊙Ψ MPj ⊙...⊙Ψ MPm ;
其中,ΨMk表示第k个元结构的邻接矩阵,RAiA(i+1)表示第i个节点和第i+1个节点之间的关系矩阵,i=1,2,...,n,n表示元路径节点的数量,ΨMPj表示第j个ΨMP,⊙表示哈达玛积,m表示ΨMP的数量。Among them, Ψ Mk represents the adjacency matrix of the k-th element structure, R AiA(i+1) represents the relationship matrix between the i-th node and the i+1-th node, i=1,2,...,n , n represents the number of meta-path nodes, Ψ MPj represents the jth Ψ MP , ⊙ represents the Hadamard product, and m represents the number of Ψ MP .
更具体的,步骤S4包括以下步骤:More specifically, step S4 includes the following steps:
S41:以one-hot向量形式对每个节点进行编码,得到矩阵H,将H和给定元结构Mk的邻接矩阵结合起来,通过归一化操作获得元结构内部节点的邻接矩阵:S41: Encode each node in the form of a one-hot vector to obtain a matrix H, combine H with the adjacency matrix of a given metastructure Mk, and obtain the adjacency matrix of the internal nodes of the metastructure through a normalization operation:
ΨMk’=Normalize(H·HT⊙ΨMk)Ψ Mk' =Normalize(H·H T ⊙Ψ Mk )
并引入边缘权重感知的GAT模型更新元结构Mk内部节点嵌入ΦMk=GAT(H;ΨMk’);And introduce the GAT model of edge weight perception to update the meta-structure Mk internal node embedding Φ Mk = GAT(H; Ψ Mk' );
S42:利用多层感知器学习融合中每个元结构Mk的权重βMk,S42: Using a multi-layer perceptron to learn the weight β Mk of each meta-structure Mk in the fusion,
(βM1,...,βMk,...,βMK)=softmax(NN(ΦM1),...,NN(ΦMk),...,NN(ΦMK))(β M1 ,...,β Mk ,...,β MK )=softmax(NN(Φ M1 ),...,NN(Φ Mk ),...,NN(Φ MK ))
其中,NN是将给定矩阵映射为数值的原生神经网络,Among them, NN is a native neural network that maps a given matrix to a numerical value,
从而获得已有节点的低维向量嵌入:To obtain the low-dimensional vector embedding of existing nodes:
更具体的,在步骤S5中,通过以下步骤获取待检测的安卓应用程序的节点嵌入:More specifically, in step S5, the node embedding of the Android application to be detected is obtained through the following steps:
S51:根据待检测的安卓应用程序与关键特征实体之间的关系形成图外关系矩阵Rlout,l∈[1,6];S51: Form an out-of-graph relationship matrix Rl out according to the relationship between the Android application to be detected and the key feature entity, l∈[1,6];
S52:形成节点邻接矩阵的增量段形式为j行列矩阵,j表示图内节点的个数,矩阵的第j行数值代表新节点与图内节点vj之间元结构的数量;S52: Form an incremental segment of the node adjacency matrix The form is a matrix of j rows and columns, j represents the number of nodes in the graph, and the value of the jth row of the matrix Represents the number of metastructures between the new node and the node v j in the graph;
S53:使用top-k算法对进行排序,选出数值较大的前t个图内节点作为图内邻居节点vs,s=1,2,...,t,聚合新节点与图内邻居节点的向量,得到待检测的安卓应用程序的节点嵌入:S53: Use the top-k algorithm pair Sort, select the first t nodes in the graph with larger values as the neighbor nodes v s in the graph, s=1,2,...,t, aggregate the vectors of the new node and the neighbor nodes in the graph, and obtain the to-be-detected Node Embedding for Android Apps:
其中,表示vs在元路径Mk上的权重,表示新节点与图内邻居节点vs之间元结构的数量。in, Indicates the weight of v s on the meta-path Mk, Indicates the number of metastructures between the new node and its neighbor nodes vs s in the graph.
更具体的,逻辑回归模型输出的预测值为:More specifically, the predicted value output by the logistic regression model is:
其中,b表示偏移参数,w表示权重,表示待检测的安卓应用程序的节点嵌入;Among them, b represents the offset parameter, w represents the weight, a node embedding representing the Android application to be detected;
当逻辑回归模型输出的预测值a大于0.5,则得到检测结果为恶意,否则,得到检测结果为良性。When the predicted value a output by the logistic regression model is greater than 0.5, the detection result is malicious; otherwise, the detection result is benign.
实施例3Example 3
一种基于异构图注意力网络的安卓恶意程序检测装置,用于实现所述的一种基于异构图注意力网络的安卓恶意程序检测方法,包括:An Android malware detection device based on a heterogeneous graph attention network, used to implement the described Android malware detection method based on a heterogeneous graph attention network, comprising:
特征工程模块,用于对APP进行标签,并将APK反编译,提取关键特征实体;The feature engineering module is used to label the APP, decompile the APK, and extract key feature entities;
图构建模块,用于根据安卓应用程序与关键特征实体之间的关系以点和边的形式构建异构图注意力网络;还用于将异构图注意力网络转化为多个元结构,并计算各个元结构的邻接矩阵;Graph building blocks for constructing heterogeneous graph attention networks in the form of points and edges based on relationships between Android apps and key feature entities; also for transforming heterogeneous graph attention networks into multiple metastructures, and Calculate the adjacency matrix of each meta-structure;
节点聚合模块,用于获取安卓应用程序的节点嵌入,以及根据元结构的邻接矩阵获取节点的低维向量嵌入;The node aggregation module is used to obtain the node embedding of the Android application, and obtain the low-dimensional vector embedding of the node according to the adjacency matrix of the meta structure;
检测模块,用于通过节点的低维向量嵌入和标签学习分类,以及根据待检测的安卓应用程序的节点嵌入进行检测,输出待检测的安卓应用程序为恶意或良性的检测结果。The detection module is used to learn and classify through the low-dimensional vector embedding and label of the node, and detect according to the node embedding of the Android application to be detected, and output the detection result that the Android application to be detected is malicious or benign.
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210983464.3A CN115391778A (en) | 2022-08-16 | 2022-08-16 | Android malware detection method and device based on heterogeneous graph attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210983464.3A CN115391778A (en) | 2022-08-16 | 2022-08-16 | Android malware detection method and device based on heterogeneous graph attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115391778A true CN115391778A (en) | 2022-11-25 |
Family
ID=84121404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210983464.3A Pending CN115391778A (en) | 2022-08-16 | 2022-08-16 | Android malware detection method and device based on heterogeneous graph attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115391778A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116074092A (en) * | 2023-02-07 | 2023-05-05 | 电子科技大学 | A Heterogeneous Graph Attention Network Based Attack Scene Reconstruction System |
CN117708821A (en) * | 2024-02-06 | 2024-03-15 | 山东省计算中心(国家超级计算济南中心) | Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding |
-
2022
- 2022-08-16 CN CN202210983464.3A patent/CN115391778A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116074092A (en) * | 2023-02-07 | 2023-05-05 | 电子科技大学 | A Heterogeneous Graph Attention Network Based Attack Scene Reconstruction System |
CN116074092B (en) * | 2023-02-07 | 2024-02-20 | 电子科技大学 | Attack scene reconstruction system based on heterogram attention network |
CN117708821A (en) * | 2024-02-06 | 2024-03-15 | 山东省计算中心(国家超级计算济南中心) | Method, system, equipment and medium for detecting Lesu software based on heterogeneous graph embedding |
CN117708821B (en) * | 2024-02-06 | 2024-04-30 | 山东省计算中心(国家超级计算济南中心) | Ransomware detection method, system, device and medium based on heterogeneous graph embedding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Capuano et al. | Explainable artificial intelligence in cybersecurity: A survey | |
Mao et al. | Phishing page detection via learning classifiers from page layout feature | |
Zhang et al. | A deep learning method to detect web attacks using a specially designed CNN | |
Wang et al. | Detection of malicious web pages based on hybrid analysis | |
Agarwal et al. | Detecting malicious accounts in permissionless blockchains using temporal graph properties | |
Chen et al. | Improved crack detection and recognition based on convolutional neural network | |
Wang et al. | Tsgn: Transaction subgraph networks for identifying ethereum phishing accounts | |
Zhao et al. | Maldeep: a deep learning classification framework against malware variants based on texture visualization | |
Zhao et al. | Automatically predicting cyber attack preference with attributed heterogeneous attention networks and transductive learning | |
CN115391778A (en) | Android malware detection method and device based on heterogeneous graph attention network | |
KR102259760B1 (en) | System for providing whitelist based abnormal process analysis service | |
CN113901465A (en) | Heterogeneous network-based Android malicious software detection method | |
CN117235532B (en) | A training and detection method for a malicious website detection model based on M-Bert | |
Lu et al. | Intrusion detection system based on evolving rules for wireless sensor networks | |
Xu et al. | I2DS: interpretable intrusion detection system using autoencoder and additive tree | |
Song et al. | HGVul: A Code Vulnerability Detection Method Based on Heterogeneous Source‐Level Intermediate Representation | |
Mishra et al. | Hybrid deep learning algorithm for smart cities security enhancement through blockchain and internet of things | |
Geng et al. | Novel blockchain transaction provenance model with graph attention mechanism | |
Shanmugam et al. | Electro search optimization based long short‐term memory network for mobile malware detection | |
CN109344614A (en) | An online detection method for Android malicious applications | |
Zhang et al. | Automatic detection of Android malware via hybrid graph neural network | |
CN112487421B (en) | Android malicious application detection method and system based on heterogeneous network | |
Ji et al. | Prohibited item detection on heterogeneous risk graphs | |
Le-Nguyen et al. | Phishing website detection as a website comparing problem | |
Alsaedi et al. | Multi-modal features representation-based convolutional neural network model for malicious website detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |