WO2022077755A1 - 一种加密api使用分析方法及系统 - Google Patents

一种加密api使用分析方法及系统 Download PDF

Info

Publication number
WO2022077755A1
WO2022077755A1 PCT/CN2020/136140 CN2020136140W WO2022077755A1 WO 2022077755 A1 WO2022077755 A1 WO 2022077755A1 CN 2020136140 W CN2020136140 W CN 2020136140W WO 2022077755 A1 WO2022077755 A1 WO 2022077755A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
api
encrypted
hidden markov
encrypted api
Prior art date
Application number
PCT/CN2020/136140
Other languages
English (en)
French (fr)
Inventor
许智武
蔡树彬
明仲
胡雄亚
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Publication of WO2022077755A1 publication Critical patent/WO2022077755A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code

Definitions

  • the invention relates to the technical field of encrypted APIs, in particular to a method and system for analyzing the use of encrypted APIs.
  • Predictive analysis of API call sequences in the prior art is based on manual analysis of a certain number of API calls, and infers API call conventions based on frequently occurring API calls, but it relies heavily on the quality of the data set. Or it is implemented based on the N-gram model, but the N-gram model in the prior art has many constraints and strict requirements on the API, so the predictive analysis effect on the API calling sequence is not good.
  • the CRYSL cipher language is used to constrain the API calling sequence, which is basically limited by the field and difficult to maintain. Therefore, the main encryption API call protocols on the market now rely on manual definitions, which are difficult to maintain, and have a high error rate.
  • the technical problem to be solved by the present invention is that, aiming at the above-mentioned defects of the prior art, a method and system for analyzing the use of an encrypted API are provided, aiming at solving the calling protocol of the encrypted API in the prior art, which basically relies on manual definitions and is difficult to maintain. , and the use of the error rate is high, the lack of correct use of encrypted API data sets.
  • the present invention provides an encryption API usage analysis method, wherein the method includes:
  • the obtaining APK data set, and obtaining the encrypted API calling sequence data set according to the APK data set including:
  • the Dalvik instruction is classified to obtain classification information of the Dalvik instruction, and the encrypted API call sequence data set is constructed.
  • the acquiring the APK data set and preprocessing the APK data set include:
  • the classification of the Dalvik instruction is performed to obtain classification information of the Dalvik instruction, including:
  • the Dalvik instruction is classified according to the read/write type, the number of operands, and the number of operand constants of the Dalvik instruction.
  • the misuse detection and use recommendation operations are performed on the encrypted API, including:
  • the misuse detection and use recommendation operations are performed on the encrypted API, including:
  • the encryption API at the preset position is recommended successfully.
  • the setting method of the preset threshold includes:
  • the scores are sorted in descending order, and the scores of the positions close to 80% are taken as the preset threshold.
  • the present invention provides an encryption API usage analysis system, wherein the system includes:
  • a data acquisition module for acquiring an APK data set, and obtaining an encrypted API calling sequence data set according to the APK data set, and the encrypted API calling sequence data set is constructed based on the classification information of the Dalvik instruction;
  • a model training module for using the encrypted API to call the training set and the verification set in the sequence data set to perform hidden Markov model training and N-gram model training to obtain the trained hidden Markov model and N-gram model;
  • the present invention provides an intelligent terminal comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors.
  • the one or more programs described above comprise methods for performing any of the above-described aspects.
  • the present invention provides a non-transitory computer-readable storage medium, wherein, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device is enabled to execute any one of the above solutions.
  • the present invention provides a method and system for analyzing the usage of encrypted API.
  • the method includes: acquiring an APK data set, and obtaining an encrypted API calling sequence data set according to the APK data set, so that the The encrypted API call sequence data set is constructed based on the classification information of the Dalvik instruction; the training set and the verification set in the encrypted API call sequence data set are used to perform hidden Markov model training and N-gram model training to obtain the hidden Markov model after training. Markov and N-gram models; misuse detection and usage recommendations for cryptographic APIs based on trained Hidden Markov and N-gram models.
  • symbol analysis technology is introduced when constructing the encrypted API calling sequence data set, so as to construct a data set that uses the encrypted API correctly, and the hidden Markov model and N-gram model trained by the present invention can be used in the encryption process.
  • the efficiency of API usage analysis is significantly improved, and the effect is better.
  • FIG. 1 is a flowchart of a specific implementation of an encryption API usage analysis method provided by an embodiment of the present invention.
  • FIG. 2 is a specific implementation flowchart of the encryption API usage analysis method provided by the embodiment of the present invention.
  • FIG. 3 is an analysis diagram of an encrypted API usage analysis method used in an encrypted API recommendation result provided by an embodiment of the present invention.
  • FIG. 4 is a schematic block diagram of an encryption API usage analysis system provided by an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of an internal structure of an intelligent terminal provided by an embodiment of the present invention.
  • Predictive analysis of API call sequences in the prior art is based on manual analysis of a certain number of API calls, and infers API call conventions based on frequently occurring API calls, but it relies heavily on the quality of the data set. Or it is implemented based on the N-gram model, but the N-gram model in the prior art has many constraints and strict requirements on the API, so the predictive analysis effect on the API calling sequence is not good.
  • the CRYSL cipher language is used to constrain the API calling sequence, which is basically limited by the field and difficult to maintain. Therefore, the main encryption API call protocols on the market now rely on manual definitions, which are difficult to maintain, and have a high error rate.
  • this embodiment provides a method for analyzing the usage of encrypted API. Specifically, as shown in FIG. 1 and FIG. 2 , the method includes:
  • Step S100 obtaining an APK data set, and obtaining an encrypted API calling sequence data set according to the APK data set, and the encrypted API calling sequence data set is constructed based on the classification information of the Dalvik instruction;
  • Step S200 using the encryption API to call the training set and the verification set in the sequence data set to carry out hidden Markov model training and N-gram model training, to obtain the trained hidden Markov model and N-gram model;
  • Step S300 perform misuse detection and recommended operations on the encrypted API according to the trained hidden Markov model and the N-gram model.
  • symbol analysis technology is introduced when constructing the encrypted API calling sequence data set, so as to construct a data set that uses the encrypted API correctly, and the hidden Markov model and N-gram model trained by the present invention can be used in the encryption process.
  • the efficiency of API usage analysis is significantly improved, and the effect is better.
  • the API in this embodiment is a calling interface left by the operating system to the application program, and the application program makes the operating system execute the command of the application program by calling the API of the operating system.
  • the system architecture of the Android operating system adopts the idea of layered architecture. It is mainly divided into four layers: Linux kernel layer, Android operating layer, Android application framework layer and application layer. The division of labor between the various levels is clear. Android applications are written in the Java language and run Dalvik bytecodes by interpreting DEX executables through the Dalvik virtual machine. Dalvik bytecode is converted from Java bytecode, Dalvik bytecode is difficult to understand.
  • the acquired APK (application installation package) data set (ie, the APK set in FIG. 2 ) is first obtained, and the APK data set is preprocessed.
  • an Android application is essentially an Android application package ending with an ".apk” suffix, it consists of compiled dex executable files, resourses assets, certificates, and XML manifests. Therefore, in this embodiment, the ApkTool tool (eg, apktook1, dex2jar2) can be used to decompile the Android application. After the decompilation is successful, a series of directories and files will be generated.
  • the subdirectories and files of these directories are consistent with the source code directory organization structure during development, so that the Dalvik bytecode is compiled into a smali file with high readability.
  • the smali file in this embodiment has a long code length and many instructions, but the format of the smali file is relatively fixed and follows corresponding grammar rules.
  • Control Flow Graph CFG
  • Data Flow Graph DFG
  • a control flow chart is an abstract representation of a program, which represents all the paths traversed during the execution of a program, and can reflect the execution process of a program. The essence of the control flow chart is a directed graph, the nodes represent the statements of the program, and the edges represent the execution path of the program.
  • a data flow diagram can reflect the flow, processing and storage of data during program operation. Data flow analysis is based on control flow.
  • the execution path of each application program in the preprocessed APK data set can be extracted in this embodiment, and a Dalvik instruction is constructed; then The Dalvik instruction is classified to obtain classification information of the Dalvik instruction, and the encrypted API call sequence data set is constructed.
  • the control flow chart is an abstract representation of a program, it represents all the paths traversed during the execution of a program, and can reflect the execution process of a program. Therefore, in the process of constructing the control flow chart, this embodiment adopts the construction method based on the Dalvik instruction, each Dalvik instruction is a node, and specifically extracts the execution path of each application program in the preprocessed APK data set, and Build Dalvik instructions.
  • this embodiment also uses the open source tool FlowDroid to generate the function call graph corresponding to the Android application. Since an Android application has multiple possible program entries, FlowDroid also builds a main method to describe the possible program entries.
  • the data set construction method in the prior art is not suitable for the analysis of the Android encryption API call sequence. Select. Therefore, in this embodiment, when the component encryption API calls the sequence data set, the Dalvik instruction is classified according to the read/write type, the number of operands, and the number of operand constants of the Dalvik instruction to obtain classification information, and then based on the Dalvik instruction The classification information constructs the encrypted API call sequence data set. In this embodiment, the classification is performed according to the Dalvik instruction read/write type, the number of operands, and the number of operand constants, which are divided into 14 categories in total. " and "inst_invoke" directive structures are relatively special. These three types of instructions need to define additional operations when reading register values and updating register values and instructions.
  • the symbol analysis technology is used when constructing the encrypted API calling sequence data set according to the classification information, and the The encrypted API call sequence data set constructed based on the symbolic analysis technology can better analyze the usage, and because different types of encrypted APIs have different execution paths during execution, the encrypted APIs can be called more comprehensively.
  • the encrypted API call sequence data set in this embodiment includes a training set, a verification set, and a test set.
  • the training set, verification set, and test set may be 11856, 3957, and 3953 respectively. an android application.
  • the training set and the verification set are used for model training.
  • the training set and the verification set in the encrypted API call sequence data set can be used to perform hidden Markov model training and N-gram model training, and obtain the hidden Markov model after training. Markov models and N-gram models.
  • the Hidden Markov Model, N-gram Model and RNN Model are used to automatically learn encryption API usage specifications.
  • Hidden Markov model (HMM) is a statistical model, which has a wide range of applications in speech recognition, natural language processing, biological information and other fields.
  • the N-gram model is a statistical language-based model that is widely used in the field of natural language processing.
  • the hidden Markov model and the N-gram model can be used to detect whether the encrypted API calling sequence is misused.
  • the trained hidden Markov model and N-gram model are used to calculate the scores of all encrypted API call sequences in the test set in the encrypted API call sequence data set; when the scores of the encrypted API call sequences are If the score is less than the preset threshold, it is determined that the encrypted API calling sequence is misused; when the score of the encrypted API calling sequence is greater than the preset threshold, it is determined that the encrypted API calling sequence is not misused.
  • the preset threshold is set as follows: all encrypted API call sequences in the training set are calculated and their scores are calculated; then the scores are arranged in descending order, and the score at the position close to 80% of the ranking is taken as the Preset threshold.
  • this embodiment also selects the hidden Markov model and N-gram model trained by using the unsigned API, and the hidden Markov model and N-gram model trained by using the encrypted API calling sequence in the symbolic mode in this embodiment.
  • N-gram model to analyze the usage of cryptographic APIs in the test set, as shown in Table 2.
  • SYM_HMM is the optimal hidden Markov model trained based on symbolic analysis (the number of hidden states is 8);
  • SYM-NGRAM is the N-gram model trained based on symbolic analysis (N is 5) .
  • NO_HMM is a hidden Markov model trained based on unsigned analysis (the number of hidden states is 9);
  • NO_NGRAM is an N-gram model trained based on unsigned analysis (N is 4).
  • BASE is the baseline, and the detection accuracy of positive samples and negative samples is 50%. As shown in Table 2, in order to more intuitively show the effect of the misuse detection of encrypted API, the positive and negative samples are replaced, and each evaluation index is recalculated, which is represented by "T” in the table.
  • the experimental results show that the classification effects of the hidden Markov model trained based on the symbolic analysis method and the N-gram model in this embodiment are slightly different.
  • the accuracy rates of the hidden Markov model and the N-gram model are 70.38% and 71.23%, respectively, and the precision rates are 59.93% and 61.60%, respectively, which means that the two models differ by 1% in these two evaluation indicators. about.
  • the recall rates of these two models are 76.28% and 72.83%, respectively.
  • the analysis effect of the hidden Markov model trained based on the symbolic analysis method is slightly stronger than that of the N-gram model.
  • the classification results of the traditional hidden Markov model and N-gram model trained based on unsigned analysis are shown in Table 2.
  • the accuracy rates of these two models are 57.23% and 57.67%, respectively, and the precision rates are 47.68%. and 47.25%, the recall rates are 81.10% and 71.19%, and the F1 values are 60.05% and 57.14%, respectively.
  • These two models are higher than the benchmark model in these four evaluation indicators, indicating that they have certain ability to classify encrypted API call sequences.
  • the symbolic analysis method improves the model's ability to analyze the encrypted API call sequence, because the symbolic analysis makes the encrypted API call sequence contain more parameter information, and the trained model can learn a more comprehensive encryption API usage protocol , is no longer limited to the order specification of encryption API calls based on unsigned analysis.
  • this embodiment also uses the trained hidden Markov model and the N-gram model to perform a recommended operation on the encryption API. Specifically, this embodiment acquires the non-misused encryption API call sequences in the test set, and constructs the encryption API candidate set; if the original encryption API call sequence is located in a preset position in the candidate encryption API candidate set (such as in TOP-N position in the API candidate set), the encrypted API at the preset position is recommended successfully.
  • the encryption call sequence in the data set is more scattered, which reduces the accuracy of encryption API recommendation.
  • This article can recommend not only the encryption API, but also the parameters used by the encryption API.
  • This article recommends the encryption API without parameters by combining the use of encryption APIs with the same encryption API name and different parameters.
  • the recommendation results are shown in SYM-HMM0 and SYM-NGRAM0 in Figure 3, and the recommendation accuracy results are very close to the recommendation for encrypted API usage based on the unsigned analysis dataset.
  • this embodiment provides a method and system for analyzing encrypted API usage, the method includes: acquiring an APK data set, and obtaining an encrypted API calling sequence data set according to the APK data set, and the encrypted API calling sequence data set It is constructed based on the classification information of the Dalvik instruction; the training set and the verification set in the sequence data set are called using the encryption API to perform hidden Markov model training and N-gram model training, and the trained hidden Markov model and N-gram model are obtained.
  • gram model misuse detection and usage recommendations for encrypted APIs based on trained Hidden Markov models and N-gram models.
  • symbol analysis technology is introduced when constructing the encrypted API calling sequence data set, so as to construct a data set that uses the encrypted API correctly, and the hidden Markov model and N-gram model trained by the present invention can be used in the encryption process.
  • the efficiency of API usage analysis is significantly improved, and the effect is better.
  • an embodiment of the present invention provides an encrypted API usage analysis system.
  • the system includes: a data acquisition module 10 , a model training module 20 , and a usage analysis module 30 .
  • the data acquisition module 10 is configured to acquire an APK data set, and obtain an encrypted API call sequence data set according to the APK data set, and the encrypted API call sequence data set is constructed based on the classification information of the Dalvik instruction.
  • the model training module 20 is used to use the encryption API to call the training set and the verification set in the sequence data set to perform hidden Markov model training and N-gram model training, and obtain the trained hidden Markov model and N-gram model. gram model.
  • the usage analysis module 30 is configured to perform misuse detection and usage recommendation operations on the encrypted API according to the trained hidden Markov model and the N-gram model.
  • the present invention also provides an intelligent terminal, the principle block diagram of which may be shown in FIG. 5 .
  • the intelligent terminal includes a processor, a memory, a network interface, a display screen, and a temperature sensor connected through a system bus.
  • the processor of the intelligent terminal is used to provide computing and control capabilities.
  • the memory of the intelligent terminal includes a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the intelligent terminal is used for communicating with external terminals through network connection.
  • the computer program when executed by a processor, implements a cryptographic API usage analysis method.
  • the display screen of the smart terminal may be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the smart terminal is pre-set inside the smart terminal to detect the operating temperature of the internal equipment.
  • FIG. 5 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the intelligent terminal to which the solution of the present invention is applied. More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • an intelligent terminal includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors
  • One or more programs contain instructions to:
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • the present invention discloses a method and system for analyzing the usage of encrypted API, the method includes: acquiring an APK data set, and obtaining an encrypted API calling sequence data set according to the APK data set, and the encrypted API calling sequence data set It is constructed based on the classification information of the Dalvik instruction; the training set and the verification set in the sequence data set are called using the encryption API to perform hidden Markov model training and N-gram model training, and the trained hidden Markov model and N-gram model are obtained.
  • gram model misuse detection and usage recommendations for encrypted APIs based on trained Hidden Markov models and N-gram models.
  • the symbol analysis technology is introduced when constructing the encrypted API calling sequence data set, and the efficiency and effect of using and analyzing the encrypted API are obviously improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Telephonic Communication Services (AREA)
  • Storage Device Security (AREA)

Abstract

一种加密API使用分析方法及系统,所述方法包括:获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的(S100);使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型(S200);根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作(S300)。所述方法的加密API调用序列数据集构建时引入符号分析技术,并且在对加密API使用分析上效率明显提高,且效果更好。

Description

一种加密API使用分析方法及系统 技术领域
本发明涉及加密API技术领域,尤其涉及一种加密API使用分析方法及系统。
背景技术
现有技术中对于API调用序列的预测分析,采用的是基于人工分析一定数量的API调用,并基于频繁出现的API调用推断出API调用约规,但是其严重依赖数据集的质量。或者是基于N-gram模型来实现的,但是现有技术中的N-gram模型约束条件较多,且对于API的要求较为严格,因此在对于API调用序列的预测分析效果并不好。而现有技术中采用CRYSL密码语言来将API调用序列进行约束,基本都受到领域的限制,且难以进行维护。因此,现在市场上主要的加密API调用规约,基本都是依赖人工定义,难以维护,且使用错误率高,缺乏正确使用加密API的数据集。
因此,现有技术还有待改进和提高。
发明内容
本发明要解决的技术问题在于,针对现有技术的上述缺陷,提供一种加密API使用分析方法及系统,旨在解决现有技术中的加密API调用规约,基本都是依赖人工定义,难以维护,且使用错误率高,缺乏正确使用加密API的数据集的问题。
为了解决上述技术问题,本发明所采用的技术方案如下:
第一方面,本发明提供一种加密API使用分析方法,其中,所述方法包括:
获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
在一种实现方式中,所述获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,包括:
获取所述APK数据集,并对所述APK数据集进行预处理;
提取预处理后的所述APK数据集中每一个应用程序的执行路径,并构建Dalvik指令;
对所述Dalvik指令进行分类,得到所述Dalvik指令的分类信息,并构建所述加密API调用序列数据集。
在一种实现方式中,所述获取所述APK数据集,并对所述APK数据集进行预处理,包括:
使用APKTool工具对所述APK数据集中的所有应用程序进反编译操作;
当所述反编译操作成功后,得到一列的目录与文件,所述目录的子目录和文件与开发时的源码目录的组织结构一致。
在一种实现方式中,所述对所述Dalvik指令进行分类,得到所述Dalvik指令的分类信息,包括:
根据所述Dalvik指令的读写类型、操作数个数、操作数常量个数对所述Dalvik指令进行分类。
在一种实现方式中,所述根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作,包括:
利用所述训练后的隐马尔可夫模型以及N-gram模型计算所述加密API调用序列数据集中的测试集中的所有加密API调用序列的得分;
当加密API调用序列的得分小于预设阈值,则判定所述加密API调用序列被误用;
当加密API调用序列的得分大于预设阈值,则判定所述加密API调用序列未被误用。
在一种实现方式中,所述根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作,包括:
获取所述测试集中的未被误用的加密API调用序列,并构建加密API候选集;
若原始加密API调用序列位于候选加密API候选集中的预设位置时,则在所述预设位置处的加密API被推荐成功。
在一种实现方式中,所述预设阈值的设置方式,包括:
将所训练集中所有加密API调用序列并计算其得分;
将得分按降序排列,并取排序接近80%的位置的得分作为所述预设阈值。
第二方面,本发明提供一种加密API使用分析系统,其中,所述系统包括:
数据获取模块,用于获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
模型训练模块,用于使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
使用分析模块,用于根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
第三方面,本发明提供一种智能终端,其中,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于执行上述方案任意一项所述的方法。
第四方面,本发明提供一种非临时性计算机可读存储介质,其中,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述方案中任意一项所述的方法。
有益效果:与现有技术相比,本发明提供了一种加密API使用分析方法及系统,所述方法包括:获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。本发明中的加密API调用 序列数据集构建时引入符号分析技术,从而构建出正确使用加密API的数据集,且使用本发明训练得到的隐马尔可夫模型以及N-gram模型,可以在对加密API使用分析上效率明显提高,且效果更好。
附图说明
图1为本发明实施例提供的加密API使用分析方法的具体实施方式的流程图。
图2为本发明实施例提供的加密API使用分析方法的具体实现流程图。
图3为本发明实施例提供的加密API使用分析方法用于加密API推荐结果分析图。
图4是本发明实施例提供的加密API使用分析系统的原理框图。
图5是本发明实施例提供的智能终端的内部结构原理框图。
具体实施方式
为使本发明的目的、技术方案及效果更加清楚、明确,以下参照附图并举实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。
现有技术中对于API调用序列的预测分析,采用的是基于人工分析一定数量的API调用,并基于频繁出现的API调用推断出API调用约规,但是其严重依赖数据集的质量。或者是基于N-gram模型来实现的,但是现有技术中的N-gram模型约束条件较多,且对于API的要求较为严格,因此在对于API调用序列的预测分析效果并不好。而现有技术中采用CRYSL密码语言来将API调用序列进行约束,基本都受到领域的限制,且难以进行维护。因此,现在市场上主要的加密API调用规约,基本都是依赖人工定义,难以维护,且使用错误率高,缺乏正确使用加密API的数据集。
为了解决现有技术的问题,本实施例提供一种加密API使用分析方法,具体地,如图1和图2中所示,所述方法包括:
步骤S100、获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
步骤S200、使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
步骤S300、根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
本发明中的加密API调用序列数据集构建时引入符号分析技术,从而构建出正确使用加密API的数据集,且使用本发明训练得到的隐马尔可夫模型以及N-gram模型,可以在对加密API使用分析上效率明显提高,且效果更好。
具体地,本实施例中的API就是操作系统留给应用程序的一个调用接口,应用程序通过调用操作系统的API而使操作系统去执行应用程序的命令。安卓操作系统的系统架构采用了分层架构的思想。主要分为四层:Linux内核层、安卓系统运行层、安卓应用框架层和应用程序层。各层次间分工明确。而安卓应用程序是由Java语言编写,并通过Dalvik虚拟机解释DEX可执行文件运行Dalvik字节码。Dalvik字节码由Java字节码转换而来,Dalvik字节码难以理解。
因此本实施例在构建加密API调用序列数据集时,首先获取到到获取APK(应用程序的安装包)数据集(即图2中的APK集),并对所述APK数据集进行预处理。具体地,在进行预处理时,由于安卓应用程序本质上是一个以“.apk”后缀结尾的安卓应用程序包,由被编译的dex可执行文件、resourses assets、certificates和XML manifest等文件构成。因此,本实施例可利用ApkTool工具(如apktook1,dex2jar2)对安卓应用程序进行反编译操作。反编译成功后,会生成一系列的目录与文件,这些目录的子目录和文件与开发时的源码目录组织结构一致,由此将Dalvik字节码编译成可读性高的smali文件。本实施例中的Smali文件代码篇幅较长,指令繁多,但是samli文件的格式相对比较固定,并且遵循相应的语法规则。
语法是语言的形式,指将符号组合形成格式合法的句子(或程序)的规则集。语法定义了语言各要素之间的形式关系,从而构成各种合法语句的结构描述。语法只描述语言的形式和结构,不考虑其含义。语义关注合法语句的含义,对编程语言而言,语义描述计算机执行程序而发生的行为。控制流程图(Control Flow Graph,CFG)与数据流程图 (Data Flow Graph,DFG)是程序语义属性的典型代表。控制流程图是一个程序的抽象表现,代表了一个程序执行过程中会遍历到的所有路径,能反映一个程序的执行过程。控制流程图的实质是一个有向图,节点表示程序的语句,边表示程序的执行路径。数据流程图能反映程序运行过程中数据的流向、处理和存储情况。数据流分析是基于控制流的。
如图2中所示,当本实施例对所述APK数据集进行预处理后,本实施例可提取预处理后的所述APK数据集中每一个应用程序的执行路径,并构建Dalvik指令;然后对所述Dalvik指令进行分类,得到所述Dalvik指令的分类信息,并构建所述加密API调用序列数据集。具体地,由于控制流程图是一个程序的抽象表现,代表了一个程序执行过程中会遍历到的所有路径,能反映一个程序的执行过程。因此,在构建控制流程图的过程中,本实施例采用基于Dalvik指令的构建方式,每一个Dalvik指令为一个节点,具体提取预处理后的所述APK数据集中每一个应用程序的执行路径,并构建Dalvik指令。
由于安卓应用程序在开发时,可能包含废弃代码、以及大量在程序运行过程中不会被执行的代码,这部分代码在编写过程中并未考虑到代码的安全性、完整性、正确性。安卓应用程序可达方法的提取是为了排除这部分代码的影响,确保提取安卓加密API调用序列的正确性和完整性,同时可以降低提取安卓加密API调用序列的时间。在构建函数调用图的过程中,本实施例还使用开源工具FlowDroid生成对应安卓应用程序的函数调用图。由于安卓应用程序有多个可能的程序入口,FlowDroid同时还会构建一个main方法来描述可能的程序入口。
目前基于数据驱动的API使用分析仅获取API调用顺序信息,经常将API的参数信息忽略。因此现有技术中数据集构建方式不适用于安卓加密API调用序列分析,原因在于加密API使用分析不仅需要考虑加密API调用顺序的分析,还要考虑加密API参数使用情况的分析,比如加密算法的选取。为此,本实施例在构件加密API调用序列数据集时,根据所述Dalvik指令的读写类型、操作数个数、操作数常量个数对所述Dalvik指令进行分类,得到分类信息,然后基于所述分类信息构建所述加密API调用序列数据集。在本实施例中,所述根据Dalvik指令读写类型、操作数个数、操作数 常量个数进行分类,一共分为14类,具体分类情况如表1所示,其中“inst_op”、“inst_array”和“inst_invoke”指令结构相对比较特殊。这三类指令在读取寄存器值以及更新寄存器值与指令时需要定义额外的操作。
表1 Dalvik指令分类
Figure PCTCN2020136140-appb-000001
Figure PCTCN2020136140-appb-000002
本实施例中通过对Dalvik指令进行分类,由于得到的分类信息的类别是由不同的字符符号区分,因此根据所述分类信息构建所述加密API调用序列数据集时,采用了符号分析技术,而基于符号分析技术所构建的加密API调用序列数据集可以更好地进行使用分析,并且由于不同类别的加密API在执行时为不同的执行路径,因此更为方面地对加密API进行调用。
在一种实现方式中,本实施例中的加密API调用序列数据集包括训练集、验证集和测试集,具体地,所述训练集、验证集以及测试集,可以分别有11856、3957、3953个 安卓应用程序。所述训练集和验证集用于模型训练,本实施例可使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型。所述隐马尔可夫模型、N-gram模型和RNN模型来自动化学习加密API使用规约。隐马尔可夫模型(hidden Markov mode,HMM)是一个统计学模型,它在语音识别、自然语言处理、生物信息等领域有着广泛的应用。N-gram模型是一种基于统计语言的模型,它被广泛使用在自然语言处理领域。在本实施例中,当训练好所述隐马尔可夫模型和N-gram模型,可采用所述隐马尔可夫模型和N-gram模型来对加密API调用序列是否误用进行检测。具体地,本实施例利用训练好的隐马尔可夫模型和N-gram模型来对算所述加密API调用序列数据集中的测试集中的所有加密API调用序列的得分;当加密API调用序列的得分小于预设阈值,则判定所述加密API调用序列被误用;当加密API调用序列的得分大于预设阈值,则判定所述加密API调用序列未被误用。在本实施例中,所述预设阈值的设置方式为:将所训练集中所有加密API调用序列并计算其得分;然后将得分按降序排列,并取排序接近80%的位置的得分作为所述预设阈值。
作为对比,本实施例还选用采用无符号方式的API训练成的隐马尔可夫模型和N-gram模型,与本实施例中采用符号方式的加密API调用序列训练得到的隐马尔可夫模型和N-gram模型,来对测试集中的加密API的使用情况进行分析,如表2中所示。
表2测试集中加密API使用分析
Figure PCTCN2020136140-appb-000003
Figure PCTCN2020136140-appb-000004
表中“SYM_HMM”为基于符号分析方式训练的最优隐马尔可夫模型(隐状态数为8);“SYM-NGRAM”为基于符号分析方式训练的N-gram模型(N取值为5)。“NO_HMM”为基于无符号分析方式训练的隐马尔可夫模型(隐状态数为9);“NO_NGRAM”为基于无符号分析方式训练的N-gram模型(N取值为4)。“BASE”为基准线,取正样本和负样本各50%的检测准确率。如表2所示,本文为了更直观展示加密API误用检测的效果,将正负样本进行替换,重新计算各项评价指标,在表中使用“T”表示。
实验结果显示,本实施例中基于符号分析方式训练的隐马尔可夫模型和N-gram模型分类效果相差细微。隐马尔可夫模型和N-gram模型的准确率分别为70.38%和71.23%,查准率为别为59.93%和61.60%,也就是说这两个模型在这两个评价指标中相差1%左右。再者,这两个模型的召回率分别为76.28%和72.83%。通过F1评价指标可知,本实施例中的,基于符号分析方式训练的隐马尔可夫模型分析效果略强于N-gram模型。
而传统的基于无符号分析方式训练的隐马尔可夫模型和N-gram模型分类效果如表2所示,这两个模型的准确率分别为57.23%和57.67%,查准率分别为47.68%和47.25%,召回率分别为81.10%和71.19%,F1值分别为60.05%和57.14%。这两个模型在这4个评价指标中均高于基准模型,说明具有一定的加密API调用序列分类能力。
总体来说,基于符号分析方式提高了模型对加密API调用序列的分析能力,原因在于符号分析使加密API调用序列包含更多参数信息,所训练得到的模型能够学习到更全面的加密API使用规约,不再只局限于基于无符号分析方式中的加密API调用顺序规约。
此外,本实施例还使用训练后的隐马尔可夫模型以及N-gram模型,对加密API进行使用推荐操作。具体地,本实施例获取所述测试集中的未被误用的加密API调用序列,并构建加密API候选集;若原始加密API调用序列位于候选加密API候选集中的预设位置(如位于候选加密API候选集中的TOP-N位置)时,则在所述预设位置处的加密API被推荐成功。
实验结果表明,本实施例提出的方法具有一定的加密API推荐能力。如图3所示,当K不大于10时,SYM_HMM推荐准确率迅速提高,TOP-7加密API推荐准确率达到80%。SYM_NGRAM加密API推荐表现优于隐马尔可夫模型,当K等于3时,加密API推荐准确率可达到90%。原因在于,SYM_HMM考虑参数,数据集中加密API调用序列分散度较高,使得隐马尔可夫模型隐状态作用丧失退化为2-gram模型。而基于无符号分析数据集的加密API使用推荐相比于基于符号分析加密API使用推荐,推荐准确率更好,原因同上考虑参数使数据集中加密调用序列更分散,降低加密API使用推荐准确率。本文不仅可以推荐加密API,还可以推荐加密API使用的参数。本文通过合并具有相同加密API名不同参数的加密API使用,进行了加密API不带参数的推荐。推荐结果如图3中SYM-HMM0和SYM-NGRAM0所示,其推荐准确率结果与基于无符号分析数据集的加密API使用推荐非常接近。
可见,本实施例提供了一种加密API使用分析方法及系统,所述方法包括:获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。本发明中的加密API调用序列数据集构建时引入符号分析技术,从而构建出正确使用加密API的数据集,且使用本发明训练得到的隐马尔可夫模型以及N-gram模型,可以在对加密API使用分析上效率明显提高,且效果更好。
如图4中所示,本发明实施例提供一种加密API使用分析系统,该装系统包括:数据获取模块10、模型训练模块20、使用分析模块30。具体地,所述数据获取模块10,用于获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的。所述模型训练模块20,用于使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型。所述使用分析模块30,用于根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
基于上述实施例,本发明还提供了一种智能终端,其原理框图可以如图5所示。该智能终端包括通过系统总线连接的处理器、存储器、网络接口、显示屏、温度传感器。其中,该智能终端的处理器用于提供计算和控制能力。该智能终端的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该智能终端的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种加密API使用分析方法。该智能终端的显示屏可以是液晶显示屏或者电子墨水显示屏,该智能终端的温度传感器是预先在智能终端内部设置,用于检测内部设备的运行温度。
本领域技术人员可以理解,图5中示出的原理框图,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的智能终端的限定,具体的智能终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种智能终端,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于进行以下操作的指令:
获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
综上,本发明公开了一种加密API使用分析方法及系统,所述方法包括:获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。本发明中的加密API调用序列数据集构建时引入符号分析技术,并且在对加密API使用分析上效率明显提高,且效果更好。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的 精神和范围。

Claims (10)

  1. 一种加密API使用分析方法,其特征在于,所述方法包括:
    获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
    使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
    根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
  2. 根据权利要求1所述的加密API使用分析方法,其特征在于,所述获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,包括:
    获取所述APK数据集,并对所述APK数据集进行预处理;
    提取预处理后的所述APK数据集中每一个应用程序的执行路径,并构建Dalvik指令;
    对所述Dalvik指令进行分类,得到所述Dalvik指令的分类信息,并构建所述加密API调用序列数据集。
  3. 根据权利要求1所述的加密API使用分析方法,其特征在于,所述获取所述APK数据集,并对所述APK数据集进行预处理,包括:
    使用APKTool工具对所述APK数据集中的所有应用程序进反编译操作;
    当所述反编译操作成功后,得到一列的目录与文件,所述目录的子目录和文件与开发时的源码目录的组织结构一致。
  4. 根据权利要求3所述的加密API使用分析方法,其特征在于,所述对所述Dalvik指令进行分类,得到所述Dalvik指令的分类信息,包括:
    根据所述Dalvik指令的读写类型、操作数个数、操作数常量个数对所述Dalvik指令进行分类。
  5. 根据权利要求1所述的加密API使用分析方法,其特征在于,所述根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作,包括:
    利用所述训练后的隐马尔可夫模型以及N-gram模型计算所述加密API调用序列数据集中的测试集中的所有加密API调用序列的得分;
    当加密API调用序列的得分小于预设阈值,则判定所述加密API调用序列被误用;
    当加密API调用序列的得分大于预设阈值,则判定所述加密API调用序列未被误用。
  6. 根据权利要求5所述的加密API使用分析方法,其特征在于,所述根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作,包括:
    获取所述测试集中的未被误用的加密API调用序列,并构建加密API候选集;
    若原始加密API调用序列位于候选加密API候选集中的预设位置时,则在所述预设位置处的加密API被推荐成功。
  7. 根据权利要求5所述的加密API使用分析方法,其特征在于,所述预设阈值的设置方式,包括:
    将所训练集中所有加密API调用序列并计算其得分;
    将得分按降序排列,并取排序接近80%的位置的得分作为所述预设阈值。
  8. 一种加密API使用分析系统,其特征在于,所述系统包括:
    数据获取模块,用于获取APK数据集,并根据所述APK数据集得到加密API调用序列数据集,所述加密API调用序列数据集基于Dalvik指令的分类信息构建的;
    模型训练模块,用于使用所述加密API调用序列数据集中的训练集以及验证集进行隐马尔可夫模型训练以及N-gram模型训练,得到训练后的隐马尔可夫模型和N-gram模型;
    使用分析模块,用于根据训练后的隐马尔可夫模型以及N-gram模型,对加密API进行误用检测以及使用推荐操作。
  9. 一种智能终端,其特征在于,包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行所述一个或者一个以上程序包含用于执行上述权利要求1-7任意一项所述的方法。
  10. 一种非临时性计算机可读存储介质,其特征在于,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行上述权利要求1-7任意一项所述的方法。
PCT/CN2020/136140 2020-10-16 2020-12-14 一种加密api使用分析方法及系统 WO2022077755A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011110320.4 2020-10-16
CN202011110320.4A CN112199095B (zh) 2020-10-16 2020-10-16 一种加密api使用分析方法及系统

Publications (1)

Publication Number Publication Date
WO2022077755A1 true WO2022077755A1 (zh) 2022-04-21

Family

ID=74010371

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136140 WO2022077755A1 (zh) 2020-10-16 2020-12-14 一种加密api使用分析方法及系统

Country Status (2)

Country Link
CN (1) CN112199095B (zh)
WO (1) WO2022077755A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106096405A (zh) * 2016-04-26 2016-11-09 浙江工业大学 一种基于Dalvik指令抽象的Android恶意代码检测方法
CN107153789A (zh) * 2017-04-24 2017-09-12 西安电子科技大学 利用随机森林分类器实时检测安卓恶意软件的方法
US20180191739A1 (en) * 2015-10-20 2018-07-05 Sophos Limited Mitigation of anti-sandbox malware techniques
CN108959924A (zh) * 2018-06-12 2018-12-07 浙江工业大学 一种基于词向量和深度神经网络的Android恶意代码检测方法
CN109753801A (zh) * 2019-01-29 2019-05-14 重庆邮电大学 基于系统调用的智能终端恶意软件动态检测方法
CN111523117A (zh) * 2020-04-10 2020-08-11 西安电子科技大学 一种安卓恶意软件检测和恶意代码定位系统及方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699010B2 (en) * 2017-10-13 2020-06-30 Ping Identity Corporation Methods and apparatus for analyzing sequences of application programming interface traffic to identify potential malicious actions
CN109492355B (zh) * 2018-11-07 2021-09-07 中国科学院信息工程研究所 一种基于深度学习的软件抗分析方法和系统
CN113112030B (zh) * 2019-04-28 2023-12-26 第四范式(北京)技术有限公司 训练模型的方法及系统和预测序列数据的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180191739A1 (en) * 2015-10-20 2018-07-05 Sophos Limited Mitigation of anti-sandbox malware techniques
CN106096405A (zh) * 2016-04-26 2016-11-09 浙江工业大学 一种基于Dalvik指令抽象的Android恶意代码检测方法
CN107153789A (zh) * 2017-04-24 2017-09-12 西安电子科技大学 利用随机森林分类器实时检测安卓恶意软件的方法
CN108959924A (zh) * 2018-06-12 2018-12-07 浙江工业大学 一种基于词向量和深度神经网络的Android恶意代码检测方法
CN109753801A (zh) * 2019-01-29 2019-05-14 重庆邮电大学 基于系统调用的智能终端恶意软件动态检测方法
CN111523117A (zh) * 2020-04-10 2020-08-11 西安电子科技大学 一种安卓恶意软件检测和恶意代码定位系统及方法

Also Published As

Publication number Publication date
CN112199095B (zh) 2022-04-26
CN112199095A (zh) 2021-01-08

Similar Documents

Publication Publication Date Title
US11379227B2 (en) Extraquery context-aided search intent detection
CN109766540B (zh) 通用文本信息提取方法、装置、计算机设备和存储介质
CN107908635B (zh) 建立文本分类模型以及文本分类的方法、装置
Cabrera Lozoya et al. Commit2vec: Learning distributed representations of code changes
US10839207B2 (en) Systems and methods for predictive analysis reporting
Ahasanuzzaman et al. CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues
Rau et al. Transferring tests across web applications
CN112511546A (zh) 基于日志分析的漏洞扫描方法、装置、设备和存储介质
CN112132238A (zh) 一种识别隐私数据的方法、装置、设备和可读介质
CN113778852B (zh) 一种基于正则表达式的代码分析方法
Kaur et al. A systematic literature review on the use of machine learning in code clone research
Wu et al. Fcdp: Fidelity calculation for description-to-permissions in android apps
Liu et al. Autoupdate: Automatically recommend code updates for android apps
Liu et al. On the reliability and explainability of language models for program generation
CN106650450A (zh) 基于代码指纹识别的恶意脚本启发式检测方法及系统
WO2022077755A1 (zh) 一种加密api使用分析方法及系统
Zhao et al. A fine-grained chinese software privacy policy dataset for sequence labeling and regulation compliant identification
Ebrahimi et al. Self-admitted technical debt in ethereum smart contracts: a large-scale exploratory study
Wang et al. An extensive study of the effects of different deep learning models on code vulnerability detection in Python code
Siavvas et al. A self-adaptive approach for assessing the criticality of security-related static analysis alerts
Tang et al. Neural SZZ algorithm
US12008341B2 (en) Systems and methods for generating natural language using language models trained on computer code
Alonso-García et al. Machine Learning Based System for the Control and Evaluation of Programming Vulnerabilities
US20240201983A1 (en) Software development artifact name generation
US20240069907A1 (en) Software development context history operations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20957529

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180723)

122 Ep: pct application non-entry in european phase

Ref document number: 20957529

Country of ref document: EP

Kind code of ref document: A1