CN107358099B - Useless variable detection method based on LLVM intermediate representation program slicing technology - Google Patents

Useless variable detection method based on LLVM intermediate representation program slicing technology Download PDF

Info

Publication number
CN107358099B
CN107358099B CN201710431448.2A CN201710431448A CN107358099B CN 107358099 B CN107358099 B CN 107358099B CN 201710431448 A CN201710431448 A CN 201710431448A CN 107358099 B CN107358099 B CN 107358099B
Authority
CN
China
Prior art keywords
variable
program
variables
graph
useless
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710431448.2A
Other languages
Chinese (zh)
Other versions
CN107358099A (en
Inventor
张迎周
王星
陈星昊
尹秀
赵莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201710431448.2A priority Critical patent/CN107358099B/en
Publication of CN107358099A publication Critical patent/CN107358099A/en
Application granted granted Critical
Publication of CN107358099B publication Critical patent/CN107358099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a useless variable detection method based on LLVM intermediate representation program slicing, which comprises the steps of starting from a program source code added with useless variables, firstly converting the source code into a LLVM intermediate representation form, then analyzing the LLVM intermediate representation by using a program slicing technology to obtain a program dependence graph, then extracting and simplifying the program dependence graph to obtain a variable distance graph, finally setting a distance threshold value, calculating the distance between an output variable and other variables on the variable distance graph, and judging whether the useless variables exist in the source code. The method and the device can effectively detect the useless variables added into the source codes, and have universality when detecting the source codes of different languages.

Description

Useless variable detection method based on LLVM intermediate representation program slicing technology
Technical Field
The invention relates to the technical field of malicious code analysis, in particular to a useless variable detection method based on LLVM intermediate representation program slicing technology.
Background
With the rapid development of the internet technology in the information age, the life of people becomes more convenient and efficient, and meanwhile, network users are more easily attacked by malicious codes. Network information security is increasingly emphasized by people, and various malicious code analysis methods are continuously proposed. In order to increase the difficulty of analyzing the malicious code, a writer of the malicious code often adopts various methods to protect the code, and code obfuscation is one of the commonly used methods. The use of code obfuscation techniques increases the overhead of reverse engineers to analyze the code and also enables malware to evade detection by security tools.
Control flow obfuscation and data flow obfuscation are the most widely used code obfuscation methods. The former changes the control flow structure of a program through various means, and makes the control flow of the program complicated and difficult to analyze and understand by people on the premise of not changing the execution result of the program. The latter converts data or data structures in the program into an unintelligible form without affecting the result of the program execution, making it difficult for an anti-obfuscator to analyze the data in the program. Inserting useless variables is one of control flow obfuscation methods that inserts variables in the source program that are not related to the results of program execution, thereby preventing the anti-obfuscator from analyzing the code.
Researchers at home and abroad propose various code anti-confusion methods. An article proposes a detection method of an opaque predicate facing logic, which represents the intrinsic characteristics of the opaque predicate by constructing a general logic formula, judges whether the predicate is opaque or not by symbolic execution and constraint solution, and further restores a program control flow structure. There is a paper that proposes a method combining static analysis and dynamic analysis, and supplements the result of dynamic analysis by using static analysis, and adds a possible control flow edge to a control flow graph obtained by dynamic analysis to recover the control flow graph of an obfuscated code. There is a paper that proposes a program conversion method with semantic preservation, which combines with a taint recognition technique to recover the internal logic of a program from code using different obfuscation techniques.
These code anti-obfuscation methods, although recovering the control flow structure by various technical means, cannot analyze either specifically for inserted useless variables or uniformly for obfuscated codes of different programming languages. Therefore, general and targeted detection methods of useless variables are still in need of further research.
Disclosure of Invention
The invention provides a useless variable detection method based on LLVM intermediate representation program slices. The method starts from a source code possibly added with useless variables, analyzes the source code by using a program slicing technology, detects the useless variables inserted in the source code, and restores the original control flow structure of a program. The method can carry out unified analysis on the source codes written in different languages, reduces the manual analysis overhead and improves the detection efficiency.
The invention utilizes the program slicing technology to analyze the source code possibly added with useless variables to obtain a program dependency graph. And constructing a variable distance graph by extracting and simplifying the program dependency graph, and calculating the distance between the variables on the graph. Finally, the variables inserted into the source code and irrelevant to the program execution result are detected.
The method comprises the following steps:
s1, acquiring source codes into which useless variables can be inserted;
s2, converting the source code in the S1 into a form of LLVM intermediate representation under the LLVM;
s3, slicing the LLVM intermediate representation obtained in the S2 by using a program slicing technology to obtain a program dependence graph;
s4, extracting and simplifying the program dependency graph to construct a variable distance graph;
s5, setting the variable number n in the source code as a variable distance threshold value r, calculating the distance d between other variables and the output variable on the variable distance graph, and if d > r, considering the variables as useless variables irrelevant to the program execution result.
The conversion of the source code into the form of LLVM intermediate representation as described in S2 is done by means of a clone compiler.
The process of constructing the variable distance map in S4 is as follows:
s4-1, traversing nodes in the program dependency graph, adding all variables as nodes into the variable distance graph, and only one repeated node is reserved;
s4-2, traversing edges in the program dependency graph, setting a variable set in a starting node of the edge as B and a variable set in an ending node of the edge as E for one directed edge in the program dependency graph, and adding a directed edge which points to a variable in the set E from the variable in the set B into the variable distance graph; only one edge is reserved for the repeated edge.
As a method for detecting useless variables in source codes, the method makes up the defects of the traditional control flow obfuscated code detection method, analyzes the source program added with the useless variables by using a program slicing technology, extracts and simplifies the obtained program dependency graph, and constructs a variable distance graph. And calculating the distance between the variables on the variable distance graph, and detecting useless variables added in the source program.
The present invention brings the following advantageous effects
(1) Analyzing the source program added with the useless variables by using a program slicing technology, constructing a variable distance graph, calculating the distance between the variables on the graph, and having higher accuracy when detecting the useless variables;
(2) the source program is converted into LLVM intermediate representation, and then slice analysis is carried out on the LLVM intermediate representation. Through the conversion of LLVM intermediate representation, the source programs written in different languages can be analyzed and processed uniformly, so that the method has strong universality in detection of useless variables.
Drawings
Fig. 1 is a flowchart of a useless variable detection method based on LLVM intermediate representation program slicing technology according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A useless variable detection method based on LLVM intermediate representation program slicing obtains a variable distance graph of a source program by using LLVM intermediate representation and program slicing technology from the source program added with useless variables, calculates the distance between an output variable and other variables on the variable distance graph, and detects the useless variables inserted in the source program. Fig. 1 shows the overall process of the method of the present invention, which comprises the following steps:
step 1): source code is obtained that may have garbage variables added. Useless variables are variables in a program that have no effect on the outcome of the output, but may have control or data dependencies with other variables in the program. By adding useless variables, the source code can be made complex and difficult to analyze and understand by humans. The source code with the inserted garbage variables can be downloaded from a malicious code website.
Step 2): the source code is converted into a form of LLVM intermediate representation. LLVM is short for low level virtual machine, and is a compiler framework written by C + +. The LLVM intermediate representation can uniformly represent source code written in different languages. The LLVM intermediate representation file can be compiled from the source code by a clone compiler.
Step 3): and (3) carrying out slicing analysis on the LLVM intermediate representation obtained in the step (2) by using a program slicing technology to obtain a Program Dependency Graph (PDG) (program dependency graph). The program dependency graph is composed of a control flow graph, a control dependency graph and a data dependency graph. The control flow graph contains control flow information of the program, the control dependency graph contains control dependency information of the program, and the data dependency graph contains data dependency information of the program. The program slicing technology can analyze various dependency relations possibly existing in the process and generate a program dependency graph.
Step 4): extracting and simplifying the program dependence graph obtained in the step 3, and constructing a Variable Distance Graph (VDG).
Step 4.1): nodes in the PDG are traversed, and all variables in the PDG nodes are added as nodes to the VDG. Only one of the repeated nodes is reserved;
step 4.2): traversing the edges in the PDG, and setting the variable set in the starting node of the edge as B and the variable set in the ending node of the edge as E for one directed edge in the PDG. Adding a directed edge in the VDG that points from the variable in set B to the variable in set E. Only one edge is reserved for the repeated edge.
Step 5): a variable distance threshold r is set, and the variables output from the source code are analyzed to detect useless variables in the program.
Step 5.1): setting the variable number n in the source code as a variable distance threshold r;
step 5.2): and (4) setting the weight values of edges in the VDG to be 1, and calculating the distance d between the output variable and other variables on the VDG. If a directed path does not exist between a certain variable and an output variable, d is infinite;
step 5.3): the relation between d and r is judged, and variables satisfying d > r are considered to be useless variables irrelevant to the program execution result.

Claims (2)

1. The useless variable detection method based on the LLVM intermediate representation program slicing technology is characterized by comprising the following steps of:
s1, acquiring source codes into which useless variables can be inserted;
s2, converting the source code in the S1 into a form of LLVM intermediate representation under the LLVM;
s3, slicing the LLVM intermediate representation obtained in the S2 by using a program slicing technology to obtain a program dependence graph;
s4, extracting and simplifying the program dependency graph, and constructing a variable distance graph, wherein the process comprises the following steps:
s4-1, traversing nodes in the program dependency graph, adding all variables as nodes into the variable distance graph, and only one repeated node is reserved;
s4-2, traversing edges in the program dependency graph, setting a variable set in a starting node of the edge as B and a variable set in an ending node of the edge as E for one directed edge in the program dependency graph, and adding a directed edge which points to a variable in the set E from the variable in the set B into the variable distance graph; only one edge is reserved for repeated edges;
s5, setting the variable number n in the source code as a variable distance threshold value r, calculating the distance d between other variables and the output variable on the variable distance graph, and if d > r, considering the variables as useless variables irrelevant to the program execution result.
2. The garbage variable detecting method according to claim 1, wherein the converting the source code into the LLVM intermediate representation in S2 is performed by a claspg compiler.
CN201710431448.2A 2017-06-09 2017-06-09 Useless variable detection method based on LLVM intermediate representation program slicing technology Active CN107358099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710431448.2A CN107358099B (en) 2017-06-09 2017-06-09 Useless variable detection method based on LLVM intermediate representation program slicing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710431448.2A CN107358099B (en) 2017-06-09 2017-06-09 Useless variable detection method based on LLVM intermediate representation program slicing technology

Publications (2)

Publication Number Publication Date
CN107358099A CN107358099A (en) 2017-11-17
CN107358099B true CN107358099B (en) 2020-05-05

Family

ID=60272710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710431448.2A Active CN107358099B (en) 2017-06-09 2017-06-09 Useless variable detection method based on LLVM intermediate representation program slicing technology

Country Status (1)

Country Link
CN (1) CN107358099B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943516B (en) * 2017-12-06 2020-11-13 南京邮电大学 Clone code detection method based on LLVM
CN110543407B (en) 2019-08-21 2021-11-05 杭州趣链科技有限公司 Static analysis method for performance of identity intelligent contract
CN112528240B (en) * 2020-12-02 2022-08-09 上海交通大学 Password code-oriented automatic program sensitive data protection method
CN114417332A (en) * 2022-01-07 2022-04-29 西南交通大学 Program credibility verification method and device for C program source code

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572476A (en) * 2015-01-30 2015-04-29 南京邮电大学 Software safety testing method based on program slicing
CN104881274A (en) * 2014-02-28 2015-09-02 上海斐讯数据通信技术有限公司 Method for identifying useless codes
CN105700893A (en) * 2016-02-23 2016-06-22 南京邮电大学 LLVM IR program slicing method based on improved system dependence graph
CN106802860A (en) * 2015-11-25 2017-06-06 阿里巴巴集团控股有限公司 Useless class detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5779077B2 (en) * 2011-11-22 2015-09-16 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Apparatus and method for supporting program generation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881274A (en) * 2014-02-28 2015-09-02 上海斐讯数据通信技术有限公司 Method for identifying useless codes
CN104572476A (en) * 2015-01-30 2015-04-29 南京邮电大学 Software safety testing method based on program slicing
CN106802860A (en) * 2015-11-25 2017-06-06 阿里巴巴集团控股有限公司 Useless class detection method and device
CN105700893A (en) * 2016-02-23 2016-06-22 南京邮电大学 LLVM IR program slicing method based on improved system dependence graph

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Implementation of Constraint Systems for Useless Variable Elimination;Daniel M. Roy;《Research Science Institute 1998》;19981231;正文第24-31页 *
基于程序切片的代码反混淆方法研究;王星;《万方数据知识服务平台在线出版》;20181217;第1-6页 *
胡正军;《中国优秀硕士学位论文全文数据库信息科技辑》;《中国优秀硕士学位论文全文数据库信息科技辑》;20130215;正文第16-39页 *

Also Published As

Publication number Publication date
CN107358099A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
CN107358099B (en) Useless variable detection method based on LLVM intermediate representation program slicing technology
CN108985061B (en) Webshell detection method based on model fusion
CN111460472B (en) Encryption algorithm identification method based on deep learning graph network
CN107169323B (en) Android application repacking detection method based on layout cluster map
US9032516B2 (en) System and method for detecting malicious script
Eskandari et al. Metamorphic malware detection using control flow graph mining
CN104834858A (en) Method for statically detecting malicious code in android APP (Application)
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN104536898A (en) C-program parallel region detecting method
CN104407872A (en) Code clone detection method
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN116366377B (en) Malicious file detection method, device, equipment and storage medium
CN115617395A (en) Intelligent contract similarity detection method fusing global and local features
CN106874762B (en) Android malicious code detecting method based on API dependence graph
CN114003910B (en) Malicious variety real-time detection method based on dynamic graph comparison learning
CN115659356A (en) Method for realizing self-adaptive adjustment of path search depth based on abstract syntax tree
Agrawal et al. Detection of global, metamorphic malware variants using control and data flow analysis
CN113468524A (en) RASP-based machine learning model security detection method
Chai et al. Invoke-deobfuscation: AST-based and semantics-preserving deobfuscation for PowerShell scripts
WO2010149986A2 (en) A method, a computer program and apparatus for analysing symbols in a computer
CN109670317B (en) Internet of things equipment inheritance vulnerability mining method based on atomic control flow graph
CN106951366A (en) A kind of dead code detection method of C language based on program slicing technique
CN107622201B (en) A kind of Android platform clone's application program rapid detection method of anti-reinforcing
CN115906086A (en) Method, system and storage medium for detecting webpage backdoor based on code attribute graph
CN110309656B (en) Implicit type conversion security detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant