CN113742732A - Code vulnerability scanning and positioning method - Google Patents

Code vulnerability scanning and positioning method Download PDF

Info

Publication number
CN113742732A
CN113742732A CN202010487204.8A CN202010487204A CN113742732A CN 113742732 A CN113742732 A CN 113742732A CN 202010487204 A CN202010487204 A CN 202010487204A CN 113742732 A CN113742732 A CN 113742732A
Authority
CN
China
Prior art keywords
vulnerability
scanning
slicing
source file
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010487204.8A
Other languages
Chinese (zh)
Inventor
房春荣
葛宇
刘子夕
葛修婷
钱美缘
李宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010487204.8A priority Critical patent/CN113742732A/en
Publication of CN113742732A publication Critical patent/CN113742732A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

A code vulnerability scanning and positioning method includes firstly scanning a code data set through a plurality of vulnerability scanning tools, analyzing and extracting scanning results to obtain basic vulnerability information. And then, adopting a voting strategy to mark the bugs by positive and false alarms, and filtering out false alarm bugs. And finally, slicing the source code by using an existing slicing tool wala on the basis of the known basic information of the vulnerability. The slicing module of the invention is improved in three aspects of selection of slicing modes, classification processing of slicing point instruction types and filtration of irrelevant sentences, and the precision of vulnerability location is effectively improved.

Description

Code vulnerability scanning and positioning method
Technical Field
The invention belongs to the field of software engineering, in particular to application of static program analysis in the field of software engineering, which is used for scanning vulnerability information generated by a static code vulnerability scanning tool.
Background
Static code analysis tools are known to mark a large number of false positives. Due to the limitations of static analysis itself, Rice theorem has proven that the problem of static analysis is an indeterminate problem in the worst case; on the other hand, most static analysis tools are not accurate enough in modeling, the difference between an analysis model and the actual execution of a program is large, and many tools use a conservative analysis method and adopt an analysis mode with sensitive flow distribution or insensitive context, so that high false alarm rate and false alarm rate are generated. How to reduce the false alarm rate and the false alarm rate of the static analysis tool becomes a hot problem in the software vulnerability analysis. Since classifying a large number of false positives is time consuming for the developer and may reduce the confidence of the static code analysis tool. False positives will be one of the biggest obstacles to using static code analysis. Therefore, how to determine whether a warning message from a static code analysis tool does indicate an error, and reducing the number of false positives that developers must avoid becomes an urgent issue to be addressed at the moment.
The academia has already made some research on the problem of false alarms. A false report analysis mode is introduced in a software study report vulnerability review, and a thought is provided by a three-star Bug feedback system. However, it is also a data granularity issue that cannot be circumvented. Samsung employs an internally implemented checker to intercept statements associated with a vulnerability, and the specific implementation is unknown. The software statement overview uses a manual slicing approach, i.e., all source code elements that are not associated with false positives are deleted until the deletion of the next element results in the disappearance of the false positive message. With artificial sections, the effect is certainly best. But not inconsiderable, the cost of manual slicing is very large for the huge amount of data in the data set (this is also mentioned in the literature).
At present, leak scanning tools in the industry are various, but the phenomenon of missing report is still serious. The result obtained by scanning the source file by a single scanning tool is very unstable, so that the possibility of false alarm of the vulnerability is very high, and finally the vulnerability is inaccurately positioned.
Some slicing tools are already mature on the market, and automatic slicing is not difficult. However, the following problems are also present:
1. the slice may contain some irrelevant statements in the source file, for example, statements in java class libraries of the import in the source file, and runtime class of the Wala tool.
2. The use of data flow dependent and control flow dependent parameters can severely impact the final slicing result.
3. The slicing tool generates an intermediate language representation for the source file as it is analyzed, resulting in certain statements (e.g., simple assignments (x ═ y, y ═ z)) that will not appear in the intermediate language due to optimizations performed by the intermediate language, and these Java statements will never appear in the slice. But based on the principle of taint propagation, these assignment statements are likely to be a source of contamination.
4. The selection of the appropriate slicing mode has a great influence on the final slicing result.
Therefore, we have generated the following idea: and scanning the code source file by using a plurality of vulnerability scanning tools, and comparing scanning reports of different scanning tools to obtain vulnerability basic information. According to the basic information of the loophole, an ideal slicing result is obtained automatically, and only a small amount of redundancy is achieved compared with manual slicing. Thereby more accurate vulnerability localization can be obtained.
Disclosure of Invention
The invention aims to solve the problems that: the current scanning tool fails to report too high, and the vulnerability location is not accurate.
1) Three vulnerability scanning tools are adopted to scan a code source file, and the vulnerability scanning method is mainly divided by the following steps:
1.1) firstly, packaging a code source file into a jar packet, automatically executing a command line mode instruction of a vulnerability scanning tool by calling a command line through java, and storing a scanning result.
1.2) extracting basic information of vulnerability scanning according to the result file stored in the step 1.1, wherein the basic information comprises vulnerability category, vulnerability grade, vulnerability ID, vulnerability method name and vulnerability class name. Judging whether the vulnerability is reported positively or not, adopting a voting mechanism, and defining the weight of the vulnerability to the level, wherein the light risk weight is 1, the medium risk weight is 2, and the high risk weight is 3. The three scanning tools are supposed to scan the same source file and report a bug warning, namely a positive warning, or a false warning. Assuming that it is critical that three scanning tools scan the same source file, the weighting weight of the source file is 3, so whether the vulnerability is positive is determined by calculating whether the weighting weight of each source file is greater than 3.
2) The method comprises the following steps of obtaining relevant information of a code source file, and dividing the code source file mainly through the following steps:
2.1) reading a source file (jar packet form) and generating a class inheritance graph of the code source file. And obtaining a system call graph CallGraph of the source file according to the class inheritance graph.
2.2) computing data and control flow dependency graphs according to the known system call graph CallGraph.
3) Finding out accurate slicing points, and dividing the slicing points mainly through the following steps:
3.1) traversing each CGnode of the system call graph CallGraph, and stopping traversing until the same method name and the same class name are matched according to the method name and the class name in the basic vulnerability information;
3.2) for the CGnode obtained in the step 3.1), traversing each instruction in the CGnode, and stopping traversing according to the condition provided by the vulnerability basic information until the instruction meeting the condition is matched;
3.3) if the statement is a common statement, directly packaging the statement into an object normalstement of Wala; if the statement is an inter-method calling statement, besides the related information of the slice point, the position of the calling method is found according to the method name called by the slice point, and the related information of the method point is added to obtain accurate normalstement.
4) Generating a final slicing result, and mainly dividing the final slicing result through the following steps:
4.1) according to the reference and the referred times of the slicing points, based on a reasonable strategy, selecting a slicing mode.
Source files are sliced using Wala.
4.2) pruning. Based on the system dependency graph, a filter is defined to filter a common java base class library, and classes from an import in a source file are filtered.
The invention is characterized in that: 1. according to the scanning results of a plurality of vulnerability scanning tools, adopting a voting strategy to mark the vulnerabilities positively and falsely; 2. according to the instruction type of the slicing point, calling statements between methods are specially processed, and the characteristics of the slicing point are enriched; 3. selecting a proper slicing mode according to the reference number and the referenced times of the slicing points; 4. and filtering related classes of the Java basic class library based on the system dependency graph. By combining the four points, the loophole positioning obtained by the method avoids the possibility of false alarm to a certain extent, and the granularity of the obtained slices is very small.
Through the above steps, we can achieve the benefits including but not limited to: some false alarms can be filtered more accurately, and relevant classes of the Wala slicing tool during operation are effectively filtered; statements that do not appear in the source file are eliminated, etc.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 is a source code and a slice code: source code (left), slice (right).
Detailed Description
The key technology involved in the invention is Wala.
Wala
The main use of the watson Libraries for Analysis (WALA) is to provide static Analysis for Java bytecodes and related languages as well as JavaScript. The WALA tool can perform interprocess data flow analysis, context-based list slicers, pointer analysis and call graph construction, and a general framework for iterative data flow on Java bytecodes. In the invention, the position of the vulnerability in the vulnerability report is taken as a starting point, and program slicing is carried out by means of the control dependence relationship of a WALA tool analysis program to form a simplified vulnerability code segment.
The following describes the steps of the method with a specific example and shows the results.
In order to explain the technical contents of the present invention, the objects achieved, and the final effects in detail, specific embodiments will be described in more detail below.
The slicing tool we took is Wala and the dataset used is Juliet Java.
The following describes the steps of the method with a specific example and shows the results.
We have chosen a piece of code in the Julie Java test case dataset.
The method comprises the following specific implementation steps:
1. and packaging the code source file into a jar package form, automatically executing command line mode instructions of three vulnerability scanning tools through java call command lines, and storing the scanning result.
2. And extracting basic information of vulnerability scanning according to the stored result file, wherein the basic information comprises vulnerability category, vulnerability grade, vulnerability ID, vulnerability method name and vulnerability class name. Judging whether the vulnerability is reported positively or not, adopting a voting mechanism, and defining the weight of the vulnerability to the level, wherein the light risk weight is 1, the medium risk weight is 2, and the high risk weight is 3. The three scanning tools are supposed to scan the same source file and report a bug warning, namely a positive warning, or a false warning. Assuming that it is critical that three scanning tools scan the same source file, the weighting weight of the source file is 3, so whether the vulnerability is positive is determined by calculating whether the weighting weight of each source file is greater than 3.
3. And enumerating the running class of Wala as a Wala exclusion analysis option, restricting the action range of a Wala slicing tool, and reducing the analysis domain of Wala. Here we define a related class library to be excluded, including the unrelated class libraries of java.
4. Packaging Julie Java test case dataset into Jar package form, because the dataset is integrated by ant, we recommend taking ant command: and (4) packing the anti-f MyProject \ built. xml clean Jar, so as to ensure the performability of the Jar package. The method comprises the steps that Wala loads all classes in a Jar package into an analysis pool of Wala by reading the Jar package of a source file and adding a filter file defined in 1 as an analysis domain parameter to generate a class inheritance graph of the Jar package; the system call graph is computed by class inheritance graph.
5. And finding the current slice point based on the vulnerability positioning information. According to the source file class name and method name recorded in the vulnerability positioning information, the function CGnode where the slice point is located is obtained through a system call graph (Wala internally encapsulates the function bodies into CGnode classes). And after the corresponding CGnode is obtained, traversing the IR instruction (the IR instruction is an intermediate representation in the Wala analysis process) in the CGnode, and mapping the position of the IR instruction to the source file until the IR instruction with the mapping position consistent with the vulnerability positioning line number recorded in the vulnerability positioning information is found, namely the target instruction. If the instruction is a common instruction, the instruction is directly encapsulated into a Wala statement object normalstement; if the statement is an inter-method calling statement, besides the related information of the slice point, the CGNode of the calling method is found according to the method name called by the slice point, and the related information of the called method is added.
6. The slicing mode is selected by a reasonable adoption strategy based on the number of times of reference and reference of the slicing point (getNumberOfUses and getNumberOfDefs can obtain the number of reference and reference). It is necessary to select the proper slicing mode, which affects not only the slicing result but also whether the slicing can be completed. Because for the seed states (slice points), if only few statements depend on the result of s, but s depends on the results of many other statements, taking forward slices can occupy a large memory, directly resulting in the generation of OOM exceptions. At this time, slicing is performed in a backward slicing manner. And the statements embodying the s dependence are also the measurement of the depended statements, and the most intuitive attributes are the number of times of reference and the number of times of reference. We adopt a rough comparison method here, if the reference number of the seed state is larger than the referenced number, the backward slice is adopted; otherwise, forward slicing is adopted.
7. Source files are sliced using Wala. According to the slicing mode determined in the step 4, adopting a static method computeBackWardSlice of the Slicer or computeForWardSlice to slice the segment state; setting system call graph, data flow and control flow parameters; the system call graph we have obtained in 2. Data flow and control flow parameters, which we set here are Full, and experiments prove that the statements generated in this way are more.
8. And (6) pruning. Based on the system dependency graph, a Predicate filter is defined to filter a common java base class library, and class statements from an import in a source file are filtered. The generation of the system dependency graph requires parameters of a system call graph, a data flow and a control flow, and all the parameters are generated in the previous step and can be directly used.

Claims (5)

1. A code vulnerability scanning and positioning method is characterized by comprising the following steps: (1) according to the scanning results of a plurality of vulnerability scanning tools, adopting a voting strategy to mark the vulnerabilities positively and falsely; (2) according to the instruction type of the slicing point, calling statements between methods are specially processed, and the characteristics of the slicing point are enriched; (3) selecting a proper slicing mode according to the reference number and the referenced times of the slicing points; (4) and filtering related classes of the Java basic class library based on the system dependency graph.
2. The method for scanning and locating the code bugs according to claim 1, wherein the bugs are positively and falsely marked by a voting strategy according to the scanning results of a plurality of bug scanning tools, and the steps are mainly as follows:
firstly, a code source file is packaged into a jar packet form, a command line mode instruction of a vulnerability scanning tool is automatically executed through a java call command line, and a scanning result is stored. And then extracting basic information of vulnerability scanning according to the stored result file, wherein the basic information comprises vulnerability category, vulnerability grade, vulnerability ID, vulnerability method name and vulnerability class name. Judging whether the vulnerability is reported positively or not, adopting a voting mechanism, and defining the weight of the vulnerability to the level, wherein the light risk weight is 1, the medium risk weight is 2, and the high risk weight is 3. The three scanning tools are supposed to scan the same source file and report a bug warning, namely a positive warning, or a false warning. Assuming that it is critical that three scanning tools scan the same source file, the weighting weight of the source file is 3, so whether the vulnerability is positive is determined by calculating whether the weighting weight of each source file is greater than 3.
3. The method for scanning and positioning the code vulnerability according to claim 1, wherein the inter-method call statement is specially processed according to the instruction type of the slice point, so as to enrich the characteristics of the slice point; first, a source file (jar packet form) is read, and a class inheritance graph of the code source file is generated. Then, according to the class inheritance graph, each CGnode of the system call graph CallGraph traversal system call graph of the source file is obtained, and according to the method name and the class name in the vulnerability basic information, traversal is stopped until the same method name and the same class name are matched; then, the obtained CGnode traverses each instruction in the CGnode, and stops traversing according to the condition provided by the vulnerability basic information until the instruction meeting the condition is matched; finally, if the statement is a common statement, the statement is directly packaged into an object normalstement of Wala; if the statement is an inter-method calling statement, besides the related information of the slice point, the position of the calling method is found according to the method name called by the slice point, and the related information of the method point is added to obtain accurate normalstement.
4. A method for scanning and locating code vulnerabilities as defined in claim 1 in which an appropriate slicing mode is selected based on the number of times slices are referenced; the method is characterized in that the slicing mode is selected according to the number of times of reference and reference of the slicing points (the number of times of reference and reference can be obtained by getNumberOfUSEs and getNumberOfDefs). The statements embodying the dependency are the metrics of the depended statements, and the most intuitive attributes are the number of references and the number of references. We here take a rough comparison, if the number of references of a slice point is greater than the number of references, backward slicing is used; otherwise, forward slicing is adopted.
5. The method for code vulnerability scanning and location according to claim 1, wherein relevant classes of the Java base class library are filtered based on a system dependency graph. Based on the system dependency graph, a Predicate filter is defined to filter common java base class libraries and filter class statements from import in the source file.
CN202010487204.8A 2020-05-27 2020-05-27 Code vulnerability scanning and positioning method Pending CN113742732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010487204.8A CN113742732A (en) 2020-05-27 2020-05-27 Code vulnerability scanning and positioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010487204.8A CN113742732A (en) 2020-05-27 2020-05-27 Code vulnerability scanning and positioning method

Publications (1)

Publication Number Publication Date
CN113742732A true CN113742732A (en) 2021-12-03

Family

ID=78727945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010487204.8A Pending CN113742732A (en) 2020-05-27 2020-05-27 Code vulnerability scanning and positioning method

Country Status (1)

Country Link
CN (1) CN113742732A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626068A (en) * 2022-02-24 2022-06-14 南开大学 High-precision third-party library vulnerability module detection method based on JAVA function call sequence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017458A (en) * 2007-03-02 2007-08-15 北京邮电大学 Software safety code analyzer based on static analysis of source code and testing method therefor
CN109271278A (en) * 2017-07-18 2019-01-25 阿里巴巴集团控股有限公司 A kind of method and apparatus of the reference number of determining disk snapshot data slicer
CN109391636A (en) * 2018-12-20 2019-02-26 广东电网有限责任公司 A kind of loophole administering method and device based on hierarchical protection asset tree
CN110245496A (en) * 2019-05-27 2019-09-17 华中科技大学 A kind of source code leak detection method and detector and its training method and system
CN110378122A (en) * 2019-06-28 2019-10-25 公安部第三研究所 The system and method for reducing and failing to report and report by mistake situation are realized for WEB scanner loophole
CN111191248A (en) * 2019-12-31 2020-05-22 北京清华亚迅电子信息研究所 Vulnerability detection system and method for Android vehicle-mounted terminal system
US20230401274A1 (en) * 2020-03-04 2023-12-14 Karl Louis Denninghoff Relative fuzziness for fast reduction of false positives and false negatives in computational text searches

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017458A (en) * 2007-03-02 2007-08-15 北京邮电大学 Software safety code analyzer based on static analysis of source code and testing method therefor
CN109271278A (en) * 2017-07-18 2019-01-25 阿里巴巴集团控股有限公司 A kind of method and apparatus of the reference number of determining disk snapshot data slicer
CN109391636A (en) * 2018-12-20 2019-02-26 广东电网有限责任公司 A kind of loophole administering method and device based on hierarchical protection asset tree
CN110245496A (en) * 2019-05-27 2019-09-17 华中科技大学 A kind of source code leak detection method and detector and its training method and system
CN110378122A (en) * 2019-06-28 2019-10-25 公安部第三研究所 The system and method for reducing and failing to report and report by mistake situation are realized for WEB scanner loophole
CN111191248A (en) * 2019-12-31 2020-05-22 北京清华亚迅电子信息研究所 Vulnerability detection system and method for Android vehicle-mounted terminal system
US20230401274A1 (en) * 2020-03-04 2023-12-14 Karl Louis Denninghoff Relative fuzziness for fast reduction of false positives and false negatives in computational text searches

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHARY LIU: "程序切片", Retrieved from the Internet <URL:程序切片> *
STANISLAV DASHEVSKYI等: "A Screening Test for Disclosed Vulnerabilities in FOSS Components", 《IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 》, vol. 45, no. 10, 1 October 2019 (2019-10-01), pages 945, XP011750584, DOI: 10.1109/TSE.2018.2816033 *
余杰等: "主动Web漏洞扫描中的场景技术研究", 《计算机工程与科学》, vol. 32, no. 03, 15 March 2010 (2010-03-15), pages 31 - 34 *
冯洋等: "高可信众包群体构建方法", 《中国科学:信息科学》, vol. 49, no. 11, 30 November 2019 (2019-11-30), pages 1412 - 1427 *
刘祎璠: "基于静态分析的SQL注入漏洞检测方法研究", 《中国优秀硕士学位论文全文数据库》, 31 March 2019 (2019-03-31), pages 139 - 77 *
葛修婷等: "机器学习技术在软件测试领域的应用", 《西南科技大学学报》, vol. 33, no. 04, 31 December 2018 (2018-12-31), pages 90 - 97 *
陈纯等: "基于静态污点分析的Android应用能力泄露检测框架", 《现代计算机(专业版)》, no. 09, 25 March 2019 (2019-03-25), pages 94 - 100 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626068A (en) * 2022-02-24 2022-06-14 南开大学 High-precision third-party library vulnerability module detection method based on JAVA function call sequence
CN114626068B (en) * 2022-02-24 2024-06-07 南开大学 High-precision third-party library vulnerability module detection method based on JAVA function call sequence

Similar Documents

Publication Publication Date Title
CN109144882B (en) Software fault positioning method and device based on program invariants
US10853231B2 (en) Detection and correction of coding errors in software development
US9208057B2 (en) Efficient model checking technique for finding software defects
CN101286132B (en) Test method and system based on software defect mode
US8195720B2 (en) Detecting memory leaks
CN104899147B (en) A kind of code Static Analysis Method towards safety inspection
Heimdahl et al. Reduction and slicing of hierarchical state machines
US20150052505A1 (en) Identifying and triaging software bugs through backward propagation of under-approximated values and empiric techniques
US8732676B1 (en) System and method for generating unit test based on recorded execution paths
US11048487B1 (en) Syntactical change-resistant code generation
US20110145799A1 (en) Path-sensitive dataflow analysis including path refinement
Wu et al. Automatic test case generation for structural testing of function block diagrams
JP2018026135A (en) System and method for cause point analysis for effective handling of static analysis alarms
CN104573503A (en) Method and device for detecting memory access overflow
CN115659335A (en) Block chain intelligent contract vulnerability detection method and device based on mixed fuzzy test
CN114968807A (en) Code detection method and device, electronic equipment and readable storage medium
CN110659063A (en) Software project reconstruction method and device, computer device and storage medium
CN113742732A (en) Code vulnerability scanning and positioning method
CN112631944A (en) Source code detection method and device based on abstract syntax tree and computer storage medium
Zhao et al. H-fuzzing: A new heuristic method for fuzzing data generation
Zhang et al. Test case prioritization technique based on error probability and severity of UML models
Aziz et al. From goal-oriented requirements to Event-B specifications
Baráth et al. Automatic checking of the usage of the C++ move semantics
Kashima et al. Comparison of backward slicing techniques for java
Singh et al. Design and implementation of testing tool for code smell rectification using c-mean algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination