CN113742731A - Data collection method for code vulnerability intelligent detection - Google Patents

Data collection method for code vulnerability intelligent detection Download PDF

Info

Publication number
CN113742731A
CN113742731A CN202010487163.2A CN202010487163A CN113742731A CN 113742731 A CN113742731 A CN 113742731A CN 202010487163 A CN202010487163 A CN 202010487163A CN 113742731 A CN113742731 A CN 113742731A
Authority
CN
China
Prior art keywords
code
vulnerability
detection
judgment
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010487163.2A
Other languages
Chinese (zh)
Inventor
房春荣
钱美缘
葛修婷
王旭
曹振飞
李彤宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202010487163.2A priority Critical patent/CN113742731A/en
Publication of CN113742731A publication Critical patent/CN113742731A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

A data collection method for code vulnerability intelligent detection constructs an initial code vulnerability data set, then utilizes a trained machine learning model to process unmarked codes, and expands the data set according to results of model marking and manual marking. The initial data set is constructed by combining the results of the code vulnerability detection tool with the judgment of testers, the training of the machine model is to utilize the initial data set, determine whether false alarm occurs or not by combining the judgment of the machine learning model and the judgment results of the testers for the code which is not marked, and expand the data set according to the false alarm.

Description

Data collection method for code vulnerability intelligent detection
Technical Field
The invention belongs to the field of software engineering, and particularly relates to application of a code vulnerability false alarm detection and machine learning method in the field of software engineering, which is used for constructing and collecting a code vulnerability data set.
Background
Due to the increasing complexity of modern software products, the manual testing method is not enough to quickly complete the software bug detection. At present, the traditional vulnerability discovery technology theory is mature, and vulnerabilities can be mined from codes in a mode of model detection, fuzzy test, symbolic execution and binary ratio equivalence. These sophisticated techniques have been largely automated and can scan software code awaiting testing for specific types of vulnerabilities with minimal human intervention. However, the use of automated code vulnerability detection tools also faces problems, such as:
1) code vulnerability detection tools must make a tradeoff between detection efficiency and accuracy. Whether the syntax is analyzed or the execution path of the code is analyzed, a complex analysis model needs to be constructed, and the problems of overlarge solution scale or path explosion easily occur. Due to the limitation of vulnerability detection technology, accurate analysis requires a considerable analysis time, which is not allowed in practical applications.
2) The code vulnerability detection tool relies on rules preset by human experts, so the detected vulnerabilities are often limited to certain specific types. The manually defined vulnerability rules have strong subjectivity, all conditions are difficult to consider comprehensively, and the imperfect rules can cause the problems of false missing and false reports.
3) The detection capability of the code vulnerability detection tool is fixed, and most detected vulnerabilities are real vulnerabilities for programs with low security levels. However, as bugs are fixed, the security of programs is increasing, and the rate of false alarms also increases. If the capabilities of the code vulnerability detection tools are not increased, most of the developer's time is wasted manually checking and marking invalid vulnerabilities.
In summary, in the using process of the automatic detection tool, the situations of missing report and false report are very common. The problem of excessive false alarms can be solved by improving the model. With the continuous breakthrough of machine learning and deep learning technologies, the machine learning technology can be used for helping a code vulnerability detection tool to improve the detection accuracy and reduce the false alarm ratio. However, the accuracy of the machine learning model is very dependent on the size of the data set, and overfitting may occur when insufficient data is provided during training.
The existing code vulnerability data set construction and collection technology has the following problems:
1) the collection modes and the quality of the vulnerability data are different, and the formats of the data sets are also different. At present, a universal and efficient data set is lacked, so that the data set can be automatically constructed only in a web crawler crawling mode in the research process.
2) The continuous increase of code bugs in the using process cannot be considered, the data set cannot be updated, and therefore the detection model cannot be effectively improved.
Disclosure of Invention
In view of the defects of the prior art, the technical problems to be solved by the invention are as follows: in the code vulnerability intelligent detection, an original data set cannot be expanded, so that the accuracy of vulnerability false-alarm detection is influenced.
In order to solve the problems, the invention adopts the technical scheme that: a data set amplification method in code vulnerability intelligent detection comprises the following steps:
1) sending the original code into an automatic vulnerability detection tool for detection;
2) delivering the original code to a tester for vulnerability marking;
3) comparing the detection result with the mark, determining the vulnerability which belongs to the false alarm, and constructing an initial code vulnerability data set;
4) learning the relation between the bug codes and whether the false alarm occurs by using a machine learning model;
5) processing the unmarked codes by using the trained machine learning model;
6) submitting the vulnerability code identified as false alarm by the model to a tester for auditing;
7) adding the vulnerability and the auditing result into a code vulnerability data set constructed before;
by means of the technical scheme, the invention provides the method for expanding the code vulnerability data set, the original data set can be continuously expanded in the using process of the false alarm detection model, then the false alarm detection model can be subjected to iterative training, and higher accuracy is obtained in the later detection process.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Detailed Description
In order to explain the technical content of the invention, the objectives achieved and the final results in detail, specific embodiments will be described in more detail below:
1) and (4) carrying out automatic detection by using a code vulnerability detection tool, and acquiring vulnerability types and vulnerability positions from the detection reports.
2) And acquiring a code segment possibly containing the vulnerability, and judging whether the vulnerability exists by a tester.
3) And comparing the detection result of the tool with the identification result of the tester, if the results are consistent, the detection is considered to be correct, and if the loophole detected by the tool is not marked as a loophole by the tester, the loophole is considered to belong to a loophole with false alarm. Therefore, the vulnerability source code segments and the judgment result of whether the vulnerability source code segments are false reports can be combined to construct an initial code vulnerability false report data set.
4) And training according to the data set through a machine learning algorithm, and learning the relation between the vulnerability source code text and whether false alarm occurs or not to obtain a trained model.
5) And after detecting bugs in other codes by using the code bug detection tool, processing bug codes indicated in the detection report by using the trained model. If a section of code is identified as a bug code with false alarm by the model, the code is handed to a tester for judgment, and if the bug is not contained in the judgment, the code is marked as the bug and added into the database.

Claims (4)

1. A data collection method for code vulnerability intelligent detection is characterized in that an initial code vulnerability data set is constructed by combining the result of a code vulnerability detection tool and the judgment of a tester, then a machine learning model for false-alarm judgment is trained according to the initial code vulnerability data set, finally, the vulnerability with false alarm can be determined by combining the judgment of the machine learning model and the judgment result of the tester, and the vulnerability is added into the code vulnerability data set.
2. The data collection method for intelligent detection of code vulnerabilities as claimed in claim 1, wherein the results of the code vulnerability detection tool are combined with the judgment of the tester; firstly, integrating detection reports of several different code vulnerability detection tools as a final result of tool detection; then, the testing personnel judges the loophole detected by the tool, and if the loophole is not judged, a false alarm result is recorded; specific data items of the data set include: vulnerability code segment, vulnerability type, whether false report.
3. The method for collecting data oriented to intelligent detection of code vulnerabilities as described in claim 1, characterized in that the code vulnerability data set described in claim 2 is used to train a machine learning model for false positive judgment, and after training is completed, the model can be used to predict a newly given code segment to judge whether the code segment is a false positive vulnerability.
4. The data collection method for intelligent detection of code vulnerabilities as described in claim 1, wherein the vulnerability with false alarm is determined by combining the judgment of a machine learning model and the judgment result of a tester; for each section of code possibly containing a bug, firstly inputting the code into the machine learning model in claim 3, and judging whether the code is a false alarm occurring in the bug detection process; then, according to the judgment result of the model, if the judgment result is false alarm, the code segment is delivered to a tester for inspection; if the code segment is not really a bug after checking, the code segment is added to the initial data set.
CN202010487163.2A 2020-05-27 2020-05-27 Data collection method for code vulnerability intelligent detection Pending CN113742731A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010487163.2A CN113742731A (en) 2020-05-27 2020-05-27 Data collection method for code vulnerability intelligent detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010487163.2A CN113742731A (en) 2020-05-27 2020-05-27 Data collection method for code vulnerability intelligent detection

Publications (1)

Publication Number Publication Date
CN113742731A true CN113742731A (en) 2021-12-03

Family

ID=78727931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010487163.2A Pending CN113742731A (en) 2020-05-27 2020-05-27 Data collection method for code vulnerability intelligent detection

Country Status (1)

Country Link
CN (1) CN113742731A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942497A (en) * 2013-09-11 2014-07-23 杭州安恒信息技术有限公司 Forensics type website vulnerability scanning method and system
CN104486141A (en) * 2014-11-26 2015-04-01 国家电网公司 Misdeclaration self-adapting network safety situation predication method
CN110517469A (en) * 2019-08-08 2019-11-29 武汉兴图新科电子股份有限公司 A kind of intelligent alarm convergence method suitable for audio-video convergence platform
CN110753047A (en) * 2019-10-16 2020-02-04 杭州安恒信息技术股份有限公司 Method for reducing false alarm of vulnerability scanning
CN110929267A (en) * 2019-11-29 2020-03-27 深信服科技股份有限公司 Code vulnerability detection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942497A (en) * 2013-09-11 2014-07-23 杭州安恒信息技术有限公司 Forensics type website vulnerability scanning method and system
CN104486141A (en) * 2014-11-26 2015-04-01 国家电网公司 Misdeclaration self-adapting network safety situation predication method
CN110517469A (en) * 2019-08-08 2019-11-29 武汉兴图新科电子股份有限公司 A kind of intelligent alarm convergence method suitable for audio-video convergence platform
CN110753047A (en) * 2019-10-16 2020-02-04 杭州安恒信息技术股份有限公司 Method for reducing false alarm of vulnerability scanning
CN110929267A (en) * 2019-11-29 2020-03-27 深信服科技股份有限公司 Code vulnerability detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109308411B (en) Method and system for hierarchically detecting software behavior defects based on artificial intelligence decision tree
CN105471882A (en) Behavior characteristics-based network attack detection method and device
CN112733156B (en) Intelligent detection method, system and medium for software vulnerability based on code attribute graph
CN111209570B (en) Method for creating safe closed loop process based on MITER ATT & CK
CN112147221B (en) Steel rail screw hole crack identification method and system based on ultrasonic flaw detector data
CN110309073A (en) Mobile applications user interface mistake automated detection method, system and terminal
CN113392784B (en) Automatic editing method for application security detection task based on vulnerability fingerprint identification
CN115277180B (en) Block chain log anomaly detection and tracing system
Yang et al. Vuldigger: A just-in-time and cost-aware tool for digging vulnerability-contributing changes
CN116578980A (en) Code analysis method and device based on neural network and electronic equipment
CN115952503A (en) Application safety testing method and system integrating black, white and gray safety detection technology
CN117336055A (en) Network abnormal behavior detection method and device, electronic equipment and storage medium
CN116383833A (en) Method and device for testing software program code, electronic equipment and storage medium
CN115964757A (en) Drainage basin environment monitoring and disposal method and device based on block chain
CN113779590B (en) Source code vulnerability detection method based on multidimensional characterization
CN117368651B (en) Comprehensive analysis system and method for faults of power distribution network
CN117114420B (en) Image recognition-based industrial and trade safety accident risk management and control system and method
CN111855825B (en) Rail head nuclear injury identification method and system based on BP neural network
CN113742731A (en) Data collection method for code vulnerability intelligent detection
CN104751059A (en) Function template based software behavior analysis method
CN104035866A (en) Software behavior evaluation method and device based on system calling and analysis
CN115062315A (en) Multi-tool inspection-based security code examination method and system
CN113051161A (en) API misuse detection method based on historical code change information
CN113836539A (en) Power engineering control system leak full-flow disposal system and method based on precise test
CN115795467A (en) Intelligent evaluation method for computer software bugs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination