CN116663019B - Source code vulnerability detection method, device and system - Google Patents

Source code vulnerability detection method, device and system Download PDF

Info

Publication number
CN116663019B
CN116663019B CN202310823880.1A CN202310823880A CN116663019B CN 116663019 B CN116663019 B CN 116663019B CN 202310823880 A CN202310823880 A CN 202310823880A CN 116663019 B CN116663019 B CN 116663019B
Authority
CN
China
Prior art keywords
ast
source code
cnn model
code
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310823880.1A
Other languages
Chinese (zh)
Other versions
CN116663019A (en
Inventor
索雯琪
胡雨涛
吴月明
李珍
邹德清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310823880.1A priority Critical patent/CN116663019B/en
Publication of CN116663019A publication Critical patent/CN116663019A/en
Application granted granted Critical
Publication of CN116663019B publication Critical patent/CN116663019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Virology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a source code vulnerability detection method, device and system, belonging to the technical field of information security, wherein the method comprises the following steps: performing static analysis on the code segments in the training set to obtain corresponding enhanced AST, and converting the enhanced AST into a gray level image corresponding to the state probability matrix; training an original CNN model by using gray images corresponding to code segments in a training set to obtain a target CNN model; converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into a target CNN model to obtain a vulnerability detection result. The application carries out static detection on the codes and further realizes AST expansion, thus being capable of more completely and comprehensively retaining the grammar and semantic information of the program; the method has the advantages that the AST is converted into a picture form to represent the mode while the program structure information is reserved, and then the trained CNN model is utilized to detect the loopholes, so that the detection efficiency can be improved, and the multi-program language can be supported.

Description

Source code vulnerability detection method, device and system
Technical Field
The application belongs to the technical field of information security, and particularly relates to a method, a device and a system for detecting source code loopholes.
Background
In recent years, network security events such as hacker investigation, botnet attack, user information leakage and the like frequently occur, and as an important component of network space, the vulnerability of a software system brings serious security threat to the network space. According to the National Vulnerability Database (NVD) statistics, the number of global vulnerabilities is increasing, the number of security vulnerabilities disclosed by 2021 has reached 20137, and the growth rate also shows an increasing trend. Automated attack and defense has gradually become a trend of research. Under the trend of automatic attack and defense, the discovery and the mining of the loopholes are the most basic stages. Therefore, the method actively discovers the security hole of the system and has important significance for attack and defense.
Common vulnerability detection methods convert the code into an intermediate representation to learn the code characterizations. According to the conversion mode of the source code, the existing research can be divided into four types: text-based detection, token-based detection, syntax tree-based detection, and graph-based detection. The deep learning loophole detection based on the text directly uses the code text as input, but semantic information of the program cannot be accurately grasped; the deep learning vulnerability detection based on token divides each code line into a mark sequence according to lexical rules, but still regards the source code as plain text, and lacks program semantics and context information; syntax tree-based deep learning vulnerability detection represents code with a syntactic structure, such as an parse tree or an Abstract Syntax Tree (AST), which provides more accurate syntax information, but tree analysis is very complex and costly; the deep learning loophole detection based on the graph describes source codes by graphs (PDG, CFG), wherein nodes represent sentences or identifier separators, edges represent control or data dependence, and grammar and semantic information of a program can be completely and comprehensively reserved. However, graphic analysis is time-consuming and difficult to expand. And some graphics (such as PDG) generation needs to be compiled, and can only support C/C++, and cannot be suitable for other languages.
Therefore, the existing intelligent vulnerability detection method cannot be applied to large-scale real software and mainly has the following two defects: 1) Efficiency and accuracy are difficult to achieve; 2) Only one programming language is generally supported, and the method is not applicable to detection of other languages.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the application provides a source code vulnerability detection method, a device and a system, which aim to realize AST expansion by carrying out static detection on codes and can more completely and comprehensively reserve grammar and semantic information of programs; converting AST into a picture form to represent the mode while retaining the program structure information, and further utilizing a trained CNN model to perform vulnerability detection, so that the detection efficiency can be improved, and the multi-program language can be supported; therefore, the technical problems that efficiency and precision are difficult to be complete and compatibility is poor when the vulnerability detection method is applied to large-scale real software are solved.
To achieve the above object, according to one aspect of the present application, there is provided a source code vulnerability detection method, including:
training phase:
s1: aiming at the code segments in the training set, obtaining a corresponding enhanced abstract syntax tree AST through static analysis;
s2: converting the enhanced AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the enhanced AST;
s3: training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
and (3) detection:
s4: converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced AST according to the steps in S1 and S2;
s5: and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
In one embodiment, the S1 includes:
generating AST of the code fragments in the training set through static analysis;
and adding a control stream and a data stream to the AST of the code fragments in the training set to obtain the enhanced AST of the code fragments in the training set.
In one embodiment, the enhanced AST specifies the following types of edges representing data and control flows:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
In one embodiment, the S2 includes:
s21: counting and enhancing information of two nodes connected with one edge of each sub tree in the AST to obtain the times of transferring one state into the other state; establishing an AST-based Markov chain model by counting all state transition conditions;
s22: generating a state transition matrix according to the state transition times recorded in the AST-based Markov chain model;
s23: and converting the state transition matrix into a transition probability matrix, and graying values in the transition probability matrix to obtain a corresponding gray image.
In one embodiment, the step S23 includes:
normalizing all data in the state transition matrix to determine the probability of one state transition to another state, and finally obtaining a transition probability matrix;
and graying the values in the transition probability matrix to obtain a corresponding gray image.
In one embodiment, the state of each subtree includes: statement expressions, call statements, parameter lists, and identifiers.
In one embodiment, the step S5 includes:
inputting the gray level image corresponding to the source code to be detected into the target CNN model;
a vulnerability detection result 1 output by the target CNN model indicates that the source code to be detected has a vulnerability;
and if the vulnerability detection result 0 output by the target CNN model indicates that the source code to be detected is not vulnerability.
According to another aspect of the present application, there is provided a source code vulnerability detection apparatus, including:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the enhanced AST; training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
According to another aspect of the present application there is provided a source code vulnerability detection system comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
According to another aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
In general, the above technical solutions conceived by the present application, compared with the prior art, enable the following beneficial effects to be obtained:
(1) According to the source code vulnerability detection method for large-scale real software, static detection is carried out on codes based on AST to realize AST expansion, and grammar and semantic information of programs can be reserved completely and comprehensively; the method has the advantages that the AST is converted into a picture form to represent the mode while the program structure information is reserved, and then the trained CNN model is utilized to detect the loopholes, so that the detection efficiency can be improved, and the multi-program language can be supported. The application solves the problems of detection efficiency and accuracy by analyzing and enhancing AST, and realizes rapid and accurate large-scale vulnerability detection supporting multiple program languages.
(2) According to the scheme, code semantics and structure information in an AST node are fully utilized, edges representing control flow, data flow and statement execution sequence information are additionally added to expand the AST to generate the enhanced AST, and code features matched with the graph are obtained in a short time. The semantic and grammar information of the program is extracted to the greatest extent while the efficiency is ensured.
(3) The generated enhanced AST is expressed in a Markov chain mode and finally converted into a gray image, the AST is expressed in a simpler mode while the program structure information is maintained, the AST information is fully fused and converted into a picture, and the vulnerability detection is more efficient based on CNN classification. The tool tree-side for extracting AST used in static analysis is a parser generator tool and an incremental parsing library. It can build a specific syntax tree for a source code file and efficiently update the syntax tree when editing a source file. It supports parsing in multiple programming languages including python, java, c, etc. While supporting the use of multiple programming languages. Thus, a small amount of modification is required and can be easily applied to other languages and data sets.
Drawings
Fig. 1 is a schematic diagram of a source code vulnerability detection method for large-scale real software according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a generation process of a source code corresponding enhanced AST according to an embodiment of the present application.
Fig. 3 is a schematic diagram of an embodiment of the present application for enhancing AST conversion into a gray scale image.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, a source code vulnerability detection method is provided, which mainly includes two stages: a training phase and a detection phase.
The purpose of the training phase is to train a target CNN model for analyzing the suspicious nature of the gray scale image generated by AST transformation. The method mainly comprises 3 steps, including obtaining enhanced AST through static analysis, converting the enhanced AST into a state probability matrix, converting the matrix into a gray level image, and training a CNN model by using the gray level image generated by converting the enhanced AST;
the purpose of the detection stage is to classify whether the application to be detected is a vulnerability, wherein the output is 1 and is a vulnerability, and the output is 0 and is a non-vulnerability. Firstly, counting the information of two nodes connected by each side in AST to obtain the times of transferring one state to the other state, and establishing a Markov chain model based on AST and a corresponding state transfer matrix thereof. And converting the values in the transition probability matrix obtained by processing into gray values to obtain corresponding gray images. And finally, detecting the generated gray level image by using the trained CNN model, and judging whether the gray level image is a vulnerability or not.
Among these, convolutional neural network models (Convolutional Neural Networks, CNN) are a type of neural network that is specifically used to process data having a grid-like structure, such as image data (which can be regarded as a two-dimensional grid of pixels). The difference from the fully connected layer is that the upper and lower neurons of CNN are not directly connected, but the parameters of the hidden layer are greatly reduced by the sharing of the "kernel" through the "convolution kernel" as an intermediary. A simple CNN is a series of layers, and each Layer converts one quantity to another by a micro-functional, and these layers mainly include a convolution Layer (Convolutional Layer), a Pooling Layer (Pooling Layer), and a fully-connected Layer (Fully Connected Layer).
An abstract syntax tree (Abstract Syntax Code, AST) is an abstract representation of the source code syntax structure. It represents the syntax structure of a programming language in the form of a tree, each node on the tree representing a structure in the source code. An abstract syntax tree is a sequential tree structure, with internal nodes being operators (e.g., "+" and "=") and leaf nodes being operands (e.g., constants and identifiers). The abstract syntax tree shows in detail how the operands and operators make up the program expressions and statements, and thus shows the overall form of the program.
In one embodiment, S1 comprises: generating AST of the code fragments in the training set through static analysis; and adding a control stream and a data stream to the AST of the code segments in the training set to obtain the enhanced AST of the code segments in the training set.
Wherein, the enhanced AST is constructed by adding various types of edges representing different types of control and data streams to the AST, so as to solve the problem that the AST cannot fully utilize structural information of code fragments, in particular semantic information such as control streams and data streams. Wherein the control flow represents all paths traversed in the execution of a program and reflects the real-time execution of a process. The data stream gathers information about the properties of a particular data item by tracking the possible definition and use of the data. Enhanced AST in a program is presented in the form of a directed multi-graph, where statements, code blocks, or values are nodes in the graph, and direct relationships (e.g., parent-child relationships and other relationships between two nodes) are recorded as edges. Since there may be a plurality of relationships between a pair of nodes, each type of relationship (nine relationships in total) is recorded using a relationship graph. Node connectivity of the relationship graph is encoded as an adjacency matrix. The graphical representation of enhanced AST is purely AST-based and can be easily extended to other programming languages.
In one embodiment, the enhanced AST specifies the following types to represent edges of data and control streams:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
Fig. 2 illustrates the generation of an enhanced AST, taking a buffer overflow vulnerability code as an example. As shown in fig. 2, the enhanced AST specifies the following types of edges representing the data stream; there are several other edges used to represent control flow. Edges representing if, for, while statement control flows and edges representing statement orders are added. The enhanced AST is then converted into a state probability matrix and the matrix into a gray scale image.
Fig. 3 is a schematic diagram of enhanced AST conversion into a grayscale image, in which in one embodiment the grayscale image generation process is divided into three sections altogether: and generating a Markov chain, generating a state transition matrix, generating a transition probability matrix, and finally generating a corresponding gray image.
In the process of generating a Markov chain, firstly, the information of two nodes connected by one edge in an AST is counted, and the number of times that one state is transferred to the other state is obtained. There are four states in total for the subtree as shown in fig. 3: statement expressions, call statements, parameter lists, identifiers Assignment, operator, member Reference, and Identifier. As can be seen from the pointing information of the edge in the AST, the number of transitions of the state parameter list to the state identifier is 3. By counting all state transition conditions, an AST-based Markov chain model is established.
Wherein MC (Markov Chain) is a random process in the state space that goes through a transition from one state to another, which process requires "memoryless". I.e. the probability distribution of the next state can only be determined by the current state, and the events preceding it in the time series are independent of it. This particular type of "memoryless" is known as markov properties. The states of the events can be converted into a probability matrix by model conversion of the Markov chain. The state transition matrix is converted by a certain finite number of times, and finally a stable probability distribution can be obtained, which is irrelevant to the initial state probability distribution.
In the process of generating the state transition matrix, the state transition matrix is generated according to the state transition times recorded in the Markov chain model generated before. As shown in fig. 3, the number of transitions of the state parameter list to the state identifier is 3 according to the records in the markov chain model. In the state Matrix, for convenience of expression, letter A is used for representing a state parameter list, letter I is used for representing a state identifier, and data corresponding to the state transition Matrix [ A ] [ I ] is 3. Thereby generating a corresponding state transition matrix.
In one embodiment, in the process of generating the transition probability matrix, all data are normalized according to the state transition matrix to obtain the probability of one state transition to another state, and the corresponding transition probability matrix is obtained. And then converting the values into gray values to obtain corresponding gray images. And finally, inputting gray images obtained by all training sets into the CNN model to obtain the trained CNN model.
According to another aspect of the present application, there is provided a source code vulnerability detection apparatus, including:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced AST of the code segments in the training set into a gray level image corresponding to the state probability matrix of the enhanced AST; training an original CNN model by using gray images corresponding to code segments in a training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into a target CNN model to obtain a vulnerability detection result.
According to another aspect of the present application there is provided a source code vulnerability detection system comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method described above when executing the computer program.
According to another aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the application and is not intended to limit the application, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (7)

1. A method for detecting source code vulnerabilities, comprising:
training phase:
s1: aiming at the code segments in the training set, obtaining a corresponding enhanced abstract syntax tree AST through static analysis;
s2: converting a state probability matrix corresponding to an enhanced abstract syntax tree AST of the code segments in the training set into a gray level image;
s3: training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
and (3) detection:
s4: converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced abstract syntax tree AST according to the steps in S1 and S2;
s5: inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result;
the S1 comprises the following steps: generating AST of the code fragments in the training set through static analysis; adding a control stream and a data stream to the AST of the code segments in the training set to obtain an enhanced abstract syntax tree AST of the code segments in the training set;
the step S2 comprises the following steps: s21: counting the information of two nodes connected by one edge of each sub tree in the enhanced abstract syntax tree AST to obtain the times of transferring one state to the other state; establishing a Markov chain model based on an enhanced abstract syntax tree AST by counting all state transition conditions; s22: generating a state transition matrix according to the state transition times recorded in the Markov chain model based on the enhanced abstract syntax tree AST; s23: converting the state transition matrix into a transition probability matrix, and graying values in the transition probability matrix to obtain a corresponding gray image;
the S23 includes: normalizing all data in the state transition matrix to determine the probability of one state transition to another state, and finally obtaining a transition probability matrix; and graying the values in the transition probability matrix to obtain a corresponding gray image.
2. The source code vulnerability detection method of claim 1, wherein the enhanced abstract syntax tree AST specifies the following types of edges representing data and control flows:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
3. The source code vulnerability detection method of claim 1, wherein the state of each subtree comprises: statement expressions, call statements, parameter lists, and identifiers.
4. A source code vulnerability detection method as claimed in any one of claims 1-3, wherein S5 comprises:
inputting the gray level image corresponding to the source code to be detected into the target CNN model;
a vulnerability detection result 1 output by the target CNN model indicates that the source code to be detected has a vulnerability;
and if the vulnerability detection result 0 output by the target CNN model indicates that the source code to be detected is not vulnerability.
5. A source code vulnerability detection apparatus for performing the source code vulnerability detection method of any one of claims 1-4, comprising:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced abstract syntax tree AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the AST; training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced abstract syntax tree AST; and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
6. A source code vulnerability detection system comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any one of claims 1-4.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
CN202310823880.1A 2023-07-06 2023-07-06 Source code vulnerability detection method, device and system Active CN116663019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310823880.1A CN116663019B (en) 2023-07-06 2023-07-06 Source code vulnerability detection method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310823880.1A CN116663019B (en) 2023-07-06 2023-07-06 Source code vulnerability detection method, device and system

Publications (2)

Publication Number Publication Date
CN116663019A CN116663019A (en) 2023-08-29
CN116663019B true CN116663019B (en) 2023-10-24

Family

ID=87724225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310823880.1A Active CN116663019B (en) 2023-07-06 2023-07-06 Source code vulnerability detection method, device and system

Country Status (1)

Country Link
CN (1) CN116663019B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435246B (en) * 2023-12-14 2024-03-05 四川大学 Code clone detection method based on Markov chain model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110661778A (en) * 2019-08-14 2020-01-07 中国电力科学研究院有限公司 Method and system for testing industrial control network protocol based on reverse analysis fuzzy
CN113779590A (en) * 2021-09-16 2021-12-10 中国民航大学 Source code vulnerability detection method based on multi-dimensional representation
CN115146282A (en) * 2022-08-31 2022-10-04 中国科学院大学 AST-based source code anomaly detection method and device
CN115600200A (en) * 2022-10-16 2023-01-13 武汉纺织大学(Cn) Android malicious software detection method based on entropy spectrum density and adaptive contraction convolution
CN115913655A (en) * 2022-10-28 2023-04-04 华中科技大学 Shell command injection detection method based on flow analysis and semantic analysis
CN116209997A (en) * 2020-09-28 2023-06-02 埃森哲环球解决方案有限公司 System and method for classifying software vulnerabilities
CN116361797A (en) * 2023-03-28 2023-06-30 山东省计算中心(国家超级计算济南中心) Malicious code detection method and system based on multi-source collaboration and behavior analysis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370473A1 (en) * 2018-05-30 2019-12-05 Nvidia Corporation Detecting vulnerabilities to fault injection in computer code using machine learning
US11568055B2 (en) * 2019-08-23 2023-01-31 Praetorian System and method for automatically detecting a security vulnerability in a source code using a machine learning model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110661778A (en) * 2019-08-14 2020-01-07 中国电力科学研究院有限公司 Method and system for testing industrial control network protocol based on reverse analysis fuzzy
CN116209997A (en) * 2020-09-28 2023-06-02 埃森哲环球解决方案有限公司 System and method for classifying software vulnerabilities
CN113779590A (en) * 2021-09-16 2021-12-10 中国民航大学 Source code vulnerability detection method based on multi-dimensional representation
CN115146282A (en) * 2022-08-31 2022-10-04 中国科学院大学 AST-based source code anomaly detection method and device
CN115600200A (en) * 2022-10-16 2023-01-13 武汉纺织大学(Cn) Android malicious software detection method based on entropy spectrum density and adaptive contraction convolution
CN115913655A (en) * 2022-10-28 2023-04-04 华中科技大学 Shell command injection detection method based on flow analysis and semantic analysis
CN116361797A (en) * 2023-03-28 2023-06-30 山东省计算中心(国家超级计算济南中心) Malicious code detection method and system based on multi-source collaboration and behavior analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于增强AST的图神经网络函数级代码漏洞检测方法;顾守珂 等;《计算机科学》;全文 *
基于源码分析的缓冲区溢出漏洞检测方法;尹茗;张功萱;;江苏大学学报(自然科学版)(04);全文 *
基于自适应模糊测试的IaaS层漏洞挖掘方法;沙乐天;肖甫;杨红柯;喻辉;王汝传;;软件学报(05);全文 *

Also Published As

Publication number Publication date
CN116663019A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN110245496B (en) Source code vulnerability detection method and detector and training method and system thereof
CN111639344B (en) Vulnerability detection method and device based on neural network
CN108647520B (en) Intelligent fuzzy test method and system based on vulnerability learning
CN113360915B (en) Intelligent contract multi-vulnerability detection method and system based on source code diagram representation learning
CN113961922B (en) Malicious software behavior detection and classification system based on deep learning
CN112311780B (en) Method for generating multi-dimensional attack path and attack graph
Sethi et al. DLPaper2Code: Auto-generation of code from deep learning research papers
Lin et al. Deep structured scene parsing by learning with image descriptions
CN116663019B (en) Source code vulnerability detection method, device and system
CN111124487A (en) Code clone detection method and device and electronic equipment
CN109871686A (en) Rogue program recognition methods and device based on icon representation and software action consistency analysis
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN113611356A (en) Drug relocation prediction method based on self-supervision graph representation learning
CN116756327A (en) Threat information relation extraction method and device based on knowledge inference and electronic equipment
CN116340952A (en) Intelligent contract vulnerability detection method based on operation code program dependency graph
CN113760358A (en) Countermeasure sample generation method for source code classification model
CN117370980A (en) Malicious code detection model generation and detection method, device, equipment and medium
CN117272142A (en) Log abnormality detection method and system and electronic equipment
CN117201082A (en) Network intrusion detection method integrating textCNN and GAN
CN116841869A (en) Java code examination comment generation method and device based on code structured information and examination knowledge
CN116578985A (en) Intelligent contract vulnerability detection method based on model independent element learning
CN113420127A (en) Threat information processing method, device, computing equipment and storage medium
CN115454473A (en) Data processing method based on deep learning vulnerability decision and information security system
Su et al. Modeling regex operators for solving regex crossword puzzles
CN111723301A (en) Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant