CN116663019A - Source code vulnerability detection method, device and system - Google Patents
Source code vulnerability detection method, device and system Download PDFInfo
- Publication number
- CN116663019A CN116663019A CN202310823880.1A CN202310823880A CN116663019A CN 116663019 A CN116663019 A CN 116663019A CN 202310823880 A CN202310823880 A CN 202310823880A CN 116663019 A CN116663019 A CN 116663019A
- Authority
- CN
- China
- Prior art keywords
- ast
- source code
- enhanced
- code
- vulnerability detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 59
- 239000011159 matrix material Substances 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 41
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000003068 static effect Effects 0.000 claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 230000007704 transition Effects 0.000 claims description 37
- 239000012634 fragment Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 230000014509 gene expression Effects 0.000 claims description 5
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 description 25
- 230000008569 process Effects 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007123 defense Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Virology (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a source code vulnerability detection method, device and system, belonging to the technical field of information security, wherein the method comprises the following steps: performing static analysis on the code segments in the training set to obtain corresponding enhanced AST, and converting the enhanced AST into a gray level image corresponding to the state probability matrix; training an original CNN model by using gray images corresponding to code segments in a training set to obtain a target CNN model; converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into a target CNN model to obtain a vulnerability detection result. The application carries out static detection on the codes and further realizes AST expansion, thus being capable of more completely and comprehensively retaining the grammar and semantic information of the program; the method has the advantages that the AST is converted into a picture form to represent the mode while the program structure information is reserved, and then the trained CNN model is utilized to detect the loopholes, so that the detection efficiency can be improved, and the multi-program language can be supported.
Description
Technical Field
The application belongs to the technical field of information security, and particularly relates to a method, a device and a system for detecting source code loopholes.
Background
In recent years, network security events such as hacker investigation, botnet attack, user information leakage and the like frequently occur, and as an important component of network space, the vulnerability of a software system brings serious security threat to the network space. According to the National Vulnerability Database (NVD) statistics, the number of global vulnerabilities is increasing, the number of security vulnerabilities disclosed by 2021 has reached 20137, and the growth rate also shows an increasing trend. Automated attack and defense has gradually become a trend of research. Under the trend of automatic attack and defense, the discovery and the mining of the loopholes are the most basic stages. Therefore, the method actively discovers the security hole of the system and has important significance for attack and defense.
Common vulnerability detection methods convert the code into an intermediate representation to learn the code characterizations. According to the conversion mode of the source code, the existing research can be divided into four types: text-based detection, token-based detection, syntax tree-based detection, and graph-based detection. The deep learning loophole detection based on the text directly uses the code text as input, but semantic information of the program cannot be accurately grasped; the deep learning vulnerability detection based on token divides each code line into a mark sequence according to lexical rules, but still regards the source code as plain text, and lacks program semantics and context information; syntax tree-based deep learning vulnerability detection represents code with a syntactic structure, such as an parse tree or an Abstract Syntax Tree (AST), which provides more accurate syntax information, but tree analysis is very complex and costly; the deep learning loophole detection based on the graph describes source codes by graphs (PDG, CFG), wherein nodes represent sentences or identifier separators, edges represent control or data dependence, and grammar and semantic information of a program can be completely and comprehensively reserved. However, graphic analysis is time-consuming and difficult to expand. And some graphics (such as PDG) generation needs to be compiled, and can only support C/C++, and cannot be suitable for other languages.
Therefore, the existing intelligent vulnerability detection method cannot be applied to large-scale real software and mainly has the following two defects: 1) Efficiency and accuracy are difficult to achieve; 2) Only one programming language is generally supported, and the method is not applicable to detection of other languages.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the application provides a source code vulnerability detection method, a device and a system, which aim to realize AST expansion by carrying out static detection on codes and can more completely and comprehensively reserve grammar and semantic information of programs; converting AST into a picture form to represent the mode while retaining the program structure information, and further utilizing a trained CNN model to perform vulnerability detection, so that the detection efficiency can be improved, and the multi-program language can be supported; therefore, the technical problems that efficiency and precision are difficult to be complete and compatibility is poor when the vulnerability detection method is applied to large-scale real software are solved.
To achieve the above object, according to one aspect of the present application, there is provided a source code vulnerability detection method, including:
training phase:
s1: aiming at the code segments in the training set, obtaining a corresponding enhanced abstract syntax tree AST through static analysis;
s2: converting the enhanced AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the enhanced AST;
s3: training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
and (3) detection:
s4: converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced AST according to the steps in S1 and S2;
s5: and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
In one embodiment, the S1 includes:
generating AST of the code fragments in the training set through static analysis;
and adding a control stream and a data stream to the AST of the code fragments in the training set to obtain the enhanced AST of the code fragments in the training set.
In one embodiment, the enhanced AST specifies the following types of edges representing data and control flows:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
In one embodiment, the S2 includes:
s21: counting and enhancing information of two nodes connected with one edge of each sub tree in the AST to obtain the times of transferring one state into the other state; establishing an AST-based Markov chain model by counting all state transition conditions;
s22: generating a state transition matrix according to the state transition times recorded in the AST-based Markov chain model;
s23: and converting the state transition matrix into a transition probability matrix, and graying values in the transition probability matrix to obtain a corresponding gray image.
In one embodiment, the step S23 includes:
normalizing all data in the state transition matrix to determine the probability of one state transition to another state, and finally obtaining a transition probability matrix;
and graying the values in the transition probability matrix to obtain a corresponding gray image.
In one embodiment, the state of each subtree includes: statement expressions, call statements, parameter lists, and identifiers.
In one embodiment, the step S5 includes:
inputting the gray level image corresponding to the source code to be detected into the target CNN model;
a vulnerability detection result 1 output by the target CNN model indicates that the source code to be detected has a vulnerability;
and if the vulnerability detection result 0 output by the target CNN model indicates that the source code to be detected is not vulnerability.
According to another aspect of the present application, there is provided a source code vulnerability detection apparatus, including:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the enhanced AST; training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
According to another aspect of the present application there is provided a source code vulnerability detection system comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
According to another aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
In general, the above technical solutions conceived by the present application, compared with the prior art, enable the following beneficial effects to be obtained:
(1) According to the source code vulnerability detection method for large-scale real software, static detection is carried out on codes based on AST to realize AST expansion, and grammar and semantic information of programs can be reserved completely and comprehensively; the method has the advantages that the AST is converted into a picture form to represent the mode while the program structure information is reserved, and then the trained CNN model is utilized to detect the loopholes, so that the detection efficiency can be improved, and the multi-program language can be supported. The application solves the problems of detection efficiency and accuracy by analyzing and enhancing AST, and realizes rapid and accurate large-scale vulnerability detection supporting multiple program languages.
(2) According to the scheme, code semantics and structure information in an AST node are fully utilized, edges representing control flow, data flow and statement execution sequence information are additionally added to expand the AST to generate the enhanced AST, and code features matched with the graph are obtained in a short time. The semantic and grammar information of the program is extracted to the greatest extent while the efficiency is ensured.
(3) The generated enhanced AST is expressed in a Markov chain mode and finally converted into a gray image, the AST is expressed in a simpler mode while the program structure information is maintained, the AST information is fully fused and converted into a picture, and the vulnerability detection is more efficient based on CNN classification. The tool tree-side for extracting AST used in static analysis is a parser generator tool and an incremental parsing library. It can build a specific syntax tree for a source code file and efficiently update the syntax tree when editing a source file. It supports parsing in multiple programming languages including python, java, c, etc. While supporting the use of multiple programming languages. Thus, a small amount of modification is required and can be easily applied to other languages and data sets.
Drawings
Fig. 1 is a schematic diagram of a source code vulnerability detection method for large-scale real software according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a generation process of a source code corresponding enhanced AST according to an embodiment of the present application.
Fig. 3 is a schematic diagram of an embodiment of the present application for enhancing AST conversion into a gray scale image.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, a source code vulnerability detection method is provided, which mainly includes two stages: a training phase and a detection phase.
The purpose of the training phase is to train a target CNN model for analyzing the suspicious nature of the gray scale image generated by AST transformation. The method mainly comprises 3 steps, including obtaining enhanced AST through static analysis, converting the enhanced AST into a state probability matrix, converting the matrix into a gray level image, and training a CNN model by using the gray level image generated by converting the enhanced AST;
the purpose of the detection stage is to classify whether the application to be detected is a vulnerability, wherein the output is 1 and is a vulnerability, and the output is 0 and is a non-vulnerability. Firstly, counting the information of two nodes connected by each side in AST to obtain the times of transferring one state to the other state, and establishing a Markov chain model based on AST and a corresponding state transfer matrix thereof. And converting the values in the transition probability matrix obtained by processing into gray values to obtain corresponding gray images. And finally, detecting the generated gray level image by using the trained CNN model, and judging whether the gray level image is a vulnerability or not.
Among these, convolutional neural network models (Convolutional Neural Networks, CNN) are a type of neural network that is specifically used to process data having a grid-like structure, such as image data (which can be regarded as a two-dimensional grid of pixels). The difference from the fully connected layer is that the upper and lower neurons of CNN are not directly connected, but the parameters of the hidden layer are greatly reduced by the sharing of the "kernel" through the "convolution kernel" as an intermediary. A simple CNN is a series of layers, and each Layer converts one quantity to another by a micro-functional, and these layers mainly include a convolution Layer (Convolutional Layer), a Pooling Layer (Pooling Layer), and a fully-connected Layer (Fully Connected Layer).
An abstract syntax tree (Abstract Syntax Code, AST) is an abstract representation of the source code syntax structure. It represents the syntax structure of a programming language in the form of a tree, each node on the tree representing a structure in the source code. An abstract syntax tree is a sequential tree structure, with internal nodes being operators (e.g., "+" and "=") and leaf nodes being operands (e.g., constants and identifiers). The abstract syntax tree shows in detail how the operands and operators make up the program expressions and statements, and thus shows the overall form of the program.
In one embodiment, S1 comprises: generating AST of the code fragments in the training set through static analysis; and adding a control stream and a data stream to the AST of the code segments in the training set to obtain the enhanced AST of the code segments in the training set.
Wherein, the enhanced AST is constructed by adding various types of edges representing different types of control and data streams to the AST, so as to solve the problem that the AST cannot fully utilize structural information of code fragments, in particular semantic information such as control streams and data streams. Wherein the control flow represents all paths traversed in the execution of a program and reflects the real-time execution of a process. The data stream gathers information about the properties of a particular data item by tracking the possible definition and use of the data. Enhanced AST in a program is presented in the form of a directed multi-graph, where statements, code blocks, or values are nodes in the graph, and direct relationships (e.g., parent-child relationships and other relationships between two nodes) are recorded as edges. Since there may be a plurality of relationships between a pair of nodes, each type of relationship (nine relationships in total) is recorded using a relationship graph. Node connectivity of the relationship graph is encoded as an adjacency matrix. The graphical representation of enhanced AST is purely AST-based and can be easily extended to other programming languages.
In one embodiment, the enhanced AST specifies the following types to represent edges of data and control streams:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
Fig. 2 illustrates the generation of an enhanced AST, taking a buffer overflow vulnerability code as an example. As shown in fig. 2, the enhanced AST specifies the following types of edges representing the data stream; there are several other edges used to represent control flow. Edges representing if, for, while statement control flows and edges representing statement orders are added. The enhanced AST is then converted into a state probability matrix and the matrix into a gray scale image.
Fig. 3 is a schematic diagram of enhanced AST conversion into a grayscale image, in which in one embodiment the grayscale image generation process is divided into three sections altogether: and generating a Markov chain, generating a state transition matrix, generating a transition probability matrix, and finally generating a corresponding gray image.
In the process of generating a Markov chain, firstly, the information of two nodes connected by one edge in an AST is counted, and the number of times that one state is transferred to the other state is obtained. There are four states in total for the subtree as shown in fig. 3: statement expressions, call statements, parameter lists, identifiers Assignment, operator, member Reference, and Identifier. As can be seen from the pointing information of the edge in the AST, the number of transitions of the state parameter list to the state identifier is 3. By counting all state transition conditions, an AST-based Markov chain model is established.
Wherein MC (Markov Chain) is a random process in the state space that goes through a transition from one state to another, which process requires "memoryless". I.e. the probability distribution of the next state can only be determined by the current state, and the events preceding it in the time series are independent of it. This particular type of "memoryless" is known as markov properties. The states of the events can be converted into a probability matrix by model conversion of the Markov chain. The state transition matrix is converted by a certain finite number of times, and finally a stable probability distribution can be obtained, which is irrelevant to the initial state probability distribution.
In the process of generating the state transition matrix, the state transition matrix is generated according to the state transition times recorded in the Markov chain model generated before. As shown in fig. 3, the number of transitions of the state parameter list to the state identifier is 3 according to the records in the markov chain model. In the state Matrix, for convenience of expression, letter A is used for representing a state parameter list, letter I is used for representing a state identifier, and data corresponding to the state transition Matrix [ A ] [ I ] is 3. Thereby generating a corresponding state transition matrix.
In one embodiment, in the process of generating the transition probability matrix, all data are normalized according to the state transition matrix to obtain the probability of one state transition to another state, and the corresponding transition probability matrix is obtained. And then converting the values into gray values to obtain corresponding gray images. And finally, inputting gray images obtained by all training sets into the CNN model to obtain the trained CNN model.
According to another aspect of the present application, there is provided a source code vulnerability detection apparatus, including:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced AST of the code segments in the training set into a gray level image corresponding to the state probability matrix of the enhanced AST; training an original CNN model by using gray images corresponding to code segments in a training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into a target CNN model to obtain a vulnerability detection result.
According to another aspect of the present application there is provided a source code vulnerability detection system comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method described above when executing the computer program.
According to another aspect of the present application there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the application and is not intended to limit the application, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.
Claims (10)
1. A method for detecting source code vulnerabilities, comprising:
training phase:
s1: aiming at the code segments in the training set, obtaining a corresponding enhanced abstract syntax tree AST through static analysis;
s2: converting a state probability matrix corresponding to the enhanced AST of the code segment in the training set into a gray image;
s3: training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
and (3) detection:
s4: converting the source code to be detected into a gray level image of a state probability matrix corresponding to the enhanced AST according to the steps in S1 and S2;
s5: and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
2. The source code vulnerability detection method of claim 1, wherein S1 comprises:
generating AST of the code fragments in the training set through static analysis;
and adding a control stream and a data stream to the AST of the code fragments in the training set to obtain the enhanced AST of the code fragments in the training set.
3. The source code vulnerability detection method of claim 2, wherein the enhanced AST specifies the following types represent edges of data and control flows:
father-son relationship: according to AST rule, connecting non-terminal node to all sub-nodes;
sibling relationship: connecting a node to its sibling node;
the following identification: connecting a terminal node to the next terminal node;
data flow: connecting nodes used by one variable and nodes appearing next time;
control flow: sides representing if, for, while statement control flow and sides representing statement order.
4. The source code vulnerability detection method of claim 1, wherein S2 comprises:
s21: counting and enhancing information of two nodes connected with one edge of each sub tree in the AST to obtain the times of transferring one state into the other state; establishing an AST-based Markov chain model by counting all state transition conditions;
s22: generating a state transition matrix according to the state transition times recorded in the AST-based Markov chain model;
s23: and converting the state transition matrix into a transition probability matrix, and graying values in the transition probability matrix to obtain a corresponding gray image.
5. The source code vulnerability detection method of claim 4, wherein S23 comprises:
normalizing all data in the state transition matrix to determine the probability of one state transition to another state, and finally obtaining a transition probability matrix;
and graying the values in the transition probability matrix to obtain a corresponding gray image.
6. The source code vulnerability detection method of claim 4, wherein the state of each subtree comprises: statement expressions, call statements, parameter lists, and identifiers.
7. The source code vulnerability detection method of any one of claims 1-6, wherein S5 comprises:
inputting the gray level image corresponding to the source code to be detected into the target CNN model;
a vulnerability detection result 1 output by the target CNN model indicates that the source code to be detected has a vulnerability;
and if the vulnerability detection result 0 output by the target CNN model indicates that the source code to be detected is not vulnerability.
8. A source code vulnerability detection apparatus, comprising:
the training module is used for acquiring a corresponding enhanced abstract syntax tree AST through static analysis aiming at the code fragments in the training set; converting the enhanced AST of the code segments in the training set into a gray level image corresponding to a state probability matrix of the enhanced AST; training an original CNN model by using the gray level image corresponding to the code segment in the training set to obtain a target CNN model;
the detection module is used for converting the source code to be detected into a gray image of a state probability matrix corresponding to the enhanced AST; and inputting the gray level image corresponding to the source code to be detected into the target CNN model to obtain a vulnerability detection result.
9. A source code vulnerability detection system comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310823880.1A CN116663019B (en) | 2023-07-06 | 2023-07-06 | Source code vulnerability detection method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310823880.1A CN116663019B (en) | 2023-07-06 | 2023-07-06 | Source code vulnerability detection method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116663019A true CN116663019A (en) | 2023-08-29 |
CN116663019B CN116663019B (en) | 2023-10-24 |
Family
ID=87724225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310823880.1A Active CN116663019B (en) | 2023-07-06 | 2023-07-06 | Source code vulnerability detection method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116663019B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435246A (en) * | 2023-12-14 | 2024-01-23 | 四川大学 | Code clone detection method based on Markov chain model |
CN118536122A (en) * | 2024-05-16 | 2024-08-23 | 北京云弈科技有限公司 | Source code vulnerability detection method, system, equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190370473A1 (en) * | 2018-05-30 | 2019-12-05 | Nvidia Corporation | Detecting vulnerabilities to fault injection in computer code using machine learning |
CN110661778A (en) * | 2019-08-14 | 2020-01-07 | 中国电力科学研究院有限公司 | Method and system for testing industrial control network protocol based on reverse analysis fuzzy |
US20210056211A1 (en) * | 2019-08-23 | 2021-02-25 | Praetorian | System and method for automatically detecting a security vulnerability in a source code using a machine learning model |
CN113779590A (en) * | 2021-09-16 | 2021-12-10 | 中国民航大学 | Source code vulnerability detection method based on multi-dimensional representation |
CN115146282A (en) * | 2022-08-31 | 2022-10-04 | 中国科学院大学 | AST-based source code anomaly detection method and device |
CN115600200A (en) * | 2022-10-16 | 2023-01-13 | 武汉纺织大学(Cn) | Android malicious software detection method based on entropy spectrum density and adaptive contraction convolution |
CN115913655A (en) * | 2022-10-28 | 2023-04-04 | 华中科技大学 | Shell command injection detection method based on flow analysis and semantic analysis |
CN116209997A (en) * | 2020-09-28 | 2023-06-02 | 埃森哲环球解决方案有限公司 | System and method for classifying software vulnerabilities |
CN116361797A (en) * | 2023-03-28 | 2023-06-30 | 山东省计算中心(国家超级计算济南中心) | Malicious code detection method and system based on multi-source collaboration and behavior analysis |
-
2023
- 2023-07-06 CN CN202310823880.1A patent/CN116663019B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190370473A1 (en) * | 2018-05-30 | 2019-12-05 | Nvidia Corporation | Detecting vulnerabilities to fault injection in computer code using machine learning |
CN110661778A (en) * | 2019-08-14 | 2020-01-07 | 中国电力科学研究院有限公司 | Method and system for testing industrial control network protocol based on reverse analysis fuzzy |
US20210056211A1 (en) * | 2019-08-23 | 2021-02-25 | Praetorian | System and method for automatically detecting a security vulnerability in a source code using a machine learning model |
CN116209997A (en) * | 2020-09-28 | 2023-06-02 | 埃森哲环球解决方案有限公司 | System and method for classifying software vulnerabilities |
CN113779590A (en) * | 2021-09-16 | 2021-12-10 | 中国民航大学 | Source code vulnerability detection method based on multi-dimensional representation |
CN115146282A (en) * | 2022-08-31 | 2022-10-04 | 中国科学院大学 | AST-based source code anomaly detection method and device |
CN115600200A (en) * | 2022-10-16 | 2023-01-13 | 武汉纺织大学(Cn) | Android malicious software detection method based on entropy spectrum density and adaptive contraction convolution |
CN115913655A (en) * | 2022-10-28 | 2023-04-04 | 华中科技大学 | Shell command injection detection method based on flow analysis and semantic analysis |
CN116361797A (en) * | 2023-03-28 | 2023-06-30 | 山东省计算中心(国家超级计算济南中心) | Malicious code detection method and system based on multi-source collaboration and behavior analysis |
Non-Patent Citations (3)
Title |
---|
尹茗;张功萱;: "基于源码分析的缓冲区溢出漏洞检测方法", 江苏大学学报(自然科学版), no. 04 * |
沙乐天;肖甫;杨红柯;喻辉;王汝传;: "基于自适应模糊测试的IaaS层漏洞挖掘方法", 软件学报, no. 05 * |
顾守珂 等: "基于增强AST的图神经网络函数级代码漏洞检测方法", 《计算机科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435246A (en) * | 2023-12-14 | 2024-01-23 | 四川大学 | Code clone detection method based on Markov chain model |
CN117435246B (en) * | 2023-12-14 | 2024-03-05 | 四川大学 | Code clone detection method based on Markov chain model |
CN118536122A (en) * | 2024-05-16 | 2024-08-23 | 北京云弈科技有限公司 | Source code vulnerability detection method, system, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116663019B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245496B (en) | Source code vulnerability detection method and detector and training method and system thereof | |
CN116663019B (en) | Source code vulnerability detection method, device and system | |
CN111639344B (en) | Vulnerability detection method and device based on neural network | |
CN108647520B (en) | Intelligent fuzzy test method and system based on vulnerability learning | |
CN113961922B (en) | Malicious software behavior detection and classification system based on deep learning | |
Sethi et al. | DLPaper2Code: Auto-generation of code from deep learning research papers | |
Lin et al. | Deep structured scene parsing by learning with image descriptions | |
CN111124487A (en) | Code clone detection method and device and electronic equipment | |
CN109871686A (en) | Rogue program recognition methods and device based on icon representation and software action consistency analysis | |
CN113611356A (en) | Drug relocation prediction method based on self-supervision graph representation learning | |
CN116361788A (en) | Binary software vulnerability prediction method based on machine learning | |
CN117201082A (en) | Network intrusion detection method integrating textCNN and GAN | |
CN113760358A (en) | Countermeasure sample generation method for source code classification model | |
CN117370980A (en) | Malicious code detection model generation and detection method, device, equipment and medium | |
CN116578985A (en) | Intelligent contract vulnerability detection method based on model independent element learning | |
CN115454473A (en) | Data processing method based on deep learning vulnerability decision and information security system | |
CN113704108A (en) | Similar code detection method and device, electronic equipment and storage medium | |
CN115033883B (en) | Intelligent contract vulnerability detection method and system based on strategy Fuzzer | |
CN115879868B (en) | Expert system and deep learning integrated intelligent contract security audit method | |
CN112162745B (en) | API (application program interface) -based program synthesis method using probability model | |
CN112256838B (en) | Similar domain name searching method and device and electronic equipment | |
CN117435246B (en) | Code clone detection method based on Markov chain model | |
Zhang et al. | Class-based Core Feature Extraction Network for Few-shot Classification | |
Li | [Retracted] Application of Artificial Intelligence Technology in Computer Network Security Communication | |
Hao et al. | Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |