CN115935372A - Vulnerability detection method based on graph embedding and bidirectional gated graph neural network - Google Patents

Vulnerability detection method based on graph embedding and bidirectional gated graph neural network Download PDF

Info

Publication number
CN115935372A
CN115935372A CN202211470625.5A CN202211470625A CN115935372A CN 115935372 A CN115935372 A CN 115935372A CN 202211470625 A CN202211470625 A CN 202211470625A CN 115935372 A CN115935372 A CN 115935372A
Authority
CN
China
Prior art keywords
graph
vector
text
neural network
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211470625.5A
Other languages
Chinese (zh)
Inventor
俞东进
黄琛
王思轩
金宝清
程淑涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202211470625.5A priority Critical patent/CN115935372A/en
Publication of CN115935372A publication Critical patent/CN115935372A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a vulnerability detection method based on graph embedding and a bidirectional gated graph neural network. The method comprises the steps of firstly obtaining function-level source codes with holes and source codes without holes extracted from source codes, converting all the source codes into a program dependency graph by using a source code analysis tool, converting the program dependency graph into a graph embedding representation of the codes by using an improved node2vec method, wherein the representation not only contains graph structure information of the source codes, but also contains text structure information of the source codes, so that the capability of representing nonlinear information of features is improved to a certain extent, and finally performing deep learning training on the preprocessed code embedding through a bidirectional gated graph neural network model. And applying the training result to the target program, and detecting and evaluating the code vulnerability of the target program.

Description

Vulnerability detection method based on graph embedding and bidirectional gated graph neural network
Technical Field
The invention relates to the field of preprocessing of source codes and vulnerability detection in software programs, in particular to a code vulnerability detection method based on graph embedding and a bidirectional gated graph neural network.
Background
Software bugs are responsible for many system attacks and data leakage events. Machine learning is a viable means of identifying common software vulnerabilities by building tools and models. Since different vulnerabilities may exhibit similar underlying patterns, machine learning may first learn the underlying patterns of a vulnerability program expression from training samples and then apply these patterns to new software projects to identify potential vulnerability code.
Recently, researchers have learned the program structure of source code using deep learning to identify potential software bugs in the source code. Compared with the classic machine learning technology, the deep learning technology has the advantage that structural features can be automatically learned from training samples, and experts are not required to participate in the process of manually optimizing the program structure. Existing deep learning-based procedural modeling approaches typically use a Recurrent Neural Network (RNN), such as Long Short Term Memory (LSTM) or variants thereof. However, LSTM are designed for sequential sequences and are not suitable for program structure control and data flow modeling. Therefore, previous LSTM-based methods can only capture shallow, superficial structural or grammatical information of the source code text, and cannot adequately learn the semantic features of more significant information in the program structure.
In order to better perform feature learning on a complex code structure, the invention provides a method which can directly operate a program structure diagram and learn semantic information from the diagram. Doing so would allow the model to hold a large amount of control-dependent and data-dependent information to capture the underlying code structure of many software vulnerabilities. Aiming at the problems and the practical significance, the invention improves the capability of data preprocessing, fully learns the graph structure information of the code, and trains the bidirectional gate control graph neural network (BGGNN) by optimizing parameter setting so as to realize better detection performance.
Disclosure of Invention
In order to solve the problem that the existing static vulnerability mining method cannot effectively represent the nonlinear semantic information in the code graph structure and effectively improve the neural network model effect, the invention provides a vulnerability detection method based on graph embedding and a bidirectional gated graph neural network, which can effectively solve the problem. The technical scheme adopted by the invention is as follows:
a vulnerability detection method based on graph embedding and a bidirectional gated graph neural network comprises the following steps:
s1, acquiring and labeling a data set, and specifically comprising the following substeps:
s11, acquiring a source code data set, and extracting function-level source codes containing holes and source codes without holes from the source code data set, wherein the function-level source codes and the source codes do not contain holes and comprise k functions;
s12, whether each function contains the vulnerability is marked, and a mark Y of each function file is obtained i ∈{0,1},i∈[1,k]Where 0 indicates that no holes exist and 1 indicates that holes exist.
And S2, generating a program dependency graph, and obtaining a program dependency graph set G = { V, E } corresponding to all source codes in the whole project, wherein V represents a set of nodes, and E represents a set of edges. The method specifically comprises the following substeps:
and S21, after the source code is imported into a source code analysis tool, using a query statement as input according to a function name in the source code. Generating a Program Dependency Graph (PDG) corresponding to the function name and outputting the PDG as a dot type graph description file;
and S22, mapping the variable names and function names defined by the user to the symbol names in a PDG graph description file in a one-to-one mode by using a uniform variable name mapping mode to obtain the preprocessed PDG graph.
And S3, extracting required side information and node text information for all the program dependency graphs. The method specifically comprises the following substeps:
s31, extracting the directed edge relation E between the nodes in the dot file in a regular matching mode ij =V i →V j Acquiring all directed edge sets and storing the directed edge sets as text files;
s32, extracting a code text V corresponding to the node ID in the dot file in a regular matching mode i =[Text 1 ,Text 2 ,...,Text n ]And acquiring all node text sets and storing the node text sets as dictionary files.
S4, using node2vec to perform feature training to obtain a feature vector dictionary, and specifically comprising the following substeps:
s41, the Text file which is preprocessed in S31 and stored with the directed edges is used as input, sampling strategy parameters in a node2vec model are reasonably set, text features are trained, and a minimum Text unit Text is output i Corresponding vector _ t i ,i∈[1,n];
S42, storing all output text feature vectors by using a dictionary, wherein the dictionary is Dict _ t = { [ U ]) i∈[1,n] {key:Text i ,value:vector_t i };
S43, the text file which is preprocessed in S31 and stored with the directed edges is used as input, sampling strategy parameters in a node2vec model are reasonably set, and the node is identified as a unique node ID i Instead of the text attributes described above, the node dependency features are trained, and the dependency feature vector _ n between the output graph nodes i ,i∈[1,m];
S44, storing all output node dependence characteristic vectors by using a dictionary, wherein the dictionary is Dict _ n = $ U i∈[1,m] {key:ID i ,value:vector_n i }。
S5, based on the text feature vector and the edge feature vector obtained by training in S4, converting all PDGs into matrix representation of feature vectors at function level, and specifically comprising the following substeps:
s51, merging Text description representing nodes into one line, splitting a character string into a plurality of texts, and converting the Text attributes of the nodes into corresponding embedded vectors nodeTextvec based on a Text vector dictionary Dict _ t obtained in S42 i =[vector_t i1 ,vector_t i2 ,...,vector_t in ]So as to obtain a text vector of each node;
s52. A directed edge represents that it has a pair of head node and tail node, by using the ID of the two nodes s ,ID e The node ID dictionary obtained in the step S44 is inquired as a key to obtain a head node vector _ n s And tail node vector _ n e
S53, head section is connectedSubtracting the point vector and the tail node vector to obtain an embedded vector v corresponding to a directed edge s→e =vector_n e -vector_n s . The above processing is carried out on each directed edge in the list of each program dependency graph to obtain edge vectors in all PDGs
Figure BDA0003958389310000041
And S54, encapsulating the node text vector and the edge vector into a JSON file corresponding to the program dependency graph as the input of a subsequent neural network model. The JSON file can be viewed as a combination of an N × 16 two-dimensional vector matrix and an M × 16 two-dimensional vector matrix, where N represents the number of nodes in a program dependency graph and M represents the number of edges.
S6, taking the JSON files output by the S5 as input, training a neural network model of the bidirectional gate control diagram, and specifically comprising the following substeps:
s61, segmenting a training set and a testing set: selecting d% of data samples in the JSON file data set generated in S53 as a training set, and the rest as a test set;
s62, learning characteristic data contained in the data set by applying a bidirectional gate control graph neural network (BGGNN). BGGNNs consist of two directional Gated Graphs Neural Networks (GGNN): one is a forward direction L 1 Gated Graph Neural Network (GGNN) of layers 1 Accepting a forward input; the other is reverse L 2 Gated Graph Neural Network (GGNN) of layers 2 Learning the input in reverse, the formula is expressed as:
Figure BDA0003958389310000042
Figure BDA0003958389310000043
Figure BDA0003958389310000044
in the above formula, y t Is the output of the model and is,
Figure BDA0003958389310000051
is output forward, is asserted>
Figure BDA0003958389310000052
Is the reverse output;
and S63, carrying out iteration training for l times based on the network, and storing the neural network Model after the training is finished so as to facilitate rapid Model loading at a later stage.
S7, carrying out code vulnerability detection on the target program, and specifically comprising the following substeps:
s71, firstly, preprocessing a target program source code as in the steps of S2 and S3 to obtain a preprocessed PDG;
s72, performing the step of converting the PDG to the characteristic vector matrix in S5 on the basis of the pre-trained dictionaries Dict _ t and Dict _ n in S4, and storing the converted PDG as a JSON file of a target program;
s73, multiplexing the neural network Model generated in the S6, and taking a function-level eigenvector matrix generated by the target program as input to perform function-level code vulnerability detection;
s74, the method outputs a list to list the function names of the potential code bugs existing in the target program so as to allow relevant personnel to check and perfect the program.
Preferably, the positive direction L in step S62 1 Gated Graph Neural Network (GGNN) of layers 1 ,L 1 And taking 3.
Preferably, the reverse direction of L in step S62 2 Gated Graph Neural Network (GGNN) of layers 2 ,L 2 And taking 3.
Preferably, the iterative training is performed for l times, i is 150, as described in step S63.
The invention has the following beneficial effects:
the code vulnerability detection method based on graph embedding learning uses a real vulnerability data set to extract control flow and dependency relationship from source codes, so that the information expressed by the codes is more specific and comprehensive. After the code is subjected to node2vec training, the graph embedding information and the text embedding information of the code are learned. The two-way gated graph neural network BGGNN model architecture is used as a classifier, wherein a cycle structure can effectively learn neighborhood information of graph nodes, good performance is achieved, an objective function is optimized by using random gradient rise, and the feature resolution capability is effectively improved. By properly adopting some training skills, ideal network parameters, an optimization algorithm and the setting of the learning rate are selected, the network is more stable, the result is more reliable, and the accuracy of code vulnerability detection is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a neural network for training a two-way gating pattern for vulnerability detection according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
The embodiment provides a code vulnerability detection method based on graph embedding and a two-way gated graph neural network, as shown in fig. 1, comprising the following steps:
s1, acquiring and labeling a data set, and specifically comprising the following substeps:
s11, acquiring a source code data set, and extracting function-level source codes containing holes and source codes without holes from the source code data set, wherein the function-level source codes and the source codes do not contain holes and comprise k functions;
s12, whether each function contains the vulnerability is marked, and a mark Y of each function file is obtained i ∈{0,1},i∈[1,k]Where 0 represents that there is no vulnerability present,1 indicates the presence of a vulnerability.
And S2, generating a program dependency graph, and obtaining a program dependency graph set G = { V, E } corresponding to all source codes in the whole project, wherein V represents a set of nodes, and E represents a set of edges. The method specifically comprises the following substeps:
and S21, after the source code is imported into a source code analysis tool, using a query statement as input according to a function name in the source code. Generating a Program Dependency Graph (PDG) corresponding to the function name and outputting the PDG as a graph description file of the dot type;
and S22, mapping the variable names and function names defined by the user to the symbol names in a PDG graph description file in a one-to-one mode by using a uniform variable name mapping mode to obtain the preprocessed PDG graph.
And S3, extracting required side information and node text information for all the program dependency graphs. The method specifically comprises the following substeps:
s31, extracting the directed edge relation E between the nodes in the dot file in a regular matching mode ij =V i →V j Acquiring all directed edge sets and storing the directed edge sets as text files;
s32, extracting a code text V corresponding to the node ID in the dot file in a regular matching mode i =[Text 1 ,Text 2 ,...,Text n ]And acquiring all node text sets and storing the node text sets as dictionary files.
S4, using node2vec to perform feature training to obtain a feature vector dictionary, and specifically comprising the following substeps:
s41, taking the Text file which is stored with the directed edges and is preprocessed in S31 as input, reasonably setting sampling strategy parameters in a node2vec model, training Text characteristics, and outputting a minimum Text unit Text i Corresponding vector _ t i ,i∈[1,n];
S42, storing all output text feature vectors by using a dictionary, wherein the dictionary is Dict _ t = { [ U ]) i∈[1,n] {key:Text i ,value:vector_t i };
S43, taking the text file which is preprocessed in the S31 and stores the directed edges as the text fileInputting, reasonably setting sampling strategy parameters in the node2vec model, and identifying the node as a unique node ID i Instead of the text attributes described above, the node dependency features are trained, and the dependency feature vector _ n between the output graph nodes i ,i∈[1,m];
S44, storing all output node dependence characteristic vectors by using a dictionary, wherein the dictionary is Dict _ n = $ U i∈[1,m] {key:ID i ,value:vector_n i }。
In this embodiment, in S41 and S43, parameters in the node2vec model are set reasonably, and the specific parameters include walk _ length, num _ walks, p, q, and window _ size, which respectively represent the walk length, the walk frequency, the probability of accessing the previous node, whether the walk direction is partial depth-first or breadth-first (q <1 is partial depth-first, q >1 is partial breadth-first), and the window size, where walk _ length takes 10, num \_walks takes 10, p takes 0.1, q takes 0.8, and window_size takes 5.
S5, based on the text feature vector and the edge feature vector obtained by training in S4, converting all PDGs into matrix representation of feature vectors at function level, and specifically comprising the following substeps:
s51, merging Text description representing nodes into one line, splitting a character string into a plurality of texts, and converting the Text attributes of the nodes into corresponding embedded vectors nodeTextvec based on a Text vector dictionary Dict _ t obtained in S42 i =[vector_t i1 ,vector_t i2 ,...,vector_t in ]So as to obtain a text vector of each node;
s52. A directed edge represents that a pair of head node and tail node exist, and the ID of the two nodes is used s ,ID e The node ID dictionary obtained in the step S44 is inquired as a key to obtain a head node vector _ n s And head node vector _ n e
S53, subtracting the head node vector and the tail node vector to obtain an embedded vector v corresponding to a directed edge s→e =vector_n e -vector_n s . The above processing is carried out on each directed edge in the list of each program dependency graph to obtain edge vectors in all PDGs
Figure BDA0003958389310000081
And S54, encapsulating the node text vector and the edge vector into a JSON file corresponding to the program dependency graph as the input of a subsequent neural network model. The JSON file can be viewed as a combination of an N × 16 two-dimensional vector matrix and an M × 16 two-dimensional vector matrix, where N represents the number of nodes in a program dependency graph and M represents the number of edges.
S6, taking the JSON files output by the S5 as input, training a neural network model of the bidirectional gate control diagram, and specifically comprising the following substeps:
s61, segmenting a training set and a testing set: selecting d% of data samples in the JSON file data set generated in S53 as a training set, and the rest as a test set; wherein d is 70.
S62, learning characteristic data contained in the data set by applying a bidirectional gate control graph neural network (BGGNN). BGGNNs consist of two directional Gated Graphs Neural Networks (GGNN): one is a forward direction L 1 Gated Graph Neural Network (GGNN) of layers 1 Receiving a positive input; the other is reverse L 2 Gated Graph Neural Network (GGNN) of layers 2 Learning the input in reverse, the formula is expressed as:
Figure BDA0003958389310000091
Figure BDA0003958389310000092
Figure BDA0003958389310000093
in the above formula, y t Is the output of the model and is,
Figure BDA0003958389310000094
is output forward, is asserted>
Figure BDA0003958389310000095
Is an inverted output, wherein L 1 Get 3,L 2 Taking 3;
and S63, carrying out iteration training for l times based on the network, and storing a neural network Model after the training is finished so as to facilitate rapid Model loading in the later period, wherein l is 150.
S7, carrying out code vulnerability detection on the target program, and specifically comprising the following substeps:
s71, firstly, preprocessing a target program source code as in the steps of S2 and S3 to obtain a preprocessed PDG;
s72, performing the step of converting the PDG to the characteristic vector matrix in S5 on the basis of the pre-trained dictionaries Dict _ t and Dict _ n in S4, and storing the converted PDG as a JSON file of a target program;
s73, multiplexing the neural network Model generated in the S6, and taking a function-level eigenvector matrix generated by the target program as input to perform function-level code vulnerability detection;
s74, the method outputs a list which lists the function names of the potential code bugs existing in the target program so as to be used by related personnel for checking and perfecting the program.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments, including the components, without departing from the principles and spirit of the invention, and still fall within the scope of the invention.

Claims (10)

1. A vulnerability detection method based on graph embedding and a bidirectional gated graph neural network is characterized by comprising the following steps:
s1, acquiring a data set of a source code and marking whether a vulnerability exists or not;
s2, generating a program dependency graph, and obtaining a program dependency graph set G = { V, E } corresponding to all source codes in the whole project, wherein V represents a set of nodes, and E represents a set of edges;
s3, extracting a required directed edge set and a required node set for all the program dependency graphs, and respectively storing the directed edge set and the node set as a text file and a dictionary file;
s4, carrying out feature training by using the node2vec to obtain a feature vector dictionary;
s5, obtaining text characteristic vectors and edge characteristic vectors based on the training in the S4, and converting all PDGs into matrix representation of the characteristic vectors at the function level;
s6, taking the JSON files output in the S5 as input, and training a neural network model of the bidirectional gating graph;
and S7, carrying out code vulnerability detection on the target program, and carrying out steps S1-S6 on the target program code to finish vulnerability detection.
2. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 1, characterized in that the labeling method of the source code in the data set in S1 is: extracting function-level source codes containing holes and source codes without holes from a source code data set, wherein the function-level source codes and the source codes without holes comprise k functions; whether each function contains the vulnerability is marked to obtain the mark Y of each function file i ∈{0,1},i∈[1,k]Where 0 indicates that no holes exist and 1 indicates that holes exist.
3. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 1, wherein in S2, the method for generating the program dependency graph is as follows:
s21, after a source code is imported into a source code analysis tool, according to a function name in the source code, using a query statement as input, generating a program dependency relationship graph corresponding to the function name, and outputting the graph dependency relationship graph as a dot type graph description file;
and S22, mapping the variable names and function names defined by the user to the symbol names in a one-to-one mode in the description file of the program dependency graph by using a unified variable name mapping mode to obtain the preprocessed program dependency graph.
4. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 1, characterized in that in S3, a regular matching manner is used to extract a required directed edge set and a required node set.
5. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 4, wherein the feature vector dictionary in S4 comprises a text vector dictionary and a node ID dictionary, the text vector dictionary takes a text file as input to obtain a text feature vector and uses one dictionary for storage, and the node ID dictionary takes a dictionary file as input to obtain a node dependent feature vector and uses one dictionary for storage.
6. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 5, wherein the text vector dictionary obtaining method is as follows: taking the Text file which is preprocessed in the S31 and stored with the directed edge as input, setting sampling strategy parameters in a node2vec model, training Text characteristics, and outputting a minimum Text unit Text i Corresponding vector _ t i ,i∈[1,n];
Storing all output text feature vectors by using a dictionary, wherein the dictionary is Dict _ t = & i∈[1,n] {key:Text i ,value:vector_t i }。
7. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 6, wherein the node ID dictionary obtaining method is as follows:
and taking the text file which is preprocessed in the S31 and stored with the directed edge as input, reasonably setting sampling strategy parameters in the node2vec model, and identifying the node as a unique node ID i Instead of the text attributes described above, the node dependency features are trained, and the dependency feature vector _ n between the output graph nodes i ,i∈[1,m];
Using one word for all output node dependent feature vectorsDictionary is saved, and the dictionary is Dict _ n = $ U i∈[1,m] {key:ID i ,value:vector_n i }。
8. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 7, wherein the S5 specifically comprises the following sub-steps:
s51, merging Text description representing nodes into one line, splitting a character string into a plurality of texts, and converting the Text attributes of the nodes into corresponding embedded vectors based on a Text vector dictionary Dict _ t
nodeTextvec i =[vector_t i1 ,vector_t i2 ,...,vector_t in ]So as to obtain a text vector of each node;
s52. A directed edge represents that it has a pair of head node and tail node, by using the ID of the two nodes s ,ID e And querying the node ID dictionary as a key to obtain a head node vector _ n s And tail node vector _ n e
S53, subtracting the head node vector and the tail node vector to obtain an embedded vector v corresponding to a directed edge s→e =vector_n e -vector_n s The above processing is carried out on each directed edge in the list of each program dependency graph to obtain edge vectors in all PDGs
Figure FDA0003958389300000031
And S54, encapsulating the node text vector and the edge vector together into a JSON file corresponding to the program dependency graph, and taking the JSON file as the input of a subsequent neural network model, wherein the JSON file is regarded as the combination of an Nx 16 two-dimensional vector matrix and an Mx 16 two-dimensional vector matrix, wherein N represents the number of nodes in the program dependency graph, and M represents the number of edges.
9. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 8, wherein the S6 specifically comprises the following sub-steps:
s61, segmenting a training set and a testing set: selecting d% of data samples in the JSON file data set generated in S53 as a training set, and the rest as a test set;
s62, learning characteristic data contained in the data set by using a bidirectional gated graph neural network, wherein the BGGNN consists of two directional gated graph neural networks: one is a forward direction L 1 Gated Graph Neural Network (GGNN) of layers 1 Accepting a forward input; the other is a reverse L 2 Gated Graph Neural Network (GGNN) of layers 2 Learning the input in reverse, the formula is expressed as:
Figure FDA0003958389300000041
Figure FDA0003958389300000042
Figure FDA0003958389300000043
in the above formula, y t Is the output of the model, and is,
Figure FDA0003958389300000044
is forward output, is greater than or equal to>
Figure FDA0003958389300000045
Is the reverse output;
and S63, carrying out iteration training for l times based on the network, and storing a neural network Model after the training is finished so as to facilitate rapid Model loading in the later period.
10. The vulnerability detection method based on graph embedding and bidirectional gated graph neural network according to claim 8, wherein the S7 specifically comprises the following sub-steps:
s71, firstly, preprocessing a target program source code as in the steps S2 and S3 to obtain a preprocessed program dependency graph;
s72, performing the step of converting the PDG to the characteristic vector matrix in S5 on the basis of the pre-trained dictionaries Dict _ t and Dict _ n in S4, and storing the converted PDG as a JSON file of a target program;
s73, multiplexing the neural network Model generated in the S6, and taking a function-level eigenvector matrix generated by the target program as input to perform function-level code vulnerability detection;
and S74, outputting a list, and listing a function name list of potential code bugs existing in the target program for relevant personnel to check and perfect the program.
CN202211470625.5A 2022-11-23 2022-11-23 Vulnerability detection method based on graph embedding and bidirectional gated graph neural network Pending CN115935372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211470625.5A CN115935372A (en) 2022-11-23 2022-11-23 Vulnerability detection method based on graph embedding and bidirectional gated graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211470625.5A CN115935372A (en) 2022-11-23 2022-11-23 Vulnerability detection method based on graph embedding and bidirectional gated graph neural network

Publications (1)

Publication Number Publication Date
CN115935372A true CN115935372A (en) 2023-04-07

Family

ID=86699820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211470625.5A Pending CN115935372A (en) 2022-11-23 2022-11-23 Vulnerability detection method based on graph embedding and bidirectional gated graph neural network

Country Status (1)

Country Link
CN (1) CN115935372A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116661852A (en) * 2023-04-06 2023-08-29 华中师范大学 Code searching method based on program dependency graph
CN117290238A (en) * 2023-10-10 2023-12-26 湖北大学 Software defect prediction method and system based on heterogeneous relational graph neural network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116661852A (en) * 2023-04-06 2023-08-29 华中师范大学 Code searching method based on program dependency graph
CN117290238A (en) * 2023-10-10 2023-12-26 湖北大学 Software defect prediction method and system based on heterogeneous relational graph neural network
CN117290238B (en) * 2023-10-10 2024-04-09 湖北大学 Software defect prediction method and system based on heterogeneous relational graph neural network

Similar Documents

Publication Publication Date Title
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN108647520B (en) Intelligent fuzzy test method and system based on vulnerability learning
CN115935372A (en) Vulnerability detection method based on graph embedding and bidirectional gated graph neural network
CN111506714A (en) Knowledge graph embedding based question answering
CN106295338B (en) SQL vulnerability detection method based on artificial neuron network
CN115048316B (en) Semi-supervised software code defect detection method and device
CN112633002A (en) Sample labeling method, model training method, named entity recognition method and device
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
Janz et al. Learning a generative model for validity in complex discrete structures
CN115277587B (en) Network traffic identification method, device, equipment and medium
CN112131888A (en) Method, device and equipment for analyzing semantic emotion and storage medium
CN112905188A (en) Code translation method and system based on generation type countermeasure GAN network
CN117236677A (en) RPA process mining method and device based on event extraction
CN113904844B (en) Intelligent contract vulnerability detection method based on cross-mode teacher-student network
CN110162972B (en) UAF vulnerability detection method based on statement joint coding deep neural network
CN116484024A (en) Multi-level knowledge base construction method based on knowledge graph
CN116340952A (en) Intelligent contract vulnerability detection method based on operation code program dependency graph
CN115328782A (en) Semi-supervised software defect prediction method based on graph representation learning and knowledge distillation
CN117094325B (en) Named entity identification method in rice pest field
CN113378178A (en) Deep learning-based graph confidence learning software vulnerability detection method
CN112148879B (en) Computer readable storage medium for automatically labeling code with data structure
CN110554952B (en) Search-based hierarchical regression test data generation method
CN115840884A (en) Sample selection method, device, equipment and medium
CN114648029A (en) Electric power field named entity identification method based on BiLSTM-CRF model
Zhang et al. Research on Defect Location Method of C Language Code Based on Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination