CN116702160A

CN116702160A - Source code vulnerability detection method based on data dependency enhancement program slice

Info

Publication number: CN116702160A
Application number: CN202310982855.8A
Authority: CN
Inventors: 胡勇; 陈晓
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2023-08-07
Filing date: 2023-08-07
Publication date: 2023-09-05
Anticipated expiration: 2043-08-07
Also published as: CN116702160B

Abstract

The invention discloses a source code vulnerability detection method based on data dependency enhancement program slicing, which comprises the steps of obtaining data dependency information and control dependency information of a source code through analysis of the source code, constructing a program dependency graph, and then enhancing the program dependency graph; program slicing is carried out by taking program slicing interest points as tangent points, sub-graphs of the program dependency graph are obtained, and vulnerability classification labels of the sub-graphs are determined by whether vulnerability code sentences are contained or not; anonymizing user-defined identifiers in a source code, and then converting each semantic unit in the code into a vector by using a Word2Vec technology to form a dictionary; converting code sentences of all nodes in the subgraph obtained after program slicing into vector sequences according to a dictionary; by detecting the code to be detected, the position information of the loopholes is given to help the loopholes repairing personnel to quickly locate the loopholes, the types of the loopholes are given, and help is provided for quickly repairing the loopholes.

Description

Source code vulnerability detection method based on data dependency enhancement program slice

Technical Field

The invention relates to the technical field of source code vulnerability detection, in particular to a source code vulnerability detection method based on data dependency enhancement program slices.

Background

As people increasingly rely on the internet, software programs are increasingly needed by more people as bridges to which people connect. As the demands of people become more and more complex, the code structure of the software program naturally becomes complex, so that loopholes are more easily introduced, and the security of the software program becomes non-negligible. Relevant surveys show that for millions of software programs, an average of every 1000 lines of code will contain a vulnerability. The event of serious loss caused by software loopholes is also endless. Since the advent of software programs, attempts have been made to detect vulnerabilities of programs by various methods, such as: rule matching scanning, smudge analysis, symbol execution, fuzzy testing, code similarity measurement, etc. Vulnerability detection on source code can avoid vulnerabilities during the program development phase. The earlier the vulnerability is discovered, the less impact the vulnerability has and the cost of repairing the vulnerability is.

With the popularization and successful application of artificial intelligence in recent years, some students try to detect vulnerabilities by using artificial intelligence algorithms, and research results show excellent detection performance. Currently, the vulnerability detection algorithm processing steps based on the artificial intelligence algorithm are generally divided into three steps: firstly, preprocessing a source code through a static analysis technology, extracting and constructing a representation form containing source code grammar and semantic information, then converting data in a character form into vectors, extracting features by using a neural network, and finally, training a classifier by using the extracted feature vectors to classify. The preprocessing stage is currently commonly used for data stream and control stream analysis techniques, abstract syntax tree (Abstract Syntax Tree, AST) construction, program slicing techniques, etc. A graph neural network and a recurrent neural network are generally used in the feature extraction step. The cyclic neural network mainly performs feature extraction on the preprocessed character sequence (such as an abstract syntax tree traversal sequence and a character sequence of a program fragment), and the graph neural network mainly performs feature extraction on the preprocessed graph structure data (such as an abstract syntax tree, a control flow graph, a data dependency graph, a program dependency graph and a code attribute graph). In the classifying step, the classifier is usually trained by means of the extracted feature vectors, and the capability of the classifier for correctly classifying new data is improved.

However, as vulnerability patterns in real projects become more and more complex, current advanced methods generally use more basic static analysis techniques in the preprocessing section, lacking grammar and semantic information extraction for complex vulnerabilities. Li et al (Li Z, zou D, xu S, et al, sysevr: A framework for using deep learning to detect software vulnerabilities J IEEE Transactions on Dependable and Secure Computing, 2021, 19 (4): 2244-2258.) originally used deep learning algorithms for vulnerability detection propose to obtain program slices on a program dependency graph and convert them into strings, use a BiLSTM (Bi-directionalLong Short-Term Memory) network to extract features, and use a multi-layer perceptron for vulnerability detection. However, the conventional cyclic neural network simply arranges codes into sequences because only sequence information is accepted, so that part of strongly related grammar semantic code fragments are far apart, and semantic information among codes cannot be effectively transferred, which is unfavorable for model identification. Therefore, some scholars have tried to use graph neural networks for vulnerability detection, for example Zhuang Rongfei et al (Zhuang Rongfei. Key technology research for vulnerability mining based on graph networks [ D ]. Harbin university of industry, 2020.) to convert codes into graph structural representations and use graph networks for feature extraction, the vulnerability detection effect is significantly better than that of traditional machine learning methods. However, when the graph neural network embeds the code statement into the graph node vector, the whole model is affected by the pre-training model by adopting simple static techniques such as Word2Vec or Doc2Vec, so that good generalization cannot be realized.

Disclosure of Invention

The invention aims to provide a source code bug detection method based on a data dependency enhancement program slice, which is used for providing position information of bugs to help bug repair personnel to quickly locate bugs and providing types of bugs by detecting codes to be detected and providing help for quickly repairing bugs.

The invention is realized by the following technical scheme: a source code vulnerability detection method based on data dependency enhancement program slices comprises the following steps:

1) Generating a program dependency graph and enhancing data: the data dependency information and the control dependency information of the source code are obtained through analysis of the source code, a program dependency graph is constructed, and then enhancement operation is carried out on the program dependency graph;

2) Program slicing is carried out by taking program slicing interest points as tangent points, sub-graphs of the program dependency graph are obtained, and vulnerability classification labels of the sub-graphs are determined by whether vulnerability code sentences are contained or not; the concrete mode of determining whether the vulnerability classification label of the subgraph contains vulnerability code sentences is as follows: if the sub-graph contains the bug code statements, the sub-graph is regarded as being bug-free, the bug type is the same as the label of the program dependency graph generating the sub-graph, and if the sub-graph does not contain the bug code statements, the sub-graph is regarded as being bug-free.

3) Anonymizing user-defined identifiers in a source code, and then converting each semantic unit in the code into a vector by using a Word2Vec technology to form a dictionary;

4) Converting code sentences of nodes in the subgraph obtained after program slicing into vector sequences according to the dictionary generated in the step 3);

5 since the original code lengths of the nodes are different, the vector sequence lengths of the nodes are also different, and in order to be able to use the graph neural network in the subsequent steps, the initial node vector is embedded into a vector with uniform length by adopting the gated loop recurrent neural network.

6) And sending the subgraphs with the node vectors embedded into a graph neural network model for training and testing to obtain the vulnerability multi-classification detection model of the software source code.

7) After the source codes to be detected are processed in the steps 1) to 4), the processed source codes to be detected are subjected to reasoning and prediction by utilizing the vulnerability multi-classification detection model of the software source codes trained in the step 6), so that the vulnerability type detection is completed.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: because the special processing of the function call statement exists in the traditional program dependency graph, the data pollution behavior occurring in the function call statement cannot be recorded, the method carries out data dependency enhancement operation on the program dependency graph, corrects the problem of inaccurate data dependency in the traditional program dependency graph through the special processing of the function call statement, enhances the data dependency relationship between each code statement and the function call statement, and carries out the enhancement operation on the program dependency graph, wherein the specific steps comprise:

1.1 After the program dependency graph is constructed, scanning all nodes to find out function call nodes taking the reference type or the pointer type as parameters;

1.2 Further processing the found function call node, finding the data dependency node of the parameter, and carrying out backward slicing on the program dependency graph by taking the node as an initial node;

1.3 For the node in the backward slice result obtained in step 1.2), selecting the node with the index of the node (namely the corresponding code line number) larger than the index of the function call node, establishing a data dependency relationship between the node and the function call node, and adding the data dependency relationship into the original program dependency graph.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: the specific steps of the program slice are as follows:

2.1 Proceeding normal forward slice and backward slice from the tangent point, and incorporating the slice result into the final slice result;

2.2 Identifying the conditional statement nodes in the final slicing result, and taking the conditional statement nodes as tangent points to carry out forward slicing, and searching for data dependent nodes;

2.3 And 2) taking the nodes in the forward slicing result in the step 2.2) as starting points to perform backward slicing again, and incorporating the nodes with the node indexes larger than the conditional node indexes in the slicing result into the final slicing result.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: the program slicing interest point refers to a code statement containing a code structure which is easy to cause program loopholes, and the program slicing interest point specifically refers to a code statement using one or more code structures in arithmetic expressions, pointers, arrays and sensitive library function calls.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: the method for embedding the initial node vector into the vector with the uniform length by adopting the gated cyclic recurrent neural network comprises the following specific steps of:

5.1 Filling or cutting off the vector sequence of each node by manual operation to ensure that the lengths of the vector sequences of each node are consistent, wherein the lengths of the vector sequences are set to be 20 sequence elements;

5.2 The fixed-length vector sequence of the node is sent into a gating cyclic recurrent neural network for feature extraction, in the gating cyclic recurrent neural network, each neural unit processes a sequence element and transmits the information to the next neuron, the last neuron receives the information of all the previous neurons, the hidden state of the last neuron is taken as the embedded vector of the node, and finally the vector of each node is expressed as a 256-dimensional vector;

5.3 Parameters in the gated recurrent neural network are updated as the entire network model is back-propagated.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: the overall architecture of the graph neural network model comprises a 4-layer graph convolution and a graph pooling convolution pooling block and a multi-layer perceptron. The loss function of the whole graph neural network is a cross entropy loss function with a penalty factor, wherein the penalty factor is used for relieving the influence caused by sample imbalance in multiple classifications.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: in the training test process of sending the node embedded subgraphs into the graph neural network model, dividing a data set into a training set, a verification set and a test set according to the proportion of 8:1:1; in the graph neural network, the parameter updating algorithm of each layer of network adopts an Adam gradient descent algorithm, the super parameter of the graph neural network selects the optimal super parameter setting by using a ten-time cross validation method, the learning rate is set to be 0.001, the batch_size is 64, and the number of hidden layers of the convolution layer is 256.

Further, in order to better realize the source code vulnerability detection method based on the data dependency enhancement program slice, the following setting mode is adopted: the anonymizing the user-defined identifier in the source code is specifically: unified normalization of user-defined variables into 'VAR_i', wherein i is the sequence of the corresponding variable names in the code, and i epsilon (1, 2, &. Cndot. N); unified normalization of user-defined functions into FUNC_i, wherein i is the sequence of the corresponding function names in the code, and i epsilon (1, 2, &. Cndot. M); user-defined variables are unified into 'TYPE_i', wherein i is the sequence of the corresponding structure names in the code, and i epsilon (1, 2, & P).

Compared with the prior art, the invention has the following advantages:

the invention improves the program dependency graph, enhances the data dependency relationship in the original program dependency graph, enables the program dependency graph to model the call behavior of the function with real parameters so as to identify the data pollution behavior and increase the information expressed by the program dependency graph. The special processing of the function call statement exists in the traditional program dependency graph, so that the data pollution behavior generated in the function call statement cannot be recorded, therefore, the invention carries out data dependency enhancement operation on the program dependency graph, corrects the problem of inaccurate data dependency in the traditional program dependency graph through the special processing of the function call statement, and enhances the data dependency relationship between each code statement and the function call statement.

The invention provides a new slicing method, which enables a final program slice subgraph to contain more information by carrying out additional program slicing operation on a conditional statement, and a model can identify a complex condition judgment structure. In a real scene, many loopholes are caused by a complex circulation structure caused by dynamic factors, and the direct reason of the loopholes is that the circulation ending condition is set improperly, so that the condition statement has important significance in loophole detection, and the addition of relevant extra slices can supplement more relevant information for the condition statement.

The invention uses the gate control cyclic neural network to embed the nodes, can better extract the information of the code sentences, and dynamically updates the neural network, thereby ensuring that the embedded result approaches to the optimal embedded result.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a diagram of the neural network according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples, but embodiments of the present invention are not limited thereto.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

Noun interpretation:

word2Vec: pre-training models oriented to natural language and programming language;

GCN: a graph convolutional neural network;

joern: a specific open source tool name;

PDG: a program dependency graph;

joern-parse: a subfunction of the open source tool Joern;

program Slicing Points of Interest: program slice points of interest.

Example 1:

a source code bug detection method based on a data dependency enhancement program slice is used for detecting a code to be detected, giving out position information of a bug to help bug repairing personnel to quickly locate the bug and giving out the type of the bug to help quick bug repairing, and comprises the following steps:

2) Program slicing is carried out by taking program slicing interest points as tangent points, sub-graphs of the program dependency graph are obtained, and vulnerability classification labels of the sub-graphs are determined by whether vulnerability code sentences are contained or not; the concrete mode of determining whether the vulnerability classification label of the subgraph contains vulnerability code sentences is as follows: if the sub-graph contains the bug code statements, the sub-graph is regarded as being bug-free, and the bug type is the same as the label of the program dependency graph generating the sub-graph, if the sub-graph does not contain the bug code statements, the sub-graph is regarded as being bug-free.

5) Since the original code lengths of the nodes are different, resulting in different vector sequence lengths of the nodes, in order to be able to use the graph neural network in the subsequent step, the initial node vector is embedded into a vector of uniform length by using the gated loop recurrent neural network.

6) And sending the subgraphs with the nodes embedded into a graph neural network model for training and testing to obtain the vulnerability multi-classification detection model of the software source code.

7) After the source code to be detected is processed in the steps 1) to 4), the processed source code to be detected is subjected to reasoning and prediction by utilizing a trained vulnerability multi-classification detection model of the software source code, so that the detection of the vulnerability type is completed.

Example 2:

the embodiment is further optimized based on the above embodiment, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice according to the present invention, and particularly adopt the following setting manner: because the special processing of the function call statement exists in the traditional program dependency graph, the data pollution behavior occurring in the function call statement cannot be recorded, the method carries out data dependency enhancement operation on the program dependency graph, corrects the problem of inaccurate data dependency in the traditional program dependency graph through the special processing of the function call statement, enhances the data dependency relationship between each code statement and the function call statement, and carries out the enhancement operation on the program dependency graph, wherein the specific steps comprise:

Example 3:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: the specific steps of the program slice are as follows:

Example 4:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: the program slicing interest point refers to a code statement containing a code structure which is easy to cause program loopholes, and the program slicing interest point specifically refers to a code statement using one or more code structures in arithmetic expressions, pointers, arrays and sensitive library function calls.

Example 5:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: the method for embedding the initial node vector into the vector with the uniform length by adopting the gated cyclic recurrent neural network comprises the following specific steps of:

Example 6:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: the overall architecture of the graph neural network model comprises a 4-layer graph convolution and a graph pooling convolution pooling block and a multi-layer perceptron. The loss function of the whole graph neural network is a cross entropy loss function with a penalty factor, wherein the penalty factor is used for relieving the influence caused by sample imbalance in multiple classifications.

Example 7:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: in the training test process of sending the node embedded subgraphs into the graph neural network model, dividing a data set into a training set, a verification set and a test set according to the proportion of 8:1:1; in the graph neural network, the parameter updating algorithm of each layer of network adopts an Adam gradient descent algorithm, the super parameter of the graph neural network selects the optimal super parameter setting by using a ten-time cross validation method, the learning rate is set to be 0.001, the batch_size is 64, and the number of hidden layers of the convolution layer is 256.

Example 8:

the embodiment is further optimized on the basis of any one of the embodiments, and the same features as the foregoing technical solutions are not described herein, so as to further better implement the source code vulnerability detection method based on the data dependency enhancement program slice, and particularly adopt the following setting mode: the anonymizing the user-defined identifier in the source code is specifically: unified normalization of user-defined variables into 'VAR_i', wherein i is the sequence of the corresponding variable names in the code, and i epsilon (1, 2, &. Cndot. N); unified normalization of user-defined functions into FUNC_i, wherein i is the sequence of the corresponding function names in the code, and i epsilon (1, 2, &. Cndot. M); user-defined variables are unified into 'TYPE_i', wherein i is the sequence of the corresponding structure names in the code, and i epsilon (1, 2, & P).

Example 9:

a source code bug detection method based on a data dependency enhancement program slice is used for obtaining relevant information by detecting unknown code bugs, helping bug repair staff to quickly locate bugs, giving out types of bugs, further helping bug repair staff to quickly repair bugs, and combining the following steps shown in fig. 1-2:

training phase:

1) Generating a program dependency graph and enhancing data: by analyzing the source code file, obtaining the data dependency information and the control dependency information (extracting the data flow and the control flow information) and constructing a program dependency graph, and then carrying out data dependency enhancement (data dependency enhancement program dependency graph) on the program dependency graph, wherein the specific steps of the data enhancement operation are as follows:

1.2 The function call node found in the previous step is further processed, the data dependency node of the parameter is found, and the node is used as an initial node to carry out backward slicing on the program dependency graph;

1.3 For the nodes in the backward slicing result obtained in the last step, selecting the nodes with the node index (namely the corresponding code line number) larger than the function call node index, and establishing a data dependency relationship between the nodes and the function call node;

2) Program slicing is carried out by taking program slicing interest points as tangent points through a slicing technology, a subgraph of a program dependency graph is obtained, and a vulnerability classification label of the subgraph is determined by whether vulnerability codes are contained or not; the program slicing interest point refers to a code statement containing a code structure which is easy to cause program loopholes, and the program slicing interest point specifically refers to a code statement using one or more code structures in arithmetic expressions, pointers, arrays and sensitive library function calls. Judging whether the sub-graph contains a vulnerability code statement or not by the vulnerability classification label of the sub-graph, wherein the vulnerability code statement comprises the following specific steps: if the sub-graph contains the bug code statements, the sub-graph is regarded as being bug-free, the bug type is the same as the label of the program dependency graph generating the sub-graph, and if the sub-graph does not contain the bug code statements, the sub-graph is regarded as being bug-free. The procedure for the program section was as follows:

2.3 And (3) carrying out backward slicing on the forward slicing result in the previous step again, and incorporating nodes with node indexes larger than the conditional node indexes in the slicing result into the final slicing result.

3) Anonymizing user-defined identifiers in a source code, and then converting each semantic unit in the code into a vector by using Word2Vec technology to form a dictionary; the method comprises the following specific steps:

3.1 Unified normalization of user-defined variables to "VAR _ i", where i is the order in which the corresponding variable names appear in the code, and i.epsilon.1, 2, & n; unified normalization of user-defined functions into FUNC_i, wherein i is the sequence of the corresponding function names in the code, and i epsilon (1, 2, &. Cndot. M); user-defined variables are unified into 'TYPE_i', wherein i is the sequence of the corresponding structure names in the code, and i epsilon (1, 2, & P).

3.2 Training a Word2Vec pre-training model after Word segmentation processing is carried out on the codes, and taking the Word2Vec pre-training model as a dictionary.

4) Vector representation of program slices: converting the dictionary generated by the code statement of each node in the subgraph obtained after program slicing according to the previous step (step 3)) into a vector sequence;

5) Since the original code lengths of the nodes are different, the original node vector sequence lengths are also different, and in order to be able to use the graph neural network in the subsequent steps, the original node vectors are embedded into vectors with uniform lengths by adopting a gated loop recurrent neural network. The method for embedding the initial node vector into the vector with the uniform length by adopting the gated cyclic recurrent neural network comprises the following specific steps of:

6) And sending the subgraphs with the nodes embedded into a graph neural network model for training and testing to obtain the vulnerability multi-classification detection model of the software source code. The overall framework of the graph neural network model comprises a 4-layer graph convolution and a graph pooling convolution pooling block and a multi-layer perceptron. The loss function of the whole graph neural network is a cross entropy loss function with a penalty factor, wherein the penalty factor is used for relieving the influence caused by sample imbalance in multiple classifications. In the training test process, the data set is divided into a training set, a verification set and a test set according to the proportion of 8:1:1. The super parameters of the network are selected to be optimal by using a ten-time cross validation method.

And (3) detection:

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent variation, etc. of the above embodiment according to the technical matter of the present invention fall within the scope of the present invention.

Claims

1. A source code vulnerability detection method based on data dependency enhancement program slicing is characterized by comprising the following steps: comprising the following steps:

2) Program slicing is carried out by taking program slicing interest points as tangent points, sub-graphs of the program dependency graph are obtained, and vulnerability classification labels of the sub-graphs are determined by whether vulnerability code sentences are contained or not;

5) Embedding the initial node vector into a vector with uniform length by adopting a gated cyclic recurrent neural network;

6) Sending the subgraphs with the node vectors embedded into a graph neural network model for training and testing to obtain a vulnerability multi-classification detection model of the software source code;

7) After the source codes to be detected are processed in the steps 1) to 4), the processed source codes to be detected are subjected to reasoning and prediction by utilizing the vulnerability multi-classification detection model of the software source codes trained in the step 6), and the vulnerability type detection is completed.

2. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the specific steps of the enhancement operation on the program dependency graph comprise:

1.3 For the node in the backward slice result obtained in the step 1.2), selecting a node with a node index larger than that of the function call node, establishing a data dependency relationship between the node and the function call node, and adding the data dependency relationship into the original program dependency graph.

3. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the specific steps of the program slice are as follows:

4. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the program slice interest point refers to a code statement using one or more code structures in arithmetic expressions, pointers, arrays, and sensitive library function calls.

5. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the method for embedding the initial node vector into the vector with the uniform length by adopting the gated cyclic recurrent neural network comprises the following specific steps of:

6. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the overall architecture of the graph neural network model comprises a 4-layer graph convolution and graph pooling convolution pooling block and a multi-layer perceptron; the loss function of the whole graph neural network is a cross entropy loss function with a penalty factor, wherein the penalty factor is used for relieving the influence caused by sample imbalance in multiple classifications.

7. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: in the training test process of sending the node embedded subgraphs into the graph neural network model, dividing a data set into a training set, a verification set and a test set according to the proportion of 8:1:1; the super parameters of the graph neural network are selected to be optimal super parameter settings by using a ten-fold cross validation method.

8. The method for detecting source code vulnerabilities based on data-dependent enhancement program slices as claimed in claim 1, wherein: the anonymizing the user-defined identifier in the source code is specifically: unified normalization of user-defined variables into 'VAR_i', wherein i is the sequence of the corresponding variable names in the code, and i epsilon (1, 2, &. Cndot. N); unified normalization of user-defined functions into FUNC_i, wherein i is the sequence of the corresponding function names in the code, and i epsilon (1, 2, &. Cndot. M); user-defined variables are unified into 'TYPE_i', wherein i is the sequence of the corresponding structure names in the code, and i epsilon (1, 2, & P).