CN107967208B

CN107967208B - Python resource sensitive defect code detection method based on deep neural network

Info

Publication number: CN107967208B
Application number: CN201610915633.4A
Authority: CN
Inventors: 陈林; 潘陶; 陈芝菲; 李言辉; 徐宝文
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2016-10-20
Filing date: 2016-10-20
Publication date: 2020-01-17
Anticipated expiration: 2036-10-20
Also published as: CN107967208A

Abstract

The invention relates to a Python resource sensitive defect code detection method based on a deep neural network, which comprises the following steps: 1) acquiring a source code of a historical version and a source code of a version to be tested of the same software; 2) extracting resource sensitive code modes of all versions by utilizing type inference; 3) extracting relevant characteristics of the resource sensitive code mode; 4) calculating each feature similarity between the defect code mode and the safety code mode, and between the defect code mode and the code mode to be tested, generating a feature vector, and obtaining a training set and a test set; 5) training the deep neural network model by using a training set to perform feature merging, and then calculating the correlation and sequencing by using the deep neural network model for the mode in the test set; 6) in the program development and maintenance stage, reminding resource object operation which is possibly wrong according to the relevance ranking result, and assisting development and maintenance; the invention solves the problems that an automatic method aiming at Python language resource sensitive code identification and defect code detection is lacked at present, and the like, thereby reducing the software risk and improving the software quality, and further improving the software development and maintenance efficiency of developers and maintainers.

Description

Python resource sensitive defect code detection method based on deep neural network

Technical Field

The invention belongs to the technical field of computers, particularly relates to the technical field of software, and particularly relates to a Python resource sensitive code defect code detection method based on a deep neural network.

Background

With the continuous development of software application technology, users have higher and higher requirements on software quality, and software developers are meeting the requirements of users through various technologies. Resource sensitive code is a block or statement of code that processes a resource object. In the development and maintenance stage of software, many resource sensitive codes have abnormal hidden dangers and are often discovered only in the maintenance process. With the constant popularity of agile development technologies, version changes are frequent, which often causes the situation that resource sensitive code suddenly causes an exception. The most traditional solutions to resource sensitive code exception handling are: the try-except key is used for capture and processing. However, developers often ignore exception handling during the development phase, resulting in a sudden exception to the program, causing the application to crash. Therefore, the identification and detection of dangerous operation of the resource object are indispensable steps in the program development and maintenance stage, the program quality can be effectively improved, and development and maintenance personnel can be helped to find program problems in time, so that a more effective solution is made.

At present, Python has become a very favored programming language for developers. At present, the application of each large open source community Python is continuously emerged, and a huge ecological system is formed. Python is an object-oriented, interpreted programming language with the characteristics of being concise, elegant, and practical. As a dynamic language, Python is more used in designing internet applications, graphical user interfaces, and scripting, involving various types of resources. Due to the dynamic language nature of Python language, developers tend to dynamically change variable types, resulting in many unsafe operations. On the other hand, when the Python operates on the resource object, various exceptions often occur due to the resource configuration and the like, and the problem caused by the resource sensitive operation is not easy to be discovered. At present, developers adopt the ways of condition detection, exception handling and the like to control the code defects.

At this stage, methods for identifying and detecting resource objects can be roughly divided into two categories. One type is a program-based data analysis method that can locate resource object hazardous operations based on logical and semantic analysis. In contrast, another class is the method of using information retrieval to identify resource objects and detect defect codes by way of machine learning. The first method is based on semantic analysis and can generate results quickly, but has the problems of low accuracy, difficult definition of semantic rules and the like. The second method extracts features through context and other modes, and then learns and predicts in a machine learning mode, although the result is slow, the second method has the characteristics of high accuracy, strong practicability and the like. The invention adopts a machine learning mode to detect.

In the maintenance phase, each submission of a developer may repair the same defects at the same time, so that the defect codes of the same version have strong correlation. The invention distinguishes the defect code and the safety code according to the historical repair information, and presumes that the code similar to the historical defect code has defects possibly by utilizing the correlation between the defect codes, and further provides a Python resource sensitive defect code detection method based on the deep neural network.

Disclosure of Invention

The invention provides a Python resource sensitive defect code detection method based on a deep neural network. The method finds out the codes similar to the repaired defect codes in the codes to be detected by mining and comparing the repaired defect codes in the historical versions, and reminds developers and maintainers of paying attention to the fact that the same problems possibly exist so as to repair the codes as early as possible. The method comprises the steps of collecting a historical version and a version to be tested of the same Python software from a software version control system; for the historical version, identifying a resource sensitive code mode through type inference, extracting corresponding mode characteristics, forming a relevant mode pair and a non-relevant mode pair by the defect code mode and the safety code mode according to historical repair information, and calculating characteristic similarity to generate a characteristic vector to obtain a training set; and for the version to be tested, extracting different modes and corresponding features by using the same method, forming a mode pair by using the historical version defect code mode and the version mode to be tested, and calculating feature similarity to generate a feature vector to obtain a test set. And then, training the deep neural network model by using the training set, and performing feature combination on the test set by using the trained deep neural network model to obtain the correlation degree between the code to be tested and the defect code. And finally, sequencing according to the correlation degree, and identifying potential dangerous codes which are very similar to the resource sensitive codes with the repaired historical versions in the codes to be tested, so that suggestions are provided for program developers and maintainers, and the generation of exceptions is prevented. The invention aims to solve the problems that an automatic method aiming at Python language resource sensitive code identification and defect code detection is lacked at present, and the like, so that the software risk is reduced, the software quality is improved, and the software development efficiency of developers is improved.

In order to achieve the above object, the present invention provides a Python resource sensitive defect code detection method based on a deep neural network, which comprises the following steps:

1) acquiring a source code of a historical version and a source code of a version to be tested of the same software;

2) extracting resource sensitive code modes of all versions by utilizing type inference;

3) extracting relevant characteristics of the resource sensitive code mode;

4) calculating each feature similarity between the defect code mode and the safety code mode, and between the defect code mode and the code mode to be tested, generating a feature vector, and obtaining a training set and a test set;

5) training the deep neural network model by using a training set to perform feature merging, and then calculating the correlation and sequencing by using the deep neural network model for the mode in the test set;

6) and in the stage of program development and maintenance, reminding the resource object operation which possibly has errors according to the result of the relevance ranking, and assisting development and maintenance.

Further, the specific steps of the step 1) are as follows:

step 1) -1: an initial state;

step 1) -2: acquiring a source program repaired in a historical version and a source program of a version to be detected in the same software from an open source version control system according to the file name and the version information;

step 1) -3: and finishing the acquisition of the source programs of different versions of the software.

Further, the specific steps of the step 2) are as follows:

step 2) -1: an initial state;

step 2) -2: performing lexical analysis and syntactic analysis on the source programs of the versions respectively, and generating abstract syntax trees corresponding to the versions by using an ast module in a Python standard library;

step 2) -3: and packaging each type of the Python according to abstract syntax defined in a Python standard library, wherein each type has a mapping table which contains the internal attribute name or the API interface name of the type.

Step 2) -4: the abstract syntax tree is traversed and the possible types of each variable are inferred based on the type and module of the encapsulation. And extracting the variable of the resource object type.

Step 2) -5: for the unidentified type, if the variable is an interface name and the parameter of the variable has a resource object type, the variable is identified as the resource object type, and if the variable is not the other variable members; if the calling variable is the resource object type, the calling variable is also marked as the resource object type.

Step 2) -6: and taking the code segment for calling the resource object type variable as a sensitive resource code mode.

Step 2) -8: and finishing the collection of the resource sensitive code mode information.

Further, the specific steps of the step 3) are as follows:

step 3) -1: an initial state;

step 3) -2: according to the resource code mode information, the operation position of the resource object is positioned, and the API (parameter type, parameter sequence), the resource name, the calling structure, the function internal structure and the like are extracted as characteristics.

Step 3) -3: the API (parameter type, number), resource name, call structure and function structure are named uniformly.

Step 3) -4: and finishing the extraction of the resource code mode characteristic information.

Further, the specific steps of the step 4) are as follows:

step 4) -1: an initial state;

step 4) -2: dividing the identified resource sensitive code modes into three classes, namely a defect code mode, a safety code mode and a code mode to be detected;

step 4) -3: for the historical version, pairwise matching similar defect code modes according to historical repair information to form a related mode pair; pairing the defect code mode and the security code mode similar to the defect code mode pairwise to form a non-relevant mode pair;

step 4) -4: for the version to be tested, pairing the defect code mode and the code mode to be tested in pairs to form a mode pair to be tested;

step 4) -5: calculating the similarity of each feature of different mode pairs and generating a feature vector;

step 4) -6: obtaining a training set by a feature vector set formed by the code mode pairs of the historical version, and obtaining a test set by a feature vector set formed by the code mode pairs of the version to be tested;

step 4) -7: finishing the collection of the training set test set information;

further, the specific steps of the step 5) are as follows:

step 5) -1: an initial state;

step 5) -2: training the deep neural network by using the training set similarity data generated in the step 4) to obtain each parameter value of the model;

step 5) -3: taking the test set generated in the step 4) as an input, and obtaining a correlation value through a trained deep neural network model;

step 5) -4: and sequencing the correlation degrees among all the code pairs from large to small according to the calculated correlation degree values, taking the first k test mode pairs as a resource sensitive code detection result, and marking the version code to be detected as a possible resource sensitive defect code.

Step 5) -5: and marking the possible resource sensitive defect codes.

Further, the specific steps of the step 6) are as follows:

step 6) -1: an initial state;

step 6) -2: for code labeled as sensitive resources, development and maintenance personnel are prompted for the location of the historical version associated therewith, suggested for modification, and a repair solution is presented.

Step 6) -3: in the program development and maintenance stage, the system automatically detects the submitted codes and gives a warning for the operation with potential dangerous resources.

Step 6) -4: and the newly submitted version program is used as historical version data for next comparison, so that the detection result is more accurate.

Step 6) -5: and finishing prompting the resource sensitive defect codes in the codes to be tested.

The invention carries out feature combination based on the deep neural network, and adopts a standard metric value to measure the correlation level between the code to be tested and the defect code in the historical version, thereby being capable of positioning the resource sensitive defect code block to be deep into the basic statement level. After identifying resource sensitive code based on type inference, automatic repairs are made and developers and maintainers are prompted based on solutions in historical versions similar thereto. By the method, the resource sensitive codes and dangerous operations thereof are identified, the software development efficiency is improved, and high-quality software application products are beneficially developed.

Drawings

Fig. 1 is an overall architecture diagram of a Python resource-sensitive defect code detection method based on a deep neural network according to an embodiment of the present invention.

Fig. 2 is a flowchart of a Python resource sensitive defect code detection method based on a deep neural network according to an embodiment of the present invention.

Fig. 3 is a diagram of a possible abstract syntax tree for a loop control structure.

Detailed Description

The method firstly collects the source codes of all the historical versions of the same Python software which are repaired through a software version control system such as a CVS. And then, performing lexical analysis and syntactic analysis on the source codes of the historical version and the version to be detected, performing type inference according to the generated abstract syntax tree, labeling variables of resource object operation, identifying resource code modes, and selecting a defect code mode and a safety code mode from resource sensitive code modes of various historical versions according to historical repair information to form a relevant mode pair and a non-relevant mode pair. And then, forming a test mode pair by the resource sensitive code mode and the historical defect code mode of the version to be tested. Then, according to the extracted pattern features, calculating the similarity of each pattern to each feature, and generating feature vectors to obtain corresponding training sets and test sets. And then, training the deep neural network model by using the training set, and performing feature combination on the test set by using the trained deep neural network model to obtain the corresponding correlation degree between the code mode to be tested and the historical defect code mode. And finally, sequencing according to the relevancy, selecting the first k relevant mode pairs as results, and marking the code to be tested in the code pairs as sensitive resource sensitive codes with potential defects, so as to assist development and maintenance personnel in the process of program development and maintenance to develop and maintain and avoid abnormity.

To better explain the technical contents of the present invention, the accompanying drawings are shown as follows.

The general architecture of the present invention is shown in fig. 1, and the flow chart is shown in fig. 2. The invention provides a Python resource sensitive defect code detection method based on a deep neural network, which comprises the following 6 steps:

step 1: and acquiring the source code of the repaired program of the historical version of the same software and the source code of the program of the version to be tested. All versions of the program are stored in a software version control system such as the CVS, and version numbers are marked. And obtaining the historical version and the source code of the version to be tested of the same Python software according to the established version number.

Step 2: and extracting the resource code mode of the program source code of each version by using a type inference mode. Firstly, lexical analysis and syntactic analysis are carried out on the source codes of the versions acquired in the step 1, and an abstract syntax tree is generated by using a corresponding function of an ast module in a Python standard library. In the abstract syntax tree, each node and sub-tree in the tree corresponds to a source code entity. To better perform type inference, we encapsulate several abstract type Types according to the type defined by Python. Each type has a table attribute, which represents the name in the abstract syntax tree related to the current type attribute or call, such as an apn; for each node in the abstract syntax tree, we set type and value, and at the same time set the unique identifier id of the node. For each node in the tree, t (x) represents the type of the node, i.e., the type of the node, such as an assignment statement. v (x) represents the value of a node, and is a text representation of the node, such as the specific content of the assignment statement. Id (x) represents a unique identifier of the node to distinguish the nodes.

For example: the assignment statement is a simple statement, and corresponds to a leaf node in the abstract syntax tree, wherein the type of the leaf node is "assignment state", and the value is the content of the assignment statement; the While loop statement corresponds to a subtree in the abstract syntax tree, the type of the root node of the subtree is "While state", the value is the judgment condition of the While statement, and the child nodes are the contents of the While internal statement and the contents of the statement of jumping out of the loop. Fig. 3 is a possible abstract syntax tree for a loop statement structure.

And finally, traversing the whole abstract syntax tree in a subsequent order, deducing the type of the variable according to the type information of the abstract syntax tree and the information such as the attribute and the like related to each type mapped by the table in the node, and marking the deduced code segment for calling the resource object variable as a resource sensitive code mode. A resource-sensitive code pattern refers to a code fragment that operates on a resource object (file object, graphical user interface object, etc.).

For example:

in the code segment, self is a resource object, and a switch _ backings function is called to operate the resource object. Thus, there is a resource sensitive code pattern.

And step 3: we have extracted the resource code pattern from the source code, via step 2. The relevant characteristics of the resource sensitive code mode extracted by the invention are as follows: API (parameter type, parameter order), resource name, call structure, and function structure.

The extracted feature designations are then normalized. For API characteristics, calculating characteristic similarity by using parameter types and parameter sequences; for the resource name features, calculating feature similarity by using word sequences in the resource names; for the calling structure feature, using the calling structure similarity as the feature similarity; for the functional structural feature, the functional structural feature is used as the feature similarity.

And 4, step 4: firstly, pairing similar defect code modes pairwise according to historical repair information to form a related mode pair for a historical version; and pairing the defect code pattern and the security code pattern similar to the defect code pattern pairwise to form an uncorrelated pattern pair. And pairing the defect code mode and the code mode to be detected pairwise to form a mode pair to be detected for the version to be detected. Through step 3, the feature information of the patterns can be extracted, and the similarity of each feature of different pattern pairs can be calculated.

The feature similarity of the API adopts an rVSM algorithm, wherein for parameter types, a TF-IDF algorithm is adopted to calculate the weight, and the formula is as follows:

wherein TF is the frequency of occurrence of the type in the API, Total_apiIs the total number of APIs, Contain_typeThe number of APIs containing that type. The method is used as the weight of the feature vector formed by the API, and meanwhile, the type sequence is measured by adopting 2-Grams, and the method has robustness for the change of the type sequence. And forming a feature vector by the type sequence and the measurement of the parameter type. And calculating the similarity of the feature vectors generated by the two versions by adopting an rVSM algorithm. In the method, the cosine distance between the historical version feature vector a and the version feature vector b to be detected represents the similarity, and the formula is as follows:

wherein the content of the first and second substances,and

respectively representing a historical version feature vector a and a version feature vector b to be tested,

representing the inner product of two feature vectors.

And the resource name characteristic similarity adopts a text similarity algorithm. First, the resource name is parsed into a form composed of a sequence of words. Next, for resource name R in the history version₁And resource name R in the version to be tested₂The calculation formula is as follows:

wherein, lcs (R)₁，R₂) Represents R₁Wherein all sub-words are in R₂So that the quantized value of the resource name can be obtained and the related vector can be generated. Such as "length" and "getLength", which

And "getLength" and "getLength", which are

And (3) traversing the tree structure according to the abstract syntax tree obtained in the step (2) for the similarity of the function structure characteristics and the similarity of the calling structure characteristics, and obtaining the corresponding similarity by the same number of tree nodes and the calculated probability, namely obtaining the similarity. And finally, obtaining a training set by a feature vector set formed by the code mode pairs of the historical version, and obtaining a test set by a feature vector set formed by the code mode pairs of the version to be tested.

And 5: through step 4, we can get a training set and a test set composed of feature vectors. Since it cannot represent whether it is related to a certain dangerous resource object operation as a whole, we use the algorithm of deep neural network to realize feature merging and calculate the correlation degree here.

First, the deep neural network is trained using the generated training set. The neural network designed by the invention is divided into three layers which are respectively inputLayer, hidden layer-1, hidden layer-2 and output layer. The hidden layer-1 is twice the number of the nodes of the input layer, and the hidden layer-2 nodes are half the number of the nodes of the input layer. Hidden layer-1 Each node H1_iThe calculation formula of (a) is as follows:

wherein w_1iB is a parameter to be trained, Input_iIs the input node value. Similarly, hidden layer-2 is derived from hidden layer-1 by this formula. For the training of w and b, the invention adopts a batch gradient descent method, which comprises the following steps:

1) initialization: Δ w^(l)＝0，Δb^(l)When the value is 0, w and b are randomly initialized to be smaller values;

2) assuming that the number of iterations is m, for i from 1 to m, the gradient is calculated and accumulated using the BP algorithm:

wherein the content of the first and second substances,

3) updating parameters:

wherein, λ is an optional parameter, and 2 is taken in the invention. And (4) training a deep neural network model by the training method.

In the detection stage, the feature vector of each mode pair in the test set is used as input, and calculation is carried out through the node formula. The final output is a correlation value representing the degree of correlation of the pattern pair. The nonlinear neural network method is more effective than the linear information retrieval method, and can better reflect the correlation level.

In the deep neural network, the weight of each link of the middle layer and the input layer is obtained through historical version data training, and the corresponding weight is obtained. Meanwhile, partial links and weights in the middle of the neurons are changed through a large amount of training, and therefore output results are optimized.

And for the obtained correlation values, sorting the correlation values from large to small, and selecting the first k mode pairs as output results.

Step 6: and reminding development and maintenance personnel of the position and historical resource operation related to the position according to the obtained sensitive code to be detected with high correlation, giving a previous abnormal processing scheme for the resource, and sending a warning. And the detected Python source code is used as historical version data for next detection, so that the detection accuracy is improved. And automatically detecting the Python source code just submitted, and sending an alarm to development and maintenance personnel according to the result.

For example: in the historical version, the operations on the resource object somewhere are as follows:

in the historical version, the self variable is a resource object, and a read operation is performed on the object. To prevent exceptions, the developer adds a try _ catch exception to the statement's periphery.

And the source code of the version to be tested has the following statements:

def read_bytes(self，num_bytes，callback＝None，streaming_callback＝None，

partial＝False)：

self._try_inline_read()

here again, the resource object is read and uses the same API, but no exception handling is performed. The two codes are combined into a code pair, and whether the two codes are related or not can be identified and detected by the method, so that whether the code to be detected is a sensitive resource code or not is determined, developers and maintainers are reminded to process the code, and related historical version code information is given out.

In conclusion, the Python resource sensitive defect code detection method based on the deep neural network solves the problems that an automatic method aiming at Python language resource sensitive code detection and dangerous operation identification is lacked at present, improves software application quality and ensures controllability in a software evolution process.

Claims

1. A Python resource sensitive defect code detection method based on a deep neural network is characterized in that a historical version and a version to be detected of the same Python software are collected from a software version control system; for the historical version, identifying a resource sensitive code mode through type inference, extracting corresponding mode characteristics, forming a relevant mode pair and a non-relevant mode pair by the defect code mode and the safety code mode according to historical repair information, and calculating characteristic similarity to generate a characteristic vector to obtain a training set; for the version to be tested, different modes and corresponding features are extracted by using the same method, a historical version defect code mode and the version to be tested form a mode pair, and feature similarity is calculated to generate feature vectors to obtain a test set; secondly, training a deep neural network model by using a training set, and performing feature combination on the test set by using the trained deep neural network model to obtain the correlation degree between the code to be tested and the defect code; finally, sorting is carried out according to the relevance, the first k relevant code pairs are selected as results, the codes to be detected in the code pairs are marked as resource sensitive codes with potential defects, dangerous resource object operation is detected, and auxiliary information is provided; the method comprises the following steps:

1) acquiring a source code of a historical version and a source code of a version to be tested of the same software; all versions of software are stored in the software version control system and submitted, and the version numbers are standardized; the historical version and the source code of the version to be tested of the same Python software can be obtained according to the established version number;

2) extracting resource sensitive code modes of all versions by utilizing type inference; performing lexical and syntactic analysis on the source codes of the historical version and the version to be detected which are collected in the step 1, generating a corresponding abstract syntax tree by using an ast module in a Python standard library, abstracting Python types, setting a type and a value for each node, and extracting a resource sensitive code mode by using a global type inference method;

the resource sensitive code mode refers to a code segment for operating a resource object;

definition 1: the Python standard library is issued along with the Python language and comprises built-in modules providing various system level functions;

definition 2: type inference is a method of inferring variable types in dynamic languages by performing static analysis on source code;

definition 3: the type is used for identifying node type information in the abstract syntax tree, and the concrete value of the type is from the abstract syntax defined by Python;

definition 4: value is a textual representation of the contents of a node in the abstract syntax tree;

3) extracting relevant characteristics of the resource sensitive code mode; through step 2, we have extracted resource sensitive code patterns from the source code; the relevant characteristics of the resource sensitive code mode extracted by the method are as follows: API (parameter type, parameter order), resource name, call structure and function structure; finally, the extracted feature names are normalized;

definition 1: for API features, calculating feature similarity by using parameter types and parameter sequences;

definition 2: for the resource name features, calculating feature similarity by using word sequences in the resource names;

definition 3: for the calling structure feature, using the calling structure similarity as the feature similarity;

definition 4: for the function structure feature, using the function structure feature as a feature similarity;

4) calculating each feature similarity between the defect code mode and the safety code mode, and between the defect code mode and the code mode to be tested, generating a feature vector, and obtaining a training set and a test set; for the historical version, pairwise matching similar defect code modes according to historical repair information to form a related mode pair; pairing the defect code mode and the security code mode similar to the defect code mode pairwise to form a non-relevant mode pair; for the version to be tested, pairing the defect code mode and the code mode to be tested in pairs to form a mode pair to be tested; then, according to each feature information extracted in the step 3, calculating each feature similarity of different mode pairs and generating a feature vector; finally, a training set is obtained by a feature vector set formed by the code pattern pairs of the historical version, and a test set is obtained by the feature vector set formed by the code pattern pairs of the version to be tested;

definition 1: the defect code mode refers to a resource sensitive defect code mode which is repaired later in the historical repair information;

definition 2: a secure code pattern refers to a resource-sensitive code pattern that is similar to a defective code pattern but does not find a defect;

definition 3: the feature similarity of the API adopts a VSM algorithm, wherein for the parameter types, a TF-IDF algorithm is adopted to calculate the weight, and the formula is as follows:

wherein TF is the frequency of occurrence of the type in the API, Total_apiIs the total number of APIs, Contain_typeThe number of APIs that contain the type; the method adopts the method as the weight of the characteristic vector formed by the API, measures the type sequence by adopting 2-Grams, has robustness to the change of the type sequence, and forms the type sequence and the measurement of the parameter type into one characteristic vector; calculating the similarity of the generated feature vectors of the two versions by adopting a VSM algorithm; in the method, the history versionThe cosine distance between the feature vector a and the feature vector b of the version to be detected represents the similarity, and the formula is as follows:

wherein the content of the first and second substances,

and

representing the inner product of two feature vectors;

definition 4: the feature similarity of the resource names adopts a text similarity algorithm; firstly, resolving a resource name into a form formed by combining a sequence of words; next, for the resource name in the history version and the resource name in the version to be tested, the calculation formula is as follows:

wherein, lcs (R)₁R₂) Represents R₁Wherein all sub-words are in R₂The number of the resource names, so that the quantized value of the resource names can be obtained, and related vectors are generated;

definition 5: the VSM algorithm is a space vector model and is an algorithm for calculating similarity;

5) training the deep neural network model by using a training set to perform feature merging, and then calculating the correlation and sequencing by using the deep neural network model for the mode in the test set; training a deep neural network model by using the training set generated in the step 2), then performing feature combination on the test set generated in the step 2) by using the trained deep neural network model, and calculating the correlation; finally, sorting the correlation values between the defect code mode and the code mode to be detected from large to small, and selecting k code pairs as output results;

6) in the program development and maintenance stage, reminding resource object operation which is possibly wrong according to the relevance ranking result, and assisting development and maintenance; according to the obtained sensitive code of the resource to be detected with high correlation degree, reminding development and maintenance personnel of the position and the historical resource operation related to the position, giving a previous abnormal processing scheme for the resource and giving an alarm; the detected Python source code is used as historical version data for next detection, so that the detection accuracy is improved; and automatically detecting the Python source code just submitted, and sending an alarm to development and maintenance personnel according to the result.