CN109726120B - Software defect confirmation method based on machine learning - Google Patents

Software defect confirmation method based on machine learning Download PDF

Info

Publication number
CN109726120B
CN109726120B CN201811477275.9A CN201811477275A CN109726120B CN 109726120 B CN109726120 B CN 109726120B CN 201811477275 A CN201811477275 A CN 201811477275A CN 109726120 B CN109726120 B CN 109726120B
Authority
CN
China
Prior art keywords
class
classifier
defect
code
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811477275.9A
Other languages
Chinese (zh)
Other versions
CN109726120A (en
Inventor
柯文俊
刘悦悦
江山
李雅斯
王坤龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN201811477275.9A priority Critical patent/CN109726120B/en
Publication of CN109726120A publication Critical patent/CN109726120A/en
Application granted granted Critical
Publication of CN109726120B publication Critical patent/CN109726120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a software defect confirmation method based on machine learning, which comprises the following steps: the method comprises the following steps: constructing a feature vector; step two: the defect code knowledge base construction based on cluster analysis comprises the following steps: inputting a defect code feature vector set as a data set, and clustering; performing cluster integration on a data set, firstly generating a plurality of clustering results, and then integrating the clusters; the method comprises the steps of collecting a plurality of clustering results and integrating the clustering results; forming a defect code knowledge base sample; step three: supervised learning based defect code validation, comprising: taking the obtained defect code knowledge base sample as input, constructing a multi-class classifier, and judging whether the classifier meets evaluation indexes or not by using a test sample; if the evaluation index is not met, a cost function is introduced to carry out iterative optimization on the classifier until the evaluation index is met. The invention completes the separation work of the false alarm defect and the non-false alarm defect, and achieves the purposes of accurately confirming the software defect and improving the testing efficiency.

Description

Software defect confirmation method based on machine learning
Technical Field
The invention relates to a software technology, in particular to a software defect confirmation method based on machine learning.
Background
With the increasing complexity of software and the increasing amount of code, the defect detection and confirmation of software become more and more important. The traditional software static analysis is a process of searching errors possibly existing in a code or evaluating the program code without executing the program code, and data flow, control flow and the like of the program are analyzed by scanning the text of the program code, so that the design of the system meets the requirements of modularization, structurization and object orientation, and the reliability of the code is improved by monitoring the standard and quality of the code.
Existing software static analysis is often an approximation-based analysis method that provides information that is not always accurate. The program is not actually executed, but is analyzed by static scanning of the code. The method of manually judging the detection result is not enough to meet the requirement of high-speed development of software in the future.
Disclosure of Invention
The present invention aims to provide a software defect confirmation method based on machine learning, which is used for solving the problems of the prior art.
The invention relates to a software defect confirmation method based on machine learning, which comprises the following steps: the method comprises the following steps: constructing a feature vector, comprising: firstly, extracting defect code segments in a defect code set one by one, filtering the defect code segments into minimized defect code segments by adopting code filtering based on slice analysis, then converting the code segments into an abstract syntax tree by using a syntax analysis tree method, selecting a proper C language keyword set to form a characteristic matrix of a plurality of lines of codes according to different code rules, and finally obtaining a defect code characteristic vector set for subsequent machine learning according to a characteristic matrix merging method; step two: the defect code knowledge base construction based on cluster analysis comprises the following steps: inputting a defect code feature vector set as a data set, and clustering; performing cluster integration on a data set, firstly generating a plurality of clustering results, and then integrating the clusters; the method comprises the steps of collecting a plurality of clustering results and integrating the clustering results; forming a defect code knowledge base sample; step three: supervised learning based defect code validation, comprising: taking the obtained defect code knowledge base sample as input, constructing a multi-class classifier, and judging whether the classifier meets evaluation indexes or not by using a test sample; if the evaluation index is not met, a cost function is introduced to carry out iterative optimization on the classifier until the evaluation index is met.
The invention provides a software defect confirming method based on machine learning, which takes the detection result of a software static analysis tool as input, firstly, extracts a minimized defect code segment corresponding to a defect code line by a slice analysis method, and constructs a minimized defect code characteristic vector based on a syntax tree; then, clustering analysis is carried out to construct a defect code knowledge base through feature selection and clustering integration technologies according to the feature vectors; and finally, constructing a software defect code confirmation model based on a defect code knowledge base and a supervised learning method, training, continuously optimizing the model until the specified accuracy is reached, completing the separation work of the false alarm defect and the non-false alarm defect, and achieving the purposes of accurately confirming the software defect and improving the testing efficiency.
Drawings
FIG. 1 shows a flow of a code feature vector construction method;
FIG. 2 is an exemplary diagram of a simple program static slice;
FIG. 3 is a diagram illustrating a process for constructing a syntax abstraction tree for an example code;
FIG. 4 is a schematic diagram of a clustering integration process;
FIG. 5 is a schematic diagram of a pair of other method classification problems;
FIG. 6 illustrates a four-class problem DAG structure.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
The invention relates to a software defect confirmation method based on machine learning, which comprises the following steps:
the method comprises the following steps: constructing a feature vector;
fig. 1 shows a process of a code feature vector construction method, and as shown in fig. 1, the code feature vector construction mainly uses a slice analysis-based code filtering technology and a syntax analysis tree-based feature vector construction method, the method includes firstly extracting defect code segments in a defect code set one by one, filtering the defect code segments into minimized defect code segments by using slice analysis-based code filtering, then converting the code segments into abstract syntax trees by using a syntax analysis tree method, selecting a proper C language keyword set to form a feature matrix of a multi-line code according to different code rules, and finally obtaining a quantized and accurately described defect code feature vector set for subsequent machine learning according to a feature matrix merging method.
1. The slice analysis based code filtering includes: the method comprises the steps of adopting a static slicing method to carry out slicing analysis on defect codes, firstly extracting a corresponding group of concerned variables from a section of defect code segment (one or more sentences), then filtering out irrelevant codes according to a static backward slicing criterion, extracting sentences which influence the concerned group of variable values in source codes, and forming a new code segment, namely obtaining a minimized defect code segment related to the variable values. And providing support for feature vector construction of subsequent software codes.
Fig. 2 is a diagram of a simple example of static slicing of a program, and as shown in fig. 2, a static slicing algorithm based on a System Dependency Graph (SDG) is adopted according to a static backward slicing criterion to implement slicing analysis-based code filtering. The SDG static slicing algorithm contains control and data dependencies and procedure call relationships in a single structure. Point of interest p is the statement "system.
As shown in fig. 1, 2, the feature vector construction based on the parse tree includes:
the characteristic vector construction based on the syntactic analysis tree is to extract specific key node types from the syntactic analysis tree to carry out all-around description on codes by analyzing the syntactic and semantic characteristics of core code fragments, construct corresponding characteristic vectors, realize the quantification of code characteristics and serve as the basis for constructing a defect code knowledge base and a software defect confirmation model.
As shown in fig. 1, establishing a rule set-based code keyword library includes:
common C language code rule sets are formulated for code structures, function names, parameter variables, constants, operational characters and the like of the software system by analyzing C language programming specifications such as GJB 5369 aerospace model software C language safety subsets and GJB8114-2013C/C + + language programming safety subsets; and then designing a keyword library of the corresponding codes according to each rule in the rule library. As shown in table 1, a C language keyword library example.
TABLE 1
Numbering Key byte point naming Key node representation information
1 for for circulation structure
2 stmtexp A sentence
3 decl Claim an operation
4 incr Self-adding operation
5 cond Comparison operation
6 vari Common variables
7 para Environmental variables
8 assign Assignment operations
9 block Program contained in great brackets
10 mul Multiplication operations
11 add Addition operation
12 cons Constant quantity
13 type Variable type
14 fun_call Function call
15 fname Function name
Building the parse tree includes:
aiming at the minimum defect code fragment set extracted in the previous link, mapping from codes to a syntax analysis tree is realized according to keywords of different rules, the syntax analysis tree expresses source code syntax and semantic structure logic information in a tree form, a sub-tree represents a section of continuous source codes, and each section of codes is analyzed into a syntax analysis tree formed by various types of nodes.
Establishing the code feature matrix and the feature vector comprises the following steps:
and counting the occurrence times of each related node in the syntax analysis tree to construct a corresponding feature matrix. For the target code segment, different feature vectors of the context of the interest point need to be generated respectively to construct a feature matrix.
The establishing of the code feature matrix and the feature vector specifically comprises:
the feature vector generation comprises:
for the parse tree of the code fragment described above, 6 non-critical nodes (for, vari, para, cons, type, and block) are defined therein. Considering a specific whole syntax structure, the difference of different loop structures needs to be hidden by defining for nodes and block nodes as non-key nodes; certain parameter, variable addition and deletion operations occur in the code, and therefore, the vari, para and cons nodes are defined as non-critical nodes to hide the code differences caused by the parameters and the variables. For clarity, the feature vector depicted in fig. 3 omits 6 non-critical nodes. The above example may be described with 10-dimensional vectors (stmtexp, decl, incr, cond, assign, mul, add, fun _ call, fname, fpara).
The feature matrix generation comprises:
the generation of the feature matrix is an extension of the generation of the feature vector, and the row vector of each feature matrix corresponds to the key node vector of the context of the concerned point. This example describes the generation of the feature matrix for the root node, where the feature matrix is the cumulative sum of all the child node feature matrices and the for node initialization feature matrix. The model requires one subsequent traversal operation of the entire tree to generate the feature matrix of some nodes.
The feature matrix combination comprises the following steps:
in the process, parameters are required to be set to control the number of the merged nodes, the parameters are related to the total number of lexical units of the nodes and the number of the merged nodes, the selection principle is to reduce the false alarm rate as much as possible, and finally, the quantized and accurately described code feature vectors can be obtained.
Step two, building a defect code knowledge base based on cluster analysis, which comprises the following steps:
inputting a defect code feature vector set in the step one as a data set, wherein the specific clustering integration process comprises the following steps: suppose a data set X has n data objects, X ═ X1,x2,...,xnFirstly, using N-times clustering algorithm to the data set X to obtain N clusters, where P ═ P1,P2,...,PNIn which P is a member of a clusteriAnd (i ═ 1, 2, 3., N) is a clustering result obtained by the ith clustering algorithm. Then, the consistency function T integrates the clustering result in P to obtain a new data partition P'.
As known from the clustering process, clustering a data set first generates a plurality of clustering results, and then these clusters are integrated.
(1) Multiple clustering result collections
Since the clustering effect varies with the clustering algorithm, the data set, and the feature vector, the selection of each base clusterer is based on the analysis of the data set when clustering.
(2) Multiple clustering result integration
The clustering results are integrated, and the method based on Voting (Voting) is adopted in the invention: clustering is performed by voting, that is, for a data point, if most clusters in the cluster set classify it as the ith class, it will eventually be classified as the ith class.
Step three, defect code confirmation based on supervised learning comprises the following steps:
taking the defect code knowledge base sample obtained in the second step as input, constructing a multi-class classifier, and judging whether the classifier meets the evaluation index by using the test sample; if the evaluation index is not met, a cost function is introduced to carry out iterative optimization on the classifier until the evaluation index is met, and the classification accuracy is improved as much as possible on the premise of ensuring the code recall ratio.
1. The defect classifier construction based on the defect code knowledge base comprises the following steps:
and in the first step and the second step, the defect codes are subjected to clustering analysis according to the extracted feature vectors to obtain defect code knowledge base samples with mark categories. The construction of the defect code classifier divides a sample knowledge base into two types, namely a training sample and a testing sample, wherein the training sample and the testing sample are used as learning algorithm input to learn the classifier; the latter serves as input for a classifier test to evaluate whether the classifier satisfies the corresponding index.
The software static analysis result of the invention needs to be divided into three classes, a multi-class classifier is designed by adopting a method of dividing a multi-class classification problem into a plurality of two-class classification problems, and the defect classifier is constructed by mainly adopting the following three schemes.
Fig. 5 is a schematic diagram illustrating a classification problem of a pair of other methods, and as shown in fig. 5, (1) the pair of other methods (OVR) includes:
the other method is to construct k two-class classifiers (k classes are set), wherein the ith classifier divides the ith class from the rest classes, the ith classifier takes the ith class in the training set as a positive class ("+ 1") during training, and the rest class points are negative classes ("-1") for training. During the judgment, a certain test sample respectively passes through k classifiers to obtain k output values, and if only one plus 1 occurs, the corresponding class is the class of the training sample; if the classification overlaps (more than one +1) or the classification can not be classified (none of the output is +1), judging which class of the training sample has the smallest distance from the training sample, and the class corresponding to the minimum distance is the class of the training sample.
(2) One-to-One method (One against One)
The method trains one classifier between every two classes, so for a k-class problem, there will be k (k-1)/2 classifiers. When an unknown sample is classified, each classifier judges the classification and votes for the corresponding classification, and the classification with the most votes is finally used as the classification of the unknown sample.
(3) DAG method (directed acyclic graph)
FIG. 6 shows a four-class problem DAG structure diagram, and as shown in FIG. 6, DAG is derived from a decision-directed cyclic graph DAG, the training process of which is similar to a "one-to-one" method, but only calls (k-1) classifiers when actually classifying for the k-class problem.
Due to learning techniques and noisy data in the sample knowledge base, the ideal classifier is often difficult to obtain. Therefore, it is necessary to select an appropriate evaluation index for the classifier to measure the performance of the classifier.
TABLE 2 classifier results versus actual tag comparison Table
Figure GDA0001971343810000071
Table 2 shows the classification result of the multi-class classifier and the actual label comparison result of the test sample, and the evaluation indexes of the multi-class classifier defined according to the table are as follows.
(1)Accuracy
Accuracy is the Accuracy, also called the integrated success rate, and represents the ratio of the number of all correctly classified samples to the total number of samples in the test sample set, i.e.:
Figure GDA0001971343810000081
(2)Precision
precision refers to Precision, also called Precision, which represents the percentage of the number of samples "need to be modified" that the classifier classifies correctly to the number of samples classified as "need to be modified", i.e.:
Figure GDA0001971343810000082
(3)Recall
recall, Recall, reflects the Recall of the classifier with respect to the "need to modify" category, i.e.:
Figure GDA0001971343810000083
and calculating the test result of the test sample set to obtain the three indexes, and if the test result does not meet the standard, training the learning model again to obtain a multi-class classifier meeting the performance indexes, so as to provide technical support for software defect confirmation.
2. Iterative optimization of defect classifier based on cost function
When the classifier designed in step three 1 confirms the software defect, the costs generated by different classes are asymmetric, so that the misclassification cost is taken as the research key point, a new multi-class classifier evaluation index is defined to measure the performance of the classifier according to the classification effect of the designed multi-class classifier and the common classifier evaluation index, and the cost function and the parameters are adjusted according to the evaluation result to iteratively optimize the classifier so as to obtain the multi-class classifier meeting the indexes. The construction process of the classifier based on the cost function is briefly described below by taking a two-class classifier as an example.
Constructing a cost function
The binary cost function may be constructed as follows:
Figure GDA0001971343810000091
wherein xiIs of class ciRatio of (a) xjIs of class cjRatio of (A), (B), (C)i,cj) Is of class ciIs misjudged as category cjThe cost of (a).
For this cost function, a cost matrix as shown in table 3 is constructed. Wherein c is0Is of positive type, c1Is of the inverse class, F (c)0,c0) And F (c)1,c1) Has a value of 0, F (c)0,c1) And F (c)1,c0) The value of (c) is given by the above formula. F (c)0,c1) Representing the cost of misclassifying a positive class into a negative class, F (c)1,c0) Representing the cost of misclassifying an anti-class into a positive class.
TABLE 3 cost matrix
Figure GDA0001971343810000092
(1) Constructing a risk function
For the aforementioned dichotomy problem, the risk function can be expressed as:
R(c0|X)=P(c0|X)F(c0,c0)+P(c1|X)F(c1,c0)=P(c1|X)F(c1,c0)
R(c1|X)=P(c0|X)F(c0,c1)+P(c1|X)F(c1,c1)=P(c0|X)F(c0,c1)
(2) adjusting cost function parameters
The cost function related parameters can be determined only through multiple experiments. The method comprises the steps of firstly setting initial parameters according to the distribution of various types in a training sample and related experience, and then determining cost function parameters through multiple tests.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A software defect confirmation method based on machine learning is characterized by comprising the following steps:
the method comprises the following steps: constructing a feature vector, comprising:
firstly, extracting defect code segments in a defect code set one by one, filtering the defect code segments into minimized defect code segments by adopting code filtering based on slice analysis, then converting the code segments into an abstract syntax tree by using a syntax analysis tree method, selecting a proper C language keyword set to form a characteristic matrix of a plurality of lines of codes according to different code rules, and finally obtaining a defect code characteristic vector set for subsequent machine learning according to a characteristic matrix merging method;
step two: the defect code knowledge base construction based on cluster analysis comprises the following steps:
inputting a defect code feature vector set as a data set, and clustering; performing cluster integration on a data set, firstly generating a plurality of clustering results, and then integrating the clusters; the method comprises the steps of collecting a plurality of clustering results and integrating the clustering results; forming a defect code knowledge base sample;
step three: supervised learning based defect code validation, comprising:
taking the obtained defect code knowledge base sample as input, constructing a multi-class classifier, and judging whether the classifier meets evaluation indexes or not by using a test sample; if the evaluation index is not met, a cost function is introduced to carry out iterative optimization on the classifier until the evaluation index is met.
2. The machine-learning based software bug validation method of claim 1, wherein the slicing analysis based code filtering comprises: the method comprises the steps of adopting a static slicing method to carry out slicing analysis on defect codes, firstly extracting a corresponding group of concerned variables from a section of defect code segment, then filtering out irrelevant codes according to a static backward slicing criterion, extracting statements which influence the concerned group of variable values in source codes to form a new code segment, and obtaining a minimized defect code segment related to the variable values.
3. The machine-learning-based software defect validation method of claim 2, wherein the slicing analysis-based code filtering is implemented using a static slicing algorithm based on a system dependency graph according to a static backward slicing criterion.
4. The machine-learning-based software bug validation method of claim 2, wherein the parsing tree based feature vector construction comprises:
building a syntax analysis tree, comprising:
aiming at the extracted minimized defect code fragment set, mapping codes to a syntactic analysis tree according to keywords of different rules, wherein the syntactic analysis tree adopts a tree form to express source code grammar and semantic structure logic information, a subtree represents a section of continuous source codes, each section of codes is analyzed into the syntactic analysis tree formed by multiple types of nodes, and a code characteristic matrix and a characteristic vector are established;
the method comprises the steps that corresponding feature matrixes are constructed by counting the occurrence frequency of each related node in a syntax analysis tree, and for target code segments, different feature vectors of contexts of interest points need to be generated respectively to construct the feature matrixes; the feature matrix generation comprises: performing one-time subsequent traversal operation on the whole tree to generate a feature matrix of some nodes; and feature matrix merging is performed.
5. The machine-learning-based software bug validation method of claim 1, wherein the clustering integration process comprises: suppose a data set X has n data objects, X ═ X1,x2,...,xnFirstly, using N-times clustering algorithm to the data set X to obtain N clusters, where P ═ P1,P2,...,PNIn which P isiAnd (i ═ 1, 2, 3.., N) is a clustering result obtained by the ith clustering algorithm, and the clustering results in the P are integrated through a consistency function T to obtain a new data partition P'.
6. The machine learning-based software bug validation method of claim 1, wherein the bug classifier construction based on the bug code knowledge base comprises: performing cluster analysis on the defect codes according to the extracted feature vectors in the first step and the second step to obtain defect code knowledge base samples with labeled categories, dividing the sample knowledge base into a training sample and a test sample by the structure of a defect code classifier, and inputting the training sample as a learning algorithm to learn the classifier; the test samples are used as input for the classifier test to evaluate whether the classifier satisfies the corresponding index.
7. The machine learning-based software defect validation method of claim 1, wherein the method of constructing a defect classifier comprises:
a pair of other methods comprising:
the other method is to construct k two-class classifiers, wherein the ith classifier divides the ith class from the rest classes, the ith classifier takes the ith class in a training set as a positive class during training, the rest class points are negative classes for training, a certain test sample respectively passes through the k classifiers to obtain k output values during discrimination, and if only one plus 1 occurs, the corresponding class is the class of the training sample; if the phenomenon of overlapping or unclassification of the classification occurs, judging which class of the training sample has the smallest distance from the training sample, wherein the corresponding class with the smallest distance is the class of the training sample;
a one-to-one method comprising:
training a classifier between every two classes, so that for a k-class problem, k (k-1)/2 classifiers exist, when an unknown sample is classified, each classifier judges the class and votes for the corresponding class, and the class with the most votes is finally used as the class of the unknown sample.
8. The software defect validation method based on machine learning of claim 1, wherein selecting a suitable evaluation index for the classifier to measure the performance of the classifier comprises:
the evaluation indexes for defining the multi-class classifier comprise:
the accuracy rate represents the ratio of the number of all correctly classified samples to the number of all samples in the test sample set;
the accuracy rate represents the percentage of the number of samples needing to be modified and classified as correct by the classifier to the number of all samples needing to be modified;
recall, identifying recall of the classifier relative to the class that needs to be modified;
and calculating the test result of the test sample set to obtain the evaluation index of the multi-class classifier, and if the test result does not meet the standard, re-training the learning model to obtain the multi-class classifier meeting the performance index.
9. The machine learning-based software defect validation method of claim 1, wherein when the classifier in step three validates software defects, the costs generated by different classes are asymmetric, according to the classification effect of the multi-class classifier, in combination with the classifier evaluation index, a new multi-class classifier evaluation index is defined to measure the performance of the classifier, and the cost function and parameters are adjusted according to the evaluation result to iteratively optimize the classifier, so as to obtain the multi-class classifier satisfying the index.
10. The machine-learning-based software bug validation method of claim 9, further comprising:
constructing a cost function includes:
the binary cost function is constructed as follows:
Figure FDA0001892470360000041
wherein xiIs of class ciRatio of (a) xjIs of class cjRatio of (A), (B), (C)i,cj) Is of class ciIs misjudged as category cjConstructing a cost matrix aiming at the cost function;
constructing a risk function comprising:
for the binary problem, the risk function is expressed as:
R(c0|X)=P(c0|X)F(c0,c0)+P(c1|X)F(c1,c0)=P(c1|X)F(c1,c0);
R(c1|X)=P(c0|X)F(c0,c1)+P(c1|X)F(c1,c1)=P(c0|X)F(c0,c1);
wherein c is0Is of positive type, c1Is of the inverse class, F (c)0,c0) And F (c)1,c1) Has a value of 0, F (c)0,c1) Representing the cost of misclassifying a positive class into a negative class, F (c)1,c0) Representing the cost of misclassifying a reverse class into a forward class;
and adjusting cost function parameters, setting initial parameters according to various distributions and related experiences in the training samples, and determining the cost function parameters through multiple tests.
CN201811477275.9A 2018-12-05 2018-12-05 Software defect confirmation method based on machine learning Active CN109726120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811477275.9A CN109726120B (en) 2018-12-05 2018-12-05 Software defect confirmation method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811477275.9A CN109726120B (en) 2018-12-05 2018-12-05 Software defect confirmation method based on machine learning

Publications (2)

Publication Number Publication Date
CN109726120A CN109726120A (en) 2019-05-07
CN109726120B true CN109726120B (en) 2022-03-08

Family

ID=66294788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811477275.9A Active CN109726120B (en) 2018-12-05 2018-12-05 Software defect confirmation method based on machine learning

Country Status (1)

Country Link
CN (1) CN109726120B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274123A (en) * 2019-05-14 2020-06-12 上海戎磐网络科技有限公司 Automatic generation method and framework of safety protection software test set based on software genes
CN110297656B (en) * 2019-05-23 2024-01-26 天航长鹰(江苏)科技有限公司 Method and device for evaluating codes based on configuration model and computer equipment
CN110489348B (en) * 2019-08-23 2023-08-25 山东浪潮科学研究院有限公司 Software functional defect mining method based on migration learning
CN112035345A (en) * 2020-08-20 2020-12-04 国家电网有限公司信息通信分公司 Mixed depth defect prediction method based on code segment analysis
CN112131570B (en) * 2020-09-03 2022-06-24 苏州浪潮智能科技有限公司 PCA-based password hard code detection method, device and medium
CN112131122B (en) * 2020-09-27 2022-09-30 北京智联安行科技有限公司 Method and device for source code defect detection tool misinformation evaluation
CN112579469A (en) * 2020-12-29 2021-03-30 中国信息安全测评中心 Source code defect detection method and device
CN113326198B (en) * 2021-06-15 2024-06-14 深圳前海微众银行股份有限公司 Code defect state determining method and device, electronic equipment and medium
CN113656315B (en) * 2021-08-19 2023-01-24 北京百度网讯科技有限公司 Data testing method and device, electronic equipment and storage medium
CN114743440A (en) * 2022-04-29 2022-07-12 长沙酷得网络科技有限公司 Intelligent programming training environment construction method and device based on application disassembly
CN116702160B (en) * 2023-08-07 2023-11-10 四川大学 Source code vulnerability detection method based on data dependency enhancement program slice

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103369044A (en) * 2013-07-11 2013-10-23 无锡交大联云科技有限公司 Mobile terminal user network perception diagnosis and treatment method based on cloud knowledge base
CN104965787A (en) * 2015-07-06 2015-10-07 南京航空航天大学 Three-decision-based two-stage software defect prediction method
WO2016085272A1 (en) * 2014-11-28 2016-06-02 주식회사 파수닷컴 Method for reducing false alarms in detecting source code error, computer program therefor, recording medium thereof
CN107967208A (en) * 2016-10-20 2018-04-27 南京大学 A kind of Python resource sensitive defect code detection methods based on deep neural network
CN108932192A (en) * 2017-05-22 2018-12-04 南京大学 A kind of Python Program Type defect inspection method based on abstract syntax tree

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103369044A (en) * 2013-07-11 2013-10-23 无锡交大联云科技有限公司 Mobile terminal user network perception diagnosis and treatment method based on cloud knowledge base
WO2016085272A1 (en) * 2014-11-28 2016-06-02 주식회사 파수닷컴 Method for reducing false alarms in detecting source code error, computer program therefor, recording medium thereof
CN104965787A (en) * 2015-07-06 2015-10-07 南京航空航天大学 Three-decision-based two-stage software defect prediction method
CN107967208A (en) * 2016-10-20 2018-04-27 南京大学 A kind of Python resource sensitive defect code detection methods based on deep neural network
CN108932192A (en) * 2017-05-22 2018-12-04 南京大学 A kind of Python Program Type defect inspection method based on abstract syntax tree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"半监督软件缺陷挖掘研究综述";黎铭等;《数据采集与处理》;20160525;第31卷(第1期);第56-64页 *

Also Published As

Publication number Publication date
CN109726120A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109726120B (en) Software defect confirmation method based on machine learning
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
KR101813683B1 (en) Method for automatic correction of errors in annotated corpus using kernel Ripple-Down Rules
CN109492106B (en) Automatic classification method for defect reasons by combining text codes
CN106202380B (en) Method and system for constructing classified corpus and server with system
CN113127339B (en) Method for acquiring Github open source platform data and source code defect repair system
CN108664512B (en) Text object classification method and device
CN113672931B (en) Software vulnerability automatic detection method and device based on pre-training
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
Ciurumelea et al. Suggesting comment completions for python using neural language models
CN113434418A (en) Knowledge-driven software defect detection and analysis method and system
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN113221569A (en) Method for extracting text information of damage test
CN112417893A (en) Software function demand classification method and system based on semantic hierarchical clustering
CN115757695A (en) Log language model training method and system
CN112685374B (en) Log classification method and device and electronic equipment
CN113742396B (en) Mining method and device for object learning behavior mode
Bortnikova et al. Search Query Classification Using Machine Learning for Information Retrieval Systems in Intelligent Manufacturing.
CN115066674A (en) Method for evaluating source code using numeric array representation of source code elements
US6889219B2 (en) Method of tuning a decision network and a decision tree model
Selamat Improved N-grams approach for web page language identification
CN115098389B (en) REST interface test case generation method based on dependency model
CN114202038B (en) Crowdsourcing defect classification method based on DBM deep learning
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph
CN112463974A (en) Method and device for establishing knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant