CN117555554A - Metamorphic relation recommendation method and system based on program code and annotation text learning - Google Patents

Metamorphic relation recommendation method and system based on program code and annotation text learning Download PDF

Info

Publication number
CN117555554A
CN117555554A CN202410034043.5A CN202410034043A CN117555554A CN 117555554 A CN117555554 A CN 117555554A CN 202410034043 A CN202410034043 A CN 202410034043A CN 117555554 A CN117555554 A CN 117555554A
Authority
CN
China
Prior art keywords
program
text
new
similarity
annotation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410034043.5A
Other languages
Chinese (zh)
Inventor
毛澄映
易小荣
温林林
陈积富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Finance and Economics
Original Assignee
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Finance and Economics filed Critical Jiangxi University of Finance and Economics
Priority to CN202410034043.5A priority Critical patent/CN117555554A/en
Publication of CN117555554A publication Critical patent/CN117555554A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a metamorphic relation recommending method and system based on program codes and annotation text learning, wherein the method is characterized in that annotation text information and program code text information are respectively extracted from each program in a given program set to form program feature vectors, corresponding classification prediction models are respectively constructed by utilizing labels of a plurality of MR (magnetic resonance) of the given program set, similarity information among the programs and the labels of the MR are combined and used for training the classification prediction models to obtain MR prediction models, feature vectors of new programs are acquired, similarity scores of the feature vectors are calculated, and similarity score vectors are input into a specific MR prediction model to judge whether metamorphic relation corresponding to the MR prediction models is applicable to the new programs. The invention fully mines the semantic information of the program source code, combines the semantic information with the annotation text information of the program, and utilizes the information of the two parts of the program code and the annotation text to measure the similarity between the programs so as to improve the metamorphic relation prediction performance of the model.

Description

Metamorphic relation recommendation method and system based on program code and annotation text learning
Technical Field
The invention relates to the field of software testing, in particular to an metamorphic relation recommending method and system based on program code and annotation text learning.
Background
Among many software testing techniques, metamorphic testing is widely used in testing activities of software systems lacking in testing predictors, such as artificial intelligence and search engines, as a testing technique capable of effectively reducing the expected output problem pressure in software testing. However, the metamorphic relation is a key of metamorphic testing technology, often needs to be obtained by software testers on the basis of understanding the requirement specifications of the program, and is difficult to fully automatically identify. Therefore, how to simply and efficiently construct a variety of metamorphic relationships for a program has become a hotspot problem in the field of metamorphic test research.
Among the conventional numerous metamorphic relation identification and generation methods, a metamorphic relation prediction method based on a machine learning technology has recently received much attention from researchers. The MRpredT method for predicting the metamorphic relation uses a text mining technology to process the annotation of the program, and the similarity between the programs is measured by using the processed annotation text information, so that the aim of reusing the metamorphic relation is fulfilled, and experimental results show that the method has unusual performance when the matrix operation program is subjected to metamorphic relation prediction.
Code annotations are used for explaining the logic and the functions of a certain code segment, but in some cases, the code annotations cannot completely and correctly explain the functional logic of the code, for example, a programmer has a mistake in understanding the functions of the code logic, and the problems of incomplete writing of the code annotations and the like are solved. In this scenario, the similarity calculation between programs is not fully strict by using the code annotation text information only in the metamorphic relation prediction method MRpredT, so that the metamorphic relation prediction performance of the MRpredT method needs to be improved.
Disclosure of Invention
In view of the above, the present invention is directed to a method and a system for recommending metamorphic relationships based on learning of program codes and annotation texts, so as to solve the above technical problems.
The invention provides an metamorphic relation recommending method based on program code and annotation text learning, which comprises the following steps:
step 1, giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
step 2, respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain an MR prediction model aiming at each specific MR;
step 3, for a new program, predicting whether a specific MR is suitable for the program, firstly extracting the feature vector of the new program by using the step 1, calculating the similarity between the feature vector of the new program and each program feature vector in the given program set by using a similarity calculation method to obtain a similarity score vector, inputting the similarity score vector into a specific MR prediction model, and judging whether the metamorphic relation corresponding to the MR prediction model is suitable for the new program according to the prediction label.
The invention also provides an metamorphic relation recommending system based on the program code and the annotation text learning, wherein the system applies the metamorphic relation recommending method based on the program code and the annotation text learning, and the system comprises the following steps:
the feature extraction module is used for:
giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
the model building module is used for:
respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain the MR prediction model aiming at each specific MR;
program prediction module for:
for a new program, whether a specific MR is applicable to the program is predicted, firstly, a feature extraction module is utilized to extract program feature vectors of the new program, a similarity calculation method is utilized to calculate similarity between the feature vectors of the new program and each program feature vector in a given program set, a similarity score vector is obtained, the similarity score vector is input into a specific MR prediction model, and whether the metamorphic relation corresponding to the MR prediction model is applicable to the new program is judged according to a prediction label.
Compared with the prior art, the invention has the following beneficial effects:
compared with the metamorphic relation prediction method MRpredT based on program annotation, the method fully mines semantic information of the program source code, combines the semantic information with annotation text information of the program, measures similarity between the programs by utilizing the two parts of information of the program code and the annotation text, and improves metamorphic relation prediction performance of the model.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flowchart of a metamorphic relation recommendation method based on program code and annotation text learning according to the present invention;
FIG. 2 is a general architecture diagram of a metamorphic relation recommendation method based on program code and annotation text learning according to the present invention;
FIG. 3 is a block diagram of an metamorphic relation recommendation system based on program code and annotation text learning;
fig. 4 is a diagram showing association between a program and an metamorphic relationship.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
These and other aspects of embodiments of the invention will be apparent from and elucidated with reference to the description and drawings described hereinafter. In the description and drawings, particular implementations of embodiments of the invention are disclosed in detail as being indicative of some of the ways in which the principles of embodiments of the invention may be employed, but it is understood that the scope of the embodiments of the invention is not limited correspondingly.
Referring to fig. 1 and 2, an embodiment of the present invention provides an metamorphic relation recommendation method based on program code and annotation text learning, the method includes the following steps:
step 1, giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
in the step 1, for each program in the program set, the method for extracting annotation text from the program to obtain program annotation text information includes the steps of:
for each program in a given program set, analyzing the program code annotation by using a Java Paser toolkit, and extracting annotation text of the program code annotation;
for a program without annotated text, an annotated text for which program code is generated using an annotated automatic generation technique SeCNN;
for each program in a given program set, performing morphological reduction, stem extraction and deactivation word processing operation on the annotation text by using a text mining technology NLTK toolkit to obtain the program annotation text set.
In the step 1, the method for extracting the key information of the program code to construct the text information of the program code comprises the following steps:
extracting, for each program in a given set of programs, a program code function name and a function name of its calling function from the program code using an Understand static code processing tool;
using a word spelling checking library Pyenchant tool package to split the words of the function names of the program codes and the function names of the calling functions to obtain the text of the program codes;
and performing morphological reduction, stem extraction and disabling word processing operation on the program code function name text by using a text mining technology NLTK toolkit to obtain a program code text set.
In the step 1, the method for integrating the program annotation text information and the program code text information to construct the program feature vector comprises the following steps:
for each program in a given program set, combining the preprocessed program code annotation text set with the program code text set to obtain program text information P\u wrdvec
For each program in a given set of programs, program text information P\u is calculated using TF-IDF techniques wrdvec Weights of each word in the text set, generating a program feature vector P _vec The calculation of the weights of each word in the text set has the following relation:
where N represents the frequency of occurrence of a particular word in the program text, N represents the number of all words in the program text, D represents the set of all program text, D represents the number of text containing the word, TF represents word frequency, IDF represents inverse document frequency, and TF-IDF represents the weight of the word in the set of text.
Step 2, respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain an MR prediction model aiming at each specific MR;
in this embodiment, the similarity calculation method adopts cosine similarity, and the classification prediction model adopts an SVM support vector machine.
The specific scheme of the step is as follows:
for two programs in a given program set, calculating similarity between program feature vectors by utilizing cosine similarity to obtain a similarity score matrix;
and combining the label information of each specific MR with a program similarity score matrix, and respectively training and constructing an SVM two-classifier for each specific MR by utilizing an SVM support vector machine. Example if training is for MR 1 The MR is selected first 1 Combining it with a similarity score matrix into training data for training for MR 1 Is a classification predictive model of (a).
Step 3, for a new program, predicting whether a specific MR is suitable for the program, firstly extracting the feature vector of the new program by using the step 1, calculating the similarity between the feature vector of the new program and each program feature vector in the given program set by using a similarity calculation method to obtain a similarity score vector, inputting the similarity score vector into a specific MR prediction model, and judging whether the metamorphic relation corresponding to the MR prediction model is suitable for the new program according to the prediction label.
The specific scheme of the step is as follows:
for a new program P new It is predicted whether a specific MR is suitable for the procedure, and a new procedure P is extracted by step 1 new Program feature vector P of (2) new_vec
Computing a new program P using cosine similarity new Program feature vector P of (2) new_vec Similarity between the model and each program feature vector in a given program set is used for obtaining a similarity score vector, and the score vector is used as input data to be input into each trained MR prediction model;
each MR prediction model will output a prediction label with only-1 or 1 two predictors, where-1 represents that the MR is not suitable for the new procedure P new 1 then represents a suitable.
Referring to fig. 3, an embodiment of the present invention provides an metamorphic relation recommendation system based on program code and annotation text learning, where the system applies an metamorphic relation recommendation method based on program code and annotation text learning as described above, and the system includes:
the feature extraction module is used for:
giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
the model building module is used for:
respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain the MR prediction model aiming at each specific MR;
program prediction module for:
for a new program, whether a specific MR is applicable to the program is predicted, firstly, a feature extraction module is utilized to extract program feature vectors of the new program, a similarity calculation method is utilized to calculate similarity between the feature vectors of the new program and each program feature vector in a given program set, a similarity score vector is obtained, the similarity score vector is input into a specific MR prediction model, and whether the metamorphic relation corresponding to the MR prediction model is applicable to the new program is judged according to a prediction label.
For the metamorphic relation recommending method based on program code and annotation text learning provided by the invention, the whole flow framework is shown in fig. 2. The method comprises the steps of firstly, analyzing each program code annotation in a program set by using a Java Paser toolkit to obtain annotation text information, if a program without the annotation text exists, executing a second step, otherwise, executing a third step; a second step of generating comments for the program of the non-comment text using a comment automatic generation technique SeCNN, followed by a third step; thirdly, preprocessing the annotation text by using a text mining technology NLTK, and then executing a fourth step; extracting key information from the program code by using an Understand static code processing tool, splitting words of the key information by using a Pyenchant tool, processing the key information by using a text mining technology NLTK, and then executing a fifth step; fifth, integrating the preprocessed annotation text information and the code text information to obtain program text information P _wrdvec Processing program text information P using TF-IDF technique _wrdvec Generating a program text vector P _vec Then performing a sixth step; sixth, calculating by using similarity calculation methodCalculating similarity between program feature vectors to obtain a similarity score matrix, which is sequentially directed totEach MR trains a classification prediction model to predict whether that particular MR is suitable for a new procedure. Example if training is for MR 1 The MR is selected first 1 Combining it with a similarity score matrix into training data for training for MR 1 Is followed by a seventh step; seventh step, for new sequence P new Extracting the program feature vector P by the method in step 1 new_vec Program feature vector P is calculated by using similarity calculation method new_vec Similarity to the feature vectors of each program in a given set of programs to obtain a similarity score vector, which is input as input data to each trained MR prediction model, each MR prediction model outputting a prediction label having only-1 or 1 two predictors, wherein-1 represents that the MR is unsuitable for the new program P new 1 then represents a suitable. The method ends.
FIG. 4 depicts an associated representation of a known program and its corresponding slough relationship. As shown in FIG. 4, assume that m programs are collected and that each program has a sloughing relationship suitable for its sloughing test activity. As in procedure P 1 Two metamorphic relations are suitable for the method, namely MR 1 、MR 3 Program P 1 These two metamorphic relationships can be used for metamorphic testing activities. In total, t different MRs have been constructed in the m procedures, at which time a set of metamorphic relationships can be constructed, with t MRs in the set, as shown on the right of the figure. For a particular MR, if a program can be used for a metamorphic test, it indicates that the MR is suitable for the program, 1 indicates that it is suitable, and-1 indicates that it is unsuitable. As in procedure P 1 Shown, MR 1 And MR (magnetic resonance) 3 If the program is suitable, the corresponding label is 1, otherwise, the label is-1. In the present invention, a program set will be used with a tag (e.g., MR 1 ) Constructing a SVM classification model for the MR to predict whether the MR is suitable for a new procedure for metamorphismThe test is performed, so that t MR prediction models are respectively constructed to be predicted in the invention.
The implementation of the method of the present invention will be described below with reference to a specific example. The present invention requires processing according to program code and annotation text, where the programs removeColumn and insertRow are chosen as example programs, which can be found in Apache Commons Math Library. The main function of the removeColumn function is to delete a column in the matrix that the user specified needs to delete and return the matrix after deleting the column. The function of the insert row function is to add a user-specified vector to a user-specified row of the matrix and return the matrix.
The present invention combines the similarity between the program code and the annotated text two-part information calculation program, and next, the present embodiment will describe in detail the process of extracting program feature vectors and calculating the similarity between programs using removeColumn and insert row as example programs. Table 1 shows that extraction from the removeColumn function and the insertRow function using the JavaParser tool yields their corresponding annotation text, respectively; the removeColumn function and the insert row function both contain original annotations, so that after the original annotations are obtained by using the javapasser, subsequent processing can be performed, but some programs have no original annotations for human or other reasons, and the embodiment automatically generates annotations for the program code without annotations by using the SeCNN model so as to be convenient for subsequent processing. Table 2 gives examples of automatically generating annotations against original annotations using the SeCNN model; table 3 describes the original annotation and NLTK processed annotation text, wherein the words one and from in the original annotation of the removeColumn function are deleted, and the remos are restored to the original shape of the word remove. Similarly, ads in the original annotation in the insert row function is also reduced to the original lexical add, and the word one, preposition to are also deleted.
TABLE 1 original annotation of the removeColumn function and the insertRow function
The embodiment will then use the Understand tool to analyze the static code of the program to obtain its function name and the function name of its calling function, where the naming rule and meaning of the function name are often tightly connected to the main function of the function, and the called function may represent other APIs that the function is often required to use, and if the APIs called by the two functions are mostly the same, the functions and requirements of the two functions are considered to be substantially the same. The static code analysis tool Understand can give the function call condition of the analyzed object in a SimpleInvolutionTrees. Txt file, the embodiment carries out coding analysis on the function call condition to obtain the function information called by the analyzed program, and the analysis results of the removeColumn function and the insertRow function are shown in a table 4. These function name text information are further processed using NLTK to obtain preprocessed program code text information, as with annotation text. However, NLTK toolkit cannot analyze function name and call function effectively because Java function is named hump-like, which combines two or more words, which are not legal words in grammar, resulting in NLTK toolkit not performing operations such as morphological reduction on them. The present embodiment uses the pynchant word spelling check library to effectively split Java humped naming functions, such as the blancofshape function, into blank, of, shape valid words using the pynchant library. Table 5 shows the text information after the NLTK toolkit processing; after text mining processing is carried out on codes and notes of the program respectively, the two parts of information are integrated to obtain complete text information of the program. Each word in the preprocessed text is expressed as a feature in the vector in the subsequent processing, so that the embodiment adopts a simpler strategy when integrating text information, and the preprocessed annotation text and the code text information are directly combined together. The integrated text represents the text information after the program has been processed. Table 6 shows the textual information of the removeColumn function and the insert Row function program after the integration strategy processing.
TABLE 4 Call information for the removeColumn function and the insertRow function
TABLE 5 removeColumn function and insert Row function code text information after pretreatment
TABLE 6 removeColumn and insertRow function program text information after integration
In order to be able to calculate the similarity between programs using the program text information, the program text information has to be converted into a corresponding numerical vector. The present embodiment uses the Term Frequency inverse text Frequency index (Term Frequency-Inverse Document Frequency, TF-IDF) to vector-convert the program text information. The eigenvectors obtained after the removeColumn function and insert row are processed using the TF-IDF model are given in table 7, whose feature sets are: [ add, blank, column, get, insert, matrix, remove, row, set, shape ], each feature weight was rounded to preserve two-bit decimal, and specific data are shown in Table 7.
TABLE 7 eigenvector weights for the removeColumn function and the insertRow function
After extracting the corresponding feature vectors from the program notes and the code text, a similarity calculation method can be used for calculating similarity scores between the feature vectors to form a similarity score matrix between the two programs, and the score matrix is used for training an metamorphic relation classification prediction model. In this embodiment, cosine similarity is used to perform similarity calculation on the program feature vector. Table 8 gives a concrete example of the score obtained by similarity calculation of the similarity calculation method cosine similarity to the feature vectors of the removeColumn function and the insertRow function, and the score is rounded to preserve three decimal places.
TABLE 8 cosine similarity score for removeColumn function and insert Row function
Thus far, it has been described in detail how to extract program feature vectors from program codes and annotations and how to calculate similarities between programs, and then the similarity scores between the feature vectors can be calculated using similarity calculation methods cosine similarity to construct a similarity score matrix between two programs, which can be used to train the metamorphic relationship classification prediction model.
In order to verify the effectiveness of the invention, the embodiment develops repeated experiments of MRpredT and the invention aiming at the MRpredT method for recommending the metamorphic relation by adopting a machine learning technology and a text mining technology. In the experimental stage, the present example first constructs a dataset. The data set comprises a program set and an metamorphic relation set, wherein 114 programs written by Java language are collected in the embodiment, and the information of the program set is shown in table 9; the information of the metamorphic relationship set is shown in table 10.
For a particular sloughing relationship, the program under test is only suitable or unsuitable for two states (-1 or 1). In this embodiment, label calibration is manually performed on each of the 10 metamorphic relationships in the collected set of test programs, where "1" represents that the metamorphic relationship is suitable for the test program, and "1" represents that the metamorphic relationship is unsuitable for the test program. This section labels the 10 MR for a total of 114 programs in the 5 program libraries set.
In the experimental setup phase, the present embodiment makes the following settings:
(1) Each classification prediction model uses the open source machine learning library scikit-learn from the python programming language, and the parameters of the model all use default parameters in the library.
(2) Using AUC values with F1-scoreThe index evaluates the model performance.
AUThe C evaluation index is calculated based on the true positive rate and the false positive rate of the model, the higher the AUC value is, the better the performance of the model is represented, and the AUC value is larger than 0.9, so that the model has a good prediction effect. F1-scoreIs when the beta value is equal to 1PrecisionAndRecallharmonic averaging of evaluation indexes, synthesisPrecisionAndRecallevaluation ability of the model, therefore, F1-scoreAs a model evaluation index.
(3) UsingkThe cross validation method is folded to validate the model. For a prediction model constructed by a specific metamorphic relationship, the embodiment uses a 10-time 10-fold cross-validation method to run the model, and 100 AUC values and F1-score
(4) In this example, 100 times of AUC values and F1-scoreTo more intuitively analyze the performance gap between the predictive models. In addition, in order to statistically measure the magnitude of the difference in predictive effect between predictive models, the present embodiment uses Wilcoxon rank-sum test (Wilcoxon rank-sum test) to measure the AUC value and F1 of the predictive models-scoreThe results were compared to analyze the significance of the predicted effect differences between the models. The present example sets the confidence interval of the Wilcoxon rank sum test to 95%, i.e., when p-valueAbove 0.05, AUC values or F1-scoreThere is no significant difference in a statistical sense when p-valueBelow 0.05, a statistically significant difference is demonstrated. Wherein the effect size (effect size) is used to measure the size between the significance differences, and if the effect size is smaller than 0.2, the difference is considered to be small; the effect size is between 0.2 and 0.5, which means that the difference is small; the effect size is medium between 0.5 and 0.8; and a gap greater than 0.8 is considered to be large.
Table 9 program set information table
TABLE 10 metamorphic relationship set information table
The MRpredT method is reproduced using the python language. The invention uses SVM classifier and cosine similarity calculation method, and uses SeCNN model to automatically generate annotation for the program code without annotation. Table 11 shows the AUC mean, difference and rank sum test results of the comparative MRpredT of the present invention. The invention is applied to MR 1 、MR 3 、MR 5 、MR 6 、MR 8 、MR 9 、MR 10 AUC average values under equal metamorphic relation prediction are all 0.89 and above, the highest score is 0.96, and AUC score average values in the rest three MR predictions are also 0.84 to 0.87, which shows that the metamorphic relation prediction model constructed by the method has an unusual prediction effect. The MRpredT method AUC score means is lower than that of the invention, which is only in MR 9 、MR 10 The average value of AUC scores in the prediction of the two metamorphic relations is higher than 0.9, and the average value of scores is lower than 0.88 under the other conditions. The AUC score average value difference value of the invention and the MRpredT method is analyzed, and the invention predicts that the AUC score average value is higher than the MRpredT method in the metamorphic relation of all MRs, and the invention combines the rank and the test result, and can be known, except the MR 8 Under the condition that the difference value between the two is not significant, the AUC scores in the prediction of the rest MR have statistically significant differences, and the differences are larger according to the effect size.
Comparison of MRpredT F1-scoreThe mean, difference and rank sum test results are given in table 12. Wherein the invention is in MR 5 、MR 8 、MR 9 F1-scoreThe score average value is above 0.9, and is in MR 4 、MR 6 、MR 7 F1-scoreThe mean score is lower than 0.55 and F1-scoreThe score means is between 0.8 and 0.85. MRpredT method in MR 5 、MR 8 F1-in prediction of two metamorphic relationshipscoreThe mean score is above 0.9, but the method is in MR 4 、MR 7 F1-scoreThe score average was 0. Further analysis revealed that in F1-scoreThe MRpredT method is used in MR in terms of the difference of the score means 6 、MR 8 、MR 10 Three metamorphic switchesF1-scoreThe average score is higher than that of the invention, the difference is 0.05, 0.01 and 0.01 respectively, but the invention and the MRpredT method are combined in MR 1 、MR 5 、MR 6 、MR 8 、MR 9 、MR 10 P-valueA value greater than 0.05, and in the prediction of the remaining MR, F1-scoreThe scores are all statistically significantly better than the MRpredT method.
TABLE 11 comparison of AUC means and differences for MRpredT and rank and test results for the invention
TABLE 12 comparison of F1-scoreMean and difference and rank sum test results
In the AUC evaluation index, the average value of AUC scores is higher than that of the MRpredT method, and the rank and the test result show that the invention is only applied to MR 8 When the method is not significantly different from the MRpredT method, the method is significantly superior to the MRpredT method in the rest MR. And at F1-scoreIn the index, although the invention was applied to F1-scoreScore mean was lower than MRpredT, but F1-scoreThe scores have no significant difference, and the metamorphic relation prediction performance of the invention is not lower than that of the MRpredT method under other MR conditions. Comprehensive analysis shows that compared with the MRpredT method, the method has better prediction effect in the prediction of metamorphic relation.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (7)

1. An metamorphic relation recommending method based on program code and annotation text learning is characterized by comprising the following steps:
step 1, giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
step 2, respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain an MR prediction model aiming at each specific MR;
step 3, for a new program, predicting whether a specific MR is suitable for the program, firstly extracting the feature vector of the new program by using the step 1, calculating the similarity between the feature vector of the new program and each program feature vector in the given program set by using a similarity calculation method to obtain a similarity score vector, inputting the similarity score vector into a specific MR prediction model, and judging whether the metamorphic relation corresponding to the MR prediction model is suitable for the new program according to the prediction label.
2. The method for recommending metamorphic relation based on learning of program code and comment text according to claim 1, wherein in said step 1, for each program in the program set, the method for extracting comment text from the program to obtain program comment text information comprises the steps of:
for each program in a given program set, analyzing the program code annotation by using a Java Paser toolkit, and extracting annotation text of the program code annotation;
for a program without annotated text, an annotated text for which program code is generated using an annotated automatic generation technique SeCNN;
for each program in a given program set, performing morphological reduction, stem extraction and deactivation word processing operation on the annotation text by using a text mining technology NLTK toolkit to obtain the program annotation text set.
3. The method for recommending metamorphic relation based on learning of program code and comment text according to claim 2, wherein in said step 1, the method for extracting program code key information to construct program code text information comprises the steps of:
extracting, for each program in a given set of programs, a program code function name and a function name of its calling function from the program code using an Understand static code processing tool;
using a word spelling checking library Pyenchant tool package to split the words of the function names of the program codes and the function names of the calling functions to obtain the text of the program codes;
and performing morphological reduction, stem extraction and disabling word processing operation on the program code function name text by using a text mining technology NLTK toolkit to obtain a program code text set.
4. A method for recommending metamorphic relation based on learning of program code and comment text according to claim 3, wherein in said step 1, the method for integrating program comment text information with program code text information to construct a program feature vector comprises the steps of:
for each program in a given program set, combining the preprocessed program code annotation text set with the program code text set to obtain program text information P\u wrdvec
For each program in a given set of programs, program text information P\u is calculated using TF-IDF techniques wrdvec Weights of each word in the text set, generating a program feature vector P _vec The calculation of the weights of each word in the text set has the following relation:
;/>;/>
where N represents the frequency of occurrence of a particular word in the program text, N represents the number of all words in the program text, D represents the set of all program text, D represents the number of text containing the word, TF represents word frequency, IDF represents inverse document frequency, and TF-IDF represents the weight of the word in the set of text.
5. The method for recommending metamorphic relation based on learning of program codes and annotated text according to claim 4, wherein in the step 2, the similarity calculation method uses cosine similarity, and the classification prediction model uses SVM classifier.
6. The method according to claim 2, wherein in the step 3, for a new program, it is predicted whether a specific MR is suitable for the new program, firstly, the feature vector of the new program is extracted in step 1, and similarity between the feature vector of the new program and each program feature vector in the given program set is calculated by using a similarity calculation method to obtain a similarity score vector, the similarity score vector is input into a specific MR prediction model, and the method for determining whether the ecdysis corresponding to the MR prediction model is suitable for the new program according to the prediction label specifically comprises the following steps:
for a new program P new It is predicted whether a specific MR is suitable for the procedure, and a new procedure P is extracted by step 1 new Program feature vector P of (2) new_vec
Computing a new program P using cosine similarity new Program feature vector P of (2) new_vec Similarity between the model and each program feature vector in a given program set is used for obtaining a similarity score vector, and the score vector is used as input data to be input into each trained MR prediction model;
each MR prediction model will output a prediction label with only-1 or 1 two predictors, where-1 represents that the MR is not suitable for the new procedure P new 1 then represents a suitable.
7. An metamorphic relation recommending system based on program code and annotation text learning, characterized in that the system applies the metamorphic relation recommending method based on program code and annotation text learning according to any one of claims 1 to 6, the system comprising:
the feature extraction module is used for:
giving a program set and a corresponding metamorphic relation set, marking the metamorphic relation in the metamorphic relation set as MR, and extracting annotation text of each program in the program set to obtain program annotation text information;
extracting code key information of each program in the program set to construct program code text information, integrating the program annotation text information with the program code text information, and constructing a program feature vector;
the model building module is used for:
respectively constructing a corresponding classification prediction model aiming at the label of each MR in the metamorphic relation set, calculating the similarity score between two program feature vectors in the program set to obtain a similarity score matrix, and combining the label of the MR and the similarity score matrix to train the classification prediction model of the corresponding MR to obtain the MR prediction model aiming at each specific MR;
program prediction module for:
for a new program, whether a specific MR is applicable to the program is predicted, firstly, a feature extraction module is utilized to extract program feature vectors of the new program, a similarity calculation method is utilized to calculate similarity between the feature vectors of the new program and each program feature vector in a given program set, a similarity score vector is obtained, the similarity score vector is input into a specific MR prediction model, and whether the metamorphic relation corresponding to the MR prediction model is applicable to the new program is judged according to a prediction label.
CN202410034043.5A 2024-01-10 2024-01-10 Metamorphic relation recommendation method and system based on program code and annotation text learning Pending CN117555554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410034043.5A CN117555554A (en) 2024-01-10 2024-01-10 Metamorphic relation recommendation method and system based on program code and annotation text learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410034043.5A CN117555554A (en) 2024-01-10 2024-01-10 Metamorphic relation recommendation method and system based on program code and annotation text learning

Publications (1)

Publication Number Publication Date
CN117555554A true CN117555554A (en) 2024-02-13

Family

ID=89823492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410034043.5A Pending CN117555554A (en) 2024-01-10 2024-01-10 Metamorphic relation recommendation method and system based on program code and annotation text learning

Country Status (1)

Country Link
CN (1) CN117555554A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233668A1 (en) * 2006-04-03 2007-10-04 International Business Machines Corporation Method, system, and computer program product for semantic annotation of data in a software system
CN112835797A (en) * 2021-02-03 2021-05-25 杭州电子科技大学 Metamorphic relation prediction method based on program intermediate structure characteristics
CN114881053A (en) * 2022-06-17 2022-08-09 西安邮电大学 Sentence granularity disintegration test method for neural machine translation system
US20230259359A1 (en) * 2022-02-14 2023-08-17 Jpmorgan Chase Bank , N.A. Systems and method for automated code analysis and tagging
CN117215935A (en) * 2023-09-12 2023-12-12 南京工业大学 Software defect prediction method based on multidimensional code joint graph representation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233668A1 (en) * 2006-04-03 2007-10-04 International Business Machines Corporation Method, system, and computer program product for semantic annotation of data in a software system
CN112835797A (en) * 2021-02-03 2021-05-25 杭州电子科技大学 Metamorphic relation prediction method based on program intermediate structure characteristics
US20230259359A1 (en) * 2022-02-14 2023-08-17 Jpmorgan Chase Bank , N.A. Systems and method for automated code analysis and tagging
CN114881053A (en) * 2022-06-17 2022-08-09 西安邮电大学 Sentence granularity disintegration test method for neural machine translation system
CN117215935A (en) * 2023-09-12 2023-12-12 南京工业大学 Software defect prediction method based on multidimensional code joint graph representation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘宏志: "《推荐系统》", 31 May 2020, 北京:机械工业出版社, pages: 57 - 58 *
易小荣: "基于程序代码与注释文本学习的蜕变关系推荐方法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, no. 2, 15 February 2023 (2023-02-15), pages 2 *

Similar Documents

Publication Publication Date Title
US20220405592A1 (en) Multi-feature log anomaly detection method and system based on log full semantics
Singh et al. A decision tree based word sense disambiguation system in Manipuri language
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
CN115062148A (en) Database-based risk control method
CN110750297B (en) Python code reference information generation method based on program analysis and text analysis
JP2005181928A (en) System and method for machine learning, and computer program
CN112764762B (en) Method and system for automatically converting standard text into computable logic rule
CN112417852B (en) Method and device for judging importance of code segment
WO2022226716A1 (en) Deep learning-based java program internal annotation generation method and system
CN117555554A (en) Metamorphic relation recommendation method and system based on program code and annotation text learning
KR102265947B1 (en) Method and apparatus for providing information based on machine learning
KR102520305B1 (en) Deep learning document analysis system and method based on keyword frequency and section centrality analysis
KR20220068937A (en) Standard Industrial Classification Based on Machine Learning Approach
CN111199170B (en) Formula file identification method and device, electronic equipment and storage medium
Panthum et al. Generating functional requirements based on classification of mobile application user reviews
CN111859896B (en) Formula document detection method and device, computer readable medium and electronic equipment
KR102131423B1 (en) Automatic compile method and apparatus of documents
CN112133308A (en) Method and device for multi-label classification of voice recognition text
US20230385037A1 (en) Method and system for automated discovery of artificial intelligence (ai)/ machine learning (ml) assets in an enterprise
Bago et al. Using machine learning for language and structure annotation in an 18th century dictionary
Mitra Classification on sentence embeddings for legal assistance
US20220179894A1 (en) Automatic document classification
US11783112B1 (en) Framework agnostic summarization of multi-channel communication
CN112100314B (en) API course compilation generation method based on software development question-answering website
Santoso Classifier combination for telegraphese restoration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination