CN115712576A - Software test type recommendation method and system based on open source software defect problem - Google Patents

Software test type recommendation method and system based on open source software defect problem Download PDF

Info

Publication number
CN115712576A
CN115712576A CN202211510958.6A CN202211510958A CN115712576A CN 115712576 A CN115712576 A CN 115712576A CN 202211510958 A CN202211510958 A CN 202211510958A CN 115712576 A CN115712576 A CN 115712576A
Authority
CN
China
Prior art keywords
software
defect
type
test
orthogonal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211510958.6A
Other languages
Chinese (zh)
Inventor
吴俊爽
王嬴超
白云
李皓宇
安鹏伟
曲天润
宋志强
陈俊英
闫宇航
赵菲
康建涛
刘博�
张榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinghang Computing Communication Research Institute
Original Assignee
Beijing Jinghang Computing Communication Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinghang Computing Communication Research Institute filed Critical Beijing Jinghang Computing Communication Research Institute
Priority to CN202211510958.6A priority Critical patent/CN115712576A/en
Publication of CN115712576A publication Critical patent/CN115712576A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention belongs to the field of software testing, and relates to a software testing type recommendation method and system based on the problem of open source software defects. The method comprises the steps of obtaining a defect list of open source software, constructing an orthogonal defect classification model to determine orthogonal defect types of software defects, extracting software defect keywords based on statistical characteristics, constructing the open source software and software defects, software defect types, software defect keywords and a heterogeneous association diagram of software test types corresponding to the open source software, calculating the probability of migrating from software nodes to test type nodes by using a random migration algorithm, and taking the probability as the importance weight of the software test types, so that automatic software test type recommendation is realized, and the efficiency and the quality of a software test process are improved.

Description

Software test type recommendation method and system based on open source software defect problem
Technical Field
The invention belongs to the field of software testing, and particularly relates to a software testing type recommendation method and system based on the problem of open source software defects.
Background
With the rapid development of information technology, information system software widely uses new technologies such as cloud computing and big data to greatly improve the automation, visualization and intelligence levels of the system. However, due to complexity, hierarchy and the like of the existing technologies such as cloud computing and big data, the existing information system is realized by integrating the existing cloud computing and big data open source software or framework.
With the popularity and the rapid development of open source software, the number of software defects of the open source software is increased dramatically, and in order to improve the efficiency and the quality of a software testing process, the test type recommendation of the open source software test is a problem to be considered and solved.
Disclosure of Invention
In view of the above analysis, the present invention aims to provide a software test type recommendation method and system based on the problem of open source software defects, which construct a heterogeneous association graph of open source software and software defects, defect categories, keywords, and test types to be recommended, calculate a probability of migrating from a software node to a test type node by using a random migration algorithm, and use the probability as an importance weight of a software test type, thereby implementing automatic software test type recommendation.
On one hand, the invention provides a software test type recommendation method based on open source software defect problems, which specifically comprises the following steps:
acquiring a defect record of given open source software and cleaning the defect record to obtain a cleaned defect type and a cleaned software defect;
based on the cleaned defect type and the cleaned software defect, obtaining the labeled software defect by using an orthogonal defect classification model, and determining the orthogonal defect type and the corresponding initial test type;
extracting key words from the labeled software defects to obtain key words;
constructing a software heterogeneous association graph based on the open source software, the labeled software defects, the orthogonal defect types, the keywords and the initial test types;
and obtaining the test type recommendation of the given open source software by using a random walk algorithm based on the heterogeneous correlation diagram.
Further, the building of the software heterogeneous dependency graph includes:
and (3) constructing nodes of the graph: software, defects, defect categories, keywords, and test types;
constructing a { software, defect } edge based on the open source software and the labeled software defect;
constructing a { defect, defect type } edge based on the labeled software defect and the orthogonal defect type;
based on the labeled software defects and the keywords, establishing { defects, keywords } and { keywords, defects } edges;
and constructing a { defect type, test type } edge based on the mapping relation between the orthogonal defect type and the initial test type.
Further, the obtaining of the test type recommendation of the given open source software by using a random walk algorithm based on the heterogeneous dependency graph includes:
deleting branches of the heterogeneous association graph, wherein the branches cannot reach the test type node from the software node;
finding out a path from the software node to the test type node and calculating the probability of the wandering path;
calculating a recommendation confidence coefficient of a test type corresponding to the open source software based on the path and the probability;
and sequencing the test types corresponding to the open source software based on the recommendation confidence coefficient to obtain the test type recommendation of the given open source software.
Further, based on the cleaned defect type and the cleaned software defect, an orthogonal defect classification model is used for obtaining the labeled software defect, and the orthogonal defect type and the initial test type are determined; the method comprises the following steps:
constructing an orthogonal defect classification model and training to obtain a trained orthogonal defect classification model;
determining the orthogonal defect type corresponding to the software defect by using a trained orthogonal defect classification model and marking the orthogonal defect type based on the cleaned defect type and the cleaned software defect to obtain the marked software defect;
and determining the initial test type of the software defect based on the orthogonal defect type corresponding to the marked software defect.
Further, extracting keywords from the labeled software defects to obtain keywords includes:
extracting keywords of the professional field;
obtaining all participles based on the labeled software defects;
calculating an IDF value of each participle;
calculating the TF value of each word segmentation to each defect;
calculating the TF-IDF value of each part word for each defect;
keywords are determined based on the TF-IDF values.
Further, acquiring and cleaning the defect record of the given open source software, and acquiring the category of the cleaned defects and the cleaned software defects including software defect category cleaning and software defect cleaning.
Further, the software defect cleaning comprises cleaning a defect description; the cleaning of the defect description comprises deleting special symbols, deleting links and deleting large code segments aiming at the defect description to obtain the cleaned defect.
Further, the orthogonal defect classification model comprises an input layer, a BEAT coding layer, a Softmax layer and a loss function layer.
Further, the orthogonal defect categories include assignments, checks, interfaces, algorithms, functions, timing, software configuration management, and documentation.
On the other hand, the invention also provides a software test type recommendation system based on the open source software defect problem, which comprises the following steps:
the defect cleaning module is used for acquiring and cleaning the defect record of the given open source software to obtain the category of the cleaned defect and the defect of the cleaned software;
the orthogonal defect classification module is used for obtaining the labeled software defects by using an orthogonal defect classification model based on the cleaned defect types and the cleaned software defects, and determining the orthogonal defect types and the initial test types of the labeled software defects;
the keyword extraction module is used for extracting keywords from the labeled software defects to obtain keywords;
the heterogeneous association diagram module is used for constructing a software heterogeneous association diagram based on the open source software, the labeled software defects, the orthogonal defect categories, the keywords and the initial test types;
and the test type recommendation module is used for obtaining the given test type recommendation of the open source software by using a random walk algorithm based on the heterogeneous association diagram.
The invention can realize at least one of the following beneficial effects:
the method comprises the steps of constructing an orthogonal defect classification model, labeling orthogonal defect classification labels on a defect list of open source software through the orthogonal defect classification model, determining orthogonal defect types corresponding to the defects, and obtaining software test types corresponding to the open source software according to the corresponding relation between the orthogonal defect types and the software test types.
By extracting keywords of open source software defects and inferring the similarity between the software, a richer inference path is provided for software test type inference.
The method comprises the steps of obtaining a defect list of open source software, constructing an orthogonal defect classification model to determine an orthogonal defect type of software defects, extracting software defect keywords based on statistical characteristics, constructing a heterogeneous association graph of the software, the defects, the defect type, the keywords and test types, calculating the probability of migrating from software nodes to the test type nodes by using a random migration algorithm, and taking the probability as the importance weight of the software test types, so that automatic software test type recommendation is realized, and the efficiency and the quality of a software test process are improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The drawings, in which like reference numerals refer to like parts throughout, are for the purpose of illustrating particular embodiments only and are not to be considered limiting of the invention.
FIG. 1 is a flowchart of a method for recommending software test types according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a self-training framework according to embodiment 1 of the present invention;
FIG. 3 is a schematic diagram illustrating the necessity principle of keyword extraction according to embodiment 1 of the present invention;
fig. 4 is a schematic diagram of a random walk path in embodiment 1 of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
Method embodiment
Example 1
The invention discloses a software test type recommendation method based on open source software defect problems, which specifically comprises the following steps:
and S01, acquiring a defect record of the given open source software and cleaning the defect record to obtain the category of the cleaned defect and the defect of the cleaned software.
Specifically, for a given open-source software, a common code crawling tool is used for crawling a software defect hosting address to obtain a defect record of the software.
Specifically, the cleaning of the software defect record comprises class cleaning and software defect cleaning of the software defect.
In particular, the class cleaning of the software defects comprises extracting hundreds of defect classes as representative fewer software defect classes through feature recognition.
Specifically, the class cleaning of the software defect comprises:
1. converting all the character strings of the defect category into lower case;
2. judging whether the character string of the defect type contains 'kid/': if yes, performing the step 3; if not, performing the step 4;
3. for the defect type character string containing 'kid/', replacing the 'kid/' with an empty string '' to obtain a replaced character string, wherein the replaced character string is the new software defect type.
4. For the defect type string without 'kid/' if the software defect type is not the four types of bug, default, extended and query, the software defect type is replaced by default, that is, the new software defect type corresponding to the software defect type is default.
Specifically, the software defect cleaning refers to cleaning the defect description, and includes deleting a special symbol, deleting a link, and deleting a large code segment for the defect description to obtain a cleaned defect.
Illustratively, given a managed address https:// githu, com/aliba/nas corresponding to a software Nacos for dynamic service discovery and configuration and service management, and a managed address of a software defect of https:// githu, com/aliba/nas/iss, a query sourcing tool is used to obtain a software defect list and resolve multidimensional information of each software defect.
Specifically, the multi-dimensional information of each defect is represented by { id, source, title, type, link, answered, status, author, content, exposed _ time, late _ time, comment _ number }, where id represents a defect number, source represents a defect source, title represents a defect title, type represents a category of the defect, link represents an access address of the defect, answered represents whether the defect is replied, status represents a state of the defect, author represents a gitub account nickname of a questioner, content represents specific content of the defect, opend _ time represents a first time the defect is proposed, late _ time represents a latest modified time of the defect, and comment _ number represents a number of times of replying.
Illustratively, 735 classes are converted to 14 classes by class clean-up, as shown in table 1:
TABLE 1 software Defect Categories
Figure SMS_1
Figure SMS_2
Figure SMS_3
And S02, based on the cleaned defect type and the cleaned software defect, obtaining the labeled software defect by using an orthogonal defect classification model, and determining the orthogonal defect type and the initial test type of the labeled software defect.
Specifically, an orthogonal defect classification model is constructed and trained, and based on the cleaned defect types and the cleaned software defects, orthogonal defect type labels are labeled on the cleaned software defects by using the trained orthogonal defect classification model to obtain the labeled software defects.
Specifically, the software defect classes of the orthogonal defect classification system include assignment, inspection, interfaces, algorithms, functions, timing sequences, software configuration management and documents, and totally eight non-overlapping and mutually orthogonal classes.
Specifically, an orthogonal defect classification model is constructed, and comprises an input layer, a BEAT coding layer, a Softmax layer and a loss function layer. The input of the model is the cleaned software defect of which the cleaned defect type is determined, and the output of the model is the software defect type of the orthogonal defect classification system.
Specifically, a self-training frame is adopted to train the orthogonal defect classification model by means of a small number of labeled samples and a large number of unlabeled samples. Wherein a small number of labeled samples are represented as (D) train ={{x (1) ,y (1) },…,{x (N) ,y (N) }}),D train Representing a sample set, wherein x represents a labeled sample, y represents a prediction classification corresponding to the labeled sample, and N represents the number of samples with labels; a number of unlabeled samples are shown as
Figure SMS_4
Figure SMS_5
Denotes unlabeled sample, N Indicating the number of unlabeled samples. Wherein, the self-training framework is as shown in fig. 2, and a small number of samples are used for training the classification model; then predicting the defect type of the unlabeled sample by using the trained classification model; and taking the predicted defect type as a type label of the unlabeled sample, adding a small amount of original labeled samples and newly labeled samples together to form a new training sample, and then training the model.
Specifically, the iterative process of self-training includes the following steps:
1. setting initial training samples
Figure SMS_6
And initial Round number Round =0; wherein,
Figure SMS_7
representing that the initial training sample is a sample with a label; round =0 indicates that the number of initialization iteration rounds from training is 0.
2. Using training samples
Figure SMS_8
Minimizing loss function
Figure SMS_9
And (5) training the model.
3. Classifying the unlabeled samples by using the model trained in the step 2; for each class of samples, the confidence of the class is the average of the probabilities predicted to be the class of samples, and if the probability of the newly labeled sample is greater than the confidence, the new sample is added to the newly labeled sample
Figure SMS_10
Performing the following steps; wherein, the confidence is a preset threshold value, and the range is (0, 1); preferably, 0.5 is selected as the confidence value.
4. If step 3 is updated
Figure SMS_11
And then repeats from step 2. Otherwise, the iteration is terminated.
Specifically, given a defect x = { id, source, title, type, link, answer, status, author, new content, openend _ time, latest _ time, comment _ number }, its input vector is obtained, specifically, the input vector is an input vector required for extracting 3 parts of contents such as title, type, new content, etc. from the original features of x, and converting into BERT for defect orthogonal classification model training. Its input vector is represented as [ [ CLS ]],t 1 ,t 2 ,…,t |T| ,one_hot(type),c 1 ,c 2 ,,…,c |C| ,[SEP]]Wherein t is 1 ,t 2 ,…,t |T| Is a string of title, one _ hot (type) is a 14-dimensional 0-1 vector, c 1 ,c 2 ,,…,c |C| Is a character string of the defect problem description content. Due to the fact thatThe dimensions of this entire input vector are: l T | +14+ | C | +1. Coding the sequence by using a multi-language BERT model, and taking the output vector of the model BERT as a final characteristic vector
Figure SMS_12
Then the class of the sample is classified as:
Figure SMS_13
wherein W represents a classification weight matrix; d _ out represents the vector feature dimension of the BERT output; b represents a bias vector; r represents a vector set; wx represents the W vector multiplied by the x vector;
Figure SMS_14
Figure SMS_15
b∈R 8×1
selection prediction
Figure SMS_16
The class with the highest probability in the vector is taken as the classification class of the defect. Given the labeled class y of sample x, then the loss function for the class is:
Figure SMS_17
wherein,
Figure SMS_18
denotes the loss function, K =8 denotes the number of classification categories, and lower case K denotes the index.
Specifically, the loss function is the cross entropy of the real classification vector and the classification prediction vector, the cross entropy value range is [0,1], wherein if and only if the prediction vector is completely consistent with the classification vector, 0 is taken, which indicates that the model correctly predicts the sample category.
Specifically, when the loss function of the model on the training set does not decrease and the accuracy of the model on the test set does not increase, the model training is completed. Where 20% of the training set is extracted as the test set.
Specifically, after the model is trained, orthogonal defect type label labeling is carried out on the cleaned software defects by using the trained model, so as to obtain the labeled software defects.
Specifically, after the orthogonal defect type corresponding to the software defect is determined, the corresponding initial test type is also determined. Generally, a given open source software defect corresponds to multiple orthogonal defect classes, and thus, the corresponding initial test type is also multiple.
Specifically, the mapping relationship between the orthogonal defect type and the test type is shown in table 1:
TABLE 1 mapping relationship between orthogonal defect classes and test types
Orthogonal defect classes Type of test
Assignment, algorithm Static analysis
Examination of Logic test
Interface Interface class testing
Function(s) Functional class testing
Time sequence Timing test
Software configuration management Data review
Document Document review
Specifically, an initial test type corresponding to the labeled software defect is determined based on a mapping relation between the orthogonal defect classification and the test type.
Illustratively, as shown in FIG. 3:
both defect 1 and defect 2 in software 1 correspond to defect category 1, so test type 1 is recommended. In software 2, defect 3 and defect 4 both correspond to defect type 1, and defect 5 corresponds to defect type 7, so test type 1 and test type 7 are recommended.
Problems can thus be found: both the software 1 and the software 2 have defects corresponding to the defect category 1, and it can be presumed that the software 1 and the software 2 should have a high similarity, and therefore the defect category 7 determined by the defect 5 in the software 2 should also be recommended to the software 1, but the problem cannot be solved only by the step S02.
For example, a large piece of software a has a certain software defect problem in the configuration item software B, but the user reports only the possible defects at software B, but does not indicate them in software a synchronously; or indicate in software a that software B has some defect but is not indicated in software B.
Therefore, the method for recommending the test type by determining the defect type of the software in step S02 cannot feed back different defect types of two similar software to the software of the other party through backtracking, and therefore, the keyword needs to be extracted in step S03 to construct a defect-keyword-defect path to solve the problem.
And S03, extracting keywords from the labeled software defects to obtain the keywords.
Specifically, the keyword extraction refers to extracting keywords in the software defect problem description, and the keywords can express the theme concept of the software defect.
Specifically, determining keywords based on TF-IDF; among them, TF-IDF is a common method for evaluating the importance degree of a class of words to a document.
Specifically, extracting keywords from the labeled software defects to obtain keywords; the method comprises the following steps:
1. and extracting keywords in the professional field.
Specifically, professional terms can be extracted from national standards such as GB/T11457-2006 information technology software engineering terms, GB/T32400-2015 information technology cloud computing overview and vocabularies, GB/T35295-2017 information technology big data terms and the like and added into a word segmentation dictionary, and the accuracy of word segmentation is improved.
2. And obtaining all the participles based on all the labeled software defects.
Specifically, a word segmentation tool is used for segmenting all the labeled software defect title texts and the labeled description texts, corresponding stop words are deleted, and the whole word set v = { v } after word segmentation is obtained 1 ,v 2 ,…,v |v| V denotes a set, | v | denotes the number of elements in the set, and v denotes an element in the set, i.e., a word.
3. The IDF value for each word is calculated.
Specifically, the inverse defect frequency of each word is calculated:
Figure SMS_19
wherein f is v The expression v represents how many software defects occur, | X | represents the number of software defects.
4. The TF value for each word for each defect is calculated.
Specifically, given a word v and a defect x, the word frequency represents the number of times the word appears in the defect, and the normalization is given by:
Figure SMS_20
wherein f is x (v) Representing word frequency, v i Indicating the defectThe word of (a) is selected,
Figure SMS_21
indicating the word frequency sum of all words of the defect.
5. And calculating TF-IDF values of each part word for each defect, and determining the keywords based on the TF-IDF values.
Specifically, for a given word v and corresponding defect x, if v appears in the defect description of the defect, then the TF-IDF value of v for that defect x is: TF-IDF (v, x) = TF (v, x) × IDF (v); if v appears in the defect header, then the TF-IDF value of v for defect x is: TF-IDF (v, x) = λ TF (v, x) × IDF (v), where λ represents a weighting factor and takes an empirical value of 1.2.
Specifically, the higher the TF-IDF score is, the higher the importance degree of the word to the defect is; wherein the words appearing in the title of a software defect are added with a weighting factor lambda since the title expresses the connotation of the entire defect.
Specifically, the TF-IDF scores of all the participles of each defect are sorted in a descending order, and the participles corresponding to the top 5 values are selected as keywords; if the number of the participles in the defect is less than 5, the participles are all used as the key word of the defect.
S04, based on open source software, labeled software defects and orthogonal defect types
And constructing a software heterogeneous association graph by the keywords and the test types mapped with the orthogonal defect categories.
Specifically, constructing the heterogeneous dependency graph includes constructing nodes of the graph and edges of the graph.
Specifically, the heterogeneous association graph comprises five types of nodes: software, defects, defect categories, keywords, test types.
Specifically, the method comprises the following steps:
the software corresponds to the open source software given in S01;
the defects correspond to the labeled software defects obtained in the step S02 of the given open source software;
the defect type corresponds to the orthogonal defect type indicated in step S02;
the keywords correspond to the keywords obtained by the step S03 of the given open source software;
the test types correspond to the test types listed in table 1 in step S03.
Specifically, the edge construction process comprises the following steps:
constructing a { software, defect } edge based on the open source software and the labeled software defect;
constructing a { defect, defect type } edge based on the labeled software defect and the orthogonal defect type;
based on the labeled software defects and keywords, establishing { defects, keywords } and { keywords, defects } edges;
and constructing a { defect type, test type } edge based on the mapping relation between the orthogonal defect type and the test type.
In particular, the nodes and edges form a complete heterogeneous dependency graph.
And S05, obtaining the given test type recommendation of the open source software by using a random walk algorithm based on the heterogeneous association diagram.
Specifically, random walk is performed based on the heterogeneous association diagram constructed in S04, a walk path and a probability corresponding to the test type corresponding to the open-source software are obtained, the confidence of the test type of the open-source software is obtained based on the path and the probability, and the recommendation of the test type of the open-source software is obtained based on the confidence.
Specifically, starting from a software node, the software node walks according to the edges of the heterogeneous association graph, reverse walking cannot be performed, and the software node stops walking if and only if the software node walks to the test type node.
Illustratively, as shown in fig. 4, obtaining the test type recommendation of the software 2 specifically includes the following steps:
1. and (4) pruning, specifically, deleting the branch of the heterogeneous dependency graph in the graph, which cannot be reached by the software.
Illustratively, since keyword 2 only links one defect 5 in fig. 4, if the random walker walks to the node, a test type node can never be reached because the random walker cannot walk in reverse, and therefore the keyword needs to be pruned.
2. From node "software" wandering to "defect".
Specifically, starting from one of the software s nodes, assuming a defect node set Def of the software s node, the probability of randomly walking to a certain defect d is 1/| Def |, where | Def | represents a corresponding defect type of the software defect.
Illustratively, as shown in fig. 4, starting from software 2, it walks to defect 3, defect 4, and defect 5 with equal probability, and they are marked as (software 2, defect 3, 1/3), (software 2, defect 4, 1/3), (defect 5, 1/3).
3. Walk from the "defect" node to either the "keyword" or the "defect category".
Specifically, when the random walk reaches the second layer of the heterogeneous association graph, i.e., the defective node, there are two types of choices at this time: optionally, migrating to the defect category t with 1/2 probability; optionally, a keyword node is randomly selected, and if the set of the defective keywords is (Key), the random walk randomly walks to a certain keyword g with a probability of 1/2 × 1/| Key |, where | Key | represents the number of the keywords.
Illustratively, as shown in FIG. 4, when the random walk is at defect 3 (or defect 4), the random walk must go to defect class 1 (or defect class 2), which is marked as (software 2, defect 3, defect class 1,1/3, 1) or (software 2, defect 4, class 2,1/3, 1)
Illustratively, when the random walk is at defect 5, optionally, the random walker walks to the class node with a probability of 1/2, and is marked as (software 2, defect 5, defect class 7,1/3, 1/2); alternatively, the 1/2 node walk to the keyword node, and the specific keyword node to which the walk is determined by the normalized TF-IDF value, only 1 legal keyword node (software 2, defect 5, keyword 1,1/3, 1/2) in FIG. 4
4. Walk from "defect category" to "test type" or "keyword" to "defect".
Specifically, only a certain test type c can be walked from the software defect category t. I.e. the probability of wandering from defect class to test type is 1.
Specifically, from (software 2, defect 3, category 1,1/3, 1), (software 2, defect 4, category 2,1/3, 1) and (software 2, defect 5, category 7,1/3, 1/2), since the backward walking is impossible, it next reaches test type 1, test type 2, test type 7, which are marked as (software 2, defect 3, category 1, test method 1,1/3, 1), (software 2, defect 4, category 2, test method 2,1/3, 1); (software 2, defect 5, category 7, test method 7,1/3,1/2, 1).
Specifically, the random walk to the keyword node, and the next step to be followed to the associated defect, the set is (Def) - (ii) a Random walk to a defect probability of 1/(Def) - Here, to simplify the calculation, the software test type corresponding to the defective node is directly recommended to the software s.
Specifically, at (software 2, defect 5, keyword 1,1/3, 1/2), since it cannot walk reversely, and then it walks randomly to the connected defects with equal probability, in fig. 4, besides defect 5, keyword 1 is only associated with one defect 1, where for simplicity of operation, when entering defect 1, it is specified that the random walker can only select the route of category 1 and test type 1, i.e. (software 2, defect 5, keyword 1, defect 1, category 1, test type 1,1/3,1/2, 1).
5. And calculating the recommendation confidence of each test type corresponding to the software.
Specifically, for a certain path, the probability from the software to the test type is the product of the probabilities of the wandering paths. And combining results of different paths reaching the same test type, wherein the calculated result is the recommendation confidence of the software to each test type.
For example, as shown in fig. 4, the confidence of each test type corresponding to the software 2 is calculated as follows:
test type 1= (software 2, defect 3, category 1, test type 1,1/3, 1) + (software 2, defect 5, keyword 1, defect 1, category 1, test type 1,1/3,1/2, 1) =1/3+1/6=1/2;
test type 2= (software 2, defect 4, category 2, test type 2,1/3,1,1) =1/3;
test type 7= (software 2, defect 5, category 7, test type 7,1/3,1/2,1) =1/6.
6. And sequencing the test types according to the confidence coefficient to obtain the recommendation of the given test type.
Specifically, in the calculation result, the confidence of one test type is in direct proportion to the probability that the test type finds the historical software defect.
Specifically, the test types are ranked according to the confidence level, a high confidence level indicates that more defects found by the test type should be paid more attention, and a low confidence level indicates that the test method exposes fewer defects, but does not indicate that the test method cannot be performed.
Preferably, the confidence coefficient is greater than 0 test type, the tests need to be performed with emphasis, and software testers need to be strictly examined to perform the test work by using the test types without finding any defects, so that the software test quality is improved.
Illustratively, in fig. 4, based on the result calculated in step 5, the test type confidences are ranked, ( test type 1, 1/2), (test type 2, 1/3), (test type 7, 1/6), and the resulting software 2 should preferably perform the work of test types 1, 2, and 7.
Based on the recommendation, the quality management personnel of the test items need to strictly inspect the situation that the software testing personnel uses the test types to carry out the test work but does not find any defects, thereby improving the software testing quality.
Alternatively, other test type work may be performed on the software 2.
The method comprises the steps of mining and analyzing the defect problem of open source software, constructing an orthogonal defect classification model, carrying out software defect keyword extraction based on statistical characteristics, constructing a heterogeneous association graph of software, defects, keywords, defect types and test types, and calculating the probability of migrating from software nodes to test type nodes by using a random walk algorithm, thereby realizing automatic software test type recommendation, providing a basis for test defect prediction and test type recommendation of an information system, and improving the efficiency and quality of a software test process.
System embodiment
The invention discloses a software test type recommendation system based on open source software defect problems, which comprises a defect cleaning module, an orthogonal defect classification model building module, a keyword extraction module and a test type recommendation module, wherein the defect cleaning module is used for cleaning the defects of the software;
the defect cleaning module is used for acquiring and cleaning the defect record of the given open source software to obtain the category of the cleaned defect and the defect of the cleaned software;
the orthogonal defect classification module is used for obtaining the labeled software defects by using an orthogonal defect classification model based on the cleaned defect types and the cleaned software defects, and determining the orthogonal defect types and the initial test types of the labeled software defects;
the keyword extraction module is used for extracting keywords from the labeled software defects to obtain keywords;
the heterogeneous association diagram module is used for constructing a software heterogeneous association diagram based on the open source software, the labeled software defects, the orthogonal defect categories, the keywords and the initial test types;
and the test type recommendation module is used for obtaining the given test type recommendation of the open source software by using a random walk algorithm based on the heterogeneous association diagram.
It should be noted that the above embodiments are based on the same inventive concept, and the description is not repeated, so that they can be referred to each other.
Compared with the prior art, the beneficial effects of the software test type recommendation system provided by the embodiment are basically the same as those provided by the method embodiment, and are not repeated herein.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims (10)

1. A software test type recommendation method based on open source software defect problems is characterized by comprising the following steps:
acquiring a defect record of given open source software and cleaning the defect record to obtain a cleaned defect type and a cleaned software defect;
based on the cleaned defect type and the cleaned software defect, obtaining the labeled software defect by using an orthogonal defect classification model, and determining the orthogonal defect type and the corresponding initial test type;
extracting keywords from the labeled software defects to obtain keywords;
constructing a software heterogeneous association diagram based on the open source software, the labeled software defects, the orthogonal defect types, the keywords and the initial test types;
and obtaining a given test type recommendation of the open source software by using a random walk algorithm based on the heterogeneous correlation diagram.
2. The method for recommending software test types according to claim 1, wherein the building of the software heterogeneous association graph comprises:
and (3) constructing nodes of the graph: software, defects, defect categories, keywords, and test types;
constructing a { software, defect } edge based on the open source software and the labeled software defect;
constructing a { defect, defect type } edge based on the labeled software defect and the orthogonal defect type;
based on the labeled software defects and the keywords, establishing { defects, keywords } and { keywords, defects } edges;
and constructing a { defect type, test type } edge based on the mapping relation between the orthogonal defect type and the initial test type.
3. The method for recommending software test types according to claim 2, wherein said obtaining a test type recommendation for a given said open source software using a random walk algorithm based on said heterogeneous correlation map comprises:
deleting branches in the heterogeneous dependency graph, wherein the branches cannot reach the test type node from the software node;
finding out a path from the software node to the test type node and calculating the probability of the wandering path;
calculating a recommendation confidence coefficient of a test type corresponding to the open source software based on the path and the probability;
and sequencing the test types corresponding to the open source software based on the recommendation confidence coefficient to obtain the test type recommendation of the given open source software.
4. The software test type recommendation method according to any one of claims 1-3, wherein the method comprises obtaining labeled software defects based on the cleaned defect types and the cleaned software defects by using an orthogonal defect classification model, and determining the orthogonal defect types and the initial test types thereof; the method comprises the following steps:
constructing an orthogonal defect classification model and training to obtain a trained orthogonal defect classification model;
determining an orthogonal defect type corresponding to the software defect by using a trained orthogonal defect classification model and marking the orthogonal defect type based on the cleaned defect type and the cleaned software defect to obtain a marked software defect;
and determining the initial test type of the software defect based on the orthogonal defect type corresponding to the marked software defect.
5. The software test type recommendation method according to any one of claims 1 to 3, wherein the extracting keywords from the labeled software defects to obtain the keywords comprises:
extracting keywords of the professional field;
obtaining all participles based on the labeled software defects;
calculating an IDF value of each participle;
calculating the TF value of each word segmentation to each defect;
calculating the TF-IDF value of each part word to each defect;
keywords are determined based on the TF-IDF values.
6. The software test type recommendation method according to any one of claims 1 to 3, wherein a defect record of a given open source software is obtained and cleaned, and the obtained cleaned defect type and the cleaned software defect comprise software defect type cleaning and software defect cleaning.
7. The software test type recommendation method according to claim 6, wherein said software defect cleaning comprises cleaning a defect description; the step of cleaning the defect description comprises deleting special symbols, deleting links and deleting large code segments aiming at the defect description to obtain the cleaned defect.
8. The software test type recommendation method of claim 4, wherein the orthogonal defect classification model comprises an input layer, a BEAT coding layer, a Softmax layer and a loss function layer.
9. The software test type recommendation method of claim 8, wherein the orthogonal defect classes comprise assignments, checks, interfaces, algorithms, functions, timing, software configuration management, and documentation.
10. A software test type recommendation system based on open source software defect problems is characterized by comprising:
the defect cleaning module is used for acquiring and cleaning the defect record of the given open source software to obtain the category of the cleaned defect and the defect of the cleaned software;
the orthogonal defect classification module is used for obtaining the labeled software defects by using an orthogonal defect classification model based on the cleaned defect types and the cleaned software defects, and determining the orthogonal defect types and the initial test types of the labeled software defects;
the keyword extraction module is used for extracting keywords from the labeled software defects to obtain keywords;
the heterogeneous association diagram module is used for constructing a software heterogeneous association diagram based on the open source software, the labeled software defects, the orthogonal defect types, the keywords and the initial test types;
and the test type recommendation module is used for obtaining the given test type recommendation of the open source software by using a random walk algorithm based on the heterogeneous association diagram.
CN202211510958.6A 2022-11-29 2022-11-29 Software test type recommendation method and system based on open source software defect problem Pending CN115712576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211510958.6A CN115712576A (en) 2022-11-29 2022-11-29 Software test type recommendation method and system based on open source software defect problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211510958.6A CN115712576A (en) 2022-11-29 2022-11-29 Software test type recommendation method and system based on open source software defect problem

Publications (1)

Publication Number Publication Date
CN115712576A true CN115712576A (en) 2023-02-24

Family

ID=85235277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211510958.6A Pending CN115712576A (en) 2022-11-29 2022-11-29 Software test type recommendation method and system based on open source software defect problem

Country Status (1)

Country Link
CN (1) CN115712576A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610592A (en) * 2023-07-20 2023-08-18 青岛大学 Customizable software test evaluation method and system based on natural language processing technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610592A (en) * 2023-07-20 2023-08-18 青岛大学 Customizable software test evaluation method and system based on natural language processing technology
CN116610592B (en) * 2023-07-20 2023-09-19 青岛大学 Customizable software test evaluation method and system based on natural language processing technology

Similar Documents

Publication Publication Date Title
US11500818B2 (en) Method and system for large scale data curation
CN110321291A (en) Test cases intelligent extraction system and method
CN108984775B (en) Public opinion monitoring method and system based on commodity comments
CN112487199B (en) User characteristic prediction method based on user purchasing behavior
US11620453B2 (en) System and method for artificial intelligence driven document analysis, including searching, indexing, comparing or associating datasets based on learned representations
EP3968245A1 (en) Automatically generating a pipeline of a new machine learning project from pipelines of existing machine learning projects stored in a corpus
CN110633366A (en) Short text classification method, device and storage medium
US20230214679A1 (en) Extracting and classifying entities from digital content items
US11416682B2 (en) Evaluating chatbots for knowledge gaps
CN107368521A (en) A kind of Promote knowledge method and system based on big data and deep learning
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN114691525A (en) Test case selection method and device
CN111582506A (en) Multi-label learning method based on global and local label relation
CN114548321A (en) Self-supervision public opinion comment viewpoint object classification method based on comparative learning
CN115712576A (en) Software test type recommendation method and system based on open source software defect problem
CN116562265A (en) Information intelligent analysis method, system and storage medium
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
EP3965024A1 (en) Automatically labeling functional blocks in pipelines of existing machine learning projects in a corpus adaptable for use in new machine learning projects
CN111723021B (en) Defect report automatic allocation method based on knowledge base and representation learning
CN112861956A (en) Water pollution model construction method based on data analysis
CN111259223A (en) News recommendation and text classification method based on emotion analysis model
CN114428855A (en) Service record classification method for hierarchy and mixed data type
CN109299381B (en) Software defect retrieval and analysis system and method based on semantic concept
Ghosh et al. Understanding machine learning
CN114519406B (en) Industrial data classification method and model training method and device thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination