A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive
Technical field
The present invention relates to software testing technology fields more particularly to a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive
Part bug allocating method.
Background technique
Software Bug is the software program problem in the process of running, their presence meeting is so that software failure, institute
It is the basis for guaranteeing software quality to repair software Bug in time.The warehouse software Bug is one kind for management software bug report
Efficient tool, the Bug report that user submits will store in the warehouse software Bug, and administrative staff are by reading these Bug report
To assign suitable developer to repair them.But being gradually expanded with software project scale, software bug and participation
The quantity for the developer that Bug is repaired all greatly increases, and consuming time is long, cost is efficient for the mode of this artificial carry out Bug appointment
Rate is low, and people need a kind of more efficient mode to complete the work of Bug appointment.So automation Bug dispatch technique is met the tendency of
And give birth to, Bug assignment problem is converted into text classification problem by machine learning by researchers, proposes numerous automatic Bug
Dispatch technique.But current many automatic Bug dispatch techniques only consider Bug report being dispatched to most suitable developer, without
The burden problem for considering developer is easy to cause a certain developer to overstock and the Bug repaired is largely needed to report, to drag slow project
Whole reparation process.
Summary of the invention
According to problem of the existing technology, the invention discloses a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive
Part bug allocating method, specifically includes the following steps:
S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening
Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out;
S2: pre-training is carried out to CSDBT model on training set;
S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked
Solution obtains the optimal value of misclassification cost matrix;
S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain
New CSDBT model, recently enters test set and tests new CSDBT model.
Further, raw data set pre-process specifically in the following way:
S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired;
S12: extracting text information from the bug report filtered out, chooses text information to be segmented, be gone at stop words
Reason deletes the excessive and very few word of frequency of occurrence;
S13: developer's liveness information is extracted, product and the identical developer of module information of current bug report are collected
Active sequences;
S14: pretreated data set is divided into training set, verifying collection and test set.
Further, in S2 it is specific in the following way:
S21: One-Hot encoding textual information and developer's information are used;
S22: the high-level characteristic of text information is extracted using bidirectional circulating neural network, is taken out using one-way circulation neural network
The high-level characteristic for taking liveness information merges two kinds of high-level characteristics in a manner of being multiplied between element;
S23: the conversion of high-level characteristic to developer's probability is completed;
S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.
Further, in S3 it is specific in the following way:
S31: initialization misclassification cost population wherein has individual, each individual represents a misclassification generation in population
The feasible solution of valence matrix;
S32: mutation operator and crossover operator based on adaptive generation generate a new misclassification cost population;
S33: calculating the fitness value of all individuals in population, by the misclassification cost matrix band respectively of each individual representative
Enter the good CSDBT model of pre-training, operation obtains correct rate score on verifying collection by the new CSDBT model of acquisition, this is correct
Rate score is the fitness value of a feasible solution;
S34: according to current individual with the fitness value of the corresponding individual of previous generation, therefrom selecting the higher individual of fitness,
Abandon the individual of performance difference;
S35: the mutation operator and crossover operator of adaptive adjustment population;
S36: it returns to S32 step and continues iteration, until reaching the maximum number of iterations of setting, and select last Dai Zhongshi
The highest individual of response is used as optimal solution.
Further, in S4 it is specific in the following way:
S41: on the CSDBT model for completing to instruct in advance, in conjunction with the misclassification cost matrix that optimization obtains, input is by pre- place
The test set of reason;
S42:CSDBT model will return to nominator's list that a length is K for each sample in test set.
By adopting the above-described technical solution, provided by the invention a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive
Part bug allocating method, this method uses differential evolution algorithm, is each data set under the premise of no any priori knowledge
The most suitable misclassification cost matrix of Optimization Solution.Based on original DeepTriage model, the order of text had both been taken into account
With the liveness information of developer, solve the problems, such as that data category is unbalanced also by misclassification cost matrix.Based on above-mentioned
Reason, present invention effectively prevents the same developers the overstocked problem of a large amount of bug reports occurs, accelerate the reparation of project into
Journey.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart of the method for the present invention
Fig. 2 is the flow chart of method part of detecting in the present invention.
Specific embodiment
To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this
Technical solution in inventive embodiments carries out clear and complete description:
A kind of software bug allocating method (Cost based on Recognition with Recurrent Neural Network and cost-sensitive as depicted in figs. 1 and 2
Sensitive Deep Bug Triage, CSDBT), specifically includes the following steps:
S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening
Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out.
Further, raw data set pre-process specifically using following steps:
S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired;
S12: extracting text information from the bug report filtered out, chooses text information to be segmented, be gone at stop words
Reason deletes the excessive and very few word of frequency of occurrence;
S13: developer's liveness information is extracted, product and the identical developer of module information of current bug report are collected
Active sequences;
S14: pretreated data set is divided into training set, verifying collection and test set.
S2: carrying out pre-training to CSDBT model on training set, specifically in the following way:
S21: input uses One-Hot encoding textual information and developer's information;
S22: feature extraction is extracted the high-level characteristic of text information using bidirectional circulating neural network, uses one-way circulation
Neural network extracts the high-level characteristic of liveness information, and two kinds of high-level characteristics are merged in a manner of being multiplied between element;
S23: the conversion of high-level characteristic to developer's probability is completed;
S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.Wherein protect
The parameter constant of neural network is demonstrate,proved, new bug report is concentrated in input verifying, can return to developer's column by the big minispread of probability
Table.
S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked
Solution obtains the optimal value of misclassification cost matrix.The purpose of the step be obtain an optimal misclassification cost matrix because
Each individual represents the feasible solution of a cost matrix in population, so we need to evaluate all feasible solutions,
To select best feasible solution, that is, select the highest individual of fitness.
Wherein cost sensitive learning, which refers to, provides different weights for different classes of sample, and core is misclassification cost square
Battle array.The misclassification cost of different classes of sample just constitutes misclassification cost matrix.Assuming that the number of classification is M, the data set
Corresponding misclassification cost matrix is exactly the matrix of a M × M:
Wherein Ci,jIndicate that the sample for actually belonging to i class is accidentally divided into cost caused by j class.
Under normal conditions, traditional training algorithm assume the cost of all misclassifications be all it is equal, i.e.,
But in practical applications, different classes of mistake often corresponds to different misclassification costs.Such as in medical treatment
In, the cost " that Healthy People mistaken diagnosis is patient is different by " by the cost " and " that patient's mistaken diagnosis is Healthy People certainly;Cost is quick
Sense study is exactly to consider different misclassification costs, distributes different costs for different types of mistake, so that in classification, it is high
The cost summation of quantity and mistake classification that cost mistake generates is minimum.By cost sensitive learning and neural network phase in the present invention
In conjunction with traditional neural network model, after classification, the posterior probability that a sample x belongs to classification i can be expressed as:
P (y=j | x)=softmaxi
Neural network will use the posterior probability that obtains in this way to recommend facilitate developer's list later.
But we additionally add a step, introduce cost-sensitive mechanism herein, and it is theoretical based on Bayes risk, it is general to posteriority
Rate is punished:
Then recommend facilitate developer's list with the new probability after this punishment by misclassification cost matrix above again.But
It is due to lacking priori knowledge, we can not learn the occurrence of the corresponding misclassification cost matrix of the data set, so we
It needs through differential evolution algorithm come optimization one optimal misclassification cost matrix.The misclassification generation that the feasible solution is represented
Valence matrix is combined with the good neural network of pre-training in above manner, with correct rate score of the model on verifying collection
Fitness value as the feasible solution.It is the evaluation method to a feasible solution above, if there is 100 individuals in population,
We need by the independent execution of the above process 100 times, to obtain fitness value of each individual.
S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain
New CSDBT model, recently enters test set and tests new CSDBT model.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its
Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.