CN109615242A

CN109615242A - A Recurrent Neural Network-Based and Cost-Sensitive Software Bug Assignment Method

Info

Publication number: CN109615242A
Application number: CN201811528909.9A
Authority: CN
Inventors: 陈荣; 王林辉; 王芝; 张德成; 李辉; 郭世凯
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2019-04-12

Abstract

The invention discloses a method for assigning software bugs based on cyclic neural network and cost-sensitive, comprising the following steps: S1: Preprocessing the original data set in the collection historical bug report warehouse; wherein the preprocessing includes screening bug reports, extracting and screening The text information of the bug report, and the developer activity information is extracted; S2: Pre-train the CSDBT model on the training set; S3: Use the adaptive differential evolution algorithm to solve the optimal misclassification cost matrix: the data of the validation set Solve to obtain the optimal value of the misclassification cost matrix; S4: Combine the pre-trained CSDBT model with the optimal misclassification cost matrix obtained in the previous step to obtain a new CSDBT model, and finally input the test set to test the new CSDBT model .

Description

A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive

Technical field

The present invention relates to software testing technology fields more particularly to a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method.

Background technique

Software Bug is the software program problem in the process of running, their presence meeting is so that software failure, institute It is the basis for guaranteeing software quality to repair software Bug in time.The warehouse software Bug is one kind for management software bug report Efficient tool, the Bug report that user submits will store in the warehouse software Bug, and administrative staff are by reading these Bug report To assign suitable developer to repair them.But being gradually expanded with software project scale, software bug and participation The quantity for the developer that Bug is repaired all greatly increases, and consuming time is long, cost is efficient for the mode of this artificial carry out Bug appointment Rate is low, and people need a kind of more efficient mode to complete the work of Bug appointment.So automation Bug dispatch technique is met the tendency of And give birth to, Bug assignment problem is converted into text classification problem by machine learning by researchers, proposes numerous automatic Bug Dispatch technique.But current many automatic Bug dispatch techniques only consider Bug report being dispatched to most suitable developer, without The burden problem for considering developer is easy to cause a certain developer to overstock and the Bug repaired is largely needed to report, to drag slow project Whole reparation process.

Summary of the invention

According to problem of the existing technology, the invention discloses a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method, specifically includes the following steps:

S1: the raw data set in acquisition history bug report warehouse is pre-processed；Wherein pretreatment includes screening Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out；

S2: pre-training is carried out to CSDBT model on training set；

S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked Solution obtains the optimal value of misclassification cost matrix；

S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain New CSDBT model, recently enters test set and tests new CSDBT model.

Further, raw data set pre-process specifically in the following way:

S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired；

S12: extracting text information from the bug report filtered out, chooses text information to be segmented, be gone at stop words Reason deletes the excessive and very few word of frequency of occurrence；

S13: developer's liveness information is extracted, product and the identical developer of module information of current bug report are collected Active sequences；

S14: pretreated data set is divided into training set, verifying collection and test set.

Further, in S2 it is specific in the following way:

S21: One-Hot encoding textual information and developer's information are used；

S22: the high-level characteristic of text information is extracted using bidirectional circulating neural network, is taken out using one-way circulation neural network The high-level characteristic for taking liveness information merges two kinds of high-level characteristics in a manner of being multiplied between element；

S23: the conversion of high-level characteristic to developer's probability is completed；

S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.

Further, in S3 it is specific in the following way:

S31: initialization misclassification cost population wherein has individual, each individual represents a misclassification generation in population The feasible solution of valence matrix；

S32: mutation operator and crossover operator based on adaptive generation generate a new misclassification cost population；

S33: calculating the fitness value of all individuals in population, by the misclassification cost matrix band respectively of each individual representative Enter the good CSDBT model of pre-training, operation obtains correct rate score on verifying collection by the new CSDBT model of acquisition, this is correct Rate score is the fitness value of a feasible solution；

S34: according to current individual with the fitness value of the corresponding individual of previous generation, therefrom selecting the higher individual of fitness, Abandon the individual of performance difference；

S35: the mutation operator and crossover operator of adaptive adjustment population；

S36: it returns to S32 step and continues iteration, until reaching the maximum number of iterations of setting, and select last Dai Zhongshi The highest individual of response is used as optimal solution.

Further, in S4 it is specific in the following way:

S41: on the CSDBT model for completing to instruct in advance, in conjunction with the misclassification cost matrix that optimization obtains, input is by pre- place The test set of reason；

S42:CSDBT model will return to nominator's list that a length is K for each sample in test set.

By adopting the above-described technical solution, provided by the invention a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method, this method uses differential evolution algorithm, is each data set under the premise of no any priori knowledge The most suitable misclassification cost matrix of Optimization Solution.Based on original DeepTriage model, the order of text had both been taken into account With the liveness information of developer, solve the problems, such as that data category is unbalanced also by misclassification cost matrix.Based on above-mentioned Reason, present invention effectively prevents the same developers the overstocked problem of a large amount of bug reports occurs, accelerate the reparation of project into Journey.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of the method for the present invention

Fig. 2 is the flow chart of method part of detecting in the present invention.

Specific embodiment

To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this Technical solution in inventive embodiments carries out clear and complete description:

A kind of software bug allocating method (Cost based on Recognition with Recurrent Neural Network and cost-sensitive as depicted in figs. 1 and 2 Sensitive Deep Bug Triage, CSDBT), specifically includes the following steps:

S1: the raw data set in acquisition history bug report warehouse is pre-processed；Wherein pretreatment includes screening Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out.

Further, raw data set pre-process specifically using following steps:

S2: carrying out pre-training to CSDBT model on training set, specifically in the following way:

S21: input uses One-Hot encoding textual information and developer's information；

S22: feature extraction is extracted the high-level characteristic of text information using bidirectional circulating neural network, uses one-way circulation Neural network extracts the high-level characteristic of liveness information, and two kinds of high-level characteristics are merged in a manner of being multiplied between element；

S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.Wherein protect The parameter constant of neural network is demonstrate,proved, new bug report is concentrated in input verifying, can return to developer's column by the big minispread of probability Table.

S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked Solution obtains the optimal value of misclassification cost matrix.The purpose of the step be obtain an optimal misclassification cost matrix because Each individual represents the feasible solution of a cost matrix in population, so we need to evaluate all feasible solutions, To select best feasible solution, that is, select the highest individual of fitness.

Wherein cost sensitive learning, which refers to, provides different weights for different classes of sample, and core is misclassification cost square Battle array.The misclassification cost of different classes of sample just constitutes misclassification cost matrix.Assuming that the number of classification is M, the data set Corresponding misclassification cost matrix is exactly the matrix of a M × M:

Wherein C_i,jIndicate that the sample for actually belonging to i class is accidentally divided into cost caused by j class.

Under normal conditions, traditional training algorithm assume the cost of all misclassifications be all it is equal, i.e.,

But in practical applications, different classes of mistake often corresponds to different misclassification costs.Such as in medical treatment In, the cost ＂ that Healthy People mistaken diagnosis is patient is different by ＂ by the cost ＂ and ＂ that patient's mistaken diagnosis is Healthy People certainly；Cost is quick Sense study is exactly to consider different misclassification costs, distributes different costs for different types of mistake, so that in classification, it is high The cost summation of quantity and mistake classification that cost mistake generates is minimum.By cost sensitive learning and neural network phase in the present invention In conjunction with traditional neural network model, after classification, the posterior probability that a sample x belongs to classification i can be expressed as:

P (y=j | x)=softmax_i

Neural network will use the posterior probability that obtains in this way to recommend facilitate developer's list later.

But we additionally add a step, introduce cost-sensitive mechanism herein, and it is theoretical based on Bayes risk, it is general to posteriority Rate is punished:

Then recommend facilitate developer's list with the new probability after this punishment by misclassification cost matrix above again.But It is due to lacking priori knowledge, we can not learn the occurrence of the corresponding misclassification cost matrix of the data set, so we It needs through differential evolution algorithm come optimization one optimal misclassification cost matrix.The misclassification generation that the feasible solution is represented Valence matrix is combined with the good neural network of pre-training in above manner, with correct rate score of the model on verifying collection Fitness value as the feasible solution.It is the evaluation method to a feasible solution above, if there is 100 individuals in population, We need by the independent execution of the above process 100 times, to obtain fitness value of each individual.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive: the following steps are included:

S1: the raw data set in acquisition history bug report warehouse is pre-processed；Wherein pretreatment includes screening Bug report It accuses, extract text information, extraction developer's liveness information that the Bug filtered out is reported；

S2: pre-training is carried out to CSDBT model on training set；

S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are solved, and are obtained Take the optimal value of misclassification cost matrix；

S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain new DeepTriage model recently enters test set and tests new CSDBT model.

2. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S1 specifically in the following way:

S12: extracting text information from the bug report filtered out, chooses text information to be segmented, stop words is gone to handle, Delete the excessive and very few word of frequency of occurrence；

S13: extracting developer's liveness information, collects product and the identical developer's activity of module information of current bug report Sequence；

3. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S2 specifically in the following way:

S22: extracting the high-level characteristic of text information using bidirectional circulating neural network, is extracted and is lived using one-way circulation neural network The high-level characteristic of jerk information merges two kinds of high-level characteristics in a manner of being multiplied between element；

4. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S3 specifically in the following way:

S31: initialization misclassification cost population wherein has individual, each individual represents a misclassification cost square in population The feasible solution of battle array；

S33: calculating the fitness value of all individuals in population, the misclassification cost matrix that each individual represents is brought into respectively pre- Trained CSDBT model, by the new CSDBT model of acquisition, operation obtains correct rate score on verifying collection, the accuracy number Value is the fitness value of a feasible solution；

S34: according to current individual with the fitness value of the corresponding individual of previous generation, the higher individual of fitness is therefrom selected, is abandoned Show the individual of difference；

S36: it returns to S32 step and continues iteration, until reaching the maximum number of iterations of setting, and select fitness in last generation Highest individual is used as optimal solution.

5. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S4 specifically in the following way:

S41: on the CSDBT model for completing to instruct in advance, in conjunction with the misclassification cost matrix that optimization obtains, input is by pretreated Test set；