CN109615241A

CN109615241A - A kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network

Info

Publication number: CN109615241A
Application number: CN201811528908.4A
Authority: CN
Inventors: 陈荣; 王林辉; 王芝; 李辉; 郭世凯
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2019-04-12

Abstract

The invention discloses a kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network, comprising the following steps: S1: obtain original bug report data set from selected open source projects, and pre-processed into training set and test set to it；S2: the sample in training set is sequentially input in CLBT model, and all parameters are until the training of the model is completed in convergence in training CLBT model；S3: the sample in test set being sequentially input in the CLBT model for completing training, and each sample returns to a recommendation probability for whole developers, which is dispatched to the developer of maximum probability.This method first choice has done the feature extraction of quantization to the length dependence between the hierarchical relationship and word of entire sentence, the word order information in considering prior art, it is extracted semanteme and the contextual feature of word further simultaneously to participate in the assignment work of bug report, more sufficiently effective digging utilization has been carried out to text information.

Description

A kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network

Technical field

The present invention relates to software testing technology field more particularly to a kind of softwares based on convolution sum Recognition with Recurrent Neural Network Bug allocating method.

Background technique

Software Bug, i.e. software fault are inevitable products in software development process.The bug in software is repaired in time It is the premise for guaranteeing software quality with the correctness of maintenance system.In order to be conducive to collect and management software bug, software developer The warehouse software Bug (Bug Repository) is devised, the administrative staff for coming the warehouse storage and maintenance software bug, Bug are by examining Bug report is read to assign suitable developer to repair Bug.With the further maturation of software development technique, software bug Quantity greatly increase, it is traditional by way of manually carrying out Bug appointment because time-consuming big, low efficiency, far can not Meet current needs.So researchers propose that being carried out automation Bug using machine learning is assigned, so that Bug be assigned Problem is converted to text classification problem, becomes a research hotspot at present.But there is no to text envelope for many researchs Breath is adequately excavated, they often ignore the word order and contextual feature of text.In addition, the relevant technologies judge it is similar Performance is also very poor when developer.

Summary of the invention

According to problem of the existing technology, the invention discloses a kind of softwares based on convolution sum Recognition with Recurrent Neural Network Bug allocating method, comprising: following steps:

S1: obtaining original bug report data set from selected open source projects, and it is pre-processed into training set and Test set；

S2: the sample in training set is sequentially input in CLBT model, and all parameters are until receive in training CLBT model Hold back the training for completing the model；

S3: the sample in test set is sequentially input in the CLBT model for completing training, each sample returns to a needle To the recommendation probability of whole developers, which is dispatched to the developer of maximum probability；

Further, in S1 it is specific in the following way:

S11: screening bug report: retaining confirmation and be repaired bug report, deletes and repairs the very few developer of bug report quantity And the bug report repaired by them；

S12: it extracts text information: the text information of bug report being segmented, is stemmed and remove stop words, is deleted out The excessively high or too low word of existing frequency；

S13: it extracts developer's liveness information: elapsing a period of time forward from the corresponding timing node of every bug report, Statistics belongs to the history bug report of one kind with current bug report in this period, successively extract and belong to a kind of history bug report The reparation person of announcement forms developer's repairing sequence of current bug report；

S14: pretreated data set will be completed and be divided into training set and test set；

Further, established in S2 and training CLBT model specifically in the following way:

S21: encoding textual information and developer's liveness information: all words are processed into using an effective code isometric Vector, isometric vector is equally processed into developer；

S22: the text vector encoded is inputted into bidirectional circulating neural network, to extract the word order feature between word；

S23: the semanteme and its contextual feature of word are extracted: the text vector encoded are inputted into convolutional neural networks, It is slided using the unequal convolution kernel of multiple sizes on word sequence to obtain high-level characteristic, is obtained under multiple and different convolution kernels Feature Mapping, this feature is mapped using dimension, the reservation notable feature for reducing output by the way of maximum pond, by reservation Notable feature is as the high-level characteristic extracted；

S24: the developer's liveness information input one-way circulation neural network that will be encoded extracts developer's liveness High-level characteristic；

S25: by the high level of S22 word order feature, semanteme and its contextual feature generated into S24 and developer's liveness Feature is merged in a manner of being multiplied between element, and fused feature is input to output layer；

S26: output layer obtains the recommendation probability to each developer after the calculating of softmax function；

Further, in S3 it is specific in the following way:

S31: reading trained neural network model, maintains all parameter constants, and pretreated test set is passed through in input.

S32: for each sample in test set, developer's recommendation list is returned to；

By adopting the above-described technical solution, a kind of software based on convolution sum Recognition with Recurrent Neural Network provided by the invention Bug allocating method, this method have done quantization to the length dependence between the hierarchical relationship and word of entire sentence first Feature extraction, the word order information in considering prior art, while being based on convolutional neural networks (Convolutional Neural Networks, CNN) semanteme and the contextual feature of word are further extracted to participate in the assignment work of Bug report, to text Information has carried out more sufficiently effective digging utilization.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of the method for the present invention.

Specific embodiment

To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this Technical solution in inventive embodiments carries out clear and complete description:

A kind of software Bug allocating method (Convolution based on convolution sum Recognition with Recurrent Neural Network as shown in Figure 1 LSTM Bug Triage, CLBT), comprising the following steps:

S2: the sample in training set is sequentially input in CLBT model, and all parameters in training pattern are until restrained At the training of the model；

Further, in S1 it is specific in the following way:

Further, in S3 it is specific in the following way:

S32: for each sample in test set, developer's recommendation list is returned to.

The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims

1. a kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network, it is characterised in that: the following steps are included:

S1: original bug report data set is obtained from selected open source projects, and is pre-processed into training set and test to it Collection；

S2: the sample in training set is sequentially input in CLBT model, and all parameters are until restrained in training CLBT model At the training of the model；

S3: the sample in test set being sequentially input in the CLBT model for completing training, and each sample returns to one for complete The sample, is dispatched to the developer of maximum probability by the recommendation probability of portion developer.

2. a kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network according to claim 1, feature is also It is: in S1 specifically in the following way:

S11: screening bug report: retaining confirmation and be repaired bug report, delete repair the very few developer of bug report quantity and by The bug report that they repair；

S12: it extracts text information: the text information of bug report being segmented, is stemmed and remove stop words, delete appearance frequency The excessively high or too low word of rate；

S13: it extracts developer's liveness information: elapsing a period of time, statistics forward from the corresponding timing node of every bug report A kind of history bug report is belonged to current bug report in this period, successively extracts and belongs to a kind of history bug report Reparation person forms developer's repairing sequence of current bug report；

S14: pretreated data set will be completed and be divided into training set and test set.

3. a kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network according to claim 1, feature is also Be: being established in S2 and training CLBT model specifically in the following way:

S21: encoding textual information and developer's liveness information: using an effective code by all words be processed into it is isometric to Amount, isometric vector is equally processed into developer；

S23: the semanteme and its contextual feature of word are extracted: the text vector encoded is inputted into convolutional neural networks, is used Multiple unequal convolution kernels of size slide the spy obtained under multiple and different convolution kernels to obtain high-level characteristic on word sequence Sign mapping is reduced the dimension of output by the way of maximum pond to this feature mapping, retains notable feature, by the significant of reservation Feature is as the high-level characteristic extracted；

S24: the developer's liveness information input one-way circulation neural network that will be encoded extracts the high level of developer's liveness Feature；

S25: by the high-level characteristic of S22 word order feature, semanteme and its contextual feature generated into S24 and developer's liveness It is merged in a manner of being multiplied between element, fused feature is input to output layer；

S26: output layer obtains the recommendation probability to each developer after the calculating of softmax function.

4. a kind of software Bug allocating method based on convolution sum Recognition with Recurrent Neural Network according to claim 1, feature is also It is: in S3 specifically in the following way:

S31: reading trained neural network model, maintains all parameter constants, and pretreated test set is passed through in input；