CN109615242A - A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive - Google Patents

A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive Download PDF

Info

Publication number
CN109615242A
CN109615242A CN201811528909.9A CN201811528909A CN109615242A CN 109615242 A CN109615242 A CN 109615242A CN 201811528909 A CN201811528909 A CN 201811528909A CN 109615242 A CN109615242 A CN 109615242A
Authority
CN
China
Prior art keywords
bug
model
csdbt
neural network
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811528909.9A
Other languages
Chinese (zh)
Inventor
陈荣
王林辉
王芝
张德成
李辉
郭世凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201811528909.9A priority Critical patent/CN109615242A/en
Publication of CN109615242A publication Critical patent/CN109615242A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive, comprising the following steps: S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening Bug report, the text information for extracting the Bug report filtered out, extracts developer's liveness information;S2: pre-training is carried out to CSDBT model on training set;S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are solved, and obtain the optimal value of misclassification cost matrix;S4: combining the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step to obtain new CSDBT model, recently enters test set and tests new CSDBT model.

Description

A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive
Technical field
The present invention relates to software testing technology fields more particularly to a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method.
Background technique
Software Bug is the software program problem in the process of running, their presence meeting is so that software failure, institute It is the basis for guaranteeing software quality to repair software Bug in time.The warehouse software Bug is one kind for management software bug report Efficient tool, the Bug report that user submits will store in the warehouse software Bug, and administrative staff are by reading these Bug report To assign suitable developer to repair them.But being gradually expanded with software project scale, software bug and participation The quantity for the developer that Bug is repaired all greatly increases, and consuming time is long, cost is efficient for the mode of this artificial carry out Bug appointment Rate is low, and people need a kind of more efficient mode to complete the work of Bug appointment.So automation Bug dispatch technique is met the tendency of And give birth to, Bug assignment problem is converted into text classification problem by machine learning by researchers, proposes numerous automatic Bug Dispatch technique.But current many automatic Bug dispatch techniques only consider Bug report being dispatched to most suitable developer, without The burden problem for considering developer is easy to cause a certain developer to overstock and the Bug repaired is largely needed to report, to drag slow project Whole reparation process.
Summary of the invention
According to problem of the existing technology, the invention discloses a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method, specifically includes the following steps:
S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out;
S2: pre-training is carried out to CSDBT model on training set;
S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked Solution obtains the optimal value of misclassification cost matrix;
S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain New CSDBT model, recently enters test set and tests new CSDBT model.
Further, raw data set pre-process specifically in the following way:
S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired;
S12: extracting text information from the bug report filtered out, chooses text information to be segmented, be gone at stop words Reason deletes the excessive and very few word of frequency of occurrence;
S13: developer's liveness information is extracted, product and the identical developer of module information of current bug report are collected Active sequences;
S14: pretreated data set is divided into training set, verifying collection and test set.
Further, in S2 it is specific in the following way:
S21: One-Hot encoding textual information and developer's information are used;
S22: the high-level characteristic of text information is extracted using bidirectional circulating neural network, is taken out using one-way circulation neural network The high-level characteristic for taking liveness information merges two kinds of high-level characteristics in a manner of being multiplied between element;
S23: the conversion of high-level characteristic to developer's probability is completed;
S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.
Further, in S3 it is specific in the following way:
S31: initialization misclassification cost population wherein has individual, each individual represents a misclassification generation in population The feasible solution of valence matrix;
S32: mutation operator and crossover operator based on adaptive generation generate a new misclassification cost population;
S33: calculating the fitness value of all individuals in population, by the misclassification cost matrix band respectively of each individual representative Enter the good CSDBT model of pre-training, operation obtains correct rate score on verifying collection by the new CSDBT model of acquisition, this is correct Rate score is the fitness value of a feasible solution;
S34: according to current individual with the fitness value of the corresponding individual of previous generation, therefrom selecting the higher individual of fitness, Abandon the individual of performance difference;
S35: the mutation operator and crossover operator of adaptive adjustment population;
S36: it returns to S32 step and continues iteration, until reaching the maximum number of iterations of setting, and select last Dai Zhongshi The highest individual of response is used as optimal solution.
Further, in S4 it is specific in the following way:
S41: on the CSDBT model for completing to instruct in advance, in conjunction with the misclassification cost matrix that optimization obtains, input is by pre- place The test set of reason;
S42:CSDBT model will return to nominator's list that a length is K for each sample in test set.
By adopting the above-described technical solution, provided by the invention a kind of soft based on Recognition with Recurrent Neural Network and cost-sensitive Part bug allocating method, this method uses differential evolution algorithm, is each data set under the premise of no any priori knowledge The most suitable misclassification cost matrix of Optimization Solution.Based on original DeepTriage model, the order of text had both been taken into account With the liveness information of developer, solve the problems, such as that data category is unbalanced also by misclassification cost matrix.Based on above-mentioned Reason, present invention effectively prevents the same developers the overstocked problem of a large amount of bug reports occurs, accelerate the reparation of project into Journey.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart of the method for the present invention
Fig. 2 is the flow chart of method part of detecting in the present invention.
Specific embodiment
To keep technical solution of the present invention and advantage clearer, with reference to the attached drawing in the embodiment of the present invention, to this Technical solution in inventive embodiments carries out clear and complete description:
A kind of software bug allocating method (Cost based on Recognition with Recurrent Neural Network and cost-sensitive as depicted in figs. 1 and 2 Sensitive Deep Bug Triage, CSDBT), specifically includes the following steps:
S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening Bug report, extracts developer's liveness information at the text information for extracting the Bug report filtered out.
Further, raw data set pre-process specifically using following steps:
S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired;
S12: extracting text information from the bug report filtered out, chooses text information to be segmented, be gone at stop words Reason deletes the excessive and very few word of frequency of occurrence;
S13: developer's liveness information is extracted, product and the identical developer of module information of current bug report are collected Active sequences;
S14: pretreated data set is divided into training set, verifying collection and test set.
S2: carrying out pre-training to CSDBT model on training set, specifically in the following way:
S21: input uses One-Hot encoding textual information and developer's information;
S22: feature extraction is extracted the high-level characteristic of text information using bidirectional circulating neural network, uses one-way circulation Neural network extracts the high-level characteristic of liveness information, and two kinds of high-level characteristics are merged in a manner of being multiplied between element;
S23: the conversion of high-level characteristic to developer's probability is completed;
S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.Wherein protect The parameter constant of neural network is demonstrate,proved, new bug report is concentrated in input verifying, can return to developer's column by the big minispread of probability Table.
S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are asked Solution obtains the optimal value of misclassification cost matrix.The purpose of the step be obtain an optimal misclassification cost matrix because Each individual represents the feasible solution of a cost matrix in population, so we need to evaluate all feasible solutions, To select best feasible solution, that is, select the highest individual of fitness.
Wherein cost sensitive learning, which refers to, provides different weights for different classes of sample, and core is misclassification cost square Battle array.The misclassification cost of different classes of sample just constitutes misclassification cost matrix.Assuming that the number of classification is M, the data set Corresponding misclassification cost matrix is exactly the matrix of a M × M:
Wherein Ci,jIndicate that the sample for actually belonging to i class is accidentally divided into cost caused by j class.
Under normal conditions, traditional training algorithm assume the cost of all misclassifications be all it is equal, i.e.,
But in practical applications, different classes of mistake often corresponds to different misclassification costs.Such as in medical treatment In, the cost " that Healthy People mistaken diagnosis is patient is different by " by the cost " and " that patient's mistaken diagnosis is Healthy People certainly;Cost is quick Sense study is exactly to consider different misclassification costs, distributes different costs for different types of mistake, so that in classification, it is high The cost summation of quantity and mistake classification that cost mistake generates is minimum.By cost sensitive learning and neural network phase in the present invention In conjunction with traditional neural network model, after classification, the posterior probability that a sample x belongs to classification i can be expressed as:
P (y=j | x)=softmaxi
Neural network will use the posterior probability that obtains in this way to recommend facilitate developer's list later.
But we additionally add a step, introduce cost-sensitive mechanism herein, and it is theoretical based on Bayes risk, it is general to posteriority Rate is punished:
Then recommend facilitate developer's list with the new probability after this punishment by misclassification cost matrix above again.But It is due to lacking priori knowledge, we can not learn the occurrence of the corresponding misclassification cost matrix of the data set, so we It needs through differential evolution algorithm come optimization one optimal misclassification cost matrix.The misclassification generation that the feasible solution is represented Valence matrix is combined with the good neural network of pre-training in above manner, with correct rate score of the model on verifying collection Fitness value as the feasible solution.It is the evaluation method to a feasible solution above, if there is 100 individuals in population, We need by the independent execution of the above process 100 times, to obtain fitness value of each individual.
S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain New CSDBT model, recently enters test set and tests new CSDBT model.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (5)

1. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive: the following steps are included:
S1: the raw data set in acquisition history bug report warehouse is pre-processed;Wherein pretreatment includes screening Bug report It accuses, extract text information, extraction developer's liveness information that the Bug filtered out is reported;
S2: pre-training is carried out to CSDBT model on training set;
S3: optimal misclassification cost matrix is solved using adaptive differential evolution algorithm: the data for verifying collection are solved, and are obtained Take the optimal value of misclassification cost matrix;
S4: the good CSDBT model of pre-training and optimal misclassification cost matrix obtained in the previous step are combined to obtain new DeepTriage model recently enters test set and tests new CSDBT model.
2. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S1 specifically in the following way:
S11: carrying out screening reserved state to bug report is the bug repaired, deletes the inefficient bug in vain and repaired;
S12: extracting text information from the bug report filtered out, chooses text information to be segmented, stop words is gone to handle, Delete the excessive and very few word of frequency of occurrence;
S13: extracting developer's liveness information, collects product and the identical developer's activity of module information of current bug report Sequence;
S14: pretreated data set is divided into training set, verifying collection and test set.
3. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S2 specifically in the following way:
S21: One-Hot encoding textual information and developer's information are used;
S22: extracting the high-level characteristic of text information using bidirectional circulating neural network, is extracted and is lived using one-way circulation neural network The high-level characteristic of jerk information merges two kinds of high-level characteristics in a manner of being multiplied between element;
S23: the conversion of high-level characteristic to developer's probability is completed;
S24: in training set data, to minimize cross entropy as target, optimization neural network model is extremely restrained.
4. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S3 specifically in the following way:
S31: initialization misclassification cost population wherein has individual, each individual represents a misclassification cost square in population The feasible solution of battle array;
S32: mutation operator and crossover operator based on adaptive generation generate a new misclassification cost population;
S33: calculating the fitness value of all individuals in population, the misclassification cost matrix that each individual represents is brought into respectively pre- Trained CSDBT model, by the new CSDBT model of acquisition, operation obtains correct rate score on verifying collection, the accuracy number Value is the fitness value of a feasible solution;
S34: according to current individual with the fitness value of the corresponding individual of previous generation, the higher individual of fitness is therefrom selected, is abandoned Show the individual of difference;
S35: the mutation operator and crossover operator of adaptive adjustment population;
S36: it returns to S32 step and continues iteration, until reaching the maximum number of iterations of setting, and select fitness in last generation Highest individual is used as optimal solution.
5. a kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive according to claim 1, special Sign also resides in: in S4 specifically in the following way:
S41: on the CSDBT model for completing to instruct in advance, in conjunction with the misclassification cost matrix that optimization obtains, input is by pretreated Test set;
S42:CSDBT model will return to nominator's list that a length is K for each sample in test set.
CN201811528909.9A 2018-12-13 2018-12-13 A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive Pending CN109615242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811528909.9A CN109615242A (en) 2018-12-13 2018-12-13 A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811528909.9A CN109615242A (en) 2018-12-13 2018-12-13 A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive

Publications (1)

Publication Number Publication Date
CN109615242A true CN109615242A (en) 2019-04-12

Family

ID=66008134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811528909.9A Pending CN109615242A (en) 2018-12-13 2018-12-13 A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive

Country Status (1)

Country Link
CN (1) CN109615242A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471854A (en) * 2019-08-20 2019-11-19 大连海事大学 A kind of defect report assigning method based on high dimensional data mixing reduction
CN111309907A (en) * 2020-02-10 2020-06-19 大连海事大学 Real-time Bug assignment method based on deep reinforcement learning
US11714743B2 (en) 2021-05-24 2023-08-01 Red Hat, Inc. Automated classification of defective code from bug tracking tool data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104009824A (en) * 2014-06-01 2014-08-27 张喆 Pilot assisted data fusion method based on differential evolution in base station coordination uplink system
CN107480141A (en) * 2017-08-29 2017-12-15 南京大学 It is a kind of that allocating method is aided in based on the software defect of text and developer's liveness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104009824A (en) * 2014-06-01 2014-08-27 张喆 Pilot assisted data fusion method based on differential evolution in base station coordination uplink system
CN107480141A (en) * 2017-08-29 2017-12-15 南京大学 It is a kind of that allocating method is aided in based on the software defect of text and developer's liveness

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋良孝,李超群编: "《贝叶斯网络分类器 算法与应用》", 31 December 2015 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471854A (en) * 2019-08-20 2019-11-19 大连海事大学 A kind of defect report assigning method based on high dimensional data mixing reduction
CN110471854B (en) * 2019-08-20 2023-02-03 大连海事大学 Defect report assignment method based on high-dimensional data hybrid reduction
CN111309907A (en) * 2020-02-10 2020-06-19 大连海事大学 Real-time Bug assignment method based on deep reinforcement learning
US11714743B2 (en) 2021-05-24 2023-08-01 Red Hat, Inc. Automated classification of defective code from bug tracking tool data

Similar Documents

Publication Publication Date Title
US11500818B2 (en) Method and system for large scale data curation
CN107239529B (en) Public opinion hotspot category classification method based on deep learning
CN110570920B (en) Entity and relationship joint learning method based on concentration model
US7685082B1 (en) System and method for identifying, prioritizing and encapsulating errors in accounting data
US20220319706A1 (en) A drgs automatic grouping method based on a convolutional neural network
CN110660478A (en) Cancer image prediction and discrimination method and system based on transfer learning
CN109615242A (en) A kind of software bug allocating method based on Recognition with Recurrent Neural Network and cost-sensitive
WO2020224433A1 (en) Target object attribute prediction method based on machine learning and related device
CN110413775A (en) A kind of data label classification method, device, terminal and storage medium
CN111785387B (en) Method and system for classifying disease standardization mapping by using Bert
CN106250311A (en) Repeated defects based on LDA model report detection method
Mgala et al. Data-driven intervention-level prediction modeling for academic performance
CN109255029A (en) A method of automatic Bug report distribution is enhanced using weighted optimization training set
CN113723312A (en) Visual transform-based rice disease identification method
CN111242565A (en) Resume optimization method and device based on intelligent personnel model
CN113705215A (en) Meta-learning-based large-scale multi-label text classification method
US20140244293A1 (en) Method and system for propagating labels to patient encounter data
Wang et al. Coad: Automatic diagnosis through symptom and disease collaborative generation
CN113705159A (en) Merchant name labeling method, device, equipment and storage medium
Arifin et al. Comparative analysis on educational data mining algorithm to predict academic performance
CN116631626A (en) Patient clinical risk assessment method, device, equipment and medium
CN109948782A (en) A kind of multi-targets recognition optimization method neural network based
CN114492386A (en) Combined detection method for drug name and adverse drug reaction in web text
Semenov et al. Implementation of a clinical decision support system for interpretation of laboratory tests for patients
CN110147830A (en) Training image data generates method, image data classification method and the device of network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190412

RJ01 Rejection of invention patent application after publication