CN107957929B - Software defect report repair personnel distribution method based on topic model - Google Patents

Software defect report repair personnel distribution method based on topic model Download PDF

Info

Publication number
CN107957929B
CN107957929B CN201711160414.0A CN201711160414A CN107957929B CN 107957929 B CN107957929 B CN 107957929B CN 201711160414 A CN201711160414 A CN 201711160414A CN 107957929 B CN107957929 B CN 107957929B
Authority
CN
China
Prior art keywords
defect
defect report
developers
developer
report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711160414.0A
Other languages
Chinese (zh)
Other versions
CN107957929A (en
Inventor
吴芳芳
顾庆
陈道蓄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201711160414.0A priority Critical patent/CN107957929B/en
Publication of CN107957929A publication Critical patent/CN107957929A/en
Application granted granted Critical
Publication of CN107957929B publication Critical patent/CN107957929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • G06Q10/063112Skill-based matching of a person or a group to a task
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention discloses a software defect report repairing person distribution method based on a topic model, which fully excavates the implicit semantic information of a defect report by utilizing the topic model, measures the experience of developers based on the repaired defect report information and the repairing time, simultaneously considers the work load balance of the developers, and calculates the matching degree of the developers and a target defect report so as to recommend proper developers. The invention has simple calculation and strong universality and expansibility, can quickly and effectively distribute personnel for the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.

Description

Software defect report repair personnel distribution method based on topic model
Technical Field
The invention relates to a method for distributing repair personnel of a software defect report in the field of software engineering, in particular to a method for distributing repair personnel of a software defect report based on a topic model.
Background
Software defects are inevitable in the development and maintenance process of software, and the repair of the software defects is a task which is high in difficulty and consumes a large amount of manpower and material resources. The large-scale software project uses a defect tracking tool and a database to collect, organize and monitor the state of the defect report, users, developers and testers of the software system can submit the defect report to the defect tracking database, and quality management personnel can classify and distribute the defect according to the submitted defect report. The repair tasks of the defect reports are distributed to proper developers based on the content and the field related to the defect reports and by combining with the professional knowledge of the developers, and the process is distributed to the developers of the defect reports. Accurate and timely defect report distribution plays a key role and significance in software quality assurance and defect repair.
With the explosive growth of software scale, the number of developers has also increased dramatically, making it increasingly difficult to understand the state, workload, and expertise of the developers. Manually completing the assignment of defect reports becomes a complex process, prone to error and time consuming. Therefore, there is a need to use automatic defect report assignment methods based on machine learning or information retrieval. The method based on machine learning regards the assignment of defect reports as a classification problem, regards the domain knowledge and text content of the defect reports as features, regards the behavior of developers as labels, regards the history repaired defect reports as training data, and predicts the most appropriate developers for new defect reports. The method based on information retrieval converts the defect reports into keyword vectors, and the main idea is that developers with similar professional knowledge and experience can better process the defects of a specific type, so that the keyword retrieval is adopted to distribute new defect reports to developers who repair similar historical defects.
A topic model is a statistical model used to find abstract topics in a large number of documents, each document being represented as a probability distribution over a series of topics by relating words in the document to topics. The topic model overcomes the defect of a document similarity calculation method in the traditional information retrieval technology. A topic represents a concept or aspect that appears as a set of highly related words, with the words in the set defining the topic. For example, a document that introduces a country often chooses to introduce the country from multiple aspects, such as history, geography, politics, and culture, each of which can be considered as a topic, words such as mountains and rivers appear more frequently when the geography is introduced, and words such as music, novels, and drama when the culture is introduced. The probability distribution of a topic is the conditional probability distribution of words in the vocabulary, and the more closely related words to the topic, the greater the conditional probability and vice versa. Based on the difference of training methods, the topic model can be divided into two types, one is plsa (probabilistic Latent Semantic analysis) using expectation maximization EM algorithm, and the other is lda (Latent Dirichlet allocation) using Gibbs sampling method.
The existing automatic personnel allocation method for the defect report usually ignores the influence of time factors, does not consider the current working load of developers in the allocation process, has high calculation complexity and cannot be well adapted to the actual software development and maintenance process.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides the automatic repair personnel distribution method for the defect report based on the theme model, which has strong universality and expansibility, can quickly and effectively distribute personnel to the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.
The invention adopts the following technical scheme for solving the technical problems:
1) sorting the defect report and developer data of the software project; the process is as follows: firstly, collecting historical defect reports of software projects from a defect tracking database, wherein the historical defect reports comprise text data describing defects and developers processing the defect reports; then, sorting data of developers, including statistics of repaired defect reports and distributed defect reports of each developer;
2) training a theme probability distribution vector of a defect report by using a sampling method;
3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer;
4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer;
5) sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees; based on the matching degree calculation of all developers and the target defect report, the developers are ranked from large to small according to the matching degree, and the developers ranked in the front are preferentially recommended to be used as the repairmen of the current defect report.
The process of training the theme probability distribution vector of the defect report by using the sampling method in the step 2) is as follows: firstly, defining a theme, representing a function or a technical point in a software system, and setting the number of the theme as K, wherein the recommended value K is V multiplied by 11%, and V is the total number of all different words in all defect reports; then, a historical defect report forming set B is collected, and a word forming word list V ═ w in all the defect reports is summarized1,w2,...,wNThe number of elements (words) in V is determined by all collected defect reports, which is the total number of all different words in the defect reports; each word in the defect report is associated with a topic, the topic index vector zbRecord the number of the associated topic of the word in the defect report b, the vector dimension is nb,nbThe length of the defect report b, namely the total number of words in the defect report b; vector zbIs k, represents a word at the ith position in the defect report b
Figure BDA0001474824560000021
Is associated to a topic K, K is the topic number, and K is more than or equal to 1 and less than or equal to K; topic distribution vector θ for Defect report bbIs according to zbCalculating a K-dimensional vector, wherein the kth element of the K-dimensional vector is the proportion of words in b associated to a subject K; finally, calculating the theme distribution vectors of all the defect reports by applying a sampling method;
the process of calculating the topic probability distribution vectors of all the defect reports by applying the sampling method in the step 2) is as follows: first, a vector is defined for a topic k
Figure BDA0001474824560000022
Is a word distribution vector with dimension | V |, which represents the probability distribution of words in the vocabulary V on the topic k, | V | refers to the length of the vocabulary V; then the topic probability distribution vector theta for the Defect report bbAnd word distribution vector
Figure BDA0001474824560000023
Defining parameter vectors alpha and beta of prior distribution, alpha is a real number vector in K dimension, beta is a real number vector in | V | dimension, K is the number of subjects, V is a word list, and setting elements in alpha and betaAll values of (1);
then, the topic index vector of each defect report b is updated in an iterative mode
Figure BDA0001474824560000031
Where B is the historical defect report set up to index vector zbReaching a convergent state, i.e. after a previous iteration update
Figure BDA0001474824560000032
After the iteration is updated
Figure BDA0001474824560000033
In the method, the element proportion of the changed value is less than a threshold value sigma, and the recommended value sigma is 0.1%;
index vector in historical defect report set B
Figure BDA0001474824560000034
After reaching the convergence state, calculating the topic probability distribution vector
Figure BDA0001474824560000035
The formula is as follows:
Figure BDA0001474824560000036
wherein n isb[k]Number of words associated to topic k for words in defect report b, nbFor the length of the defect report b, K is the total number of subjects, αkIs the kth component of the parameter vector α of the prior distribution;
the process of updating the theme index vector of the defect report in the step 2) is as follows: given a defect report b, sequentially calculating the ith word in the defect report b
Figure BDA0001474824560000037
A probability associated to K topics, where B1., | B |, i 1., | nbB is a set of historical defect reports, nbFor the length of the defect report b, countThe calculation formula is as follows:
Figure BDA0001474824560000038
wherein the content of the first and second substances,
Figure BDA00014748245600000315
indicating that the word having the subscript i is removed,
Figure BDA00014748245600000316
representing the number of other words in the defect report b associated to the topic k,
Figure BDA0001474824560000039
expressed in the historical defect report set B
Figure BDA00014748245600000310
Total number of times, n, associated to topic k at other locationsbFor the length of the defect report b,
Figure BDA00014748245600000311
is the number of words in B that are associated to a topic K, K being the total number of topics, | V | being the length of the vocabulary V, | V |, alphakAnd betajThe kth and j components of vectors α and β, respectively, j being
Figure BDA00014748245600000312
Subscripts in vocabulary V;
based on the probability distribution calculated by the formula, selecting one theme K from K themes according to the probability to update zb[i]I.e. the topic index vector z of the defect report bbThe ith component of (a).
In the step 3), the process of calculating the experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer is as follows: firstly, defining a memory compression function, describing the influence of time factors on the experience of developers, and giving a defect report b, wherein the memory compression function msd (b) has the following calculation formula:
Figure BDA00014748245600000313
Figure BDA00014748245600000314
wherein T isbThe reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, and the unit of the time period is days; lambda is a memory factor of a developer, and the memory intensity of the developer is described;
then, the experience distribution vector of the developer is counted, and given to the developer d, the experience distribution vector exp (d) of the developer d is calculated as follows:
Figure BDA0001474824560000041
wherein HBdA set of defect reports, θ, representing developers d repairedbIs the subject probability distribution vector for defect report b; msd (b) is the memory squeeze function of report b; exp (d) reflects the cumulative empirical distribution of developers across topics including time factors.
The process of determining the memory factor of the developer in the process of calculating the memory squeeze function in the step 3) is as follows: the memory factor lambda reflects the accumulated working time of developers and represents the enhancement of experienced development; the values of λ are shown in the following table, where YexpRepresents developer working hours in years:
development experience (Y)exp) Year/year Lambda value
Yexp<1 1
1≤Yexp<4 2
4≤Yexp<7 3
Yexp≥7 5
The process of calculating the workload function of the developer in the step 3) based on the defect report data distributed by the developer is as follows: let BdRepresenting the defect report set distributed by the developer d, firstly, the distributed defect report number is normalized to obtain N (B)d) The formula is as follows, whereind′|minAnd | Bd′|maxMinimum and maximum values representing the defect report allocation numbers for all developers, respectively:
Figure BDA0001474824560000042
then defining a work efficiency factor mu of the developer to distinguish the work efficiency of the developers with different experience levels; as shown in the following table:
Figure BDA0001474824560000043
Figure BDA0001474824560000051
finally based on the normalized number of defect reports N (B) that developer d has assignedd) And a work efficiency factor mu, and calculating a work load function Wlod (d) thereof, wherein the formula is as follows:
Figure BDA0001474824560000052
the process of calculating the matching degree between the developer and the target defect report in the step 4) is as follows: first, given a target defect report tb, the subject index vector z is calculated according to the processing procedure of step 2)tbAnd a topic distribution vector θtb
Then, the correlation Cspd (tb, d) between the target defect report tb and the developer d is calculated based on the cosine similarity, and the formula is as follows:
Figure BDA0001474824560000053
where exp (D) is the empirical distribution vector of developer D, D ∈ D, D is the set of all developers, θtbTopic distribution vector, | θ, for target Defect report tbtbAnd | exp (d) | respectively represent the euclidean metrics of the two vectors, i.e., the square root is taken for the sum of squares of the elements.
And finally, introducing a workload function Wlod () of the developer, and calculating the matching degree Match (tb, d) of the defect report tb and the developer d, wherein the formula is as follows:
Match(tb,d)=Wlod(d)×Cspd(tb,d) (9)
compared with the prior art, the invention adopting the technical scheme has the following technical effects:
according to the method, the subject model is utilized to fully mine the implicit semantic information of the defect report, then the experience of developers is measured based on the repaired defect report data and the repair time, and meanwhile the matching degree of the developers and the target defect report is calculated by considering the workload balance problem of the developers so as to recommend the proper developers. The invention has simple calculation and strong universality and expansibility, can quickly and effectively distribute personnel for the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.
Drawings
FIG. 1 is a general framework diagram of a subject model based software bug report repair personnel assignment methodology;
FIG. 2 is a schematic diagram of a bug report of the Eclipse plug-in development environment PDE software;
FIG. 3 is a flow diagram of topic model training based on historical defect reporting data.
Detailed Description
FIG. 1 is an overall framework for a subject model based software bug reporting repair personnel assignment methodology. The input of the invention is historical defect report and repair information of the software project, developer data, distributed defect report data and the current target defect report to be distributed, and the output is top-k recommended developers aiming at the target defect report. The method comprises the following five steps: 1) sorting the defect report and developer data of the software project; 2) training a theme probability distribution vector of a defect report by using a sampling method; 3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer; 4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer; 5) and sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees.
The first step of the present invention is to collate the defect reports and developer data for the software project. Historical defect reports for a software project are first collected from a defect tracking database, which contains textual data describing the defects and developer data for processing the defect reports. Fig. 2 is a screenshot of a repaired defect report, where the defect report is generally divided into an abstract and a detailed description, and the abstract includes: the detailed description part is the detailed description of the defect by the submitter of the defect report.
The work information of the developers is sorted, and the method mainly comprises the following steps: and counting the repaired defect reports and the distributed defect reports of each developer, and summarizing various documents written by the developers in the software project development process.
The second step of the present invention is to train the topic probability distribution vector of the defect report using a sampling method. The defect report is generally written by using a natural language, phenomena of synonyms, word ambiguity and the like often exist, and a defect report submitter may use different words to describe defects of similar types, so that the method adopts an LDA (latent Dirichlet allocation) method in a topic model to mine implicit semantic information of a historical defect report. The software system comprises a plurality of function or technical points, such as a connection database, a loading file and the like, and once the function or technical point which cannot normally operate is found, a defect report is generated, so that the function or technical point of the software system can be regarded as an abstract theme, each defect report can analyze and calculate the probability distribution of the theme, and a developer repairing the defect report can analyze the experience distribution on the corresponding theme. The invention adopts an LDA topic model to express the defect report as a probability distribution vector of the topic.
Given a software system, the functions or technology points involved constitute K topics, the suggested value K is V × 11%, V is the total number of all different words in the total defect report. All collected historical defect reports form a set B, where the words form a vocabulary V ═ w1,w2,...,wNThe number of elements (words) in V is determined by all collected defect reports, which is the total number of all different words in these defect reports. Each word in the defect report is associated with a topic, index vector zbRecord the number of the associated topic of the word in the defect report b, the vector dimension is nb,nbThe length of the defect report b, namely the total number of words in the defect report b; suppose zbIs k, represents a word at the ith position in the defect report b
Figure BDA0001474824560000061
Is associated to a topic K, K is an integer and is the number of K topics, and K is greater than or equal to 1 and less than or equal to K. One K-dimensional topic for each topic in a given defect report bProbability distribution vector thetabIndicating that its elements are normalized probability values, i.e. the sum of all elements is 1, e.g. thetab=[0.3,0.5,0.1,…]Indicating that 30% of the words in defect report b are associated with the first topic, 50% of the words are associated with the second topic, and so on.
Figure BDA0001474824560000071
Is a word distribution vector with dimension | V |, representing the probability distribution of words in the vocabulary V on the topic K, where K is an integer, is the number of K topics, and K is greater than or equal to 1 and less than or equal to K. Topic probability distribution vector
Figure BDA0001474824560000072
And word distribution vector
Figure BDA0001474824560000073
The parameter vectors of prior distribution are respectively set as alpha and beta, alpha is a real number vector of K dimension, beta is a real number vector of | V | dimension, and the theme is assumed to be uniformly distributed in the defect report and the word on the theme, so the values of the elements in the parameters alpha and beta can be all 1.
The Gibbs sampling method is a random analog sampling algorithm, and provides a simpler approximate calculation method for parameter derivation of a high-dimensional probability model. The Gibbs sampling performs approximate sampling on the given high-dimensional joint probability distribution in a dimension rotation mode, namely randomly selecting any one dimension and then performing transition according to conditional probability until the probability distribution reaches a convergence state.
The process of training the LDA model comprises the steps of sampling words and related topics in the defect report by using a Gibbs sampling method, calculating and updating the topics of the words, and iterating the sampling process for multiple times until the distribution of the topics in the defect report reaches a final convergence state, wherein a topic probability distribution vector theta of a defect report bbCalculations are performed based on the samples obtained from the final sampling. The method comprises the following specific steps: first, the index vectors of all defect reports are randomly initialized
Figure BDA00014748245600000714
Then based on the Gibbs sampling formula and the index vector zbSequentially calculating the ith word in the defect report b
Figure BDA00014748245600000715
A probability associated to K topics, where B1., | B |, i 1., | nbWhere B is a set of historical defect reports, nbFor the length of the defect report b, the probability calculation formula is as follows:
Figure BDA0001474824560000074
in the above formula, the first and second carbon atoms are,
Figure BDA0001474824560000075
indicating that the word having the subscript i is removed,
Figure BDA0001474824560000076
representing the number of other words in the defect report b associated to the topic k,
Figure BDA0001474824560000077
expressed in the historical defect report set B
Figure BDA0001474824560000078
Total number of times, n, associated to topic k at other locationsbFor the length of the defect report b,
Figure BDA0001474824560000079
representing the number of words in the historical defect report set B that are associated with a topic K, where K is the total number of topics, | V | is the total number of different words in the historical defect report set B, and αkAnd betajThe kth and j components of vectors α and β, respectively, j being
Figure BDA00014748245600000710
Subscript in the vocabulary V.
From K probability distributions calculated based on equation (1)Selecting one theme k from the themes according to the probability to update zb[i]Wherein B1., | B |, i 1., | nb. The process iterates several times until the index vector zbReaching a convergent state, i.e. after a previous iteration update
Figure BDA00014748245600000711
After the iteration is updated
Figure BDA00014748245600000712
The element proportion of which the value changes is less than a threshold value sigma, and the proposed value sigma is 0.1%.
Index vector in defect report set B
Figure BDA00014748245600000713
After reaching the convergence state, calculating a topic probability distribution vector based on the final sample statistical data
Figure BDA0001474824560000081
The formula is as follows:
Figure BDA0001474824560000082
nb[k]number of words associated to topic k for words in defect report b, nbFor the length of the defect report b, K is the total number of subjects, αkIs the kth component of the parameter vector a of the prior distribution.
The third step of the invention is to calculate the experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; based on the defect reporting data that the developer has assigned, a developer workload function is calculated. The number of defect reports repaired by the developer reflects the level of experience of the developer in repairing the defects, i.e., the more defect reports are processed, the more experience the developer has in repairing the defects, and the more confident the developer is in repairing a new defect. However, if the repair date of a defect report has elapsed a long time from the current date, developers generally forget to repair the defect report gradually as time elapses. Therefore, the influence of a time factor on the experience of the developer is firstly described by using a memory squeeze function, the function value is between 0 and 1, the longer the repair date of the defect report is away from the current time, the smaller the function value is, the smaller the contribution of repairing the defect report to the current experience level of the developer is, and the definition formula of the memory squeeze function is as follows:
Figure BDA0001474824560000085
Figure BDA0001474824560000083
wherein T isbThe reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, and the unit of the time period is days. Lambda is a memory factor of the developer and characterizes the memory strength of the developer. The memory factor value of advanced developers who have long accumulated time in development work is high, because the advanced developers can reinforce past experience when repairing a certain defect, and novice developers accumulate a new experience, the memory factor value is relatively low. The values of λ are defined in the following table:
development experience (Y)exp) Year/year Lambda value
Yexp<1 1
1≤Yexp<4 2
4≤Yexp<7 3
Yexp≥7 5
TABLE 1 developer memory factor lambda value
The definition of the experience distribution vector of the developer is based on the time weight accumulation of LDA topic probability distributions of all the repaired defect reports on K topics, so the formula of the experience distribution vector of the developer d is defined as follows:
Figure BDA0001474824560000084
wherein HBdA set of historical bug reports, θ, representing that developer d has repairedbIs the subject probability distribution vector for defect report b. Therefore, the developer experience distribution vector calculated by the above formula reflects the accumulated experience value of the developer including the time factor on each topic.
Regardless of the current workload of developers, some developers with higher experience may be assigned too many bug reports, while those with lower experience are idle, which may result in not only prolonging bug fix periods, but also even some developers may be relegated to bug reports because they are overwhelmed. Therefore, to avoid a few developers being assigned excessive defect reports, it is necessary to define their workload functions according to the defect report data that the developers have assigned. Let BdRepresenting the defect report set distributed by the developer d, firstly, the distributed defect report quantity is normalized, | Bd′|minAnd | Bd′|maxThe minimum value and the maximum value of the distribution quantity of the defect reports of all developers are respectively expressed, and the formula is as follows:
Figure BDA0001474824560000091
also, the work efficiency of a high-level developer who accumulates a long time of working on development work is generally higher than that of a novice developer, and thus the work efficiency factor μ of the developer is defined to distinguish the work efficiency of developers of different experience levels. μ is defined in the following table:
development experience (Y)exp) Year/year Mu value
Yexp<1 0.8
1≤Yexp<4 1
4≤Yexp<7 1.2
Yexp≥7 1.5
TABLE 2 developer work efficiency factor μ values
Finally based on the number of defect reports | B that developer d has allocatedd| and the work efficiency factor μ define its workload function:
Figure BDA0001474824560000092
the fourth step of the invention is to give a defect report, and calculate the matching degree of the developer and the target defect report by combining the experience distribution and the workload of the developer. Firstly, based on the LDA model training process of the historical defect report in the second step, the index vector z of the target defect report tb which needs to be distributed currently is calculated by adopting the final sample datatbAnd a topic probability distribution vector thetatb
Topic probability distribution vector θtbReflecting the distribution information of the target defect report tb on the K topics, and the experience distribution vector of the developer calculated in the step three reflects the experience values of the developer on the K topics, so the cosine similarity is used to measure the correlation between the target defect report tb and the developer d, and the calculation formula is as follows:
Figure BDA0001474824560000101
where exp (D) is the empirical distribution of developer D, D ∈ D, is the set of all developers, θtbTopic probability distribution vector, | θ, for target defect report tbtbAnd | exp (d) | respectively represent the euclidean metrics of the two vectors, and the square root is taken for the square sum of the elements. To avoid that some developers with higher experience level are allocated excessive defect reports, the workload balancing problem among the developers needs to be considered, and therefore the formula for calculating the matching degree of the current defect report tb and the developer d is obtained as follows:
Match(tb,d)=Wlod(d)×Cspd(tb,d) (9)
the fifth step of the invention is to sort the matching degree of the target defect reports of the developers in a descending order to complete the recommendation of the developers. And (4) calculating the matching degrees of all developers and the target defect report according to the formula (9), and sequencing the developers from large to small based on the matching degrees, wherein the developers in the front row are regarded as the developers which are preferentially recommended and allocated for the current defect report.
According to the method, the subject model is utilized to fully mine the implicit semantic information of the defect report, then the experience of developers is measured based on the repaired defect report data and the repair time, and meanwhile the matching degree of the developers and the target defect report is calculated by considering the workload balance problem of the developers so as to recommend the proper developers. The invention has simple calculation and strong universality and expansibility, can quickly and effectively distribute personnel for the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.
The specific application of the process of the invention is numerous and the above description is only a preferred embodiment of the invention. It should be noted that modifications can be made by those skilled in the art without departing from the principle of the present invention, and these modifications should also be construed as the scope of the present invention.

Claims (6)

1. A software defect report repairer allocation method based on a topic model is characterized in that the method utilizes the topic model to mine implicit semantic information of a defect report, then measures the experience of developers based on repaired defect report data and repair time, and calculates the matching degree of the developers and a target defect report by considering the workload balance problem of the developers so as to recommend proper developers;
the work load function Wlod (d) has the following formula:
Figure FDA0002820570350000011
wherein
N(Bd) Carrying out normalization processing on the distributed defect report quantity to obtain data; mu is a work efficiency factor of a developer;
the method comprises the following five steps: 1) sorting the defect report and developer data of the software project; 2) training a theme probability distribution vector of a defect report by using a sampling method; 3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer; 4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer; 5) and sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees.
2. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 1) is specifically as follows:
collecting historical defect reports for the software project from a defect tracking database, wherein the historical defect reports comprise text data describing the defects and developer data for processing the defect reports;
arranging the work information of the developers, wherein the work information comprises: each developer's repaired defect report and assigned defect report; and summarize all kinds of documents that the development personnel wrote in the development process of the software project.
3. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 2) is specifically:
firstly, randomly initializing index vectors of all defect reports, and then sequentially calculating the probability that the ith word is associated to K subjects in the defect reports based on a Gibbs sampling formula and the index vectors;
selecting a theme from K themes for updating according to probability, and iterating the process for a plurality of times until the index vector reaches a convergence state, namely, the ratio of elements with changed values in the index vector after the last iteration update and the index vector after the current iteration update is smaller than a set threshold value;
and finally, calculating a theme probability distribution vector based on the final sample statistical data after the index vectors in the defect report set reach a convergence state.
4. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 3) is specifically:
firstly, the influence of time factors on the experience of developers is described by using a memory squeezing function; secondly, defining an experience distribution vector of a developer based on time weight accumulation of LDA subject probability distributions of all repaired defect reports on K subjects respectively; defining a work load function according to the defect report data distributed by the developer, and finally defining the work load function based on the number of the defect reports distributed by the developer and the work efficiency;
the definition formula of the memory compression function is shown as follows:
Figure FDA0002820570350000021
Figure FDA0002820570350000022
wherein the content of the first and second substances,
wherein T isbAnd the reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, the unit of the time period is days, and the lambda is a memory factor of the developer, so that the memory strength of the developer is described.
5. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 4) is specifically: firstly, based on the LDA model training process of the historical defect report in the step 2), calculating an index vector and a theme probability distribution vector of a target defect report which needs to be distributed currently by adopting final sample data; and measuring the correlation between the target defect report and the developers by utilizing the cosine similarity, and finally, considering the problem of workload balance among the developers to obtain the relation between the matching degree of the current defect report and the developers.
6. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 5) is specifically: and 4) sequencing the developers from large to small according to the matching degree obtained in the step 4), wherein the developers in the front row are regarded as the developers which are preferentially recommended and allocated according to the current defect report.
CN201711160414.0A 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model Active CN107957929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711160414.0A CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711160414.0A CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Publications (2)

Publication Number Publication Date
CN107957929A CN107957929A (en) 2018-04-24
CN107957929B true CN107957929B (en) 2021-02-26

Family

ID=61963905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711160414.0A Active CN107957929B (en) 2017-11-20 2017-11-20 Software defect report repair personnel distribution method based on topic model

Country Status (1)

Country Link
CN (1) CN107957929B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165382B (en) * 2018-08-03 2022-08-23 南京工业大学 Similar defect report recommendation method combining weighted word vector and potential semantic analysis
CN109299007A (en) * 2018-09-18 2019-02-01 哈尔滨工程大学 A kind of defect repair person's auto recommending method
WO2020210947A1 (en) * 2019-04-15 2020-10-22 Entit Software Llc Using machine learning to assign developers to software defects
CN110348712A (en) * 2019-06-28 2019-10-18 北京银企融合技术开发有限公司 Software developer's configuration method, system, electronic equipment and storage medium
CN110597490A (en) * 2019-08-26 2019-12-20 珠海格力电器股份有限公司 Software development demand distribution method and device
CN113094095B (en) * 2021-03-26 2024-03-22 海信集团控股股份有限公司 Agile development progress determining method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639829A (en) * 2009-08-28 2010-02-03 中国科学院软件研究所 Software bug report and distribution method and system
CN103246603A (en) * 2013-03-21 2013-08-14 中国科学院软件研究所 Automatic distribution method for software bug reports of bug tracking system
CN103970667A (en) * 2014-05-30 2014-08-06 深圳市茁壮网络股份有限公司 Defect management platform based defect assigning method and system
CN105446734A (en) * 2015-10-14 2016-03-30 扬州大学 Software development history-based developer network relation construction method
CN106126736A (en) * 2016-06-30 2016-11-16 扬州大学 Software developer's personalized recommendation method that software-oriented safety bug repairs

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8352904B2 (en) * 2008-06-24 2013-01-08 International Business Machines Corporation Early defect removal model
US10067855B2 (en) * 2013-01-31 2018-09-04 Entit Software Llc Error developer association

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639829A (en) * 2009-08-28 2010-02-03 中国科学院软件研究所 Software bug report and distribution method and system
CN103246603A (en) * 2013-03-21 2013-08-14 中国科学院软件研究所 Automatic distribution method for software bug reports of bug tracking system
CN103970667A (en) * 2014-05-30 2014-08-06 深圳市茁壮网络股份有限公司 Defect management platform based defect assigning method and system
CN105446734A (en) * 2015-10-14 2016-03-30 扬州大学 Software development history-based developer network relation construction method
CN106126736A (en) * 2016-06-30 2016-11-16 扬州大学 Software developer's personalized recommendation method that software-oriented safety bug repairs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于LDA主题模型的软件缺陷分派方法;黄小亮等;《计算机工程》;20111130;第37卷(第21期);全文 *

Also Published As

Publication number Publication date
CN107957929A (en) 2018-04-24

Similar Documents

Publication Publication Date Title
CN107957929B (en) Software defect report repair personnel distribution method based on topic model
CN109558429B (en) Talent service bidirectional recommendation method and system based on Internet big data
Parra et al. A methodology for the classification of quality of requirements using machine learning techniques
CN110928764A (en) Automated mobile application crowdsourcing test report evaluation method and computer storage medium
US20160019282A1 (en) Discovery management method and system
CN109376247B (en) Automatic software defect classification method based on association rules
Kusonkhum et al. Government construction project budget prediction using machine learning
CN111079941A (en) Credit information system combining expert experience model and supervised machine learning algorithm
CN111126865B (en) Technology maturity judging method and system based on technology big data
CN116739541A (en) Intelligent talent matching method and system based on AI technology
Chao Estimating project overheads rate in bidding: DSS approach using neural networks
CN109359288B (en) Method for quantitatively evaluating documents in legal field
CN108182512B (en) Resume evaluation method and device
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN112163132A (en) Data labeling method and device, storage medium and electronic equipment
CN116128544A (en) Active auditing method and system for electric power marketing abnormal business data
CN115292167A (en) Life cycle prediction model construction method, device, equipment and readable storage medium
Sianturi et al. Implementation of The Moora Method in Determining Candidates for Village Head
CN110442862B (en) Data processing method and device based on recruitment information
CN108182513B (en) Resume evaluation method and device
CN113313470A (en) Employment type evaluation method and system based on big data
CN113780366B (en) Crowd-sourced test report clustering method based on AP neighbor propagation algorithm
Murpratiwi et al. Accuracy Analysis of Predictive Value in Transaction Data of Service Company Using Combination of K-Means Clustering and Time Series Methods
Azadeh et al. Optimization of human resources and industrial banks with ambiguous inputs using intelligent fuzzy mathematical programming approach
Makris et al. Classification model for predicting cost slippage in governmental ICT projects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant