CN107957929B

CN107957929B - Software defect report repair personnel distribution method based on topic model

Info

Publication number: CN107957929B
Application number: CN201711160414.0A
Authority: CN
Inventors: 吴芳芳; 顾庆; 陈道蓄
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2017-11-20
Filing date: 2017-11-20
Publication date: 2021-02-26
Anticipated expiration: 2037-11-20
Also published as: CN107957929A

Abstract

The invention discloses a software defect report repairing person distribution method based on a topic model, which fully excavates the implicit semantic information of a defect report by utilizing the topic model, measures the experience of developers based on the repaired defect report information and the repairing time, simultaneously considers the work load balance of the developers, and calculates the matching degree of the developers and a target defect report so as to recommend proper developers. The invention has simple calculation and strong universality and expansibility, can quickly and effectively distribute personnel for the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.

Description

Software defect report repair personnel distribution method based on topic model

Technical Field

The invention relates to a method for distributing repair personnel of a software defect report in the field of software engineering, in particular to a method for distributing repair personnel of a software defect report based on a topic model.

Background

Software defects are inevitable in the development and maintenance process of software, and the repair of the software defects is a task which is high in difficulty and consumes a large amount of manpower and material resources. The large-scale software project uses a defect tracking tool and a database to collect, organize and monitor the state of the defect report, users, developers and testers of the software system can submit the defect report to the defect tracking database, and quality management personnel can classify and distribute the defect according to the submitted defect report. The repair tasks of the defect reports are distributed to proper developers based on the content and the field related to the defect reports and by combining with the professional knowledge of the developers, and the process is distributed to the developers of the defect reports. Accurate and timely defect report distribution plays a key role and significance in software quality assurance and defect repair.

With the explosive growth of software scale, the number of developers has also increased dramatically, making it increasingly difficult to understand the state, workload, and expertise of the developers. Manually completing the assignment of defect reports becomes a complex process, prone to error and time consuming. Therefore, there is a need to use automatic defect report assignment methods based on machine learning or information retrieval. The method based on machine learning regards the assignment of defect reports as a classification problem, regards the domain knowledge and text content of the defect reports as features, regards the behavior of developers as labels, regards the history repaired defect reports as training data, and predicts the most appropriate developers for new defect reports. The method based on information retrieval converts the defect reports into keyword vectors, and the main idea is that developers with similar professional knowledge and experience can better process the defects of a specific type, so that the keyword retrieval is adopted to distribute new defect reports to developers who repair similar historical defects.

A topic model is a statistical model used to find abstract topics in a large number of documents, each document being represented as a probability distribution over a series of topics by relating words in the document to topics. The topic model overcomes the defect of a document similarity calculation method in the traditional information retrieval technology. A topic represents a concept or aspect that appears as a set of highly related words, with the words in the set defining the topic. For example, a document that introduces a country often chooses to introduce the country from multiple aspects, such as history, geography, politics, and culture, each of which can be considered as a topic, words such as mountains and rivers appear more frequently when the geography is introduced, and words such as music, novels, and drama when the culture is introduced. The probability distribution of a topic is the conditional probability distribution of words in the vocabulary, and the more closely related words to the topic, the greater the conditional probability and vice versa. Based on the difference of training methods, the topic model can be divided into two types, one is plsa (probabilistic Latent Semantic analysis) using expectation maximization EM algorithm, and the other is lda (Latent Dirichlet allocation) using Gibbs sampling method.

The existing automatic personnel allocation method for the defect report usually ignores the influence of time factors, does not consider the current working load of developers in the allocation process, has high calculation complexity and cannot be well adapted to the actual software development and maintenance process.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides the automatic repair personnel distribution method for the defect report based on the theme model, which has strong universality and expansibility, can quickly and effectively distribute personnel to the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.

The invention adopts the following technical scheme for solving the technical problems:

1) sorting the defect report and developer data of the software project; the process is as follows: firstly, collecting historical defect reports of software projects from a defect tracking database, wherein the historical defect reports comprise text data describing defects and developers processing the defect reports; then, sorting data of developers, including statistics of repaired defect reports and distributed defect reports of each developer;

2) training a theme probability distribution vector of a defect report by using a sampling method;

3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer;

4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer;

5) sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees; based on the matching degree calculation of all developers and the target defect report, the developers are ranked from large to small according to the matching degree, and the developers ranked in the front are preferentially recommended to be used as the repairmen of the current defect report.

The process of training the theme probability distribution vector of the defect report by using the sampling method in the step 2) is as follows: firstly, defining a theme, representing a function or a technical point in a software system, and setting the number of the theme as K, wherein the recommended value K is V multiplied by 11%, and V is the total number of all different words in all defect reports; then, a historical defect report forming set B is collected, and a word forming word list V ═ w in all the defect reports is summarized₁,w₂,...,w_NThe number of elements (words) in V is determined by all collected defect reports, which is the total number of all different words in the defect reports; each word in the defect report is associated with a topic, the topic index vector z_bRecord the number of the associated topic of the word in the defect report b, the vector dimension is n_b，n_bThe length of the defect report b, namely the total number of words in the defect report b; vector z_bIs k, represents a word at the ith position in the defect report b

Is associated to a topic K, K is the topic number, and K is more than or equal to 1 and less than or equal to K; topic distribution vector θ for Defect report b_bIs according to z_bCalculating a K-dimensional vector, wherein the kth element of the K-dimensional vector is the proportion of words in b associated to a subject K; finally, calculating the theme distribution vectors of all the defect reports by applying a sampling method;

the process of calculating the topic probability distribution vectors of all the defect reports by applying the sampling method in the step 2) is as follows: first, a vector is defined for a topic k

Is a word distribution vector with dimension | V |, which represents the probability distribution of words in the vocabulary V on the topic k, | V | refers to the length of the vocabulary V; then the topic probability distribution vector theta for the Defect report b_bAnd word distribution vector

Defining parameter vectors alpha and beta of prior distribution, alpha is a real number vector in K dimension, beta is a real number vector in | V | dimension, K is the number of subjects, V is a word list, and setting elements in alpha and betaAll values of (1);

then, the topic index vector of each defect report b is updated in an iterative mode

Where B is the historical defect report set up to index vector z_bReaching a convergent state, i.e. after a previous iteration update

After the iteration is updated

In the method, the element proportion of the changed value is less than a threshold value sigma, and the recommended value sigma is 0.1%;

index vector in historical defect report set B

After reaching the convergence state, calculating the topic probability distribution vector

The formula is as follows:

wherein n is_b[k]Number of words associated to topic k for words in defect report b, n_bFor the length of the defect report b, K is the total number of subjects, α_kIs the kth component of the parameter vector α of the prior distribution;

the process of updating the theme index vector of the defect report in the step 2) is as follows: given a defect report b, sequentially calculating the ith word in the defect report b

A probability associated to K topics, where B1., | B |, i 1., | n_bB is a set of historical defect reports, n_bFor the length of the defect report b, countThe calculation formula is as follows:

wherein the content of the first and second substances,

indicating that the word having the subscript i is removed,

representing the number of other words in the defect report b associated to the topic k,

expressed in the historical defect report set B

Total number of times, n, associated to topic k at other locations_bFor the length of the defect report b,

is the number of words in B that are associated to a topic K, K being the total number of topics, | V | being the length of the vocabulary V, | V |, alpha_kAnd beta_jThe kth and j components of vectors α and β, respectively, j being

Subscripts in vocabulary V;

based on the probability distribution calculated by the formula, selecting one theme K from K themes according to the probability to update z_b[i]I.e. the topic index vector z of the defect report b_bThe ith component of (a).

In the step 3), the process of calculating the experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer is as follows: firstly, defining a memory compression function, describing the influence of time factors on the experience of developers, and giving a defect report b, wherein the memory compression function msd (b) has the following calculation formula:

wherein T is_bThe reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, and the unit of the time period is days; lambda is a memory factor of a developer, and the memory intensity of the developer is described;

then, the experience distribution vector of the developer is counted, and given to the developer d, the experience distribution vector exp (d) of the developer d is calculated as follows:

wherein HB_dA set of defect reports, θ, representing developers d repaired_bIs the subject probability distribution vector for defect report b; msd (b) is the memory squeeze function of report b; exp (d) reflects the cumulative empirical distribution of developers across topics including time factors.

The process of determining the memory factor of the developer in the process of calculating the memory squeeze function in the step 3) is as follows: the memory factor lambda reflects the accumulated working time of developers and represents the enhancement of experienced development; the values of λ are shown in the following table, where Y_expRepresents developer working hours in years:

development experience (Y)_exp) Year/year	Lambda value
		Y_exp<1	1
1≤Y_exp<4	2
		4≤Y_exp<7	3
Y_exp≥7	5

The process of calculating the workload function of the developer in the step 3) based on the defect report data distributed by the developer is as follows: let B_dRepresenting the defect report set distributed by the developer d, firstly, the distributed defect report number is normalized to obtain N (B)_d) The formula is as follows, wherein_d′|_minAnd | B_d′|_maxMinimum and maximum values representing the defect report allocation numbers for all developers, respectively:

then defining a work efficiency factor mu of the developer to distinguish the work efficiency of the developers with different experience levels; as shown in the following table:

finally based on the normalized number of defect reports N (B) that developer d has assigned_d) And a work efficiency factor mu, and calculating a work load function Wlod (d) thereof, wherein the formula is as follows:

the process of calculating the matching degree between the developer and the target defect report in the step 4) is as follows: first, given a target defect report tb, the subject index vector z is calculated according to the processing procedure of step 2)_tbAnd a topic distribution vector θ_tb；

Then, the correlation Cspd (tb, d) between the target defect report tb and the developer d is calculated based on the cosine similarity, and the formula is as follows:

where exp (D) is the empirical distribution vector of developer D, D ∈ D, D is the set of all developers, θ_tbTopic distribution vector, | θ, for target Defect report tb_tbAnd | exp (d) | respectively represent the euclidean metrics of the two vectors, i.e., the square root is taken for the sum of squares of the elements.

And finally, introducing a workload function Wlod () of the developer, and calculating the matching degree Match (tb, d) of the defect report tb and the developer d, wherein the formula is as follows:

Match(tb,d)＝Wlod(d)×Cspd(tb,d) (9)

compared with the prior art, the invention adopting the technical scheme has the following technical effects:

according to the method, the subject model is utilized to fully mine the implicit semantic information of the defect report, then the experience of developers is measured based on the repaired defect report data and the repair time, and meanwhile the matching degree of the developers and the target defect report is calculated by considering the workload balance problem of the developers so as to recommend the proper developers. The invention has simple calculation and strong universality and expansibility, can quickly and effectively distribute personnel for the defect report, improves the defect repair efficiency, and is suitable for the development and maintenance process of large-scale software products.

Drawings

FIG. 1 is a general framework diagram of a subject model based software bug report repair personnel assignment methodology;

FIG. 2 is a schematic diagram of a bug report of the Eclipse plug-in development environment PDE software;

FIG. 3 is a flow diagram of topic model training based on historical defect reporting data.

Detailed Description

FIG. 1 is an overall framework for a subject model based software bug reporting repair personnel assignment methodology. The input of the invention is historical defect report and repair information of the software project, developer data, distributed defect report data and the current target defect report to be distributed, and the output is top-k recommended developers aiming at the target defect report. The method comprises the following five steps: 1) sorting the defect report and developer data of the software project; 2) training a theme probability distribution vector of a defect report by using a sampling method; 3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer; 4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer; 5) and sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees.

The first step of the present invention is to collate the defect reports and developer data for the software project. Historical defect reports for a software project are first collected from a defect tracking database, which contains textual data describing the defects and developer data for processing the defect reports. Fig. 2 is a screenshot of a repaired defect report, where the defect report is generally divided into an abstract and a detailed description, and the abstract includes: the detailed description part is the detailed description of the defect by the submitter of the defect report.

The work information of the developers is sorted, and the method mainly comprises the following steps: and counting the repaired defect reports and the distributed defect reports of each developer, and summarizing various documents written by the developers in the software project development process.

The second step of the present invention is to train the topic probability distribution vector of the defect report using a sampling method. The defect report is generally written by using a natural language, phenomena of synonyms, word ambiguity and the like often exist, and a defect report submitter may use different words to describe defects of similar types, so that the method adopts an LDA (latent Dirichlet allocation) method in a topic model to mine implicit semantic information of a historical defect report. The software system comprises a plurality of function or technical points, such as a connection database, a loading file and the like, and once the function or technical point which cannot normally operate is found, a defect report is generated, so that the function or technical point of the software system can be regarded as an abstract theme, each defect report can analyze and calculate the probability distribution of the theme, and a developer repairing the defect report can analyze the experience distribution on the corresponding theme. The invention adopts an LDA topic model to express the defect report as a probability distribution vector of the topic.

Given a software system, the functions or technology points involved constitute K topics, the suggested value K is V × 11%, V is the total number of all different words in the total defect report. All collected historical defect reports form a set B, where the words form a vocabulary V ═ w₁,w₂,...,w_NThe number of elements (words) in V is determined by all collected defect reports, which is the total number of all different words in these defect reports. Each word in the defect report is associated with a topic, index vector z_bRecord the number of the associated topic of the word in the defect report b, the vector dimension is n_b，n_bThe length of the defect report b, namely the total number of words in the defect report b; suppose z_bIs k, represents a word at the ith position in the defect report b

Is associated to a topic K, K is an integer and is the number of K topics, and K is greater than or equal to 1 and less than or equal to K. One K-dimensional topic for each topic in a given defect report bProbability distribution vector theta_bIndicating that its elements are normalized probability values, i.e. the sum of all elements is 1, e.g. theta_b＝[0.3,0.5,0.1,…]Indicating that 30% of the words in defect report b are associated with the first topic, 50% of the words are associated with the second topic, and so on.

Is a word distribution vector with dimension | V |, representing the probability distribution of words in the vocabulary V on the topic K, where K is an integer, is the number of K topics, and K is greater than or equal to 1 and less than or equal to K. Topic probability distribution vector

And word distribution vector

The parameter vectors of prior distribution are respectively set as alpha and beta, alpha is a real number vector of K dimension, beta is a real number vector of | V | dimension, and the theme is assumed to be uniformly distributed in the defect report and the word on the theme, so the values of the elements in the parameters alpha and beta can be all 1.

The Gibbs sampling method is a random analog sampling algorithm, and provides a simpler approximate calculation method for parameter derivation of a high-dimensional probability model. The Gibbs sampling performs approximate sampling on the given high-dimensional joint probability distribution in a dimension rotation mode, namely randomly selecting any one dimension and then performing transition according to conditional probability until the probability distribution reaches a convergence state.

The process of training the LDA model comprises the steps of sampling words and related topics in the defect report by using a Gibbs sampling method, calculating and updating the topics of the words, and iterating the sampling process for multiple times until the distribution of the topics in the defect report reaches a final convergence state, wherein a topic probability distribution vector theta of a defect report b_bCalculations are performed based on the samples obtained from the final sampling. The method comprises the following specific steps: first, the index vectors of all defect reports are randomly initialized

Then based on the Gibbs sampling formula and the index vector z_bSequentially calculating the ith word in the defect report b

A probability associated to K topics, where B1., | B |, i 1., | n_bWhere B is a set of historical defect reports, n_bFor the length of the defect report b, the probability calculation formula is as follows:

in the above formula, the first and second carbon atoms are,

indicating that the word having the subscript i is removed,

expressed in the historical defect report set B

representing the number of words in the historical defect report set B that are associated with a topic K, where K is the total number of topics, | V | is the total number of different words in the historical defect report set B, and α_kAnd beta_jThe kth and j components of vectors α and β, respectively, j being

Subscript in the vocabulary V.

From K probability distributions calculated based on equation (1)Selecting one theme k from the themes according to the probability to update z_b[i]Wherein B1., | B |, i 1., | n_b. The process iterates several times until the index vector z_bReaching a convergent state, i.e. after a previous iteration update

After the iteration is updated

The element proportion of which the value changes is less than a threshold value sigma, and the proposed value sigma is 0.1%.

Index vector in defect report set B

After reaching the convergence state, calculating a topic probability distribution vector based on the final sample statistical data

The formula is as follows:

n_b[k]number of words associated to topic k for words in defect report b, n_bFor the length of the defect report b, K is the total number of subjects, α_kIs the kth component of the parameter vector a of the prior distribution.

The third step of the invention is to calculate the experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; based on the defect reporting data that the developer has assigned, a developer workload function is calculated. The number of defect reports repaired by the developer reflects the level of experience of the developer in repairing the defects, i.e., the more defect reports are processed, the more experience the developer has in repairing the defects, and the more confident the developer is in repairing a new defect. However, if the repair date of a defect report has elapsed a long time from the current date, developers generally forget to repair the defect report gradually as time elapses. Therefore, the influence of a time factor on the experience of the developer is firstly described by using a memory squeeze function, the function value is between 0 and 1, the longer the repair date of the defect report is away from the current time, the smaller the function value is, the smaller the contribution of repairing the defect report to the current experience level of the developer is, and the definition formula of the memory squeeze function is as follows:

wherein T is_bThe reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, and the unit of the time period is days. Lambda is a memory factor of the developer and characterizes the memory strength of the developer. The memory factor value of advanced developers who have long accumulated time in development work is high, because the advanced developers can reinforce past experience when repairing a certain defect, and novice developers accumulate a new experience, the memory factor value is relatively low. The values of λ are defined in the following table:

TABLE 1 developer memory factor lambda value

The definition of the experience distribution vector of the developer is based on the time weight accumulation of LDA topic probability distributions of all the repaired defect reports on K topics, so the formula of the experience distribution vector of the developer d is defined as follows:

wherein HB_dA set of historical bug reports, θ, representing that developer d has repaired_bIs the subject probability distribution vector for defect report b. Therefore, the developer experience distribution vector calculated by the above formula reflects the accumulated experience value of the developer including the time factor on each topic.

Regardless of the current workload of developers, some developers with higher experience may be assigned too many bug reports, while those with lower experience are idle, which may result in not only prolonging bug fix periods, but also even some developers may be relegated to bug reports because they are overwhelmed. Therefore, to avoid a few developers being assigned excessive defect reports, it is necessary to define their workload functions according to the defect report data that the developers have assigned. Let B_dRepresenting the defect report set distributed by the developer d, firstly, the distributed defect report quantity is normalized, | B_d′|_minAnd | B_d′|_maxThe minimum value and the maximum value of the distribution quantity of the defect reports of all developers are respectively expressed, and the formula is as follows:

also, the work efficiency of a high-level developer who accumulates a long time of working on development work is generally higher than that of a novice developer, and thus the work efficiency factor μ of the developer is defined to distinguish the work efficiency of developers of different experience levels. μ is defined in the following table:

development experience (Y)_exp) Year/year	Mu value
		Y_exp<1	0.8
1≤Y_exp<4	1
		4≤Y_exp<7	1.2
Y_exp≥7	1.5

TABLE 2 developer work efficiency factor μ values

Finally based on the number of defect reports | B that developer d has allocated_d| and the work efficiency factor μ define its workload function:

the fourth step of the invention is to give a defect report, and calculate the matching degree of the developer and the target defect report by combining the experience distribution and the workload of the developer. Firstly, based on the LDA model training process of the historical defect report in the second step, the index vector z of the target defect report tb which needs to be distributed currently is calculated by adopting the final sample data_tbAnd a topic probability distribution vector theta_tb。

Topic probability distribution vector θ_tbReflecting the distribution information of the target defect report tb on the K topics, and the experience distribution vector of the developer calculated in the step three reflects the experience values of the developer on the K topics, so the cosine similarity is used to measure the correlation between the target defect report tb and the developer d, and the calculation formula is as follows:

where exp (D) is the empirical distribution of developer D, D ∈ D, is the set of all developers, θ_tbTopic probability distribution vector, | θ, for target defect report tb_tbAnd | exp (d) | respectively represent the euclidean metrics of the two vectors, and the square root is taken for the square sum of the elements. To avoid that some developers with higher experience level are allocated excessive defect reports, the workload balancing problem among the developers needs to be considered, and therefore the formula for calculating the matching degree of the current defect report tb and the developer d is obtained as follows:

Match(tb,d)＝Wlod(d)×Cspd(tb,d) (9)

the fifth step of the invention is to sort the matching degree of the target defect reports of the developers in a descending order to complete the recommendation of the developers. And (4) calculating the matching degrees of all developers and the target defect report according to the formula (9), and sequencing the developers from large to small based on the matching degrees, wherein the developers in the front row are regarded as the developers which are preferentially recommended and allocated for the current defect report.

The specific application of the process of the invention is numerous and the above description is only a preferred embodiment of the invention. It should be noted that modifications can be made by those skilled in the art without departing from the principle of the present invention, and these modifications should also be construed as the scope of the present invention.

Claims

1. A software defect report repairer allocation method based on a topic model is characterized in that the method utilizes the topic model to mine implicit semantic information of a defect report, then measures the experience of developers based on repaired defect report data and repair time, and calculates the matching degree of the developers and a target defect report by considering the workload balance problem of the developers so as to recommend proper developers;

the work load function Wlod (d) has the following formula:

wherein

N(B_d) Carrying out normalization processing on the distributed defect report quantity to obtain data; mu is a work efficiency factor of a developer;

the method comprises the following five steps: 1) sorting the defect report and developer data of the software project; 2) training a theme probability distribution vector of a defect report by using a sampling method; 3) calculating an experience distribution vector of the developer by combining the defect report and the repair date repaired by the developer; calculating a developer workload function based on the defect report data distributed by the developer; 4) giving a defect report, and calculating the matching degree of the developer and a target defect report by combining the experience distribution and the workload of the developer; 5) and sorting the matching degrees of the developers in a descending order, and recommending the developers with high matching degrees.

2. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 1) is specifically as follows:

collecting historical defect reports for the software project from a defect tracking database, wherein the historical defect reports comprise text data describing the defects and developer data for processing the defect reports;

arranging the work information of the developers, wherein the work information comprises: each developer's repaired defect report and assigned defect report; and summarize all kinds of documents that the development personnel wrote in the development process of the software project.

3. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 2) is specifically:

firstly, randomly initializing index vectors of all defect reports, and then sequentially calculating the probability that the ith word is associated to K subjects in the defect reports based on a Gibbs sampling formula and the index vectors;

selecting a theme from K themes for updating according to probability, and iterating the process for a plurality of times until the index vector reaches a convergence state, namely, the ratio of elements with changed values in the index vector after the last iteration update and the index vector after the current iteration update is smaller than a set threshold value;

and finally, calculating a theme probability distribution vector based on the final sample statistical data after the index vectors in the defect report set reach a convergence state.

4. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 3) is specifically:

firstly, the influence of time factors on the experience of developers is described by using a memory squeezing function; secondly, defining an experience distribution vector of a developer based on time weight accumulation of LDA subject probability distributions of all repaired defect reports on K subjects respectively; defining a work load function according to the defect report data distributed by the developer, and finally defining the work load function based on the number of the defect reports distributed by the developer and the work efficiency;

the definition formula of the memory compression function is shown as follows:

wherein the content of the first and second substances,

wherein T is_bAnd the reciprocal of the time period between the repair date bt of the defect report b and the current date ct is shown, the unit of the time period is days, and the lambda is a memory factor of the developer, so that the memory strength of the developer is described.

5. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 4) is specifically: firstly, based on the LDA model training process of the historical defect report in the step 2), calculating an index vector and a theme probability distribution vector of a target defect report which needs to be distributed currently by adopting final sample data; and measuring the correlation between the target defect report and the developers by utilizing the cosine similarity, and finally, considering the problem of workload balance among the developers to obtain the relation between the matching degree of the current defect report and the developers.

6. The method for distributing software bug report repairmen based on subject model according to claim 1, wherein the step 5) is specifically: and 4) sequencing the developers from large to small according to the matching degree obtained in the step 4), wherein the developers in the front row are regarded as the developers which are preferentially recommended and allocated according to the current defect report.