CN108647863B - Project popularity analysis method based on mixed effect linear regression model - Google Patents

Project popularity analysis method based on mixed effect linear regression model Download PDF

Info

Publication number
CN108647863B
CN108647863B CN201810377403.6A CN201810377403A CN108647863B CN 108647863 B CN108647863 B CN 108647863B CN 201810377403 A CN201810377403 A CN 201810377403A CN 108647863 B CN108647863 B CN 108647863B
Authority
CN
China
Prior art keywords
issue
feature
project
bug
popularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810377403.6A
Other languages
Chinese (zh)
Other versions
CN108647863A (en
Inventor
常俊胜
胡东阳
王涛
余跃
王怀民
尹刚
李耀宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Publication of CN108647863A publication Critical patent/CN108647863A/en
Application granted granted Critical
Publication of CN108647863B publication Critical patent/CN108647863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Stored Programmes (AREA)

Abstract

Aiming at the problems that the existing research separately explores a defect report and a feature report respectively and has one-sidedness in the evaluation of the popularity of a project, the invention provides a method for analyzing the popularity of the project based on a mixed effect linear regression model, which provides the influence relationship between the number of the defect reports and the number of the feature reports in the project on the popularity of the project by collecting project data from a GitHub and then using statistical analysis and regression modeling, and analyzes the relationship between the improvement of the popularity of the project and the popularity of the defect reports and the feature reports by the difference of the influence factors of the defect reports and the feature reports on the popularity of the project in the project; further, through four-dimensional analysis on the description diversity of the defect report and the feature report, the difference of the defect report and the feature report in the description diversity is found out. According to the method, the popularity of the project is comprehensively researched by analyzing the difference comparison between the number of the defect reports and the number of the feature reports in the project, so that the popularity of the project can be comprehensively evaluated.

Description

Project popularity analysis method based on mixed effect linear regression model
Technical Field
The invention belongs to the field of computer open source software analysis, and particularly relates to an analysis method for influence of a defect report (bug) and a feature report (feature issue) on project popularity in a project development process.
Background
Software development is a complex process involving many steps and associated developers. Code defects (bugs) usually occur in the software development process or new functions (features) are proposed, so a bug report (bug issue) and a feature report (feature issue) are two very important factors in the software project development process.
The number of the bug issue and the feature issue in projects with different targets and requirements can be different, and the difference of the number of the bug issue and the feature issue can have certain influence on the project development, such as the popularity of the project. The existing research mainly explores bug issue and feature issue respectively, and the popularity of the project is judged on one side.
GitHub is an open source code hosting website that has hundreds of millions of users. It allows developers to create and manage projects. GitHub contains billions of software project information. GitHub provides features such as flowers (flowers), seeds (feeds), network graphs (network graphs), and reports (issues), which may help developers to better manage the code repository. There are a wide variety of labels available to GitHub's issue, nor does GitHub provide specific resolution and differentiation of defect labels and feature labels. By checking the tag type of the issue, the bug issue and feature issue can be automatically identified. The GitHub project can be retrieved through REST APIs, and various types of projects are provided for various application fields (such as game software, web application programs, operating systems and the like). The programming language in the project is also a style change, and the number of developers is more or less. These project features make GitHub a very attractive open source platform to collect data for empirical studies. In previous work, data on the GitHub was used to study programming languages, the popularity of projects on a large scale, and to study software testing.
The mixed effect linear regression model is different from the common linear model in that the mixed effect linear regression model has a random effect besides a fixed effect, and the regression effect has comprehensiveness and noise resistance. The mixed-effect linear regression model is sometimes also referred to as a multi-level linear model or a hierarchical linear model.
The mixed effect linear regression model formula is:
Y=Xβ+ZU+
wherein, Y is a dependent variable vector, X is an independent variable matrix, beta is a fixed effect parameter vector corresponding to X, and Z is a random effect variable matrix, and the structure of the matrix is the same as that of X. U is a random effect parameter vector corresponding to Z and is a noise vector.
At present, no method for analyzing the popularity of the project by utilizing a mixed effect linear regression model exists.
Disclosure of Invention
Aiming at the problems that most of the existing researches are to independently explore the bug issue and the feature issue and few of the existing researches comprehensively research the influence on the popularity of the project from the difference comparison of the numbers of the bug issue and the feature issue in the project, so that the popularity of the project cannot be comprehensively evaluated, the invention provides a project popularity analysis method based on a mixed effect linear regression model. Further, by analyzing the descriptive diversity of the bug issue and the feature issue in four dimensions, the difference between the bug issue and the feature issue in the descriptive diversity is found out.
The technical scheme is as follows:
the project popularity analysis method based on the mixed effect linear regression model comprises the following steps:
step one, collecting project data from a GitHub to establish a data set; the specific process is as follows:
1.2 randomly selecting F items from the GitHub, wherein F is a natural number, and setting the value of F according to the requirement of result accuracy;
1.2 selecting data in the project by: selecting all the esses in the F items, recording the quantity of the esses as S, wherein S is a natural number, and then counting the statistical data of the S esses;
secondly, constructing a mixed effect linear regression model, wherein the construction method comprises the following steps:
2.1 dependent and independent variables defining the mixed-effect linear regression model:
nStars the total number of Stars (praise) for an item, an indication of the popularity of the item;
avg. timeLatency _ bug: the average resolving time of the bug issue in the project represents the speed of the bug issue, and the unit is minutes.
AVG. timeLatency _ feature: the average resolution time of feature issue in the project represents the speed of feature issue resolution in minutes.
Details _ bug: average number of comments of bug issue in the project.
Details _ feature: average number of reviews of feature issue in a project.
nIssueBef the number of asses generated by project 3 months before the start of this issue, representing the workload of the project.
nMembers the total number of project members, representing the team size of the project.
HasAssssignee: binary, and if the issue has at least one submitter (issue), the value is 1.
textLen-Total number of words in the issue text, representing the complexity of the issue.
issueType ═ bug: the issue type is bug issue.
issueType ═ feature: the issue type is feature issue.
Wherein nsars is a dependent variable, and avg.
2.2 obtaining dependent variable and independent variable data of the mixed effect linear regression model defined in the step 2.1 by using an Application Programming Interface (API) provided by the GitHub official;
2.3, constructing a mixed effect linear regression model by using the dependent variable and independent variable data of the mixed effect linear regression model obtained in the step 2.2 through an lmer package in the R language to obtain a model, wherein the processing form of the model obtained through the lmer package is as follows:
model<-lmer(scale(log(nStars+0.5))~
scale(log(AVG.timeLatency_bug.+0.5))
+scale(log(AVG.timeLatency_feature.+0.5))
+scale(log(AVG.comments_bug.+0.5))
+scale(log(AVG.comments_feature.+0.5))
+scale(log(nIssueBef+0.5))
+scale(log(nMembers+0.5))
+scale(log(hasAssignee+0.5))
+scale(log(textLen+0.5))
+factor(issueType=bug)
+factor(issueType=feature),data=data,REML=FALSE)
the lmer package in the R language is common general knowledge in the art.
And thirdly, carrying out variance analysis on the model to obtain a multiple regression analysis result, including the standard error and the difference sum of squares of the estimated value, and further calculating the variance contribution rate of the bug issue quantity and the feature issue quantity in the project, namely the project popularity influence degree. The variance contribution rate calculation method of the bug issue quantity is as follows: the sum of the squared differences at "issueType ═ bug" divided by the sum of the squared differences for all independent variables;
the variance contribution rate calculation method of feature issue quantity is as follows: the sum of the squared differences at "issueType ═ feature" is divided by the sum of the squared differences for all arguments.
If the variance contribution rate of the number of bug issue is greater than that of the number of feature issue, the influence degree of the number of bug issue in the project on the popularity of the project is larger; otherwise, the influence degree of the feature issue quantity on the item popularity is more.
As a further improvement of the technical scheme of the invention, project developers continue to analyze the description diversity of the bug issue and the feature issue and find out the difference between the bug issue and the feature issue in the description diversity. The process is as follows:
4.1 randomly extracts M bug issues and N feature issues from S issues, M, N all natural numbers, and the sum of M, N does not exceed S.
4.2 project developers read the web page content of each issue and mark keywords and sentences of the web page content.
4.3 comparing the difference of the descriptive diversity of bug issue and feature issue from the four attributes of code segment (code), link (https), @, picture (picture). The method comprises the following steps:
4.3.1 logging in the webpage interface of issue, counting the information of four attributes of the sample issue: if the issue interface contains a code segment, the code segment tag is marked as 1, otherwise, the code segment tag is marked as 0. If the issue interface contains an https link, the https link label is marked as 1, otherwise it is marked as 0. If the issue interface contains @, the @ tag is noted as 1, otherwise noted as 0. If the issue interface contains picture content, the picture (picture) tag is marked as 1, otherwise, the picture (picture) tag is marked as 0.
4.3.2 calculating the proportion of the four attributes marked to be equal to 1 in M bug esses and N feature esses respectively;
4.4 if the proportion of the four attributes marked with 1 in the bug issue is more than that in the feature issue, the description diversity of the bug issue is higher than that of the feature issue, and the influence degree of the bug issue on the item popularity is larger; otherwise, if the proportion of the four attributes marked with 1 in the feature issue is more than that in the bug issue, the description diversity of the feature issue is higher than that of the bug issue, and the bug issue has a larger influence degree on the item popularity; otherwise, the bug issue and feature issue have a similar impact on item popularity.
As a further improvement of the technical solution of the present invention, in step 1.1, in order to ensure the universality of the experimental data, the following restrictions are set for the selection of the items: the selected items include at least 10 or more bug issues and 10 or more feature issues.
As a further improvement of the technical solution of the present invention, the statistical data of the issue in step 1.2 includes:
(1) key indexes of the project to which each issue belongs include project language, number of project branches (forks), number of project praise (Stars), number of project members (members), and number of issues generated by the project three months before the start of the current issue;
(2) key indicators for each issue, including the length of the issue's title (title) and body, the number of comments (comments), the creation time, the closing time, whether this issue is assigned.
(3) For each issue, submitting developer information for that issue, including marking whether the developer submitted the issue before submitting the issue, and if so, marking as 1; otherwise, it is marked 0.
As a further improvement of the technical solution of the present invention, in step 1.2, in order to ensure the reliability of the analysis method result, the following limitations are set for data selection in issue: for the processing time of issue, only the time difference from the creation time of issue to the first closing time of issue is counted.
Compared with the prior art, the invention has the beneficial effects that:
●, analyzing the influence of the bug issue and the feature issue on the popularity of the project through a mixed effect linear regression model, and comprehensively researching the influence on the popularity of the project through the difference comparison of the quantity of the bug issue and the feature issue in the project, thereby comprehensively evaluating the popularity of the project.
●, through analyzing the description diversity of the bug issue and the feature issue in four dimensions, the difference of the bug issue and the feature issue in description diversity is found out, and furthermore, suggestions can be provided for project developers, and the popularity of the project is improved by increasing the description diversity of the bug issue or the feature issue.
Drawings
FIG. 1 is a general flow diagram of the present invention.
Detailed Description
Step one, collecting project data from a GitHub to establish a data set; the specific process is as follows:
1.1 selecting items
The data of the research are derived from a GitHub project, 272 Github projects are randomly selected, in order to ensure the reliability of the experimental result, the projects selected by the inventor at least comprise more than 10 bug esses and more than 10 featureeisue, and the universality of the experimental data is ensured. Table 1 lists example tags for the bug issue and feature issue, and issues with these tags will be considered as either the bug issue or the feature issue.
TABLE 1 bug issue and feature issue tags
bug issue Bug; defect; type is bug; a Browser Bug; bugfix, etc
feature issue feature; request; propofol; featreq; feautre et al
1.2 selecting data
To ensure the reliability of the experimental results, for the processing time of issue, we only consider the time difference between the creation time of issue and the first closing time of issue. Based on the above principle, we selected 287,703 esses from 272 selected items. We count some key indexes of the items to which the isuse belongs, including the item language, the number of items forks, the number of items Stars and the number of items chambers. We also counted some key indicators of 287,703 issues, including the length of the title and body for each issue, the number of comments, the creation time, the closing time, whether this issue is allocated. We also make statistics of the relevant information of the developer who submitted this issue, including marking whether this developer submitted the issue before submitting this issue, and if so, marking as 1, otherwise marking as 0. Table 2 shows the information about the statistics of 287,703 isues.
TABLE 2287,703 summary statistics
Statistical information Mean value of Standard deviation of Minimum value Median value Maximum value
Number of items members 13.4 30.3 0.0 4.0 175.0
Number of items for 1,609 3,709.0 0.0 463.0 49,657.0
Number of items Stars 5,839.0 10,446.4 0.0 1,251.0 69,834.0
Total number of issue 1,062.0 1,038.3 1.0 764.0 7,910.0
Total number of bug issue 962.8 986.2 1.0 704.0 7,910.0
Total number of feature esses 239.2 309.5 1.0 160.0 2,139.0
Secondly, constructing a mixed effect linear regression model, wherein the construction method comprises the following steps:
2.1 define the dependent and independent variables of the regression model, our independent variables come from three different levels, project level, devipper level and issue level:
nStars, the total number of endorsements for a project, an indication of the popularity of the project;
avg. timeLatency _ bug: the average resolving time of the bug issue in the project represents the speed of the bug issue, and the unit is minutes.
AVG. timeLatency _ feature: the average resolution time of feature issue in the project represents the speed of feature issue resolution in minutes.
Details _ bug: average number of comments of bug issue in the project.
Details _ feature: average number of reviews of feature issue in a project.
nIssueBef the number of asses generated by project 3 months before the start of this issue, representing the workload of the project.
nMembers the total number of project members, representing the team size of the project.
HasAsssignee binary, with a value of 1 if the issue has at least one submitter.
textLen-Total number of words in the issue text, representing the complexity of the issue.
issueType ═ bug: issue is of type bug.
issueType ═ feature: the issue is feature type.
Wherein nsars is a dependent variable, and avg.
2.2 because GitHub provides an official API, experimenters can conveniently obtain historical behavior data of project development on GitHub. Therefore, the variable data of the experiment are all dependent variable and independent variable data of the mixed effect linear regression model defined in the step 2.1 obtained through an application programming interface API provided by the gitchub authority.
2.3, constructing a mixed effect linear regression model by using the dependent variable and independent variable data of the mixed effect linear regression model obtained in the step 2.2 through an lmer package in the R language to obtain a model, wherein the processing form of the model obtained through the lmer package is as follows:
model<-lmer(scale(log(nStars+0.5))~
scale(log(AVG.timeLatency_bug.+0.5))
+scale(log(AVG.timeLatency_feature.+0.5))
+scale(log(AVG.comments_bug.+0.5))
+scale(log(AVG.comments_feature.+0.5))
+scale(log(nIssueBef+0.5))
+scale(log(nMembers+0.5))
+scale(log(hasAssignee+0.5))
+scale(log(textLen+0.5))
+factor(issueType=bug)
+factor(issueType=feature),data=data,REML=FALSE)
where nsarss are dependent variables, issueType-bug and issueType-feature are the main analytical independent variables, and the other variables are all random effect variables.
And thirdly, carrying out variance analysis on the model to obtain a multiple regression analysis result, including the standard error of the estimated value and the sum of squares of the difference. Table 3 shows the results of the multiple regression analysis:
TABLE 3 multiple regression analysis results
Standard error of estimated value Sum of squares of differences
log(AVG.timeLatency_bug.+0.5) -0.2173(0.0020) 4,827.8
log(AVG.timeLatency_feature.+0.5) -0.1838(0.0019) 3,443.4
log(AVG.comments_bug.+0.5) 0.1747(0.0025) 3,112.2
log(AVG.comments_feature.+0.5) 0.1025(0.0037) 2,413.4
log(nIssueBef+0.5) 0.0403(0.0075) 23.6
log(nMembers+0.5) 0.0633(0.0052) 121.2
log(hasAssignee+0.5) 0.0690(0.0023) 765.6
log(textLen+0.5) 0.0149(0.0019) 51.8
issueType=bug 0.4292(0.0063) 12,784.5
issueType=feature 0.1107(0.0024) 2,023.8
Statistical analysis shows that nStars is positively correlated with both bug type and feature, and the higher the number of bug issues and feature issues of an item, the higher the nStars, and the higher the popularity of the item. Wherein, the issueType variance contribution rate is 12,784.5/(4,827.8+3,443.4+ … +12784.5+2,023.8) ═ 43.2%, and the issueType variance contribution rate is 2,023.8/(4,827.8+3,443.4+ … +12784.5+2,023.8) ═ 6.8%. 43.2% is much higher than 6.8%, so the number of bug issues in the project has a greater impact on the popularity of the project than the number of feature issues in the project.
And (4) conclusion: from the research on the influence of bug issue and feature issue based on the mixed effect linear regression model on the popularity of the project, when the number of the bug issue and the feature issue of the project is more, the higher the Stars (nStars) of the project, the higher the popularity of the project. And the influence degree of the number of bug issues in the project on the popularity of the project is larger than the influence degree of the number of feature issues in the project on the popularity of the project.
And fourthly, continuing analyzing the description diversity of the bug issue and the feature issue by the project developers based on the research conclusion of the third step mixed effect linear regression model, and finding out the difference of the bug issue and the feature issue on the description diversity. The process is as follows:
4.1 first we randomly drawn 10000 bug esses and feature esses (5000 for each class of data) from the dataset.
4.2 project developers read the web page content of each issue and mark keywords and sentences of the web page content.
4.3 comparing the difference of the descriptive diversity of bug issue and feature issue from the four attributes of code segment (code), link (https), @, picture (picture). The method comprises the following steps:
4.3.1 logging in to the webpage interface of issue counts the related information of the sample issue. For example, if the issue interface contains a code section, the code section tag is marked as 1, otherwise it is marked as 0. If the issue interface contains an https link, the https link label is marked as 1, otherwise it is marked as 0. If the issue interface contains @, the @ tag is noted as 1, otherwise noted as 0. If the issue interface contains picture content, the picture (picture) tag is marked as 1, otherwise, the picture (picture) tag is marked as 0.
4.3.2 calculate the ratio of the four attributes with a label equal to 1 in 5000 bug issue and 5000 feature issue samples, respectively; the statistical results are shown in table 4.
Table 4 describes the diversity
Ratio in bug issue Ratio in feature issue
Code segment (code) 34.0% 13%
Link (https) 52.0% 24.0%
@ 16.0% 6.0%
Picture (picture) 14.0% 2.0%
4.4 analyzing the differences in the four dimensions of bug issue and feature issue description diversity by Table 4, it is shown that bug issue is more descriptive than feature issue. The conclusion is that one of the reasons why the bug issue has a greater influence on the item popularity than the feature issue is that the bug issue has more descriptive diversity than the feature issue. Therefore, the invention provides a regression model-based project popularity analysis method, which suggests project developers to increase the description diversity of featureissue and improve the popularity of projects.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (5)

1. The project popularity analysis method based on the mixed effect linear regression model is characterized by comprising the following steps of:
step one, collecting project data from a GitHub to establish a data set; the specific process is as follows:
1.1 randomly selecting F items from the GitHub, wherein F is a natural number, and setting the value of F according to the requirement of result accuracy;
1.2 selecting data in the project by: selecting all the esses in the F items, recording the quantity of the esses as S, wherein S is a natural number, and then counting the statistical data of the S esses;
secondly, constructing a mixed effect linear regression model, wherein the construction method comprises the following steps:
2.1 dependent and independent variables defining the mixed-effect linear regression model:
nStars the total number of praise for an item;
avg. timeLatency _ bug: in the project, bug issue is the average solving time of the defect report, and the unit is minute;
AVG. timeLatency _ feature: feature issue in the project is the average resolution time of the feature report, in minutes;
details _ bug: average number of comments of bug issue in the project;
details _ feature: average number of reviews of feature issue in the project;
nIssueBef the number of issue produced in the 3 month project before this issue begins;
nMembers the total number of project members;
HasAsssignee, binary, if the issue has at least one submitter, the value is 1;
textLen total number of words in the issue text;
issueType ═ bug: the issue type is bug issue;
issueType ═ feature: the issue type is feature issue;
2.2 acquiring dependent variable and independent variable data of the mixed effect linear regression model defined in the step 2.1 by using an application programming interface API (application programming interface) provided by the GitHub official;
2.3, constructing a mixed effect linear regression model by using an lmer package in the R language for the dependent variable and independent variable data of the mixed effect linear regression model obtained in the step 2.2 to obtain a model;
thirdly, carrying out variance analysis on the model to obtain a multiple regression analysis result, and calculating variance contribution rates of the number of bugissues and the number of feature esses in the project, namely the influence degree of the popularity of the project; if the variance contribution rate of the quantity of the bug issue is greater than the variance contribution rate of the quantity of the feature issue, the influence degree of the quantity of the bug issue in the project on the popularity of the project is larger; otherwise, the influence degree of the feature issue quantity on the item popularity is more.
2. The method for item popularity analysis based on a mixed-effect linear regression model according to claim 1, wherein the selection of items in step 1.1 is limited as follows: the selected items include at least 10 or more bugesses and 10 or more feature issues.
3. The method for item popularity analysis based on a mixed-effect linear regression model as claimed in claim 1, wherein the statistical data of issue in step 1.2 includes:
(1) key indexes of projects to which each issue belongs;
(2) key indicators for each issue;
(3) for each issue, developer information for that issue is submitted.
4. The method for analyzing popularity of projects based on the mixed-effect linear regression model as claimed in claim 1, wherein in the step 1.2, the following limits are set for the data selection in issue: for the processing time of issue, only the time difference from the creation time of issue to the first closing time of issue is counted.
5. The method for analyzing the popularity of the project based on the mixed-effect linear regression model as claimed in any one of claims 1 to 4, wherein the project developer continues to analyze the descriptive diversity of the bug issue and the feature issue to find out the difference between the descriptive diversity of the bug issue and the feature issue; the process is as follows:
4.1 randomly extracting M bug issues and N feature issues from S issues, wherein M, N are natural numbers, and the sum of M, N is not more than S;
4.2 the project developer reads the webpage content of each issue and marks the keywords and sentences of the webpage content;
4.3 comparing differences of the descriptive diversity of the bug issue and the feature issue from four attributes of code segment, link, @andpicture; the method comprises the following steps:
4.3.1 logging in the webpage interface of issue, counting the information of four attributes of the sample issue: if the issue interface contains a code segment, recording a code segment label as 1, otherwise, recording as 0; if the issue interface contains https links, the https link label is marked as 1, otherwise, the https link label is marked as 0; if the issue interface contains @, then the @ label is marked as 1, otherwise, the @ label is marked as 0; if the issue interface contains picture content, the picture label is marked as 1, otherwise, the picture label is marked as 0;
4.3.2 calculating the proportion of the four attributes marked to be equal to 1 in M bug esses and N feature esses respectively;
4.4 if the proportion of the four attributes marked with 1 in the bug issue is more than that in the feature issue, the description diversity of the bug issue is higher than that of the feature issue, and the influence degree of the bug issue on the item popularity is larger; otherwise, if the proportion of the four attributes marked with 1 in the feature issue is more than that in the bug issue, the description diversity of the feature issue is higher than that of the bug issue, and the influence degree of the bug issue on the item popularity is larger; otherwise, the bug issue and feature issue have a similar impact on item popularity.
CN201810377403.6A 2018-04-23 2018-04-25 Project popularity analysis method based on mixed effect linear regression model Active CN108647863B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018103690882 2018-04-23
CN201810369088 2018-04-23

Publications (2)

Publication Number Publication Date
CN108647863A CN108647863A (en) 2018-10-12
CN108647863B true CN108647863B (en) 2020-10-27

Family

ID=63747595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810377403.6A Active CN108647863B (en) 2018-04-23 2018-04-25 Project popularity analysis method based on mixed effect linear regression model

Country Status (1)

Country Link
CN (1) CN108647863B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096633A (en) * 2010-12-10 2011-06-15 东华大学 Application field oriented software quality standard evaluating method
CN106528679A (en) * 2016-10-24 2017-03-22 天津大学 Time series analysis method based on multilinear autoregression model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7092895B2 (en) * 2001-01-12 2006-08-15 Perot Systems Corporation Method and system for assessing stability of a project application by performing regression analysis of requirements document metrics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096633A (en) * 2010-12-10 2011-06-15 东华大学 Application field oriented software quality standard evaluating method
CN106528679A (en) * 2016-10-24 2017-03-22 天津大学 Time series analysis method based on multilinear autoregression model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Github Peojects.Quality Analysis of Open-Source Software;Oskar Jarczyk;《Social Informatics》;20141231;全文 *
面向开源社区的群体化协同开发机理实证研究;余跃;《中国博士学位论文全文数据库信息科技辑》;20171215(第12期);第26-66页 *

Also Published As

Publication number Publication date
CN108647863A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
Bagherzadeh et al. Going big: a large-scale study on what big data developers ask
Brito et al. Migrating to GraphQL: A practical assessment
Luo et al. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks
US9892026B2 (en) Data records selection
Muse et al. On the prevalence, impact, and evolution of sql code smells in data-intensive systems
US20150186808A1 (en) Contextual data analysis using domain information
US9213543B2 (en) Software internationalization estimation model
Zhang et al. How effectively can spreadsheet anomalies be detected: An empirical study
Liu et al. An exploratory study on the introduction and removal of different types of technical debt in deep learning frameworks
Imran et al. Complex process modeling in Process mining: A systematic review
Najadat et al. Predicting software projects cost estimation based on mining historical data
Chen et al. A systemic framework for crowdsourced test report quality assessment
CN108647863B (en) Project popularity analysis method based on mixed effect linear regression model
CN116844724A (en) APP health degree assessment method based on factor analysis technology
Gendron Introduction to R for Business Intelligence
Eessaar On query-based search of possible design flaws of SQL databases
US8452636B1 (en) Systems and methods for market performance analysis
Rath et al. Selecting open source projects for traceability case studies
Palepu et al. Meta data quality control architecture in data warehousing
Yu et al. Generating test case for algebraic specification based on Tabu search and genetic algorithm
Eessaar Automating detection of occurrences of PostgreSQL database design problems
Khomh Patterns and quality of object-oriented software systems
García et al. QlikView: Advanced Data Visualization: Discover deeper insights with Qlikview by building your own rich analytical applications from scratch
Mahmud Towards a data generation tool for NoSQL data stores
Song et al. The Utilization Ratio and Interoperability of Corporate‐Level XBRL Classification Standard Elements in China

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant