CN108876031B - Software developer contribution value prediction method - Google Patents
Software developer contribution value prediction method Download PDFInfo
- Publication number
- CN108876031B CN108876031B CN201810598339.4A CN201810598339A CN108876031B CN 108876031 B CN108876031 B CN 108876031B CN 201810598339 A CN201810598339 A CN 201810598339A CN 108876031 B CN108876031 B CN 108876031B
- Authority
- CN
- China
- Prior art keywords
- developer
- text
- emotion
- social
- developers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000008451 emotion Effects 0.000 claims abstract description 21
- 230000006399 behavior Effects 0.000 claims abstract description 16
- 238000005065 mining Methods 0.000 claims abstract 6
- 238000006243 chemical reaction Methods 0.000 claims description 20
- 230000002996 emotional effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 4
- 238000013016 damping Methods 0.000 claims description 2
- 230000007935 neutral effect Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract 3
- 238000011161 development Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- WZCQRUWWHSTZEM-UHFFFAOYSA-N 1,3-phenylenediamine Chemical compound NC1=CC=CC(N)=C1 WZCQRUWWHSTZEM-UHFFFAOYSA-N 0.000 description 2
- VOZKAJLKRJDJLL-UHFFFAOYSA-N 2,4-diaminotoluene Chemical compound CC1=CC=C(N)C=C1N VOZKAJLKRJDJLL-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013277 forecasting method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/101—Collaborative creation, e.g. joint development of products or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A software developer contribution value prediction method comprises the steps of constructing a directed graph network G (N, E) formed by social relations among developers, wherein the node N is a developer in an open source community, the node E is a social relation among the developers, the step one comprises two steps, step 1.1, mining and constructing a developer emotion network from a problem tracking system text to obtain a binary group (emotion, entity) formed by emotion and entity, and then forming a triple (expressive person, emotion and expressive object) formed by expressive person, emotion and expressive object; step 1.2, mining and constructing a developer social network from the response behaviors of the developer of the problem tracking system to form a triplet (an expressor, a response and a text presenter) formed by an expressor, a response and a text presenter, wherein the triplet in the step 1.1 and the step 1.2 forms a directed graph network G (N, E); and step two, calculating the potential contribution value of the developer by simplifying the social network of the developer.
Description
Technical Field
The invention relates to a contribution value prediction method, in particular to a software developer contribution value prediction method.
Background
The open source community serves as a hosting and collaborative development platform for open source code, so that developers from all over the world can contribute code to the same project at the same time. Among them, GitHub is the most popular open source community today, and the number of developers has reached 2400 ten thousand. Due to the team collaborative nature of the open source community, it is crucial for a single project to discover and attract the participation and contribution of new developers. Thus, researchers have investigated automated potential developer forecasting methods that can forecast and recommend developers who have not contributed to the project but have a high chance to contribute. In the prior art, a potential contributor prediction algorithm mainly comprises two methods, namely a developer potential contribution state modeling method and a developer technical interest relationship modeling method.
The potential contribution state modeling method of the developer specifies some indexes, and carries out social relation and technical feature modeling on the developer, wherein the indexes comprise 7 indexes such as project age, whether the developer is a new developer or not, and the number of information sent and received by the developer. The method comprises the steps of analyzing developers appearing in a mail list (mailing list) in the first 3 months in a project and modeling for each developer, wherein for each developer, the indexes are used as characteristics of a logistic regression (logistic regression) classification method, whether the developers contribute to two classes to be classified after 3 months or not is judged, and 2/3 data are used as a training set to calculate a logistic regression classification model.
The method for modeling the technical interest relationship of the developers mainly comprises two parts, namely, an improved collaborative filtering method WCF (weighted collaborative filtering) is provided through a commit network among the developers, and a project similarity algorithm based on a recommendation algorithm is used for solving the cold start problem. The WCF algorithm establishes the relationship between potential developers and existing developers from other projects, and obtains the similarity of each potential developer and each existing developer based on a collaborative filtering algorithm and sequences the potential developers to obtain a potential developer sequence. And for projects lacking existing developers, the second method utilizes the IKAnalyzer to extract technical nouns for each project, and finds developers conforming to the technical nouns through the TF-IDF method to carry out sequencing to obtain potential developer sequences.
However, the developer potential contribution state modeling method can only analyze the categories of developers that appear in the first 3 months of the project, i.e., whether they can become contributors, but cannot derive the ranked sequence of the likelihood of each developer becoming a contributor. The developer technical interest relation modeling method only considers the contribution of a developer to the technology, namely commit of the developer in a project, and the social relation and social contribution of the developer are used as one of core measurement modes of the developer in an open source community, and are not included in the method when the potential contribution analysis and prediction of the developer are carried out.
Meanwhile, the social relationship of developers in the open source community can obviously influence the development efficiency and team cooperation among the developers, and the influence of the social relationship of the developers on the development efficiency and team cooperation is not considered in the two methods, so that the accuracy and reliability of the analysis of the social relationship of the developers and the prediction of contributors in the prior art are insufficient.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for predicting the contribution value of a software developer. The method predicts potential contributors in an open source community based on developer social relationship analysis results. The method aims to predict potential contributors from the perspective of social relations of developers in open-source projects by constructing a developer social relation expression network. Compared with the prior art, the method and the system can consider the influence of the social relationship factors of the developers on the development efficiency and team cooperation, deeply excavate the social relationship and contribution among the developers, reasonably associate the social relationship and contribution with the technical contribution of the developers, and predict and sequence potential developers in the open-source project.
Drawings
FIG. 1 is a flow chart illustrating developer contribution prediction in a social network according to the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
FIG. 1 is a flow chart illustrating the developer contribution value prediction in a social network according to the present invention, which defines the developer social network as a directed graph network G (N, E) formed by social relationships between developers. Where node N refers to a developer in the open source community and edge E refers to a social relationship between developers.
The method includes the steps that social relations among developers are mined from a problem Tracking System (Issue Tracking System) of the GitHub, and the social relations include two developer social relations including active socialization, passive socialization and the like mined from a problem and comment text; and the social relationships mined from the Reaction behaviors (reactions) of the developers comprise six kinds of social relationships of approval, disapproval, cheerful, laugh, love and confusion.
The method comprises the steps that sentiment analysis is carried out on all questions and question comment texts in a project question tracking system by using a software field social analysis tool at an entity level to obtain a (sentiment, entity) binary group, wherein the sentiment comprises three types of positive, negative and neutral, the entity comprises people and other entities, all the classification results of the binary group are (positive, people) and (negative, people) from sentiment analysis results, and a sentiment expressor and a concrete object of sentiment expression are found, wherein the sentiment expressor is a developer for providing the texts; if the text contains an object of an emotional expressor @, extracting the object as a specific object of emotional expression, otherwise, finding a question text of a non-emotional expressor in the conversation process and extracting a developer proposing the text as the specific object of emotional expression; then for each text with classification results of (positive, human) and (negative, human), a triple (emotion enunciator, emotion enunciator object) can be obtained; all the triples are obtained, and then the social network of the developer mined from the text of the problem tracking system can be constructed.
In one embodiment of the invention, social relationships between developers are mined from the problem Tracking System (Issue Tracking System) of GitHub.
In the GitHub problem tracking system, a developer can perform reaction behaviors on each problem text or problem comment text, wherein the reaction behaviors comprise six parts of approval, disapproval, cheering, laugh, love, confusion and the like. For each reaction behavior, an expressor of the reaction behavior and a presenter of the text corresponding to the reaction behavior are obtained, and then (expressor, reaction, text presenter) triplets can be formed. All the triplets are obtained, and then the developer social network mined from the problem tracking system developer reaction behaviors can be constructed.
In a developer social network, more developer nodes that are actively receiving social feedback reflect the reputation of the developer in the open source project, and since reputation is a social attribute, social feedback of one developer affects other developers through social interaction. This means that if one developer C receives two positive social feedbacks from developer a and developer B, respectively, developer a with more positive feedbacks can have more positive impact on C, i.e. means that developer C is more likely to contribute.
Based on the above thought, the invention calculates the potential contribution value for each developer by simplifying the social network of the developer and by the weighted PageRank algorithm. And (4) carrying out reverse ordering on the potential contribution values to obtain a prediction list of the potential developers.
The formula that simplifies the developer's social network is:
weight=α×(C(+1)+C(hooray)+C(heart)+C(laugh)-C(-1)-C(confused))+β(C(Pos)-C(Neg))
wherein, the function c (Emotion) refers to the algebraic sum of the Emotion social relations of the two developers, α and β are parameters respectively, and the values of the parameters can be 0.1 and 1 respectively.
The potential contribution value is calculated by the formula:
wherein u, v, w are developer nodes in the developer social network, wherein ss (u) is a potential contribution value of the node developer u; df is the damping coefficient, which can take the value of 0.85; b isuSet refers to the set of all nodes pointing to node u; nv set refers to the set of all nodes to which node v points. SS (u) is the new calculated potential contribution value. This process is iterated until the change in potential contribution values ss (u)' and ss (u) is less than a certain threshold, i.e., ss (u) converges. Namely, the final potential contribution value of each developer, and the potential developer prediction list can be obtained by screening out the developers which have contributed.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (3)
1. A software developer contribution value prediction method is characterized in that a directed graph network G (N, E) formed by social relations among developers is constructed, wherein N is a set of developers in an open source community, E is a set of social relations among the developers, the first step comprises two steps, 1.1, a developer emotion network is mined and constructed from a problem tracking system text, a binary group (emotion, entity) formed by emotion and an entity is obtained, and then a triple (expressive person, emotion and expressive object) formed by an expressive person, an emotion and an expressive object is formed; step 1.2, mining and constructing a developer social network from the problem tracking system developer reaction behaviors to form triples (an expressor, a reaction and a text presenter) consisting of an expressor, a reaction and a text presenter, wherein the triples in the step 1.1 and the step 1.2 form a directed graph network G (N, E); step two, calculating potential contribution values of the developers by simplifying social networks of the developers; the method for calculating the potential contribution value of the developer by simplifying the social network of the developer comprises the following steps:
weight(u,v)=α×(C(+1)+C(hooray)+C(heart)+C(laugh)-C(-1)-C(confused))+β(C(Pos)-C(Neg))
where u, v are developer nodes in a developer's social network, the C function is the algebraic sum of the social relationships between the two developers u and v, α and β are parameters respectively,
The potential contribution value is calculated by the formula:
wherein w is a developer node in the developer social network, ss (u) is a potential contribution value of the node developer u; df is the damping coefficient; b isuThe set is a set of all nodes pointing to node u; n is a radical ofvThe set is the set of all nodes pointed to by the node v, ss (u) ' is the new calculated potential contribution value, and the process is iterated until the difference between the potential contribution values ss (u) ' and ss (u) is less than a certain threshold, i.e. ss (u) converges, and the final potential contribution value ss (u) ' of each developer can be calculated.
2. The method of claim 1, wherein in step 1.1, the method of mining the emotional relationship of the developer from the text of the question tracking system is to use the entity level social analysis tool to perform emotional analysis on all the questions and the question comment text in the project question tracking system to obtain the (emotion, entity) duplet, wherein the emotion comprises three types of positive, negative and neutral, the entity comprises two types of people and other entities, find out the duplet with all the classification results of (positive, people) or (negative, people) from the results of the emotional analysis, find out the emotional expressor and the specific object of the emotional expression, and the emotional expressor is the developer who provided the text; if the text contains an object of an emotion expressor identifier, extracting the object as a specific object of emotion expression, otherwise, finding a question text of a previous non-emotion expressor in the conversation process and extracting a developer of the text as the specific object of emotion expression; for each text with classification result (positive, human) or (negative, human), a triple (emotion expressor ) can be obtained; all the triplets are obtained, and then the developer social network mined from the text of the problem tracking system can be constructed.
3. The method of claim 1, wherein in step 1.2, the method of mining and constructing the developer social network from the problem tracking system developer reaction behaviors is that the problem tracking system mining and developing developer can perform reaction behaviors for each question text or question comment text, wherein the reaction behaviors include approval, disapproval, cheering, laugh, love and confusion, for each reaction behavior, the expressor of the reaction behavior is obtained, and the presenter of the text corresponding to the reaction behavior can form (expressor, reaction, text presenter) triples; all the triplets are obtained, and then the developer social network mined from the problem tracking system developer reaction behaviors can be constructed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810598339.4A CN108876031B (en) | 2018-06-12 | 2018-06-12 | Software developer contribution value prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810598339.4A CN108876031B (en) | 2018-06-12 | 2018-06-12 | Software developer contribution value prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108876031A CN108876031A (en) | 2018-11-23 |
CN108876031B true CN108876031B (en) | 2022-06-28 |
Family
ID=64337927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810598339.4A Active CN108876031B (en) | 2018-06-12 | 2018-06-12 | Software developer contribution value prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108876031B (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254250A (en) * | 2011-07-13 | 2011-11-23 | 武汉大学 | Method for measuring contribution degree of developer during development of open source software |
US9645817B1 (en) * | 2016-09-27 | 2017-05-09 | Semmle Limited | Contextual developer ranking |
-
2018
- 2018-06-12 CN CN201810598339.4A patent/CN108876031B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108876031A (en) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
US11080468B2 (en) | Activity modeling in email or other forms of communication | |
CN110971659A (en) | Recommendation message pushing method and device and storage medium | |
US20190220518A1 (en) | Probabilistic modeling system and method | |
US20180018634A1 (en) | Systems and methods for assessing an individual in a computing environment | |
Liu et al. | Data correction and evolution analysis of the ProgrammableWeb service ecosystem | |
Caschera et al. | MONDE: a method for predicting social network dynamics and evolution | |
CN108876031B (en) | Software developer contribution value prediction method | |
US20200342351A1 (en) | Machine learning techniques to distinguish between different types of uses of an online service | |
CN116307078A (en) | Account label prediction method and device, storage medium and electronic equipment | |
CN116401372A (en) | Knowledge graph representation learning method and device, electronic equipment and readable storage medium | |
Shi et al. | Practical POMDP-based test mechanism for quality assurance in volunteer crowdsourcing | |
Murray | Markov reward models for analyzing group interaction | |
Hu et al. | Expert recommendation via semantic social networks | |
Moayedikia | Studying crowdsourcing using machine learning and optimisation-based approaches | |
CN117874337A (en) | Recommendation interaction simulation system and method under online content platform scene | |
Kilanioti et al. | A novel framework for AI-based dynamic teaming up of students in the context of online collaborative learning activities | |
Dixit | Development Complexity of Chatbot Artefacts: A Perspective of Developer Communities | |
CN117933237A (en) | Conference analysis method, conference analysis device and storage medium | |
CN116955623A (en) | Related problem recommendation method, device and storage medium | |
SHARMA | Social software development: Insights and solutions | |
Tian et al. | Analyzing social influence through network simulations in choice modeling | |
CN111191882A (en) | Method and device for identifying influential developers in heterogeneous information network | |
Lin et al. | A Transfer-Learning Approach to Exploit Noisy Information for Classification and Its Application on Sentiment Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |