CN112347373B - Role recommendation method based on open source software mail network - Google Patents

Role recommendation method based on open source software mail network Download PDF

Info

Publication number
CN112347373B
CN112347373B CN202011265544.2A CN202011265544A CN112347373B CN 112347373 B CN112347373 B CN 112347373B CN 202011265544 A CN202011265544 A CN 202011265544A CN 112347373 B CN112347373 B CN 112347373B
Authority
CN
China
Prior art keywords
network
node
edge
role
wandering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011265544.2A
Other languages
Chinese (zh)
Other versions
CN112347373A (en
Inventor
宣琦
谢昀苡
张剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011265544.2A priority Critical patent/CN112347373B/en
Publication of CN112347373A publication Critical patent/CN112347373A/en
Application granted granted Critical
Publication of CN112347373B publication Critical patent/CN112347373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a role recommendation method based on an open source software mail network, which comprises the following steps: s1: constructing an undirected authorized network according to mail data of an open source software project; s2: randomly deleting part of the continuous edges of the network constructed in the S1 to be used as test samples, using the residual continuous edges in the network after deleting the continuous edges as training samples and constructing a dynamic sequence slicing network; s3: generating the characteristics of each node by adopting a time sequence biased walking algorithm on a dynamic sequence slicing network, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes; s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples. The invention can effectively recommend the role in the open source software project, and compared with an algorithm which does not consider the time sequence information and the role information of the mail in the open source software project, the accuracy of the recommendation is obviously improved.

Description

Role recommendation method based on open source software mail network
Technical Field
The invention relates to the field of link prediction in a complex network, in particular to a role recommendation method based on an open source software mail network.
Background
The rapid development of open source software has become very prominent in the past few years. Attract a large number of users to join the open source software community. Active participation by developers and users is critical to the success of the open source software project. To promote the sustainable development of open source software projects, developers need to maintain project code. Also, it is vital to motivate, attract and retain users and developers. However, most of the previous research has focused on the maintenance of project code, and neglected the importance of users in the development of open source software projects. To preserve the quality of project code, there are many code repository-based methods for generating lists of developers recommending top-ranked developers to help perform code changes. It is not difficult to imagine that the recommended developers can maintain the stability of the project code. Developers contribute to the sustainable development of the project, but at the same time must also be concerned with users using the software. Because they provide feedback to developers, maintain the development of open source software projects, and they are also potential developers, meaning that they may contribute to open source software by submitting code on a day.
The participation of users and developers in open-source software projects requires overcoming a number of obstacles that hinder their further contribution to the open-source software project. Since mail is a public communication channel in the open source software community, users and developers often interact in this way in projects, i.e., people who lack understanding and guidance often post problems, request help or resolve confusion using existing information in the mail list. However, access is not easy due to the large amount of information. And the received responses provide no guidance or unprocessed responses may result in their failure to obtain useful assistance. The obstacles faced by users and developers will cause them to forgo further contributions to the open source software project. It is therefore possible to recommend some experienced people for the developers and users who are mainly helped to avoid this.
The recommendation method for the reviewers of the Pull Request in the open source software development disclosed in the Chinese patent application publication with the application number of CN202010338549.7 considers four factors of interest correlation, liveness, social relationship influence degree and file path correlation of the reviewers and the content of the Pull Request, and carries out personalized weighting on the four factors by a Bayesian personalized sorting method, so that the suitable code reviewers are recommended for the Pull Request, and the recommendation method is based on the manual feature extraction of the developers in the open source software. The patent application focuses more on mail information of the open source software project rather than a code repository, and the consideration range is wider, and not only the developers in the open source software are concerned, but also the users using the open source software are concerned. In addition, the method and the system model the mail data of the open source software project from the network level, and consider the embeddability of nodes in the network, so that more important interaction between users and developers in the open source software project can be found, and role recommendation can be performed on participants needing help in the open source software.
There is very little literature involved in role recommendation work specific to open source software. Canfora et al propose an unsupervised approach based on open source software by mining data from mailing lists and code repositories for open source software projects and making role recommendations. They focus on the code repository of the open source software project and calculate the score between the developer and the user so that the user and the developer can recommend appropriate personnel to help them. However, this is merely an empirical study and is not a universally applicable approach.
The current popular method is to model the data into the form of network, and convert the nodes in the network into low-dimensional vector representation (the vectors represent the characteristics of the network nodes) by the graph embedding method, and convert the role recommendation problem into the link prediction task in machine learning. The Node2vec method proposed by Grover is a very easy-to-apply walking method, combines depth-first walking and breadth-first walking, and represents nodes in a network by using low-dimensional vectors, so that the network structure characteristics of the nodes are extracted, and role recommendation can be performed more accurately.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a role recommendation method based on an open source software mail network, which can help project participants (users and developers) needing help by recommending the participants to the projects in an open source software project, thereby being beneficial to the sustainable development of the open source software project.
The invention researches the recommendation of developers and users to participants needing the help of an open source software community. These recommendations can provide some support to participants when they encounter difficulties, which is critical to the sustainable development of the open source software project. Further, the invention models the mail data in the open source software as a dynamic sequence slicing network, which is a new temporal network to capture the evolution of the interaction between the user and the developer. In addition, a time sequence biased walking algorithm based on interaction is also provided, the algorithm integrates the time information, the structure information and the identity information of participants of the open source software mail network, and effectively uses an embedded algorithm to represent developers and users for role recommendation.
In order to achieve the purpose, the invention provides the following scheme:
the invention provides a role recommendation method based on an open source software mail network, which is characterized by comprising the following steps of:
s1: constructing an undirected authorized network according to mail data of an open source software project;
s2: connecting edges of the randomly deleted part of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
Preferably, in the undirected authorized network constructed in the step S1:
the roles in the mail data represent nodes in the network, the mail interaction between the roles represents the connecting edges of two nodes in the network, and the number of the mail interaction represents the weight of the connecting edges in the network;
the undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a continuous edge set of the nodes, W is a weight matrix of the continuous edges, and W isijIs an element of the matrix W, the WijRepresenting the weights of node i and node j, i.e., the number of exchanges of mail between the two nodes.
Preferably, the specific steps of constructing the dynamic sequence slice network G' in step S2 are as follows:
the undirected weighted network G is divided according to given time intervals, and is divided into a plurality of subgraphs { G ] by taking one month as a time interval1,G2,G3,…Gi…, numbering, arranging each subgraph in ascending order of time number, and connecting the same nodes in adjacent subgraphs in order.
Further, each continuous edge in the dynamic sequence slicing network G' in S2 is represented by e ═ u (u, V, W, T), where u is a starting node of the continuous edge, i.e., src (e) ═ u, V is an ending node of the continuous edge, i.e., dst (e) ═ V, W is a weight of the continuous edge, i.e., W (e) ═ W, and T is a temporal reachability of the continuous edge, T (e) ═ T.
Preferably, the timing biased walking algorithm in S3 is a second-order neighbor sampling strategy for selecting a reachable edge to generate an edge sequence, where the strategy includes static edge weight information and a structure transition probability PSTiming transition probability PTAnd a role-based transition probability PRThe time sequence biased walking algorithm specifically comprises the following steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
and 5, obtaining the characteristics of the connected edges by averaging the characteristics of every two nodes.
Further, the reachable connection edge is defined as:
for subgraph GiNode u in (1), defines: η (u) ═ i, then the temporal reachability of the edges can be defined as: t (e) · η (v) - η (u) ∈ { -1,0,1}, where u is a start node of a connected edge, v is a termination node of the connected edge, and for the dynamic sequence slicing network G', a reachable connected edge set of the defined node v is Lt(v) Where "e | src (e) ≧ v, t (e) ≧ 0", that is, the start node of the connected edge is v and the time reachability of the connected edge is required to be 0 or more.
Further, the structure transition probability PSThe calculation method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edget(c) Dst (e) x, structure transition probability PSComprises the following steps:
Figure GDA0003556506390000061
PS(e)=ψS(e)·W(e)
wherein d istxE {0,1,2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
Further, the timing transition probability PTThe calculation method comprises the following steps:
Figure GDA0003556506390000062
Figure GDA0003556506390000063
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is a timing deviation parameter, and the parameter alpha is more than or equal to 0.1 and less than or equal to 0.9 determines whether the wandering stays in the current sub-graph: when alpha is smaller, the wandering time is more inclined to stay in the current sub-graph; when alpha is larger, the walking time is more prone to be transferred to the next subgraph, and e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'.
Further, the role-based transition probability PRDividing into unbiased transfer and biased transfer, if the current wandering stays at the node c, the last wandering node is t, and the random reachable connecting edge e belongs to Lt(c) Dst (e) ═ x, transition probability P based on characterRComprises the following steps:
the specific calculation method comprises the following steps:
1) no deflection shift:
Figure GDA0003556506390000071
no deflection shift means that each reachable edge has equal probability of being selected;
2) the deflection movement is as follows:
Figure GDA0003556506390000072
Figure GDA0003556506390000073
where ω (x) represents the true identity of node x, e.g. user or developer,. psiR(e) The role search deviation of a connecting edge e is included, beta is a role deviation parameter, a parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards the same type or different types of nodes is determined, and the parameter beta controls the communication tendency of the nodes: when beta is larger, the wandering direction is more inclined to wander towards the same type of node; the smaller beta, the more the direction of the wandering tends to wander towards the different classes of nodes, e denotes the side-to-side of the immediately following transition, psiT(e) Representing the role search bias of the connected edge e, e' belonging to the reachable connected edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
Further, the transition probability p (e) is calculated by:
transferring the above structure to probability PSTiming transition probability PTAnd a role-based transition probability PRThe final transition probabilities are obtained by respective normalization as follows:
P(e)=PS(e)PT(e)PR(e)
the invention has the advantages that: the time sequence information of the mail data in the open source software project is fully utilized, and the mail data is modeled into a dynamic sequence slicing network. The dynamic sequence slice network can reflect the evolution process of the network structure and is more suitable for dynamic data sets than a common static network. Secondly, on the basis of the dynamic sequence slicing network, a time sequence biased walking algorithm is provided, and the algorithm makes full use of the topological characteristics, the time sequence information and the identity information of project participants of the mail network. Compared with the prior art, the role recommendation method can effectively recommend roles in the open source software project, and compared with an algorithm which does not consider the time sequence information and the role information of the mails in the open source software project, the recommendation accuracy is obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic diagram of a dynamic sequence slicing network G' of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.
It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that each intervening value, between the upper and lower limit of that range, is also specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.
It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
The "parts" in the present invention are all parts by mass unless otherwise specified.
Example 1
The technical scheme provides the definition of a dynamic sequence slicing network, the definition of reachable edges and a time sequence biased walking algorithm specially for an open source software project, wherein the structure transfer probability is consistent with a Node2vec algorithm, and the main innovation point of the algorithm is the time sequence transfer probability and the transfer probability based on roles.
The invention provides a role recommendation method based on an open source software mail network, which comprises the following steps:
s1: constructing an undirected authorized network according to mail data of an open source software project;
s2: connecting edges of randomly deleted parts of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
Further, in step S1, the roles in the mail data represent nodes in the network, the mail interactions between the roles represent edges between two nodes in the network, and the number of mail interactions represents the weight of the edges in the network.
The undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a set of edges connecting the nodes, W is a weight matrix of the edges, and W isijIs an element of the matrix W, the WijRepresenting the weights of node i and node j, i.e., the number of mail exchanges between the two nodes.
In step S1, a certain proportion of continuous edges in the original network are concealed as test samples, continuous edges in the remaining network are used as training samples, and the continuous edges in the remaining network are constructed into a dynamic sequence slice network G'. Mail data in the open source software project is provided with time information, so that the undirected and authorized network G can be divided according to given time intervals, and the undirected and authorized network G is divided into a plurality of subgraphs { G ] by taking one month as a time interval1,G2,G3,…Gi…, numbering, arranging each sub-graph in ascending order of time number, and connecting the same nodes in adjacent sub-graphs in order. Fig. 1 is an example of a dynamic sequence slice network G'.
Further, in step S2, for each continuous edge in the dynamic sequence slicing network G', the value is denoted by e ═ u (u, V, W, T), where u is src (e) u, which is the start node of the continuous edge, V is dst (e) V, W is the weight of the continuous edge, which is W (e) W, and T is the time reachability T (e) T of the continuous edge.
Further, in the step S3, the timing biased walk algorithm is further designed based on the above definition. The time sequence biased walking algorithm is a second-order neighbor sampling strategy and is used for selecting the reachable continuous edge so as to generate a continuous edge sequence. The strategy comprises static continuous edge weight information and structure transition probability PSTiming transition probability PTAnd a role based transition probability PRThe time sequence biased walking algorithm specifically comprises the following steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
and 5, obtaining the characteristics of the connecting edge by averaging the characteristics of every two nodes.
Further, the reachable edge is defined as:
for subgraph GiNode u in (1), defines: η (u) ═ i. Then the temporal reachability of the connection edge can be defined as: t (e) · η (v) - η (u) ∈ { -1,0,1}, where u is the starting node of the connected edge and v is the terminating node of the connected edge. Further, for the dynamic sequence slicing network G', the reachable edge set of the node v may be defined as follows: l is a radical of an alcoholt(v) Where "e | src (e) ≧ v, t (e) ≧ 0", that is, the start node of the connected edge is v and the time reachability of the connected edge is required to be 0 or more.
Further, the structure transition probability PSThe calculating method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edge et(c) And dst (e) x. Probability of structure transfer PSThe probability is:
Figure GDA0003556506390000121
PS(e)=ψs(e)·W(e)
wherein d istxE {0,1,2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
Further, the timing transition probability PTThe calculation method comprises the following steps:
Figure GDA0003556506390000122
Figure GDA0003556506390000123
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is a timing deviation parameter, and alpha is more than or equal to 0.1 and less than or equal to 0.9, which determines the time search direction: whether residing on the current sub-graph or moving to the next sub-graph. If α is small, the walk is more inclined to stay in the current sub-graph, otherwise the walk is more inclined to the edge appearing in the future sub-graph, e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'. The timing transition probability is helpful for exploring the change of node interaction in different time periods in the network evolution process.
Further, a role-based transition probability PR: there can be a classification into unbiased transfer and biased transfer. There are two types of roles in open source software: users and developers. Unbiased transitions are employed when the true identity of the character is unknown, and biased transitions are employed if the true identity of the character is known. Experimental results with the offset shift tend to be better than the time results without the offset shift.
The unbiased transfer is:
Figure GDA0003556506390000131
no deflection shift means that every reachable edge has equal probability of being selected, Lt(c) Each edge e in (a) has the same probability of being sampled.
The deflection movement is as follows:
Figure GDA0003556506390000132
Figure GDA0003556506390000133
Lt(c) each edge e in (a) needs to consider information about dst (e) ═ x in the connected edge e, that is, the real identity of the node x, where ω (x) represents the real identity of the node x (e.g., a user or a developer). PsiR(e) The method is characterized in that the character search deviation of a continuous edge e is included, beta is a character deviation parameter, a parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards nodes of the same type or different types or not is determined, the parameter beta controls the communication tendency of the nodes, if the beta is larger, the wandering is more likely to traverse the nodes of the same type as the initial node, otherwise the wandering encourages the exploration of the nodes of different types, e represents a continuous edge just transferred next time, and e' belongs to a reachable continuous edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
Further, the transition probabilities are finally normalized respectively, and the final transition probabilities are obtained as follows:
P(e)=PS(e)PT(e)PR(e)
further, in step S4, the logistic regression classifier is used to learn the data in the training samples, and then the test data is predicted. Fig. 2 gives a general flow chart.
The invention uses mail data in an open source software Project to recommend roles, and a table 1 is main data information of the open source software Project, including projects, Users, Developers, Email exchanges, timespan (month) and other projects, and collects the information to perform a test.
TABLE 1
Figure GDA0003556506390000141
The method is characterized in that four algorithms including Line, Deepwalk, Node2vec, time sequence biased walk and the like are used for carrying out experiments, AUC is used as various algorithm recommendation results of evaluation indexes, the algorithm with a better recommendation effect has a larger AUC value, and the AUC value of the algorithm is optimal as seen in the table 2.
TABLE 2
Figure GDA0003556506390000142
Figure GDA0003556506390000151
The above-described embodiments are only intended to illustrate the preferred embodiments of the present invention, and not to limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims (7)

1. A role recommendation method based on an open source software mail network is characterized in that: the method comprises the following steps:
s1: constructing an undirected authorized network according to mail data of an open source software project; in the undirected entitled network:
the roles in the mail data represent nodes in the network, the mail interaction between the roles represents the connecting edges of two nodes in the network, and the number of the mail interaction represents the weight of the connecting edges in the network;
the undirected weighted network is represented by G (V, E, W), wherein V represents n nodes in the network, E represents a continuous edge set of the nodes, W is a weight matrix of the continuous edges, and W isijIs an element of the matrix W, the WijRepresenting the weight of the node i and the node j, namely the exchange quantity of the mails between the two nodes;
s2: connecting edges of randomly deleted parts of the network constructed in the S1 are used as test samples, the remaining connecting edges in the network after the connecting edges are deleted are used as training samples, and a dynamic sequence slicing network G' is constructed; the method comprises the following specific steps:
the undirected entitled network G is divided according to given time intervals, and the undirected entitled network G is divided into a plurality of directed entitled networks by taking one month as a time intervalSubfigure { G1,G2,G3,…Gi…, numbering, arranging each subgraph in ascending order according to time numbering, and connecting the same nodes in the adjacent subgraphs in sequence;
s3: generating the characteristics of each node on the dynamic sequence slicing network G' by adopting a time sequence biased walking algorithm, and then obtaining the characteristics of a connecting edge by averaging the characteristics of every two nodes; the time sequence biased walking algorithm is a second-order neighbor sampling strategy and is used for selecting a reachable continuous edge to generate a continuous edge sequence, and the strategy comprises static continuous edge weight information and structure transfer probability PSTiming transition probability PTAnd a role-based transition probability PRThe time sequence biased walking algorithm specifically comprises the following steps:
step 1, setting the maximum wandering times and the wandering length;
step 2, randomly selecting any node in the dynamic sequence slice network G' as an initial node;
step 3, carrying out wandering according to the calculated transition probability P (e), thereby obtaining a series of wandering sequences;
step 4, applying a Skip-Gram model in natural language processing to the walking sequence to obtain node characteristics;
step 5, averaging the characteristics of every two nodes to obtain the characteristics of a connecting edge;
s4: and learning the training samples by adopting a logistic regression classifier, and predicting the test samples.
2. The role recommendation method based on the open source software mail network according to claim 1, characterized in that: each continuous edge in the dynamic sequence slicing network G' is denoted by e ═ u, v, w, t, where u is a starting node of the continuous edge, namely src (e) ═ u, v is an ending node of the continuous edge, namely dst (e) ═ v, w is a weight of the continuous edge, namely w (e) ═ w, and t denotes a temporal reachability of the continuous edge, t (e) ═ t.
3. The role recommendation method based on the open source software mail network according to claim 2, wherein the reachable edges are defined as follows:
for subgraph GiNode u in (1), defines: η (u) ═ i, then the temporal reachability of the edges can be defined as: t (e) ═ η (v) - η (u) ∈ { -1,0,1}, where u is the start node of the connected edge, v is the end node of the connected edge, and for the dynamic sequence slicing network G', the reachable connected edge set of the node v is defined as Lt(v) Where "e | src (e) ≧ v, t (e) ≧ 0", that is, the start node of the connected edge is v and the time reachability of the connected edge is required to be 0 or more.
4. The role recommendation method based on the open source software mail network according to claim 3, characterized in that: the structure transition probability PSThe calculation method comprises the following steps:
if the current wandering stays at the node c, the last wandering node is t, and e belongs to L for any reachable connecting edget(c) Dst (e) ═ x, structure transition probability PSComprises the following steps:
Figure FDA0003565234550000031
PS(e)=ψS(e)·W(e)
wherein d istxE {0,1,2} represents the shortest distance, ψ, between node t and node xS(e) The method comprises the steps of searching for the structure deviation of a connecting edge e, returning a parameter r and an access parameter q, wherein the parameter q and the parameter r jointly determine the searching direction of the connecting edge and also control the speed of exploration and departure from the neighborhood of an initial vertex during walking, and W (e) is the weight of the connecting edge e.
5. The role recommendation method based on the open source software mail network according to claim 4, characterized in that: the timing transition probability PTThe calculating method comprises the following steps:
Figure FDA0003565234550000032
Figure FDA0003565234550000033
wherein psiT(e) Is the timing search deviation of the connecting edge e, alpha is a timing deviation parameter, and the parameter alpha is more than or equal to 0.1 and less than or equal to 0.9 determines whether the wandering stays in the current sub-graph: when alpha is smaller, the wandering time is more inclined to stay in the current sub-graph; when alpha is larger, the wandering time is more prone to be transferred to the next subgraph, and e' belongs to the reachable edge set L of the node vt(c) One side of, psiT(e ') represents a timing search bias of the continuous edge e'.
6. The method as claimed in claim 5, wherein the transition probability P is based on the roleRDividing into unbiased transfer and biased transfer, if the current wandering stays at the node c, the last wandering node is t, and the random reachable connecting edge e belongs to Lt(c) Dst (e) ═ x, transition probability P based on characterRComprises the following steps:
the specific calculation method comprises the following steps:
1) no deflection shift:
Figure FDA0003565234550000041
no deflection shift means that each reachable edge has equal probability of being selected;
2) the deflection movement is as follows:
Figure FDA0003565234550000042
Figure FDA0003565234550000043
where ω (x) represents the true identity of node x, user or developer, ψR(e) Is thatThe role search deviation of the connecting edge e, beta is a role deviation parameter, the parameter beta is more than or equal to 0.1 and less than or equal to 0.9, whether the wandering tends to be towards the same type or different types of nodes or not is determined, and the parameter beta controls the communication tendency of the nodes: when beta is larger, the wandering direction is more inclined to wander towards the same type of node; when beta is smaller, the wandering direction is more inclined to wander towards nodes of different types, e represents the continuous edge of the next transition, psiT(e) Representing the role search bias of the connected edge e, e' belonging to the reachable connected edge set L of the node vt(c) One side of, psiT(e ') represents a role search bias of the connected edge e'.
7. The role recommendation method based on the open source software mail network as claimed in claim 6, wherein the transition probability P (e) is calculated by:
the above structure is transferred to probability PSTiming transition probability PTAnd a role-based transition probability PRRespectively normalizing to obtain the final transition probability as follows:
P(e)=PS(e)PT(e)PR(e)。
CN202011265544.2A 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network Active CN112347373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011265544.2A CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011265544.2A CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Publications (2)

Publication Number Publication Date
CN112347373A CN112347373A (en) 2021-02-09
CN112347373B true CN112347373B (en) 2022-06-17

Family

ID=74363592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011265544.2A Active CN112347373B (en) 2020-11-13 2020-11-13 Role recommendation method based on open source software mail network

Country Status (1)

Country Link
CN (1) CN112347373B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644268A (en) * 2017-09-11 2018-01-30 浙江工业大学 A kind of open source software project hatching trend prediction method based on multiple features

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7778945B2 (en) * 2007-06-26 2010-08-17 Microsoft Corporation Training random walks over absorbing graphs
US8433670B2 (en) * 2011-03-03 2013-04-30 Xerox Corporation System and method for recommending items in multi-relational environments
CN106529562A (en) * 2016-09-09 2017-03-22 浙江工业大学 OSS (Open Source software) project developer prediction method based on Email networks
CN107391542B (en) * 2017-05-16 2021-01-01 浙江工业大学 Open source software community expert recommendation method based on file knowledge graph
CN111431755B (en) * 2020-04-21 2023-02-03 太原理工大学 Multi-layer time sequence network model construction and key node identification method based on complex network
CN111523037B (en) * 2020-04-26 2023-08-04 上海理工大学 Reviewer recommendation method of Pull Request in open source software development

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644268A (en) * 2017-09-11 2018-01-30 浙江工业大学 A kind of open source software project hatching trend prediction method based on multiple features

Also Published As

Publication number Publication date
CN112347373A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
Wei et al. Reinforcement learning to rank with Markov decision process
Chen et al. Curriculum meta-learning for next POI recommendation
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN105809554B (en) Prediction method for user participating in hot topics in social network
US11205128B2 (en) Inferred profiles on online social networking systems using network graphs
Xiong et al. Where to go: An effective point-of-interest recommendation framework for heterogeneous social networks
CN111191081B (en) Developer recommendation method and device based on heterogeneous information network
CN116738066B (en) Rural travel service recommendation method and device, electronic equipment and storage medium
CN115270007B (en) POI recommendation method and system based on mixed graph neural network
CN111143539A (en) Knowledge graph-based question-answering method in teaching field
He et al. Next point-of-interest recommendation via a category-aware Listwise Bayesian Personalized Ranking
CN115577185A (en) Muting course recommendation method and device based on mixed reasoning and mesopic group decision
Zhao et al. GT-SEER: geo-temporal sequential embedding rank for point-of-interest recommendation
CN110781256B (en) Method and device for determining POI matched with Wi-Fi based on sending position data
CN110008411B (en) Deep learning interest point recommendation method based on user sign-in sparse matrix
Volkova et al. Online bayesian models for personal analytics in social media
CN110188958A (en) A kind of method that college entrance will intelligently makes a report on prediction recommendation
Wang et al. Robust distillation for worst-class performance: on the interplay between teacher and student objectives
CN112347373B (en) Role recommendation method based on open source software mail network
CN117271899A (en) Interest point recommendation method based on space-time perception
CN115934899A (en) IT industry resume recommendation method and device, electronic equipment and storage medium
CN115827968A (en) Individualized knowledge tracking method based on knowledge graph recommendation
Li et al. Multi-modal representation learning for successive poi recommendation
Tabourier et al. RankMerging: Learning to rank in large-scale social networks
CN108959467A (en) A kind of calculation method of question sentence and the Answer Sentence degree of correlation based on intensified learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant