CN107527295B - Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof - Google Patents
Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof Download PDFInfo
- Publication number
- CN107527295B CN107527295B CN201710737012.6A CN201710737012A CN107527295B CN 107527295 B CN107527295 B CN 107527295B CN 201710737012 A CN201710737012 A CN 201710737012A CN 107527295 B CN107527295 B CN 107527295B
- Authority
- CN
- China
- Prior art keywords
- community
- node
- nodes
- edge
- author
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002123 temporal effect Effects 0.000 title claims abstract description 13
- 238000013441 quality evaluation Methods 0.000 title abstract description 9
- 238000005192 partition Methods 0.000 claims description 19
- 230000008033 biological extinction Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008602 contraction Effects 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 239000011230 binding agent Substances 0.000 claims description 2
- 238000005728 strengthening Methods 0.000 claims description 2
- 238000003501 co-culture Methods 0.000 claims 2
- 238000001303 quality assessment method Methods 0.000 claims 2
- 230000008859 change Effects 0.000 abstract description 14
- 230000008569 process Effects 0.000 abstract description 6
- 238000012795 verification Methods 0.000 abstract description 4
- 238000011160 research Methods 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 230000008520 organization Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000011273 social behavior Effects 0.000 description 2
- 230000009625 temporal interaction Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 241001474977 Palla Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an academic team dynamic community discovery algorithm based on a temporal co-occurrence network, which is characterized in that a temporal co-occurrence network model is established, the evolution process of an academic team in the academic network is analyzed and tracked on the basis of detecting the importance of an author node and the strength change of a relation edge in the co-occurrence network in real time, and the community is established, expanded, contracted, split and eliminated, so that the purpose of dynamic community discovery is achieved, and the algorithm accuracy is high. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.
Description
Technical Field
The invention relates to an academic team dynamic community discovery method based on a temporal co-occurrence network.
Background
The national innovation system needs a group of high-level scientific research groups with independent innovation capability as important components. How to effectively observe and analyze the evolution of research groups and comprehensively and objectively evaluate the overall performance of academic teams is a new topic faced in the selection and evaluation of scientific and technological innovation groups. The scientific research team is a whole formed by long-term cooperation and nature, the complex evolution process of the scientific research team is hidden in historical records of various academic activities such as articles published by scientific researchers in cooperation, cooperative research projects and the like, wherein the historical tracks of the articles published by scientific researchers in cooperation are completely recorded by the public data resources of scientific literature, so that the academic network is constructed by utilizing the large-scale scientific literature database, and the evolution process of the academic team is mined by using a dynamic community discovery theory and method on the basis of the historical records, which is an important research topic in the current complex network analysis field.
The community structure in the academic collaboration network is a whole formed by long-term academic collaboration of community members, the community structure generally has stable core members, and the important factor for driving community evolution is the transition of the core members. The early dynamic community discovery algorithm mostly applies the algorithm idea of static community discovery to a dynamic network, and adopts a two-step strategy, namely, firstly, the static community detection algorithm is used for carrying out community division on network snapshots at different moments, and then the community division at two adjacent moments is matched. In 2007, Palla et al [1] utilizes CPM algorithm [2] to detect node overlapping degree of a community interval to construct an evolution relation between communities at different times. Hopcroft et al 2004[3], Greene et al 2010[4] utilized the overlap similarity to analyze the community evolution. Wang et al.2008[5] uses part of important nodes to track communities. However, static community partitioning based on network snapshots does not accurately describe the evolving characteristics of the community. Aiming at the problem, Yang et al 2005[6], Miller et al 2010[7], Caravelli et al 2013[8] and the like adopt an incremental dynamic community discovery method, wherein the method firstly obtains initial community division by using a static algorithm, and then guides the community division at the current moment to change on the basis of the original community according to the change of a network topological structure; the method is convenient for tracking the evolution of the community, but how to define the network change at each moment is a key problem to be solved by the method. In 2010, Cazabet et al [9] proposes to consider the change of edges between nodes in a network as the change of interaction between the nodes along with time, and to implement dynamic community detection according to the inherent attributes of the community and the interaction history of the nodes. In 2016, Rossetti et al [10] proposes a TILES (temporal Interactions a Local Edge Strategy) algorithm, in which it is considered that not only are relational edges added to a network over time, but also existing edges may lose the influence of the two factors on community evolution, so that a dynamic community discovery algorithm is more effective, but the influence of the importance of nodes and the interaction compactness on community structures is not considered. However, as the status of the team members and the relationship between them changes with time, the importance of the member nodes in the academic network changes, so that the academic team organization structure evolves, and the evolution leads to the team structure evolution guided by the team core members; therefore, in academic team community discovery, the degree of closeness of changes and relationships of personas is an important factor affecting the evolution of academic teams.
In addition, a large number of literature records are continuously added into the literature database over time, and the academic relationship of the scholars changes, so that the topological structure of the academic network of the scholars changes, the importance of the nodes in the network also changes, and the change of the academic community is influenced.
Therefore, there is a need to design a new dynamic community discovery method for academic teams aiming at the technical problems in the prior art.
Disclosure of Invention
The invention provides an Academic Team Dynamic community Discovery Algorithm (ATDD) based on a temporal collaborative network, which aims at overcoming the defects of the prior art, analyzes and tracks the evolution process of an Academic Team in an Academic network by detecting the change of the importance, the strength (weight) and the continuity of an author node along with time, and establishes, expands, contracts, splits and dies the community so as to achieve the purpose of Dynamic community Discovery. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.
The academic team members play roles in organization structure which can be divided into a team leader, an important member and a general member, and the academic network corresponds to the core node, the important node and the subordinate node respectively to form an academic team community. Over time, the academic relations among the team members change along with the changes of the academic cooperative behaviors, and the academic relations are increased, decreased, created or disappeared, so that the importance of the member nodes in the academic network changes, the roles played by the member nodes also change, and the academic team organization structure evolves, and the evolution is often the dominant role of the behaviors of the academic team core members, and the team structure evolves guided by the team core members, and the evolution is reflected in the cooperative behaviors of researchers who continuously cooperate to publish academic articles. In the public literature resource database, the literature information data records will be continuously accumulated in the database in the form of time series streams, thus forming streaming data of author-binding relationship pairs with time attributes. The academic relationship network of researchers is driven by the data flow of the association relationship, edges and nodes are continuously added into the network, and important nodes and edges are highlighted. Therefore, the invention evaluates the core of the nodes through the temporal measurement of the importance and the relation weight of the character nodes, and finds the team by adopting an iterative mode taking the core nodes as the center.
Fig. 1 is a schematic diagram of academic team evolution. When new data comes, the nodes and edges in the network are updated, and a new core node may be generated, which may cause a new community to be formed, as shown in fig. 1(a), when the node 4 and the edges (1, 4) are added, the node 1 becomes the core node, and a new community is formed. As the relationship pair joins, the connection between the node outside the community and the community becomes very tight, as shown in FIG. 1(c), the node 5 and the edge (1, 5) are added, the connection between the node 5 and the community becomes very tight, and we join the community, which increases the size of the community. As shown in fig. 1(b), some of the partnerships disappear over time, resulting in the removal of nodes from the community, and the community dies if it no longer contains core nodes. Such removal of stale edges may make some nodes less closely associated with the community and may require removal from the community, as shown in fig. 1(d), which may reduce the size of the community. This add operation adds nodes that are very tightly connected to the community in an iterative manner, so the derived subgraph of the community is a connected graph, as shown in fig. 1(e), and edges are added between two communities, resulting in that most nodes of one community are added to the other community, and then the two communities are merged into one community. Removing edges within a community may destroy such connectivity, and if removing edges results in a derived subgraph of the community containing connected components with core nodes, the community is split into multiple communities, as shown in fig. 1(f), and the edges within the community disappear and split into two communities.
Based on the above principle, the technical scheme provided by the invention is as follows:
a dynamic community discovery method of an academic team based on a temporal union network is characterized in that dynamic community division is realized by establishing a temporal union network model and using community creation, expansion, contraction, division and extinction strategies on the basis of detecting node importance and relationship strength change in a union network in real time; the method specifically comprises the following steps:
step 1, defining a binding network G at time tt=(Vt,Et) WhereinFor the set of author nodes in the corporate network at time t,representing an author node viThe importance of the time at the instant t,over time, the importance of each article published by an author is added to 1, and the more articles published by the author, the higher the importance;is the set of edges in the converged network at time t; author node viAnd vjAfter a publication, there is an edge e between themi,jConnecting;representing author node viAnd upsilonjEdge e betweeni,jThe weight at the time instant t,the larger the weight is, the more viAnd upsilonjThe tighter the relationship between, viAnd upsilonjEach article of manufacture is published in a paper,the value of (2) is updated once, and the weight of the edge is calculated as shown in formula (1):
wherein n represents upsiloniAnd upsilonjNumber of authors in a co-written article; formula (1) represents upsiloniAnd upsilonjEach article published a chapter, corresponding edge ei,jThe increment of the weight of (1) is the reciprocal of the number of edges among all the authors of the article, which indicates that the weight increment is smaller as the number of the collaborators is larger;
is an edge ei,jThe remaining life at the time t,υiand upsilonjEvery article published, the value is assignedFrom upsiloniAnd upsilonjIf upsilon on the date of the last written publicationiAnd upsilonjHas exceeded T1If the time is not a publicationIf uiAnd upsilonjHas exceeded T2If the time is not a publicationIf atFront viAnd vjAgain, theReset if the article is a well-documented articleWherein T is1And T2Taking a value according to experience as a time threshold;
step 2, initializing the binding network G0=(V0,E0),V0=Φ,E0=Φ,C0=Φ;
Step 3, inputting the literature information record R ═ R1,r2,...,rt,.., wherein rtIs a publication record containing information such as title, author, keyword and publication timetIn R, the words are sorted according to publication time, and t is RtThe serial number in R corresponds to a moment;
step 4, publishing records R according to each article in R one by onetUpdate binding network GtDynamic community division is carried out to obtain Ct={c0,c1,...,ck,...}:
Step 4.1, making t equal to 1;
step 4.2, take out a publication record R from Rt(ii) a Will r istAll the authors in (a) form a pair of inter-author co-authoring relationship nodes and add to the co-authoring network, then update the importance of each node and the weight of each edge in the co-authoring network, and according to rtRemaining life of the update edge:
if rtIncluding the author node viThen update the author node viThe importance at time t isIf rtIncluding the author node viAnd upsilonjAnd author node viAnd vjThere is no edge connection between them, then at the author node viAnd vjBetween them is added a side ei,jThen update the edge ei,jThe weight at time t isn represents rtThe number of author nodes contained in; and update the edge ei,jThe remaining life at time t isIf rtIncluding the author node viAnd vjAnd author node viAnd upsilonjHas an edge e in betweeni,jIf connected, the edge e is updated directly according to the methodi,jWeight and remaining life at time t; if rtDoes not contain the author node v at the same timeiAnd upsilonjAnd author node viAnd vjHas an edge e in betweeni,jIf so, the distance v at the time t is judged according to the literature information recordiAnd vjThe length of time of last article publication is less than T1Then, thenIf greater than or equal to T1And is less than T2Then, thenIf greater than or equal to T2Then, thenWherein T is1And T2Taking a value according to experience as a time threshold;
when the document records are gradually added into the document library along with the time sequence, the importance of the author nodes in the collaborative network and the closeness of the cooperative relationship among the authors are changed, so that the topological structure of the academic team community is also changed, and the change can cause the evolution of the community structure, including creation, expansion, division and extinction.
Step 4.3, according to rtThe publication time of the article in (1) removes the expired author cooperative relationship in the current collaborative network, inspects the influence on the community caused by the removed edge according to the community division standard of an academic team, and executes corresponding community contraction, division and eliminationDeath;
4.4, investigating the influence on the community due to the increase of edges according to the community division standard of an academic team, and executing corresponding community creation and expansion;
step 4.5, obtaining academic team community division in the co-binder network at the time t, and representing the academic team community division as Ct={c0,c1,...,ck,...};
Step 4.6, making t equal to t +1, repeating steps 4.2-4.5 until the last published article record R is taken out from RtExecuting the operation until a final academic team community division result is obtained;
in said steps 4.3 and 4.4, a binding network G is definedtThe important node, the strong edge, the subordinate node and the core node in the system are as follows:
1) important nodes: when the importance of a node is not less than the average importance of all non-isolated nodes in the network, the node is an important node; wherein a non-orphaned node is a node that has an edge connection with other nodes;
2) strengthening the edge: when the weight of an edge connecting two important nodes is not less than the average weight of all edges in the network, the edge is a strong edge;
3) slave and core nodes: if two nodes v are connected by a strong edgeiAnd vjThe importance of (A) satisfies:
and the weights of the edges of the two nodes satisfy:
v is thenjIs viThe slave node of (1);
wherein M is the minimum number of people forming an academic team, and is taken as 4 according to experience; x represents and vjAll neighbor nodes v connected by strong edgesx,Is a number vjIs the sum of the weights of all strong edges of the end points;
equations (2) and (3) indicate the dependent node vjIs much less important than node viAnd v is of importance, and viAnd vjThe weight of the edge between is greater than vjIs the median of the sum of the weights of all strong edges of the end points;
when a node has M-1 slave nodes, it is a core node;
the academic team community division standard is as follows:
forming an academic team community by a group of nodes which are communicated through strong edges, wherein the academic team community must contain core nodes and all non-core nodes in the community must meet a formula (4), namely the closeness of the connection with the nodes in the community is greater than that of the connection with the nodes outside the community;
wherein v isiAre nodes within the community of the network,representing intra-community sum viIs the sum of the weights of all strong edges of the end points,representing the entire content network by viIs the sum of the weights of all strong edges of the end points.
Further, the step 4.3 specifically includes the following steps:
for each edge in the merged network at time t, ifRemoving the edge from the network; corresponding community collection is carried out under three conditions after edges are removedShrinkage, splitting and extinction treatment:
(a) shrinking: author node vuAnd vvBelong to community ckAnd c is andkis a connected graph and there are core nodes, at this point, if vuAnd vvIf equation (4) is not satisfied, they are removed from the community; after any node is removed from the community, the connection tightness between the neighbor node in the community and the community is also changed, so that whether the neighbor node of the removed node meets the formula (4) or not needs to be iteratively judged, and if not, the neighbor node is continuously removed until the number of the nodes in the community (the size of the community) is less than M or no node needs to be removed;
(b) splitting if the author node vuAnd vvBelong to community ckAnd c is andkif the derived subgraph contains a plurality of connected subgraphs, some connected subgraphs contain core nodes, and some connected subgraphs do not contain core nodes, the connected subgraphs containing the core nodes form new communities (each connected subgraph containing the core nodes is split into one new community), and all the nodes in the connected subgraph not containing the core nodes are separated from CkRemoving;
(c) and (3) extinction: if the author node vuAnd vvBelong to community ckAnd c is andkthe derived subgraph of (1) contains a plurality of connected subgraphs, and each connected subgraph does not contain a core node, then community CkEliminating;
further, in the step 4.4, the community creating and expanding operation is:
creating: to rtIf the author node v is a node pair (u, v) consisting of any one of the groups of authorsuAnd vvNot belonging to any community, in two nodes v not belonging to any communityuAnd vvAdd an edge between vuIs formed as vvA slave node of, and vvFor the core node, a set v is createdvFor a new community of core nodes, iteratively adding the neighboring nodes satisfying equation (4) to the community;
expanding: if two nodes v of the edge are addeduAnd vvIn, if node vuBelong to community ckAnd v isuV of a neighbor nodevSatisfy equation (4), then v will bevAdding vuCommunity c to which it belongskIteratively judging whether the neighbor node of the newly added node meets the formula (4) or not, and if so, adding the new node into the community; such expansion of a community can cause another community to shrink, essentially resulting in a merger of communities.
Further, in said step 4.2, T1Set as one year, T2The setting is two years.
The invention also provides an academic team community division quality evaluation method based on character characteristics;
generally, an academic team is composed of researchers with common research interests, and the quality of community division of the academic team can be reflected by the similarity of nodes within the community. Keywords of the articles in the literature information records reflect research interests of the authors, so that a research interest keyword set of the authors can be formed, and the research interest similarity of academic teams can be measured by using the similarity of keywords of community members. However, as the keywords of the academic article have high discrimination and the effectiveness is poor by using word characteristics to measure the research interest similarity of the team, the invention provides the academic team community division quality evaluation method based on the word characteristics. A word vector is established for each author using word features of keywords of articles published by the author.
Key_charater(vi)={ch1:count1,ch2:count2,…,chn:countn} (5)
Equation (5) represents at the author node viThe keyword set of a published article is composed of n keywords, where the keyword chnThe cumulative number of occurrences in the keywords of all published articles by the author is countn. The interest similarity between authors is measured by the proportion of their shared keyword counts to their total keyword counts, and the calculation formula is shown in (6).
Wherein,andrespectively represent the key ch at the author node viAnd vjNumber of times appearing in published articles;
community ckCorrelation (c) of interestk) The formula (7) is calculated as the average value of interest similarity between all authors in the community.
Wherein, | ckI denotes Community ckThe number of authors in (a);
community partition quality PartitionQuality (C)t) The weighted sum of interest relevance for all communities, as shown in equation (8), the weight of each community is the number of authors in the community divided by the total number of authors in all communities.
Has the advantages that:
the dynamic community discovery algorithm provided by the invention is used for discovering academic teams, adopts time sequence streaming data as input, realizes dynamic community discovery by aiming at the influence on team community structures caused by the change of the importance of team members and the interaction closeness among the members along with time, does not need to match community partitions at different moments, and can continuously discover the evolution process of the communities according to a time sequence. In addition, due to the introduction of the tense node importance and the tense weight of the relation edges, the influence of non-important nodes on community division is eliminated, and the characteristic that the relation keeping time is different among different members is more reasonably described according to the different residual lives of all the edges, so that the community discovery algorithm is more accurate. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.
Drawings
FIG. 1 is a schematic diagram of the community evolution of an academic team; FIGS. 1(a) - (f) are respectively creation, extinction, expansion, contraction, merger and split of communities;
FIG. 2 is a graph of cumulative graph node and edge trend; FIGS. 2(a) - (d) are the cumulative number of author nodes and the number of edges with co-ordination relations between them over time for four data sets, respectively;
FIG. 3 is a graph of the community partition quality of ATDD and TILES algorithms over time; FIGS. 3(a) - (d) are the time-dependent changes in the community partition quality for the four data sets, respectively;
FIG. 4 is a graph of the number of communities for the ATDD and TILES algorithms over time; fig. 4(a) to (d) show the time-dependent changes in the number of communities in each of the four data sets.
Detailed Description
The present invention will be described in more detail with reference to the accompanying drawings and embodiments.
The invention provides an academic team dynamic community discovery algorithm based on a temporal co-occurrence network, which is characterized in that a temporal co-occurrence network model is established, the evolution process of an academic team in the academic network is analyzed and tracked on the basis of detecting the importance of an author node and the strength change of a relation edge in the co-occurrence network in real time, and the community is established, expanded, contracted, split and eliminated, so that the purpose of dynamic community discovery is achieved, and the algorithm accuracy is high. In addition, the invention also provides a character-feature-based academic team community division quality evaluation method for evaluating the community division quality, and the following experiment verification is carried out by using a public literature information record data set to prove the effectiveness of the algorithm.
The method adopts document information records disclosed in a Chinese knowledge network document resource database as experimental data, the time span of the data is from 2000 to 2016, each document record comprises information such as article names, authors, organizations, keywords, abstracts, publishing time and the like, an input data stream is formed according to the publishing time, and the validity of the algorithm is verified. The system environment for experiments is PC CPU i5-3337U, 8G RAM, Windows10, python3.5 and networkx 1.1.
1 data of the experiment
The literature information records are classified into 4 data sets according to mechanisms, each data set is composed of 5925 data sets (a), 5892 data sets (b), 3792 data sets (c) and 4586 data sets (d), and four subgraphs in FIG. 2 are respectively the number of author nodes of the four data sets and the accumulation of the number of edges with a binding relationship among the four data sets along with time.
2 comparative experiment
For comparison with the TILES (temporal Interactions a Local Edge Strategy) method of document [10], I give each Edge a fixed lifetime of 2 years, perform ATDD algorithm and TILES algorithm on the four data sets respectively, and evaluate with an academic team interest community quality evaluation method based on word features, the results of which are shown in FIG. 3. Since the TILES is based on the community discovery of the triangle structure, the community division quality of the TILES is higher than ATDD when the data volume is small. As the data volume is increased continuously over time, the community partition quality of the ATDD algorithm exceeds TILES quickly, and the partition quality is kept high.
Table 1 is a comparison of the community partition quality of the ATDD and TILES algorithms for four groups of test data, and the table lists the average value of the community partition of each data set at all times and the total average value of the four data sets, and the community partition quality of ATDD on each data set is better than that of TILES.
TABLE 1 comparison of community partition quality for ATDD and TILES algorithms
It can be seen from fig. 4 that the number of community partitions in the ATDD and TILES algorithms changes with time, and the number of community partitions in the four groups of data teams in the ATDD algorithm is relatively stable along with the accumulation of the number of nodes and the number of edges; the number of community partitions of the TILES algorithm rises with the increase in the number of nodes and edges, and when the network scale is large, the number of community partitions increases drastically. Table 2 shows the comparison of the community partition numbers of the ATDD and TILES algorithms, where the community partition number of each data set is an average value of all times, obviously, the community number found by TILES is much larger than that of ATDD, and since the data sets are set according to research institutions and the number of research teams corresponding to each institution is limited, the community partition number of the ATDD algorithm is more reasonable.
TABLE 2 comparison of the number of community partitions for ATDD and TILES algorithms
Number of communities | Data set (a) | Data set (b) | Data set (c) | Data set (d) | Total |
ATDD | |||||
11 | 15 | 13 | 11 | 12 | |
TILES | 144 | 152 | 230 | 119 | 161 |
Reference to the literature
1.Palla G,Barabási A L,Vicsek T.Quantifying social group evolution.[J].Nature,2007,
446(7136):664-667.
2.Palla G,Derényi I,Farkas I,et al.Uncovering the overlapping community structure of complex networks in nature and society[J].Nature,2005,435(7043):814-818.
3.Hopcroft J,Khan O,Kulis B,et al.Tracking evolving communities in large linked networks[J].Proceedings of the National Academy of Sciences,2004,101(suppl 1):5249-5253.
4.Greene D,Doyle D,Cunningham P.Tracking the Evolution of Communities in Dynamic Social Networks[C]//International Conference on Advances in Social Networks Analysis and Mining,Asonam 2010,Odense,Denmark,August.2010:176-183.
5.Wang Y,Wu B,Du N.Community Evolution of Social Network:Feature,Algorithm and Model[J].arXiv:0804.4356v1[physics.soc-ph],28Apr 2008.
6.Yang B,Liu D Y.Incremental algorithm for detecting community structure in dynamic networks[C]//Machine Learning and Cybernetics,2005.Proceedings of 2005International Conference on.IEEE,2005,4:2284-2290.
7.Miller K,Eliassi-Rad T.Continuous time group discovery in dynamic graphs[R]. Lawrence Livermore National Laboratory(LLNL),Livermore,CA,2010.
8.Caravelli P,Wei Y,Subak D,et al.Understanding evolving group structures in time-varying networks[C]//Proceedings of the 2013IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.ACM,2013:142-148.
9.Cazabet R,Amblard F,Hanachi C.Detection of overlapping communities in dynamical social networks[C]//Social Computing(SocialCom),2010 IEEE Second International Conference on.IEEE,2010:309-314.
10.Rossetti G,Pappalardo L,Pedreschi D,et al.Tiles:an online algorithm for community discovery in dynamic social networks[J].Machine Learning,2016:1-29.
Claims (3)
1. An academic team dynamic community discovery method based on a temporal co-culture network is characterized by comprising the following steps:
step 1, defining a binding network G at time tt=(Vt,Et) WhereinFor the set of author nodes in the collaborating network at time t,representing an author node viThe importance of the time at the instant t, is an edge set, author node v, in the co-authoring network at time tiAnd vjAfter a publication, there is an edge e between themi,jConnecting;representing an author node viAnd vjEdge e betweeni,jThe weight at the time instant t, is an edge ei,jThe remaining life at the time t,viand vjEvery article published, the value is assignedFrom viAnd vjIf v is the date of the last co-publicationiAnd vjHas exceeded T1If the time is not a publicationIf v isiAnd vjHas exceeded T2If the time is not a publicationIf atFront viAnd vjWhen the publication is written again, it is resetWherein T is1And T2Taking a value according to experience as a time threshold;
step 2, initializing the binding network G0=(V0,E0),V0=Φ,E0=Φ,C0=Φ;
Step 3, inputting the literature information record R ═ R1,r2,...,rt,.., wherein rtIs one strip comprisesArticle posting history of title, author, keyword, and article posting time, rtIn R, the words are sorted according to publication time, and t is RtThe serial number in R corresponds to a moment;
step 4, publishing records R according to each article in R one by onetUpdate binding network GtDynamic community division is carried out to obtain Ct={c0,c1,...,ck,...}:
Step 4.1, making t equal to 1;
step 4.2, take out a publication record R from RtR is totAll authors in (2) form inter-author collaborating relationship node pairs, add to the collaborating network, update the importance of each node in the collaborating network, the weight of each edge, and calculate the relationship between each node and each edge according to rtRemaining life of the update edge:
if rtIncluding the author node viThen update the author node viThe importance at time t isIf rtIncluding the author node viAnd vjAnd author node viAnd vjThere is no edge connection between them, then at the author node viAnd vjBetween them is added a side ei,jThen update the edge ei,jThe weights at time t are:
wherein n represents rtThe number of author nodes contained in it, and update the edge ei,jThe remaining life at time t isIf rtIncluding the author node viAnd vjAnd author node viAnd vjHas an edge e in betweeni,jConnected with each other, then press directlyThe above method updates the edge ei,jWeight and remaining life at time t; if rtDo not simultaneously contain an author node viAnd vjAnd author node viAnd vjHas an edge e in betweeni,jIf connected, judging the distance v at the time t according to the literature information recordiAnd vjThe length of time of last article publication is less than T1Then, thenIf greater than or equal to T1And is less than T2Then, thenIf greater than or equal to T2Then, then
Step 4.3, according to rtThe publication time of the article in (1) removes the expired author cooperative relationship in the current collaborative network, inspects the influence on the community caused by the removed edge according to the community division standard of an academic team, and executes corresponding community contraction, division and extinction;
4.4, investigating the influence on the community due to the increase of edges according to the community division standard of an academic team, and executing corresponding community creation and expansion;
step 4.5, obtaining academic team community division in the co-binder network at the time t, and representing the academic team community division as Ct={c0,c1,...,ck,...};
Step 4.6, making t equal to t +1, repeating steps 4.2-4.5 until the last published article record R is taken out from RtExecuting the operation until a final academic team community division result is obtained;
in said steps 4.3 and 4.4, a binding network G is definedtThe important node, the strong edge, the subordinate node and the core node in the system are as follows:
1) important nodes: when the importance of a node is not less than the average importance of all non-isolated nodes in the network, the node is an important node; wherein a non-orphaned node is a node that has an edge connection with other nodes;
2) strengthening the edge: when the weight of an edge connecting two important nodes is not less than the average weight of all edges in the network, the edge is a strong edge;
3) slave and core nodes: if two nodes v are connected by a strong edgeiAnd vjThe importance of (A) satisfies:
and the weights of the edges of the two nodes satisfy:
v is thenjIs viThe slave node of (1);
wherein M is the minimum number of people required for forming an academic team and is taken according to experience; x represents and vjAll neighbor nodes v connected by strong edgesx,Is a number vjIs the sum of the weights of all strong edges of the end points;
when a node has M-1 slave nodes, it is a core node;
the academic team community division standard is as follows:
forming an academic team community by a group of nodes which are communicated through strong edges, wherein the academic team community must contain core nodes and all non-core nodes in the community must meet a formula (4), namely the closeness of the connection with the nodes in the community is greater than that of the connection with the nodes outside the community;
wherein v isiAre nodes within the community of the network,representing intra-community sum viIs the sum of the weights of all strong edges of the end points,representing the entire content network by viIs the sum of the weights of all strong edges of the end points;
the step 4.3 specifically comprises the following steps: for each edge in the merged network at time t, ifRemoving the edge from the network; after removing edges, carrying out corresponding community shrinkage, splitting and extinction treatment in three conditions:
(a) shrinking: author node vuAnd vvBelong to community ckAnd c is andkis a connected graph and there are core nodes, at this point, if vuAnd vvIf the formula (4) is not satisfied, removing the nodes from the community, iteratively judging whether the neighbor nodes of the removed nodes satisfy the formula (4), and if not, continuously removing the nodes until the number of the nodes in the community is less than M or no node is to be removed;
(b) splitting if the author node vuAnd vvBelong to community ckAnd c is andkif the derived subgraph contains a plurality of connected subgraphs, and some connected subgraphs contain core nodes, and some connected subgraphs do not contain core nodes, the connected subgraphs containing the core nodes form a new community, and all the nodes in the connected subgraphs not containing the core nodes are changed from CkRemoving;
(c) and (3) extinction: if the author node vuAnd vvBelong to community ckAnd c is andkhas a plurality of connected subgraphs, and each connected subgraph does not contain a coreNode, then community CkEliminating;
in the step 4.4, the community creating and expanding operation is as follows:
creating: to rtIf the author node v is a node pair (u, v) consisting of any one of the groups of authorsuAnd vvNot belonging to any community, in two nodes v not belonging to any communityuAnd vvAdd an edge between vuIs formed as vvA slave node of, and vvFor the core node, a set v is createdvFor a new community of core nodes, iteratively adding the neighboring nodes satisfying equation (4) to the community;
expanding: if two nodes v of the edge are addeduAnd vvIn, if node vuBelong to community ckAnd v isuV of a neighbor nodevSatisfy equation (4), then v will bevAdding vuCommunity c to which it belongskAnd iteratively judging whether the neighbor node of the newly added node meets the formula (4) or not, and if so, adding the community.
2. The dynamic community discovery method for academic teams based on temporal co-culture network as claimed in claim 1, wherein in step 4.2, T is1Set as one year, T2The setting is two years.
3. A character feature-based academic team community division quality assessment method, which is used for evaluating an academic team community division result C obtained by the method of any one of claims 1-2t={c0,c1,...,ck,.. } quality assessment:
first, word features that serve as keywords for articles published by authors establish a word vector for each author:
Key_charater(vi)={ch1:count1,ch2:count2,…,chn:countn} (5)
equation (5) represents at the author node viThe keyword set of a published article is composed of n keywords, where the keyword chnThe cumulative occurrence number in the keywords of all published articles of the author is countn;
then, the interest similarity between the authors is measured by using the ratio of the count of the shared keywords between the authors to the total count of their keywords, and the calculation formula is shown in (6).
Wherein, Similarity (v)i,vj) Representing an author node viAnd vjInterest similarity of (2);andrespectively represent the key ch at the author node viAnd vjNumber of occurrences in published articles;
recalculation Community ckThe average of interest similarity between all authors in the community ckCorrelation (c) of interestk) The calculation formula is shown in (7).
Wherein, | ckI denotes Community ckThe number of authors in (a);
finally, the weighted sum of the interest relevance of all communities is calculated as the community partition quality PartitionQuality (C)t) Wherein the weight of each community is the number of authors in the community divided by the total number of authors in all communities, and the calculation formula is shown in formula (8):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710737012.6A CN107527295B (en) | 2017-08-24 | 2017-08-24 | Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710737012.6A CN107527295B (en) | 2017-08-24 | 2017-08-24 | Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107527295A CN107527295A (en) | 2017-12-29 |
CN107527295B true CN107527295B (en) | 2021-04-30 |
Family
ID=60682220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710737012.6A Expired - Fee Related CN107527295B (en) | 2017-08-24 | 2017-08-24 | Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107527295B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070364A (en) * | 2019-03-27 | 2019-07-30 | 北京三快在线科技有限公司 | Method and apparatus, storage medium based on the fraud of graph model detection clique |
CN110110074A (en) * | 2019-05-10 | 2019-08-09 | 齐鲁工业大学 | A kind of timing data in literature analysis method and device based on Dynamic Network Analysis |
CN110334264B (en) * | 2019-06-27 | 2021-04-09 | 北京邮电大学 | Community detection method and device for heterogeneous dynamic information network |
CN111047453A (en) * | 2019-12-04 | 2020-04-21 | 兰州交通大学 | Detection method and device for decomposing large-scale social network community based on high-order tensor |
CN111428056A (en) * | 2020-04-26 | 2020-07-17 | 中国烟草总公司郑州烟草研究院 | Method and device for constructing scientific research personnel cooperative community |
CN112100452B (en) * | 2020-09-17 | 2024-02-06 | 京东科技控股股份有限公司 | Method, apparatus, device and computer readable storage medium for data processing |
CN112463977A (en) * | 2020-10-22 | 2021-03-09 | 三盟科技股份有限公司 | Community mining method, system, computer and storage medium based on knowledge graph |
CN113035366B (en) * | 2021-03-24 | 2023-01-13 | 南方科技大学 | Close contact person identification method, close contact person identification device, electronic device and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075942A (en) * | 2007-06-22 | 2007-11-21 | 清华大学 | Method and system for processing social network expert information based on expert value progation algorithm |
CN103744846A (en) * | 2013-08-13 | 2014-04-23 | 北京航空航天大学 | Multidimensional dynamic local knowledge map and constructing method thereof |
-
2017
- 2017-08-24 CN CN201710737012.6A patent/CN107527295B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101075942A (en) * | 2007-06-22 | 2007-11-21 | 清华大学 | Method and system for processing social network expert information based on expert value progation algorithm |
CN103744846A (en) * | 2013-08-13 | 2014-04-23 | 北京航空航天大学 | Multidimensional dynamic local knowledge map and constructing method thereof |
Non-Patent Citations (1)
Title |
---|
时态关联规则挖掘算法研究及其在学术合作关系挖掘中的应用;邹志科;《中国优秀硕士学位论文全文数据库 信息科技辑》;20111215;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107527295A (en) | 2017-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107527295B (en) | Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof | |
Berahmand et al. | A preference random walk algorithm for link prediction through mutual influence nodes in complex networks | |
Gui et al. | A community discovery algorithm based on boundary nodes and label propagation | |
Yu et al. | Citation prediction in heterogeneous bibliographic networks | |
Guo et al. | Evolutionary community structure discovery in dynamic weighted networks | |
Ilhan et al. | Feature identification for predicting community evolution in dynamic social networks | |
Zhou et al. | Collaborator recommendation in heterogeneous bibliographic networks using random walks | |
Sattari et al. | A spreading activation-based label propagation algorithm for overlapping community detection in dynamic social networks | |
Xu et al. | Finding overlapping community from social networks based on community forest model | |
Kaya et al. | Development of multidimensional academic information networks with a novel data cube based modeling method | |
Wang et al. | Detecting shilling groups in online recommender systems based on graph convolutional network | |
Sohrabi et al. | Systematic method for finding emergence research areas as data quality | |
Zhu et al. | Path prediction of information diffusion based on a topic-oriented relationship strength network | |
Liu et al. | Detecting community structure for undirected big graphs based on random walks | |
Liu et al. | Fast community discovery and its evolution tracking in time-evolving social networks | |
He et al. | A comparative study of different approaches for tracking communities in evolving social networks | |
Li et al. | Dynamic heterogeneous attributed network embedding | |
Sun et al. | Game theoretical approach for non-overlapping community detection | |
Huang et al. | Social Network Link Prediction Algorithm Based on Node Similarity | |
Pulipati et al. | Topological and attribute link prediction using firefly algorithm | |
Guan et al. | Discovering pattern-based subspace clusters by pattern tree | |
Sikdar et al. | Compas: Community preserving sampling for streaming graphs | |
Chen et al. | InDNI: An Infection Time Independent Method for Diffusion Network Inference | |
Liu et al. | A novel approach of discovering local community using node vector model | |
Gui et al. | A new method for overlapping community detection based on complete subgraph and label propagation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210430 |
|
CF01 | Termination of patent right due to non-payment of annual fee |