CN107527295B

CN107527295B - Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof

Info

Publication number: CN107527295B
Application number: CN201710737012.6A
Authority: CN
Inventors: 黄芳; 万文聪; 王向前; 张予琛; 章成源
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2017-08-24
Filing date: 2017-08-24
Publication date: 2021-04-30
Anticipated expiration: 2037-08-24
Also published as: CN107527295A

Abstract

The invention provides an academic team dynamic community discovery algorithm based on a temporal co-occurrence network, which is characterized in that a temporal co-occurrence network model is established, the evolution process of an academic team in the academic network is analyzed and tracked on the basis of detecting the importance of an author node and the strength change of a relation edge in the co-occurrence network in real time, and the community is established, expanded, contracted, split and eliminated, so that the purpose of dynamic community discovery is achieved, and the algorithm accuracy is high. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.

Description

Academic team dynamic community discovery method based on temporal co-occurrence network and quality evaluation method thereof

Technical Field

The invention relates to an academic team dynamic community discovery method based on a temporal co-occurrence network.

Background

The national innovation system needs a group of high-level scientific research groups with independent innovation capability as important components. How to effectively observe and analyze the evolution of research groups and comprehensively and objectively evaluate the overall performance of academic teams is a new topic faced in the selection and evaluation of scientific and technological innovation groups. The scientific research team is a whole formed by long-term cooperation and nature, the complex evolution process of the scientific research team is hidden in historical records of various academic activities such as articles published by scientific researchers in cooperation, cooperative research projects and the like, wherein the historical tracks of the articles published by scientific researchers in cooperation are completely recorded by the public data resources of scientific literature, so that the academic network is constructed by utilizing the large-scale scientific literature database, and the evolution process of the academic team is mined by using a dynamic community discovery theory and method on the basis of the historical records, which is an important research topic in the current complex network analysis field.

The community structure in the academic collaboration network is a whole formed by long-term academic collaboration of community members, the community structure generally has stable core members, and the important factor for driving community evolution is the transition of the core members. The early dynamic community discovery algorithm mostly applies the algorithm idea of static community discovery to a dynamic network, and adopts a two-step strategy, namely, firstly, the static community detection algorithm is used for carrying out community division on network snapshots at different moments, and then the community division at two adjacent moments is matched. In 2007, Palla et al [1] utilizes CPM algorithm [2] to detect node overlapping degree of a community interval to construct an evolution relation between communities at different times. Hopcroft et al 2004[3], Greene et al 2010[4] utilized the overlap similarity to analyze the community evolution. Wang et al.2008[5] uses part of important nodes to track communities. However, static community partitioning based on network snapshots does not accurately describe the evolving characteristics of the community. Aiming at the problem, Yang et al 2005[6], Miller et al 2010[7], Caravelli et al 2013[8] and the like adopt an incremental dynamic community discovery method, wherein the method firstly obtains initial community division by using a static algorithm, and then guides the community division at the current moment to change on the basis of the original community according to the change of a network topological structure; the method is convenient for tracking the evolution of the community, but how to define the network change at each moment is a key problem to be solved by the method. In 2010, Cazabet et al [9] proposes to consider the change of edges between nodes in a network as the change of interaction between the nodes along with time, and to implement dynamic community detection according to the inherent attributes of the community and the interaction history of the nodes. In 2016, Rossetti et al [10] proposes a TILES (temporal Interactions a Local Edge Strategy) algorithm, in which it is considered that not only are relational edges added to a network over time, but also existing edges may lose the influence of the two factors on community evolution, so that a dynamic community discovery algorithm is more effective, but the influence of the importance of nodes and the interaction compactness on community structures is not considered. However, as the status of the team members and the relationship between them changes with time, the importance of the member nodes in the academic network changes, so that the academic team organization structure evolves, and the evolution leads to the team structure evolution guided by the team core members; therefore, in academic team community discovery, the degree of closeness of changes and relationships of personas is an important factor affecting the evolution of academic teams.

In addition, a large number of literature records are continuously added into the literature database over time, and the academic relationship of the scholars changes, so that the topological structure of the academic network of the scholars changes, the importance of the nodes in the network also changes, and the change of the academic community is influenced.

Therefore, there is a need to design a new dynamic community discovery method for academic teams aiming at the technical problems in the prior art.

Disclosure of Invention

The invention provides an Academic Team Dynamic community Discovery Algorithm (ATDD) based on a temporal collaborative network, which aims at overcoming the defects of the prior art, analyzes and tracks the evolution process of an Academic Team in an Academic network by detecting the change of the importance, the strength (weight) and the continuity of an author node along with time, and establishes, expands, contracts, splits and dies the community so as to achieve the purpose of Dynamic community Discovery. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.

The academic team members play roles in organization structure which can be divided into a team leader, an important member and a general member, and the academic network corresponds to the core node, the important node and the subordinate node respectively to form an academic team community. Over time, the academic relations among the team members change along with the changes of the academic cooperative behaviors, and the academic relations are increased, decreased, created or disappeared, so that the importance of the member nodes in the academic network changes, the roles played by the member nodes also change, and the academic team organization structure evolves, and the evolution is often the dominant role of the behaviors of the academic team core members, and the team structure evolves guided by the team core members, and the evolution is reflected in the cooperative behaviors of researchers who continuously cooperate to publish academic articles. In the public literature resource database, the literature information data records will be continuously accumulated in the database in the form of time series streams, thus forming streaming data of author-binding relationship pairs with time attributes. The academic relationship network of researchers is driven by the data flow of the association relationship, edges and nodes are continuously added into the network, and important nodes and edges are highlighted. Therefore, the invention evaluates the core of the nodes through the temporal measurement of the importance and the relation weight of the character nodes, and finds the team by adopting an iterative mode taking the core nodes as the center.

Fig. 1 is a schematic diagram of academic team evolution. When new data comes, the nodes and edges in the network are updated, and a new core node may be generated, which may cause a new community to be formed, as shown in fig. 1(a), when the node 4 and the edges (1, 4) are added, the node 1 becomes the core node, and a new community is formed. As the relationship pair joins, the connection between the node outside the community and the community becomes very tight, as shown in FIG. 1(c), the node 5 and the edge (1, 5) are added, the connection between the node 5 and the community becomes very tight, and we join the community, which increases the size of the community. As shown in fig. 1(b), some of the partnerships disappear over time, resulting in the removal of nodes from the community, and the community dies if it no longer contains core nodes. Such removal of stale edges may make some nodes less closely associated with the community and may require removal from the community, as shown in fig. 1(d), which may reduce the size of the community. This add operation adds nodes that are very tightly connected to the community in an iterative manner, so the derived subgraph of the community is a connected graph, as shown in fig. 1(e), and edges are added between two communities, resulting in that most nodes of one community are added to the other community, and then the two communities are merged into one community. Removing edges within a community may destroy such connectivity, and if removing edges results in a derived subgraph of the community containing connected components with core nodes, the community is split into multiple communities, as shown in fig. 1(f), and the edges within the community disappear and split into two communities.

Based on the above principle, the technical scheme provided by the invention is as follows:

a dynamic community discovery method of an academic team based on a temporal union network is characterized in that dynamic community division is realized by establishing a temporal union network model and using community creation, expansion, contraction, division and extinction strategies on the basis of detecting node importance and relationship strength change in a union network in real time; the method specifically comprises the following steps:

step 1, defining a binding network G at time t_t＝(V_t,E_t) Wherein

For the set of author nodes in the corporate network at time t,

representing an author node v_iThe importance of the time at the instant t,

over time, the importance of each article published by an author is added to 1, and the more articles published by the author, the higher the importance;

is the set of edges in the converged network at time t; author node v_iAnd v_jAfter a publication, there is an edge e between them_i,jConnecting;

representing author node v_iAnd upsilon_jEdge e between_i,jThe weight at the time instant t,

the larger the weight is, the more v_iAnd upsilon_jThe tighter the relationship between, v_iAnd upsilon_jEach article of manufacture is published in a paper,

the value of (2) is updated once, and the weight of the edge is calculated as shown in formula (1):

wherein n represents upsilon_iAnd upsilon_jNumber of authors in a co-written article; formula (1) represents upsilon_iAnd upsilon_jEach article published a chapter, corresponding edge e_i,jThe increment of the weight of (1) is the reciprocal of the number of edges among all the authors of the article, which indicates that the weight increment is smaller as the number of the collaborators is larger;

is an edge e_i,jThe remaining life at the time t,

υ_iand upsilon_jEvery article published, the value is assigned

From upsilon_iAnd upsilon_jIf upsilon on the date of the last written publication_iAnd upsilon_jHas exceeded T₁If the time is not a publication

If u_iAnd upsilon_jHas exceeded T₂If the time is not a publication

If at

Front v_iAnd v_jAgain, theReset if the article is a well-documented article

Wherein T is₁And T₂Taking a value according to experience as a time threshold;

step 2, initializing the binding network G₀＝(V₀,E₀),V₀＝Φ,E₀＝Φ，C₀＝Φ；

Step 3, inputting the literature information record R ═ R₁,r₂,...,r_t,.., wherein r_tIs a publication record containing information such as title, author, keyword and publication time_tIn R, the words are sorted according to publication time, and t is R_tThe serial number in R corresponds to a moment;

step 4, publishing records R according to each article in R one by one_tUpdate binding network G_tDynamic community division is carried out to obtain C_t＝{c₀,c₁,...,c_k,...}：

Step 4.1, making t equal to 1;

step 4.2, take out a publication record R from R_t(ii) a Will r is_tAll the authors in (a) form a pair of inter-author co-authoring relationship nodes and add to the co-authoring network, then update the importance of each node and the weight of each edge in the co-authoring network, and according to r_tRemaining life of the update edge:

if r_tIncluding the author node v_iThen update the author node v_iThe importance at time t is

If r_tIncluding the author node v_iAnd upsilon_jAnd author node v_iAnd v_jThere is no edge connection between them, then at the author node v_iAnd v_jBetween them is added a side e_i,jThen update the edge e_i,jThe weight at time t is

n represents r_tThe number of author nodes contained in; and update the edge e_i,jThe remaining life at time t is

If r_tIncluding the author node v_iAnd v_jAnd author node v_iAnd upsilon_jHas an edge e in between_i,jIf connected, the edge e is updated directly according to the method_i,jWeight and remaining life at time t; if r_tDoes not contain the author node v at the same time_iAnd upsilon_jAnd author node v_iAnd v_jHas an edge e in between_i,jIf so, the distance v at the time t is judged according to the literature information record_iAnd v_jThe length of time of last article publication is less than T₁Then, then

If greater than or equal to T₁And is less than T₂Then, then

If greater than or equal to T₂Then, then

when the document records are gradually added into the document library along with the time sequence, the importance of the author nodes in the collaborative network and the closeness of the cooperative relationship among the authors are changed, so that the topological structure of the academic team community is also changed, and the change can cause the evolution of the community structure, including creation, expansion, division and extinction.

Step 4.3, according to r_tThe publication time of the article in (1) removes the expired author cooperative relationship in the current collaborative network, inspects the influence on the community caused by the removed edge according to the community division standard of an academic team, and executes corresponding community contraction, division and eliminationDeath;

4.4, investigating the influence on the community due to the increase of edges according to the community division standard of an academic team, and executing corresponding community creation and expansion;

step 4.5, obtaining academic team community division in the co-binder network at the time t, and representing the academic team community division as C_t＝{c₀,c₁,...,c_k,...}；

Step 4.6, making t equal to t +1, repeating steps 4.2-4.5 until the last published article record R is taken out from R_tExecuting the operation until a final academic team community division result is obtained;

in said steps 4.3 and 4.4, a binding network G is defined_tThe important node, the strong edge, the subordinate node and the core node in the system are as follows:

1) important nodes: when the importance of a node is not less than the average importance of all non-isolated nodes in the network, the node is an important node; wherein a non-orphaned node is a node that has an edge connection with other nodes;

2) strengthening the edge: when the weight of an edge connecting two important nodes is not less than the average weight of all edges in the network, the edge is a strong edge;

3) slave and core nodes: if two nodes v are connected by a strong edge_iAnd v_jThe importance of (A) satisfies:

and the weights of the edges of the two nodes satisfy:

v is then_jIs v_iThe slave node of (1);

wherein M is the minimum number of people forming an academic team, and is taken as 4 according to experience; x represents and v_jAll neighbor nodes v connected by strong edges_x，

Is a number v_jIs the sum of the weights of all strong edges of the end points;

equations (2) and (3) indicate the dependent node v_jIs much less important than node v_iAnd v is of importance, and v_iAnd v_jThe weight of the edge between is greater than v_jIs the median of the sum of the weights of all strong edges of the end points;

when a node has M-1 slave nodes, it is a core node;

the academic team community division standard is as follows:

forming an academic team community by a group of nodes which are communicated through strong edges, wherein the academic team community must contain core nodes and all non-core nodes in the community must meet a formula (4), namely the closeness of the connection with the nodes in the community is greater than that of the connection with the nodes outside the community;

wherein v is_iAre nodes within the community of the network,

representing intra-community sum v_iIs the sum of the weights of all strong edges of the end points,

representing the entire content network by v_iIs the sum of the weights of all strong edges of the end points.

Further, the step 4.3 specifically includes the following steps:

for each edge in the merged network at time t, if

Removing the edge from the network; corresponding community collection is carried out under three conditions after edges are removedShrinkage, splitting and extinction treatment:

(a) shrinking: author node v_uAnd v_vBelong to community c_kAnd c is and_kis a connected graph and there are core nodes, at this point, if v_uAnd v_vIf equation (4) is not satisfied, they are removed from the community; after any node is removed from the community, the connection tightness between the neighbor node in the community and the community is also changed, so that whether the neighbor node of the removed node meets the formula (4) or not needs to be iteratively judged, and if not, the neighbor node is continuously removed until the number of the nodes in the community (the size of the community) is less than M or no node needs to be removed;

(b) splitting if the author node v_uAnd v_vBelong to community c_kAnd c is and_kif the derived subgraph contains a plurality of connected subgraphs, some connected subgraphs contain core nodes, and some connected subgraphs do not contain core nodes, the connected subgraphs containing the core nodes form new communities (each connected subgraph containing the core nodes is split into one new community), and all the nodes in the connected subgraph not containing the core nodes are separated from C_kRemoving;

(c) and (3) extinction: if the author node v_uAnd v_vBelong to community c_kAnd c is and_kthe derived subgraph of (1) contains a plurality of connected subgraphs, and each connected subgraph does not contain a core node, then community C_kEliminating;

further, in the step 4.4, the community creating and expanding operation is:

creating: to r_tIf the author node v is a node pair (u, v) consisting of any one of the groups of authors_uAnd v_vNot belonging to any community, in two nodes v not belonging to any community_uAnd v_vAdd an edge between v_uIs formed as v_vA slave node of, and v_vFor the core node, a set v is created_vFor a new community of core nodes, iteratively adding the neighboring nodes satisfying equation (4) to the community;

expanding: if two nodes v of the edge are added_uAnd v_vIn, if node v_uBelong to community c_kAnd v is_uV of a neighbor node_vSatisfy equation (4), then v will be_vAdding v_uCommunity c to which it belongs_kIteratively judging whether the neighbor node of the newly added node meets the formula (4) or not, and if so, adding the new node into the community; such expansion of a community can cause another community to shrink, essentially resulting in a merger of communities.

Further, in said step 4.2, T₁Set as one year, T₂The setting is two years.

The invention also provides an academic team community division quality evaluation method based on character characteristics;

generally, an academic team is composed of researchers with common research interests, and the quality of community division of the academic team can be reflected by the similarity of nodes within the community. Keywords of the articles in the literature information records reflect research interests of the authors, so that a research interest keyword set of the authors can be formed, and the research interest similarity of academic teams can be measured by using the similarity of keywords of community members. However, as the keywords of the academic article have high discrimination and the effectiveness is poor by using word characteristics to measure the research interest similarity of the team, the invention provides the academic team community division quality evaluation method based on the word characteristics. A word vector is established for each author using word features of keywords of articles published by the author.

Key_charater(v_i)＝{ch₁:count1，ch₂:count2，…，ch_n:countn} (5)

Equation (5) represents at the author node v_iThe keyword set of a published article is composed of n keywords, where the keyword ch_nThe cumulative number of occurrences in the keywords of all published articles by the author is countn. The interest similarity between authors is measured by the proportion of their shared keyword counts to their total keyword counts, and the calculation formula is shown in (6).

Wherein,

and

respectively represent the key ch at the author node v_iAnd v_jNumber of times appearing in published articles;

community c_kCorrelation (c) of interest_k) The formula (7) is calculated as the average value of interest similarity between all authors in the community.

Wherein, | c_kI denotes Community c_kThe number of authors in (a);

community partition quality PartitionQuality (C)_t) The weighted sum of interest relevance for all communities, as shown in equation (8), the weight of each community is the number of authors in the community divided by the total number of authors in all communities.

Has the advantages that:

the dynamic community discovery algorithm provided by the invention is used for discovering academic teams, adopts time sequence streaming data as input, realizes dynamic community discovery by aiming at the influence on team community structures caused by the change of the importance of team members and the interaction closeness among the members along with time, does not need to match community partitions at different moments, and can continuously discover the evolution process of the communities according to a time sequence. In addition, due to the introduction of the tense node importance and the tense weight of the relation edges, the influence of non-important nodes on community division is eliminated, and the characteristic that the relation keeping time is different among different members is more reasonably described according to the different residual lives of all the edges, so that the community discovery algorithm is more accurate. In addition, the invention also provides an academic team community division quality evaluation method based on character characteristics to evaluate the community division quality, and the public literature information record data set is used for experimental verification, and the experimental result shows the effectiveness of the algorithm.

Drawings

FIG. 1 is a schematic diagram of the community evolution of an academic team; FIGS. 1(a) - (f) are respectively creation, extinction, expansion, contraction, merger and split of communities;

FIG. 2 is a graph of cumulative graph node and edge trend; FIGS. 2(a) - (d) are the cumulative number of author nodes and the number of edges with co-ordination relations between them over time for four data sets, respectively;

FIG. 3 is a graph of the community partition quality of ATDD and TILES algorithms over time; FIGS. 3(a) - (d) are the time-dependent changes in the community partition quality for the four data sets, respectively;

FIG. 4 is a graph of the number of communities for the ATDD and TILES algorithms over time; fig. 4(a) to (d) show the time-dependent changes in the number of communities in each of the four data sets.

Detailed Description

The present invention will be described in more detail with reference to the accompanying drawings and embodiments.

The invention provides an academic team dynamic community discovery algorithm based on a temporal co-occurrence network, which is characterized in that a temporal co-occurrence network model is established, the evolution process of an academic team in the academic network is analyzed and tracked on the basis of detecting the importance of an author node and the strength change of a relation edge in the co-occurrence network in real time, and the community is established, expanded, contracted, split and eliminated, so that the purpose of dynamic community discovery is achieved, and the algorithm accuracy is high. In addition, the invention also provides a character-feature-based academic team community division quality evaluation method for evaluating the community division quality, and the following experiment verification is carried out by using a public literature information record data set to prove the effectiveness of the algorithm.

The method adopts document information records disclosed in a Chinese knowledge network document resource database as experimental data, the time span of the data is from 2000 to 2016, each document record comprises information such as article names, authors, organizations, keywords, abstracts, publishing time and the like, an input data stream is formed according to the publishing time, and the validity of the algorithm is verified. The system environment for experiments is PC CPU i5-3337U, 8G RAM, Windows10, python3.5 and networkx 1.1.

1 data of the experiment

The literature information records are classified into 4 data sets according to mechanisms, each data set is composed of 5925 data sets (a), 5892 data sets (b), 3792 data sets (c) and 4586 data sets (d), and four subgraphs in FIG. 2 are respectively the number of author nodes of the four data sets and the accumulation of the number of edges with a binding relationship among the four data sets along with time.

2 comparative experiment

For comparison with the TILES (temporal Interactions a Local Edge Strategy) method of document [10], I give each Edge a fixed lifetime of 2 years, perform ATDD algorithm and TILES algorithm on the four data sets respectively, and evaluate with an academic team interest community quality evaluation method based on word features, the results of which are shown in FIG. 3. Since the TILES is based on the community discovery of the triangle structure, the community division quality of the TILES is higher than ATDD when the data volume is small. As the data volume is increased continuously over time, the community partition quality of the ATDD algorithm exceeds TILES quickly, and the partition quality is kept high.

Table 1 is a comparison of the community partition quality of the ATDD and TILES algorithms for four groups of test data, and the table lists the average value of the community partition of each data set at all times and the total average value of the four data sets, and the community partition quality of ATDD on each data set is better than that of TILES.

TABLE 1 comparison of community partition quality for ATDD and TILES algorithms

It can be seen from fig. 4 that the number of community partitions in the ATDD and TILES algorithms changes with time, and the number of community partitions in the four groups of data teams in the ATDD algorithm is relatively stable along with the accumulation of the number of nodes and the number of edges; the number of community partitions of the TILES algorithm rises with the increase in the number of nodes and edges, and when the network scale is large, the number of community partitions increases drastically. Table 2 shows the comparison of the community partition numbers of the ATDD and TILES algorithms, where the community partition number of each data set is an average value of all times, obviously, the community number found by TILES is much larger than that of ATDD, and since the data sets are set according to research institutions and the number of research teams corresponding to each institution is limited, the community partition number of the ATDD algorithm is more reasonable.

TABLE 2 comparison of the number of community partitions for ATDD and TILES algorithms

Number of communities	Data set (a)	Data set (b)	Data set (c)	Data set (d)	Total mean value
						ATDD
	11	15	13	11	12
						TILES	144	152	230	119	161

Reference to the literature

1.Palla G,Barabási A L,Vicsek T.Quantifying social group evolution.[J].Nature,2007,

446(7136):664-667.

2.Palla G,Derényi I,Farkas I,et al.Uncovering the overlapping community structure of complex networks in nature and society[J].Nature,2005,435(7043):814-818.

3.Hopcroft J,Khan O,Kulis B,et al.Tracking evolving communities in large linked networks[J].Proceedings of the National Academy of Sciences,2004,101(suppl 1):5249-5253.

4.Greene D,Doyle D,Cunningham P.Tracking the Evolution of Communities in Dynamic Social Networks[C]//International Conference on Advances in Social Networks Analysis and Mining,Asonam 2010,Odense,Denmark,August.2010:176-183.

5.Wang Y,Wu B,Du N.Community Evolution of Social Network:Feature,Algorithm and Model[J].arXiv:0804.4356v1[physics.soc-ph],28Apr 2008.

6.Yang B,Liu D Y.Incremental algorithm for detecting community structure in dynamic networks[C]//Machine Learning and Cybernetics,2005.Proceedings of 2005International Conference on.IEEE,2005,4:2284-2290.

7.Miller K,Eliassi-Rad T.Continuous time group discovery in dynamic graphs[R]. Lawrence Livermore National Laboratory(LLNL),Livermore,CA,2010.

8.Caravelli P,Wei Y,Subak D,et al.Understanding evolving group structures in time-varying networks[C]//Proceedings of the 2013IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.ACM,2013:142-148.

9.Cazabet R,Amblard F,Hanachi C.Detection of overlapping communities in dynamical social networks[C]//Social Computing(SocialCom),2010 IEEE Second International Conference on.IEEE,2010:309-314.

10.Rossetti G,Pappalardo L,Pedreschi D,et al.Tiles:an online algorithm for community discovery in dynamic social networks[J].Machine Learning,2016:1-29.

Claims

1. An academic team dynamic community discovery method based on a temporal co-culture network is characterized by comprising the following steps:

step 1, defining a binding network G at time t_t＝(V_t,E_t) Wherein

For the set of author nodes in the collaborating network at time t,

representing an author node v_iThe importance of the time at the instant t,

is an edge set, author node v, in the co-authoring network at time t_iAnd v_jAfter a publication, there is an edge e between them_i,jConnecting;

representing an author node v_iAnd v_jEdge e between_i,jThe weight at the time instant t,

is an edge e_i,jThe remaining life at the time t,

v_iand v_jEvery article published, the value is assigned

From v_iAnd v_jIf v is the date of the last co-publication_iAnd v_jHas exceeded T₁If the time is not a publication

If v is_iAnd v_jHas exceeded T₂If the time is not a publication

If at

Front v_iAnd v_jWhen the publication is written again, it is reset

Step 3, inputting the literature information record R ═ R₁,r₂,...,r_t,.., wherein r_tIs one strip comprisesArticle posting history of title, author, keyword, and article posting time, r_tIn R, the words are sorted according to publication time, and t is R_tThe serial number in R corresponds to a moment;

Step 4.1, making t equal to 1;

step 4.2, take out a publication record R from R_tR is to_tAll authors in (2) form inter-author collaborating relationship node pairs, add to the collaborating network, update the importance of each node in the collaborating network, the weight of each edge, and calculate the relationship between each node and each edge according to r_tRemaining life of the update edge:

If r_tIncluding the author node v_iAnd v_jAnd author node v_iAnd v_jThere is no edge connection between them, then at the author node v_iAnd v_jBetween them is added a side e_i,jThen update the edge e_i,jThe weights at time t are:

wherein n represents r_tThe number of author nodes contained in it, and update the edge e_i,jThe remaining life at time t is

If r_tIncluding the author node v_iAnd v_jAnd author node v_iAnd v_jHas an edge e in between_i,jConnected with each other, then press directlyThe above method updates the edge e_i,jWeight and remaining life at time t; if r_tDo not simultaneously contain an author node v_iAnd v_jAnd author node v_iAnd v_jHas an edge e in between_i,jIf connected, judging the distance v at the time t according to the literature information record_iAnd v_jThe length of time of last article publication is less than T₁Then, then

If greater than or equal to T₁And is less than T₂Then, then

If greater than or equal to T₂Then, then

Step 4.3, according to r_tThe publication time of the article in (1) removes the expired author cooperative relationship in the current collaborative network, inspects the influence on the community caused by the removed edge according to the community division standard of an academic team, and executes corresponding community contraction, division and extinction;

and the weights of the edges of the two nodes satisfy:

v is then_jIs v_iThe slave node of (1);

wherein M is the minimum number of people required for forming an academic team and is taken according to experience; x represents and v_jAll neighbor nodes v connected by strong edges_x，

Is a number v_jIs the sum of the weights of all strong edges of the end points;

when a node has M-1 slave nodes, it is a core node;

the academic team community division standard is as follows:

wherein v is_iAre nodes within the community of the network,

representing the entire content network by v_iIs the sum of the weights of all strong edges of the end points;

the step 4.3 specifically comprises the following steps: for each edge in the merged network at time t, if

Removing the edge from the network; after removing edges, carrying out corresponding community shrinkage, splitting and extinction treatment in three conditions:

(a) shrinking: author node v_uAnd v_vBelong to community c_kAnd c is and_kis a connected graph and there are core nodes, at this point, if v_uAnd v_vIf the formula (4) is not satisfied, removing the nodes from the community, iteratively judging whether the neighbor nodes of the removed nodes satisfy the formula (4), and if not, continuously removing the nodes until the number of the nodes in the community is less than M or no node is to be removed;

(b) splitting if the author node v_uAnd v_vBelong to community c_kAnd c is and_kif the derived subgraph contains a plurality of connected subgraphs, and some connected subgraphs contain core nodes, and some connected subgraphs do not contain core nodes, the connected subgraphs containing the core nodes form a new community, and all the nodes in the connected subgraphs not containing the core nodes are changed from C_kRemoving;

(c) and (3) extinction: if the author node v_uAnd v_vBelong to community c_kAnd c is and_khas a plurality of connected subgraphs, and each connected subgraph does not contain a coreNode, then community C_kEliminating;

in the step 4.4, the community creating and expanding operation is as follows:

expanding: if two nodes v of the edge are added_uAnd v_vIn, if node v_uBelong to community c_kAnd v is_uV of a neighbor node_vSatisfy equation (4), then v will be_vAdding v_uCommunity c to which it belongs_kAnd iteratively judging whether the neighbor node of the newly added node meets the formula (4) or not, and if so, adding the community.

2. The dynamic community discovery method for academic teams based on temporal co-culture network as claimed in claim 1, wherein in step 4.2, T is₁Set as one year, T₂The setting is two years.

3. A character feature-based academic team community division quality assessment method, which is used for evaluating an academic team community division result C obtained by the method of any one of claims 1-2_t＝{c₀,c₁,...,c_k,.. } quality assessment:

first, word features that serve as keywords for articles published by authors establish a word vector for each author:

Key_charater(v_i)＝{ch₁:count1,ch₂:count2,…,ch_n:countn} (5)

equation (5) represents at the author node v_iThe keyword set of a published article is composed of n keywords, where the keyword ch_nThe cumulative occurrence number in the keywords of all published articles of the author is countn;

then, the interest similarity between the authors is measured by using the ratio of the count of the shared keywords between the authors to the total count of their keywords, and the calculation formula is shown in (6).

Wherein, Similarity (v)_i,v_j) Representing an author node v_iAnd v_jInterest similarity of (2);

and

respectively represent the key ch at the author node v_iAnd v_jNumber of occurrences in published articles;

recalculation Community c_kThe average of interest similarity between all authors in the community c_kCorrelation (c) of interest_k) The calculation formula is shown in (7).

Wherein, | c_kI denotes Community c_kThe number of authors in (a);

finally, the weighted sum of the interest relevance of all communities is calculated as the community partition quality PartitionQuality (C)_t) Wherein the weight of each community is the number of authors in the community divided by the total number of authors in all communities, and the calculation formula is shown in formula (8):