CN106227835A - Team's research direction method for digging based on two subnetwork figure hierarchical clusterings - Google Patents

Team's research direction method for digging based on two subnetwork figure hierarchical clusterings Download PDF

Info

Publication number
CN106227835A
CN106227835A CN201610595145.XA CN201610595145A CN106227835A CN 106227835 A CN106227835 A CN 106227835A CN 201610595145 A CN201610595145 A CN 201610595145A CN 106227835 A CN106227835 A CN 106227835A
Authority
CN
China
Prior art keywords
author
group
key word
team
research
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610595145.XA
Other languages
Chinese (zh)
Other versions
CN106227835B (en
Inventor
黄芳
彭孟亚
蔡颖
龙军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201610595145.XA priority Critical patent/CN106227835B/en
Publication of CN106227835A publication Critical patent/CN106227835A/en
Application granted granted Critical
Publication of CN106227835B publication Critical patent/CN106227835B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of team's research direction method for digging based on two subnetwork figure hierarchical clusterings, comprise the following steps: step 1: set up author investigation interest representation mode based on author's key word two subnetwork;Step 2: author investigation interest representation mode is carried out figure cluster: author little for the degree of concern difference to each key word is attributed to same author group;Obtain author's cluster set;Step 3: general levels clusters, obtain the research interest of each author group: author's cluster set will only comprise the group of an author, it is merged in other author group that research interest is similar, make the author's number comprised in each author group more than 2, calculate and update the research interest of each author group, i.e. team's research direction.The present invention can excavate the academic research direction of team effectively, provides advantage for analyzing and evaluate the development of team's research direction.

Description

Team's research direction method for digging based on two subnetwork figure hierarchical clusterings
Technical field
The present invention relates to a kind of team's research direction method for digging based on two subnetwork figure hierarchical clusterings.
Background technology
In the trend of globalization day by day significantly today, team unity is a phenomenon the most universal.Along with multi-disciplinary Mixing together, the interpenetrating of science technology and society, have higher requirement to research and development management and scientific research organization.Science and technology wound New team is helped each other by resource-sharing, cooperative cooperating and the division of labor so that S&T innovation efficiency is greatly improved, as science and technology research and development One effectively organizes form, and scientific and technical innovation team progressively becomes the important model of scientific research and innovation activity[1].For ensureing Persistently the carrying out and turn out individual and the team with scientific and technical innovation spirit, National Natural Science base of science frontier research activities Gold committee started from 2000 tentative to set up " innovation colony science fund ", in order to help domestic outstanding innovative research team The basic scientific research that a certain Important Academic research direction is carried out and applied basic research[2]
The research direction of team is to evaluate the important symbol of whole team development." National Natural Science Foundation of China creates Recent studies on colony science fund trial method "[3]Explicitly pointing out, creation & research team must be the entirety that Long-term Collaboration is formed, and has The research direction of Relatively centralized, and can persistently be active in the forward position of its research field.Scientific paper is led generally around science A certain problem in territory study after science record or Scientific summarization, if the opinion that relevant academic problem is studied Literary composition, their similarity is the highest, but the paper that different problems are answered, their difference is the biggest.Root in the past According to this characteristic between paper, it is possible to analyze the dependency of the research direction of an Academic Teams.And this mode is not There are the relation between member, academic relationship network in consideration team[4]There is complex structure.Academic direction to team For, on the one hand need to consider the research interest place of member in team;On the other hand the respectively side of research is remained a need for considering in team Relation between to.
Therefore, it is necessary to the team's research direction designing a kind of combination Team Member research interest and relational network feature is dug Pick method.
List of references
[1] Wang Xinxin. the structure of scientific and technical innovation team and Study on Development Tactics [J]. science and technology and economy, 2014,27 (3): 66-69.
[2] Feng Changgen. National Natural Science Foundation of China's creation & research team [J]. science and technology Leader, 2010,28 (7): 125.
[3] " National Natural Science Foundation of China's creation & research team science fund trial method ", 2001,2.
[4]Fang Huang,Jing Liu,Xinmin Liu,et al.Academic Relation Classification Rules Extraction with Correlation Feature Weight Selection[C] .the 3rd Global Congress on Intelligent Systems(GCIS2012),Nov.6-8,2012:160- 165.
[5]Tian Y,Hankins R A,Patel J M.Efficient aggregation for graph summarization[C].AcmSigmod International Conference on Management of Data.2008:567-580.
[6] Chen Kehan, Han Panpan, Wu Jian. isomery social networks proposed algorithm [J] based on user clustering. computer Report, 2013,38 (2): 349-359.
Summary of the invention
Solved by the invention technical problem is that, for the deficiencies in the prior art, it is provided that a kind of based on two subnetwork figure layers Team's research direction method for digging of secondary cluster, research based on figure clustering method, the research direction of team is excavated, for The development analyzing and evaluating team's research direction provides advantage.
Technical scheme provided by the present invention is:
A kind of team's research direction method for digging based on two subnetwork figure hierarchical clusterings, comprises the following steps:
Step 1: set up author investigation interest representation mode based on author's key word two subnetwork;
Step 2: author investigation interest representation mode is carried out figure cluster:
Author little for degree of concern difference to each key word is attributed to same author group;Obtain author group collection Close;
Step 3: general levels clusters, and obtains the research interest of each author group:
Author's cluster set will only comprise the group of an author, be merged into other author group that research interest is similar In so that the author's number comprised in each author group is more than 2, calculates and update the research interest of each author group, i.e. Team's research direction.
Described step 1 particularly as follows:
The scientific paper collection of author from team, extraction author information and key word information, obtain preprocessed data, Wherein, author's collection is designated as VA={ A1,A2,…,AN, keyword set is designated as VK=K={k1,k2,…,kM, by author AiScience In collection of thesis, lists of keywords and keyword set K compare, therefore, for each author An, this author A of obtainingnResearch Interest representation mode is An={ (k1,wn1),(k2,wn2),…,(kM,wnM)};
Research interest representation mode based on author, constructs the author interests matrix m of N × M, wherein, collects for author In each author, define this author AnResearch interest vector be vn=(wn1,wn2,…,wnM);
Author investigation interest representation mode is expressed as G=G (V, E);
The set that wherein V is formed by author node and key word node, i.e. V={VAUVK, wherein VAGather for author VA={ A1,A2,…,An,…,AN, VKFor keyword set VK=K={k1,k2,…,kj,…,kM, N and M is respectively in team Author sum and team in all authors scientific paper concentrate key word sum;E is author node and key word node Between the set that constituted of company limit, i.e. E={e (An,kj)|An∈VA,kj∈K,wnj>0};If author is AnScientific paper in Lists of keywords comprises certain key word k in keyword setj, then weight wnj> 0, at author AnWith key word kjBetween exist Even limit e (An,kj), otherwise wnj=0, at author AnWith key word kjBetween there is not even limit.
Described step 2, carries out figure cluster to author investigation interest representation mode G=G (V, E);Comprise the following steps:
2.1) author's cluster set Groups={G is initialized0, G0It is one and comprises the contributors group of all authors in team Group;
2.2) forDefinition author group GiTo key word kjConcern collection be:
Wherein, A is author group GiIn author.
2.3) author group G is calculated by formula (2)iTo each key word kj(kj∈ K) concern situation focusij:
focus i j = | Focus k j ( G i ) | | G i | - - - ( 2 )
Wherein,Represent author group GiMiddle concern key word kjAuthor's quantity, | Gi| represent contributors group Group GiIn comprise author sum;If attention rate focusij>=α, Ze Cheng author group Gi" pay close attention to by force " in key word kj, otherwise Claim author group Gi" weak concern " is in key word kj;Wherein α > 0, for paying close attention to intensity threshold;
In an author group, the concern situation of key word is more concentrated by authors, and the degree of polymerization of this group is more High.Ambiguity in definition degree describes the difference degree paid close attention between each author of author's group internal for key word.
2.4) each author group G is calculated by formula (3)iAt each key word kj(kj∈ K) on fuzziness fuzzyij:
fuzzy i j = &delta; k j ( G i ) = | Focus k j ( G i ) | i f focus i j < &alpha; | | G i | - | Focus k j ( G i ) | | i f focus i j &GreaterEqual; &alpha; - - - ( 3 )
In formula (3), as author group Gi" pay close attention to by force " in key word kjTime, fuzzyijEqual to author group GiIn do not have Pay close attention to key word kjAuthor's number;Author group Gi" weak concern " is in key word kjTime, fuzzyijEqual to author group GiMiddle concern Key word kjAuthor's number.
2.5) according to fuzzyijCalculate each author group GiFuzziness fuzzy to keyword set Ki:
fuzzy i = &Sigma; j = 1 | K | fuzzy i j - - - ( 4 )
Wherein | K | is the key word sum in keyword set K, i.e. M;
2.6) overall fuzziness Fuzzy of this Groups is calculated:
F u z z y = &Sigma; G i &Element; G r o u p s fuzzy i - - - ( 5 )
2.7) fuzzy is foundijMaximum, by the key word k of its correspondencejAs locking word kj′
Find fuzzyiMaximum, by the author group G of its correspondenceiAs group G to be dividedi′
To treat that splitting group group is split into two new author group Gi1And Gi2, update author's cluster set Groups;
Gi1={ An∈Gi′||wnj′>0}
Gi2=Gi-Gi1
2.8) repeated execution of steps 2.2)~2.7), until the author group number in author's cluster set Groups is k;
This cluster result is designated as Groups={G1,G2,…,Gk, k is the classification number in cluster result, meets:
(1)And
(2)Gj∈ Groups, and i ≠ j,
2.9) relatively each stage etch 2.6) in overall fuzziness Fuzzy of cluster result Groups that obtains, will Groups corresponding to Fuzzy minima, as final cluster result, is designated as summaryGroups;
It is as follows that the algorithm of described step 2 correspondence performs process:
This algorithm originates in a group comprising all authors, then during each iteration, to original group Group divides, until having obtained k group.This algorithm is not to be randomly chosen a group to divide, but based on right The concern situation of key word selects group to be divided.The definition of contact fuzziness, closes certain for an author group " weak concern " relation of keyword, it is intended that isolate the concern collection to this key word from which;And relative to " paying close attention to by force " Situation, we then wish to isolate those non-interesting collection, and both operations Dou Huishi group has higher pass to this key word Note degree.Therefore, we select group to be divided, and it is carried out splitting operation.Each iterative computation when, to often The cluster result in individual stage preserves.Finally, an optimum cluster result is selected.
The pseudo-code of algorithm is described as follows:
The most openness, through clustering based on author's key word two subnetwork figure due to author's key word two subnetwork data After result, have indivedual author to be individually classified as a class, these discrete author node may just this Academic Teams expand new Research direction.It is therefore desirable to discrete author node is processed, the convenient handle to whole team's academic research direction Hold.
Described step 3 specifically includes following steps:
3.1) the contributors group component in the cluster result summaryGroups that will obtain in step 2 is discrete contributors group Group and discrete author group;Discrete author group refers to only comprise the author group of an author;Discrete author group is made For initial cluster;
3.2) each discrete author group G is calculatediClass research interest vector GMI in keyword set Ki (Group Major Interests) is as the center of each initial cluster;
GMIi=(GWi1,GWi2,…,GWij,…,GWiM) (6)
Wherein, GWij(j=1,2 ..., M) represent GiTo key word kjConcern situation, quantitative description is:
GW i j = &Sigma; A n &Element; G i w n j | G i | - - - ( 7 )
3.3) each author A in discrete author group is traveled throughn, calculate its center with each initial cluster European away from From;Computational methods are:
If author is AnResearch interest vector be vn=(wn1,wn2,…,wnj,…,wnM)
d n i = &Sigma; k = 1 M ( GW i j - w n j ) 2 ;
3.4) A is comparednWith the Euclidean distance at the center of each initial cluster, select the discrete that Euclidean distance minima is corresponding Author group, by AnDistribute to this discrete author group, will only comprise author AnDiscrete author group and this discrete make Person group merges, and forms a new author group;
3.5) iteration carries out above-mentioned steps 3.1)~3.4), until the author group produced no longer changes;
3.6) calculate and update the class research interest vector of each author group.
The pseudo-code of the algorithm of described step 3 correspondence is described as follows:
Beneficial effect:
The present invention is according to Team Member and the scientific paper information thereof excavating certain team: document name, participate in list of authors, Lists of keywords etc., has carried out pretreatment, and has utilized Authors of Science Articles and paper in Academic Teams crucial the data set obtained Word information characterizes and quantifies the research interest of author;Author and key in the topological structure unique for two subnetworks and team Between word two are intrinsic, construct author investigation interest representation mode based on author's key word two subnetwork.Then making Carry out figure cluster on person's key word two subnetwork, excavated the body feature of team, this network is had individual Preliminary study.Finally In the body feature of author's key word two subnetwork, carry out the overall situation of the cluster result Chu Liao Team Member of network general levels Academic research direction, lays a good foundation for carrying out the analysis in team's science direction from now on.
The present invention combines figure digest algorithm, hierarchical clustering algorithm and k-means algorithm, it is proposed that based on author's key word two Subnetwork figure clustering algorithm and network general levels clustering algorithm, poor to the concern of different key words according to each author in team DRS degree, clusters the author in team as k author group;In an author group, author's concern feelings to key word Condition Relatively centralized.The present invention can excavate the academic research direction of team effectively, for analyzing and evaluating team's research direction Development provides advantage.
Accompanying drawing explanation
Fig. 1 is team's research direction method for digging flow process;
Fig. 2 is author's key word two subnetwork;
The result of author's key word two subnetwork figure cluster when Fig. 3 is k=5;
Author's key word two subnetwork figure cluster result when Fig. 4 is k=7;
Fig. 5 is 4 the author groups produced after network general levels clusters;
Fig. 6 is 5 the author groups produced after network general levels clusters.
Detailed description of the invention
Below in conjunction with the drawings and specific embodiments, the present invention is described in more detail.
The invention provides a kind of team's research direction method for digging based on two subnetwork figure hierarchical clusterings, including following Step:
Step 1: set up author investigation interest representation mode G=G (V, E) based on author's key word two subnetwork;
The set that wherein V is formed by author node and key word node, i.e. V={VAUVK, wherein VAGather for author VA={ A1,A2,…,An,…,AN, VKFor keyword set VK=K={k1,k2,…,kj,…,kM, N and M is respectively in team Author sum and team in all authors scientific paper concentrate key word sum;E is author node and key word node Between the set that constituted of company limit, i.e. E={e (An,kj)|An∈VA,kj∈K,wnj>0};If author is AnScientific paper in Lists of keywords comprises certain key word k in keyword setj, then weight wnj> 0, at author AnWith key word kjBetween exist Even limit e (An,kj), otherwise wnj=0, at author AnWith key word kjBetween there is not even limit.
Step 2: author investigation interest representation mode is carried out figure cluster: by little to the degree of concern difference of each key word Author be attributed to same author group;Obtain author's cluster set;
2.1) author's cluster set Groups={G is initialized0, G0It is one and comprises the contributors group of all authors in team Group;
2.2) forDefinition author group GiTo key word kjConcern collection be:
Wherein, A is author group GiIn author.
2.3) author group G is calculated by formula (2)iTo each key word kj(kj∈ K) concern situation focusij:
focus i j = | Focus k j ( G i ) | | G i | - - - ( 2 )
Wherein,Represent author group GiMiddle concern key word kjAuthor's quantity, | Gi| represent contributors group Group GiIn comprise author sum;If attention rate focusij>=α, Ze Cheng author group Gi" pay close attention to by force " in key word kj, otherwise Claim author group Gi" weak concern " is in key word kj;Wherein α > 0, for paying close attention to intensity threshold;
2.4) each author group G is calculated by formula (3)iAt each key word kj(kj∈ K) on fuzziness fuzzyij:
fuzzy i j = &delta; k j ( G i ) = | Focus k j ( G i ) | i f focus i j < &alpha; | | G i | - | Focus k j ( G i ) | | i f focus i j &GreaterEqual; &alpha; - - - ( 3 )
2.5) according to fuzzyijCalculate each author group GiFuzziness fuzzy to keyword set Ki:
fuzzy i = &Sigma; j = 1 | K | fuzzy i j - - - ( 4 )
Wherein | K | is the key word sum in keyword set K, i.e. M;
2.6) overall fuzziness Fuzzy of this Groups is calculated:
F u z z y = &Sigma; G i &Element; G r o u p s fuzzy i - - - ( 5 )
2.7) fuzzy is foundijMaximum, by the key word k of its correspondencejAs locking word kj′
Find fuzzyiMaximum, by the author group G of its correspondenceiAs group G to be dividedi′
To treat that splitting group group is split into two new author group Gi1And Gi2, update author's cluster set Groups;
Gi1={ An∈Gi′|wnj′>0}
Gi2=Gi-Gi1
2.8) repeated execution of steps 2.2)~2.7), until the author group number in author's cluster set Groups is k;
2.9) relatively each stage etch 2.6) in overall fuzziness Fuzzy of cluster result Groups that obtains, will Groups corresponding to Fuzzy minima, as final cluster result, is designated as summaryGroups.
Step 3: general levels clusters, and obtains the research interest of each author group:
Author's cluster set will only comprise the group of an author, be merged into other author group that research interest is similar In so that the author's number comprised in each author group is more than 2, calculates and update the research interest of each author group, i.e. Team's research direction;
3.1) the contributors group component in the cluster result summaryGroups that will obtain in step 2 is discrete contributors group Group and discrete author group;Discrete author group refers to only comprise the author group of an author;Discrete author group is made For initial cluster;
3.2) each discrete author group G is calculatediClass research interest vector GMI in keyword set KiAs The center of each initial cluster;
GMIi=(GWi1,GWi2,…,GWij,…,GWiM) (6)
Wherein, GWij(j=1,2 ..., M) represent GiTo key word kjConcern situation, quantitative description is:
GW i j = &Sigma; A n &Element; G i w n j | G i | - - - ( 7 )
3.3) each author A in discrete author group is traveled throughn, calculate its center with each initial cluster European away from From;Computational methods are:
If author is AnResearch interest vector be vn=(wn1,wn2,…,wnj,…,wnM)
d n i = &Sigma; k = 1 M ( GW i j - w n j ) 2 ;
3.4) A is comparednWith the Euclidean distance at the center of each initial cluster, select the discrete that Euclidean distance minima is corresponding Author group, by AnDistribute to this discrete author group, will only comprise author AnDiscrete author group and this discrete make Person group merges, and forms a new author group;
3.5) iteration carries out above-mentioned steps 3.1)~3.4), until the author group produced no longer changes;
3.6) calculate and update the class research interest vector of each author group, i.e. team's research direction vector.
Main flow of the present invention is as shown in Figure 1:
Fig. 1 is team's research direction method for digging flow process, from team's scientific paper data, builds based on author crucial The author investigation interest representation mode of word two subnetwork;Then author investigation interest representation mode is carried out figure cluster, rolled into a ball The body feature of team's research interest;Finally the body feature excavated is analyzed, carries out the cluster of network general levels, Research direction to this team's overall situation.
Experimental analysis
The source of 1 data
These part data come from certain computer science and technology research team as object of study, by the paper of this team Data set has carried out experimental verification analysis, and the most in visual form and the form of form is opened up by experimental result Show.The author investigation interest representation mode of this part, author's cohort studies interest representation mode all by means of Gephi software to be carried out Signal is shown.
2 author's key word two subnetworks
Form and scientific paper data set by analyzing the team of this team, obtain in the research interest model of this team Comprising 23 Team Members, 547 paper key words, initial author's key word two subnetwork of foundation is as shown in Figure 2.At Fig. 2 Illustrate only the name of author in this team, the research interest worlds of its correspondence are dispersed in around corresponding author, due to joint Point is numerous, does not the most demonstrate keyword attribute.
3 team's research interest body feature
Based on author's key word two subnetwork figure cluster result during 1.k=5
When Fig. 3 describes k=5, the result of author's key word two subnetwork figure cluster: (made containing after cluster by 552 nodes Person group and key word), 714 limits are constituted.It can be seen that each group nodes is not of uniform size, wherein in group 1, number is relatively Many, group 4 contains only a Team Member.To above-mentioned information, it is possible to use form is shown.Such as table 1, list 5 author groups concern situation to Partial key word set.
Table 1 team research interest body feature partial information example shows (during k=5)
Based on author's key word two subnetwork figure cluster result during 2.k=7
When that Fig. 4 shows is k=7, author's key word two subnetwork figure cluster result: 554 nodes, 721 limits.And figure 3 compare, and owing to the number of cluster increases, occur in that more groupuscule, such as group 4, group 5 and group 7 in Fig. 4.Similarly, Table 2 lists the part research field that 7 author groups pay close attention to:
Table 2 team research interest body feature partial information example shows (during k=7)
Original team based on two subnetworks author investigation interest representation mode, by using based on author's key word two After subnetwork figure clustering algorithm excavates, the information of archetype can be simplified, focal point is placed on the main body of this model In structure, contribute to the assurance to team's main direction of studying.
The interpretation of result of 4 network general levels clusters
1. process team's research interest body feature during k=5
First, the method using network general levels cluster, process team's research interest body feature during k=5, Arriving: 551 nodes, 693 limits, as it is shown in figure 5, now, cluster number is 4 to obtained team's overall situation research interest.Can To find out, comparing with Fig. 3, the research interest of Team Member becomes apparent from, more concentrates.
Table 3 lists now 4 author groups concern situation to Partial key word set.Compare with table 1, group 3 and group Group 4 there occurs change to the concern situation of Partial key word (such as: lens distortions, Hanzi component).
Table 3 team overall situation research direction certain embodiments
2. process team's research interest body feature during k=7
Again, interest body feature is studied by team during k=7 and use the cluster of network general levels, due in Fig. 4 In team's body feature information, discrete author node has two, and therefore after global clustering, author's group number is 5.Global clustering Result as shown in Figure 6, wherein comprises 552 nodes, 679 limits.
Table 4 lists now 5 author groups and the part research field of concern thereof, compares with table 2, due to discrete work The classification of person's node, the research interest worlds causing network general levels to cluster the concern of Hou Ge group have occurred that change.
Table 4 team overall situation research direction certain embodiments
The present invention has initially set up author investigation interest representation mode based on two subnetworks.Emerging then in conjunction with author investigation Interest represents the feature of model, introduces figure clustering algorithm based on author's key word two subnetwork, excavates the body feature of network, And then obtain the main direction of studying of team.Finally by the basic thought of k-means algorithm, on the basis of network principal feature On, carry out the excavation of network general levels, obtain the overall situation research interest worlds of team.The present invention can excavate team effectively Academic research direction, provide advantage for analyzing and evaluate the development of team research direction.

Claims (4)

1. team's research direction method for digging based on two subnetwork figure hierarchical clusterings, it is characterised in that include following step Rapid:
Step 1: set up author investigation interest representation mode based on author's key word two subnetwork;
Step 2: author investigation interest representation mode is carried out figure cluster:
Author little for degree of concern difference to each key word is attributed to same author group;Obtain author's cluster set;
Step 3: general levels clusters, and obtains the research interest of each author group:
Author's cluster set will only comprise the group of an author, be merged in other author group that research interest is similar, Make the author's number comprised in each author group more than 2, calculate and update the research interest of each author group, i.e. team Research direction.
Team's research direction method for digging based on two subnetwork figure hierarchical clusterings the most according to claim 1, its feature Being, in described step 1, author investigation interest representation mode is expressed as G=G (V, E);
The set that wherein V is formed by author node and key word node, i.e. V={VAUVK, wherein VAV is gathered for authorA= {A1,A2,…,An,…,AN, VKFor keyword set VK=K={k1,k2,…,kj,…,kM, N and M is respectively in team The key word sum that in author's sum and team, the scientific paper of all authors is concentrated;E be author node and key word node it Between the set that constituted of company limit, i.e. E={e (An,kj)|An∈VA,kj∈K,wnj>0};If author is AnScientific paper in close Keyword list comprises certain key word k in keyword setj, then weight wnj> 0, at author AnWith key word kjBetween exist even Limit e (An,kj), otherwise wnj=0, at author AnWith key word kjBetween there is not even limit.
Team's research direction method for digging based on two subnetwork figure hierarchical clusterings the most according to claim 1, its feature Being, described step 2 comprises the following steps:
2.1) author's cluster set Groups={G is initialized0, G0It it is an author group comprising all authors in team;
2.2) forDefinition author group GiTo key word kjConcern collection be:
Wherein, A is author group GiIn author;
2.3) author group G is calculated by formula (2)iTo each key word kj(kj∈ K) concern situation focusij:
focus i j = | Focus k j ( G i ) | | G i | - - - ( 2 )
Wherein,Represent author group GiMiddle concern key word kjAuthor's quantity, | Gi| represent author group GiIn The author's sum comprised;If attention rate focusij>=α, Ze Cheng author group Gi" pay close attention to by force " in key word kj, otherwise it is referred to as Person group Gi" weak concern " is in key word kj;Wherein α > 0, for paying close attention to intensity threshold;
2.4) each author group G is calculated by formula (3)iAt each key word kj(kj∈ K) on fuzziness fuzzyij:
fuzzy i j = &delta; k j ( G i ) = | Focus k j ( G i ) | i f focus i j < &alpha; | | G i | - | Focus k j ( G i ) | | i f focus i j &GreaterEqual; &alpha; - - - ( 3 )
2.5) according to fuzzyijCalculate each author group GiFuzziness fuzzy to keyword set Ki:
fuzzy i = &Sigma; j = 1 | K | fuzzy i j - - - ( 4 )
Wherein | K | is the key word sum in keyword set K, i.e. M;
2.6) overall fuzziness Fuzzy of this Groups is calculated:
F u z z y = &Sigma; G i &Element; G r o u p s fuzzy i - - - ( 5 )
2.7) fuzzyi is foundjMaximum, by the key word k of its correspondencejAs locking word kj′;
Find fuzzyiMaximum, by the author group G of its correspondenceiAs group G to be dividedi′;
To treat that splitting group group is split into two new author group Gi1And Gi2, update author's cluster set Groups;
Gi1={ An∈Gi′|wnj′>0}
Gi2=Gi-Gi1
2.8) repeated execution of steps 2.2)~2.7), until the author group number in author's cluster set Groups is k;
2.9) relatively each stage etch 2.6) in overall fuzziness Fuzzy of cluster result Groups that obtains, by Fuzzy The Groups of little value correspondence, as final cluster result, is designated as summaryGroups.
Team's research direction method for digging based on two subnetwork figure hierarchical clusterings the most according to claim 3, its feature Being, described step 3 comprises the following steps:
3.1) the contributors group component in the cluster result summaryGroups that will obtain in step 2 be discrete author group and Discrete author group;Discrete author group refers to only comprise the author group of an author;Using discrete author group as just Beginning bunch;
3.2) each discrete author group G is calculatediClass research interest vector GMI in keyword set KiAs each The center of initial cluster;
GMIi=(GWi1,GWi2,…,GWij,…,GWiM) (6)
Wherein, GWij(j=1,2 ..., M) represent GiTo key word kjConcern situation, quantitative description is:
GW i j = &Sigma; A n &Element; G i w n j | G i | - - - ( 7 )
3.3) each author A in discrete author group is traveled throughn, calculate the Euclidean distance of itself and the center of each initial cluster;Calculate Method is:
If author is AnResearch interest vector be vn=(wn1,wn2,…,wnj,…,wnM)
d n i = &Sigma; k = 1 M ( GW i j - w n j ) 2 ;
3.4) A is comparednWith the Euclidean distance at the center of each initial cluster, select the discrete author that Euclidean distance minima is corresponding Group, by AnDistribute to this discrete author group, will only comprise author AnDiscrete author group and this discrete contributors group Combination also, forms a new author group;
3.5) iteration carries out above-mentioned steps 3.1)~3.4), until the author group produced no longer changes;
3.6) calculate and update the class research interest vector of each author group.
CN201610595145.XA 2016-07-25 2016-07-25 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings Expired - Fee Related CN106227835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610595145.XA CN106227835B (en) 2016-07-25 2016-07-25 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610595145.XA CN106227835B (en) 2016-07-25 2016-07-25 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings

Publications (2)

Publication Number Publication Date
CN106227835A true CN106227835A (en) 2016-12-14
CN106227835B CN106227835B (en) 2018-01-19

Family

ID=57533613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610595145.XA Expired - Fee Related CN106227835B (en) 2016-07-25 2016-07-25 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings

Country Status (1)

Country Link
CN (1) CN106227835B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256231A (en) * 2017-05-04 2017-10-17 腾讯科技(深圳)有限公司 A kind of Team Member's identification equipment, method and system
CN108491409A (en) * 2018-01-29 2018-09-04 浙江工业大学 A kind of city medical system clustering method based on hospital's related network structure feature
CN109376236A (en) * 2018-07-27 2019-02-22 中山大学 A kind of academic paper author's weight analysis method based on clustering
WO2019079971A1 (en) * 2017-10-24 2019-05-02 深圳市云中飞网络科技有限公司 Method for group communication, and apparatus, computer storage medium, and computer device
CN109741791A (en) * 2018-12-29 2019-05-10 人和未来生物科技(长沙)有限公司 A kind of author's subject bearing data method for digging and system towards PubMed paper library
CN109829634A (en) * 2019-01-18 2019-05-31 北京工业大学 A kind of adaptive patent Research Team, colleges and universities recognition methods
CN110941662A (en) * 2019-06-24 2020-03-31 上海市研发公共服务平台管理中心 Graphical method, system, storage medium and terminal for scientific research cooperative relationship

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254028A (en) * 2011-07-22 2011-11-23 青岛理工大学 Personalized commodity recommending method and system which integrate attributes and structural similarity
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN103559262A (en) * 2013-11-04 2014-02-05 北京邮电大学 Community-based author and academic paper recommending system and recommending method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254028A (en) * 2011-07-22 2011-11-23 青岛理工大学 Personalized commodity recommending method and system which integrate attributes and structural similarity
CN102609546A (en) * 2011-12-08 2012-07-25 清华大学 Method and system for excavating information of academic journal paper authors
CN103020302A (en) * 2012-12-31 2013-04-03 中国科学院自动化研究所 Academic core author excavation and related information extraction method and system based on complex network
CN103559262A (en) * 2013-11-04 2014-02-05 北京邮电大学 Community-based author and academic paper recommending system and recommending method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘非凡 等: "基于2-模网络和G-N 社群聚类算法的潜在合作者研究", 《情报理论与实践》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256231A (en) * 2017-05-04 2017-10-17 腾讯科技(深圳)有限公司 A kind of Team Member's identification equipment, method and system
CN107256231B (en) * 2017-05-04 2022-04-22 腾讯科技(深圳)有限公司 Team member identification device, method and system
WO2019079971A1 (en) * 2017-10-24 2019-05-02 深圳市云中飞网络科技有限公司 Method for group communication, and apparatus, computer storage medium, and computer device
CN108491409A (en) * 2018-01-29 2018-09-04 浙江工业大学 A kind of city medical system clustering method based on hospital's related network structure feature
CN108491409B (en) * 2018-01-29 2022-06-17 浙江工业大学 Urban medical system clustering method based on hospital associated network structural features
CN109376236A (en) * 2018-07-27 2019-02-22 中山大学 A kind of academic paper author's weight analysis method based on clustering
CN109376236B (en) * 2018-07-27 2021-10-26 中山大学 Academic paper author weight analysis method based on cluster analysis
CN109741791A (en) * 2018-12-29 2019-05-10 人和未来生物科技(长沙)有限公司 A kind of author's subject bearing data method for digging and system towards PubMed paper library
CN109829634A (en) * 2019-01-18 2019-05-31 北京工业大学 A kind of adaptive patent Research Team, colleges and universities recognition methods
CN109829634B (en) * 2019-01-18 2021-02-26 北京工业大学 Self-adaptive college patent and scientific research team identification method
CN110941662A (en) * 2019-06-24 2020-03-31 上海市研发公共服务平台管理中心 Graphical method, system, storage medium and terminal for scientific research cooperative relationship

Also Published As

Publication number Publication date
CN106227835B (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN106227835A (en) Team&#39;s research direction method for digging based on two subnetwork figure hierarchical clusterings
Qu et al. Efficient topological OLAP on information networks
Narvekar et al. An optimized algorithm for association rule mining using FP tree
Salam et al. Mining top− k frequent patterns without minimum support threshold
Srinivas et al. Clustering and classification of software component for efficient component retrieval and building component reuse libraries
Archambeau et al. Latent IBP compound Dirichlet allocation
Malo et al. Automated query learning with Wikipedia and genetic programming
Zhang et al. Multi-label truth inference for crowdsourcing using mixture models
Loh et al. Faster hoeffding racing: Bernstein races via jackknife estimates
Radhakrishna et al. GANDIVA: Temporal pattern tree for similarity profiled association mining
Kumar et al. Fake news detection of Indian and United States election data using machine learning algorithm
Bei et al. Summarizing scale-free networks based on virtual and real links
Ge et al. LPX: Overlapping community detection based on X‐means and label propagation algorithm in attributed networks
Cai et al. HMSG: Heterogeneous graph neural network based on metapath subgraph learning
Flamino et al. Robust and scalable entity alignment in big data
Yu et al. Overlapping community detection based on random walk and seeds extension
Lewis How transdisciplinary is design? An analysis using citation networks
Le Bras et al. Mining classification rules without support: an anti-monotone property of Jaccard measure
Olawumi et al. Scientometric review and analysis: A case example of smart buildings and smart cities
Liakos et al. Uncovering local hierarchical overlapping communities at scale
Bouhatem et al. Density-based Approach with Dual Optimization for Tracking Community Structure of Increasing Social Networks
Khanam et al. Application of network analysis for finding relatedness among legal documents by using case citation data
Xenopoulos et al. Gale: Globally assessing local explanations
Banerjee et al. Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context
Wang et al. An effective semi-supervised clustering framework integrating pairwise constraints and attribute preferences

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180119

Termination date: 20210725

CF01 Termination of patent right due to non-payment of annual fee