CN111078859B - Author recommendation method based on reference times - Google Patents
Author recommendation method based on reference times Download PDFInfo
- Publication number
- CN111078859B CN111078859B CN201911154792.7A CN201911154792A CN111078859B CN 111078859 B CN111078859 B CN 111078859B CN 201911154792 A CN201911154792 A CN 201911154792A CN 111078859 B CN111078859 B CN 111078859B
- Authority
- CN
- China
- Prior art keywords
- author
- document
- citation
- authors
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The invention discloses an author recommending method based on citation times, which comprises the following steps: firstly, selecting a literature population range in a literature database; secondly, constructing a citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors; then, clustering and grouping the authors based on the author citation network; and finally, recommending the document authors to the user according to the gold citation times and the author clustering grouping ordering. By defining the gold citation times of document authors, the self-introduction interference of the document authors is eliminated, the influence of low-quality and low-efficiency citations is weakened, and meanwhile, author clustering division is carried out based on citation relations among the authors, so that a user can quickly and accurately lock experts in a specific research field.
Description
Technical Field
The invention belongs to the technical field of document retrieval, and particularly relates to an author recommendation method based on citation times.
Background
The experts meeting specific technical requirements are searched, and the corresponding experts are usually searched by utilizing a social relationship network or according to author information of scientific and technological achievements. The social relationship of the requiring party which is excessively depended by the experts is searched through a social relationship network, and the limitation is very large; the author information searching expert based on the scientific and technological achievements needs to consume a large amount of manpower and time to conduct scientific and technological achievements and achievement author investigation, and is low in efficiency and huge in workload. The two ways of manually searching for experts are too subjective, and have the problems of lack of accuracy, fairness and the like. The expert technology is recommended intelligently, and the limitation of searching experts by traditional manpower is broken.
Chinese patent application No. 201410680306.6 describes an expert recommendation method and system based on group matching: the system acquires webpage information of each expert in an expert list through a web crawler, extracts the webpage information to acquire expert academic information of each expert, calculates matching degree between each expert and a project to be matched, and finally determines the expert recommended for the project to be matched according to the matching degree and a group matching model. However, the method uses keywords in the scientific research field as the reference of the matching degree, and the result bias phenomenon is inevitably generated when the cross disciplines or emerging disciplines are encountered.
Chinese patent application No. 201811228086.8 discloses a collaborative recommendation method based on expert domain similarity and association relationship. Taking batch thesis data as a training set, constructing a cooperative relationship network, calculating the shortest path between authors by using a Dijkstra algorithm to serve as an expert correlation COR, constructing an expert word vector model by using a word2vec algorithm to calculate the cosine similarity between a correlation expert word vector and a field word vector to serve as the expert field similarity, and screening experts with the expert field similarity SIM and the expert correlation COR meeting a threshold value to serve as recommendation experts. The expert association degree provided by the method is calculated according to the cooperative relationship between the experts, and the recommended experts are closely associated with the given experts according to the cooperative relationship. However, the cooperative relationship is influenced by subjective factors, similar cooperation in non-research fields interferes with recommendation results, and the cooperative relationship between authors cannot reflect implicit relevance between inheritance of knowledge and research topics.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an author recommendation method based on citation times.
In order to solve the technical problems, the invention adopts the technical scheme that:
the invention provides an author recommending method based on citation times, which comprises the following steps: firstly, selecting a literature population range in a literature database; secondly, constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors; then, clustering and grouping the authors based on the author citation network; and finally, recommending the document authors to the user according to the gold citation times and the author clustering grouping ordering.
In the above scheme, the literature includes scientific journals, patents, meeting papers, research reports and academic papers.
The further scheme of the invention is as follows: the document cites the network model as: g ═ V, E, (V, E) is a directed network consisting of | V | ═ N document nodes and | E | ═ M edges; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.。
The further scheme of the invention is as follows: the method for clustering and grouping authors based on the author reference network comprises the steps of carrying out community division on the author reference network, wherein the divided author community is regarded as a relatively independent research field, and the method comprises the following steps:
s11, quoting the author to a network Gauth.As initial network, and set as current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, and if the network modularity is not increased, withdrawing the mobile node, moving a new node with lower contribution degree different from the withdrawn mobile node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, continuously dividing each community divided in the step S17 as an individual network, and executing the steps S12-S18 to each individual network in a recursive mode until no larger modularity is generated in the initial network to obtain a network community division result, namely that the network community division result is obtained
The further scheme of the invention is as follows: the contribution degree lambda of each node to modularity degreeiCalculated according to the following formula:
wherein, κr(i)Representing nodes v belonging to the community riAnd the sum of the edge values representing the reference relationship with other nodes in the community,is a node viThe number of the applied primers of (a),is a node viIs introduced number of ar(i)Representing the proportion of the edge values of the nodes in the community r whether the node is introduced or introduced.
The further scheme of the invention is as follows: the modularity Q is calculated according to the following equation:
wherein m is the total sum of the total edge values representing the citation relation in the literature network.
The further scheme of the invention is as follows: the step of recommending document authors to users according to gold citation times and author clustering grouping ordering comprises the following steps: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user; preferably, the combining includes setting a first threshold, taking author nodes in an author group with golden citation times larger than the first threshold, sorting in a descending order according to the golden citation times, and then taking an intersection of an author clustering division result and the sorting result in the descending order to form an author list recommended to a user, wherein the first threshold is not larger than the maximum golden citation times; preferably, the combining includes arranging all author nodes in the author group in a descending order according to the golden citation times, then setting a second threshold for the author community of each research field in the author cluster partitioning result, and recommending an author list composed of the author nodes of each author community, of which the golden citation times are between the second threshold and the maximum golden citation times, to the user, where the second threshold is not greater than the maximum golden citation times.
In the above scheme, the main role of the document author cluster division is to divide a plurality of research fields included in a technical topic, and the division result is a community of a plurality of document authors corresponding to each sub-technical field under the technical topic. However, since some technical subjects include numerous and complicated research fields, it is impossible to completely realize that document authors in each research field happen to fall into the gold citation author group, and therefore, it is necessary to introduce a threshold into the gold citation author group and/or the clustered author group, and form an intersection of the two in a certain threshold range, so as to obtain an author list recommended to a user.
The further scheme of the invention is as follows: the golden citation times of the statistical literature comprises the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
and S23, calculating the gold citation times of the literature.
The further scheme of the invention is as follows: the number of times of citation in the step S21 is represented by the document viThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formulaThe number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable eijAnd self-induction coefficient lambdaijProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G isIf document viIs documented by vjQuote, then eijEqual to 1; if document viIs not disclosed in document vjQuote, then eijEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if document viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1.
The further scheme of the invention is as follows: in the step S22, the number of times of the reference is determinedGenerating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.The author is a vertex, and the author quote relationship is an edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;representing authors in a citation networkWith the authorIf the author refers toIs authoredAnd if cited in a document, is denoted as 1,is the authorWith the authorSum of number of directed edges, i.e. authorBy the authorsThe total number of times of reference of (1) is recorded as n; if the author isBy no author vjIf the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1.
According to the inventionThe further scheme is as follows: the step S23 is to calculate the author nodeNumber of golden citationsThe method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein containA node, i.e. containingIn the literature, there is a need for a solution,bar reference relationship, author nodeIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the author group Gauth.kThe number of references in the relation isScaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
The further scheme of the invention is as follows: said "downscaling G by recursive extractionauth.kExample (A) ofEnclose until Gauth.k +1The specific method of the node number of 0 "is as follows: extraction of Gauth.kInThe author nodes of (1) to form an author groupThe rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author node viIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the New crop group Gautth.(k+1)The reference number in (1) is related toGroup G of newly-grown workersautth.(k+1)Therein containA node, i.e. containingThe number of the authors is such that,edges, wherein k is an integer and is not less than 0; author population extracted at layer kThe author nodes contained in the same group G have the same golden reference times kauthMiddle author nodeBy the nodeGroup of authorsDeciding, i.e. author nodeNumber of golden citations
The further scheme of the invention is as follows: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user; preferably, a threshold value K is setminIn the author group Gauth.Get allThe nodes are sorted in descending order according to the number of gold references, i.e.Recommending an author list to a user according to an author clustering resultWherein, K ismax≥KminNot less than 0; preferably, the group of authors Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that isSetting a threshold value for each author community according to the author clustering resultRecommending each community to usersAuthor node ofGroup, i.e.Wherein the content of the first and second substances,
in the scheme, a combination mode of sorting authors according to gold citation times and results of clustering division of research fields by document authors is specifically described, wherein K isminAndthe value of (A) can be generated and set by a system or can be set manually.
After adopting the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. the author recommendation method provided by the invention reflects the internal relation of the research subject of the author, and the method is based on the calculation method of the research content association between authors, can recommend field experts more conveniently, intelligently and accurately, and is also suitable for expert recommendation of cross subjects or emerging subjects;
2. according to the author recommending method provided by the invention, the gold citation times are adopted to replace the author citation times, so that the interference of self-citation in author evaluation is eliminated, the low-quality and low-efficiency influence of other citations is weakened, the author sequencing is more reasonable, and the expert recommended according to the method is more authoritative;
3. the author recommending method provided by the invention divides the author group according to the reference relationship network among the authors, accords with the objective law of scientific and technical development, and has few artificial interference factors.
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the right. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow chart of an author recommendation method provided in the present invention;
FIG. 2 is a flow chart of an author recommendation method provided by the present invention;
FIG. 3 is a schematic flow chart of counting the number of gold references in the author recommendation method of the present invention;
FIG. 4 is a flow chart illustrating clustering and grouping of authors based on an author reference network in the present invention.
It should be noted that the drawings and the description are not intended to limit the scope of the inventive concept in any way, but to illustrate it by a person skilled in the art with reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the following embodiments are used for illustrating the present invention and are not intended to limit the scope of the present invention.
As shown in FIGS. 1 to 4, the invention provides an author recommending method based on citation times, and provides an author recommending method based on citation times, which quickly and accurately recommends domain experts for users by calculating gold citation times of authors and dividing the author research domain by using a community algorithm.
Examples
As shown in fig. 1, the present embodiment specifically includes the following four steps a to D:
A. selecting a literature population range in a literature database; the literature includes scientific journals, patents, meeting papers, research reports and academic papers;
B. constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors;
in this embodiment, as shown in fig. 3, step B includes constructing a document citation network, where the document citation network model is: g ═ V, E, (V, E) is a directed network consisting of | V | ═ N document nodes and | E | ═ M edges; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.。
In this embodiment, as shown in fig. 2, the counting of the number of times of gold citation of the literature in step B includes the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
s23, calculating the gold citation times of the author.
In this embodiment, the number of times of referrals in the document v in the step S21 isiThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formulaThe number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable eijAnd self-induction coefficient lambdaijProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G isIf document viIs documented by vjQuote, then eijEqual to 1; if document viIs not documentedvjQuote, then eijEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if document viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1.
In this embodiment, in the step S22, the number of times of the reference is determinedGenerating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.The author is a vertex, and the author quote relationship is an edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;representing authors in a citation networkWith the authorIf the author refers toIs authoredAnd if cited in a document, is denoted as 1,is the authorWith the authorSum of number of directed edges, i.e. authorBy the authorsThe total number of times of reference of (1) is recorded as n; if the author isHas not been authoredIf the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1.
In this embodiment, the step S23 is to calculate the author nodeNumber of golden citationsThe method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein containA node, i.e. containingThe number of the authors is such that,bar reference relationship, author nodeIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the author group Gauth.kThe number of references in the relation isScaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
In this embodiment, the step of "reducing G by recursive extractionauth.kIn the range up to Gauth.k+1The specific method of the node number of 0 "is as follows: extraction of Gauth.kInThe author nodes of (1) to form an author groupThe rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author node viIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the New crop group Gautth.(k+1)The reference number in (1) is related toGroup G of newly-grown workersautth.(k+1)Therein containA node, i.e. containingThe number of the authors is such that,edges, wherein k is an integer and is not less than 0; author population extracted at layer kThe author nodes contained in the same group G have the same golden reference times kauthMiddle author nodeBy the nodeGroup of authorsDeciding, i.e. author nodeNumber of golden citations
C. Clustering authors based on the author citation network;
in this embodiment, as shown in fig. 4, the "clustering and grouping authors based on the author reference network" in step C includes performing community division on the author reference network, where the divided author community is regarded as a relatively independent research field, and the steps are as follows:
s11, citing the author to a networkGauth.As initial network, and set as current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, and if the network modularity is not increased, withdrawing the mobile node, moving a new node with lower contribution degree different from the withdrawn mobile node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, continuously dividing each community divided in the step S17 as an individual network, and executing the steps S12-S18 to each individual network in a recursive mode until no larger modularity is generated in the initial network to obtain a network community division result, namely that the network community division result is obtained
In this embodiment, the contribution λ of each node to the modularity degree in steps S13, S15, and S17iCalculated according to the following formula:
wherein, κr(i)Representing nodes v belonging to the community riAnd the sum of the edge values representing the reference relationship with other nodes in the community,is a node viThe number of the applied primers of (a),is a node viIs introduced number of ar(i)Representing the proportion of the edge values of the nodes in the community r whether the node is introduced or introduced.
In the present embodiment, the modularity Q in steps S13, S15, and S17 is calculated according to the following equation:
wherein m is the sum of the edge values representing the reference relationship in the literature network.
D. And recommending the document authors to the user according to the research field divided by the author clusters and the gold citation times.
In the embodiment, the step D comprises the steps of combining the clustering division results of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user;
in one embodiment of this embodiment, a threshold K is setminIn the author group Gauth.Get allThe nodes are sorted in descending order according to the number of gold references, i.e.Recommending an author list to a user according to an author clustering resultWherein, K ismax≥Kmin≥0;
In another embodiment of this embodimentMiddle, author group Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that isSetting a threshold value for each author community according to the author clustering resultRecommending each community to usersThe author node of (2) is composed of an author group, i.e.Wherein the content of the first and second substances,
in this embodiment, the literature groups in step a may be defined according to the subject, the field, the subject, and/or the year, may be defined according to a set search policy, or may be all the literatures in the citation database.
In this embodiment, documents with keywords including "automatic driving" are taken as an example, 14,260 documents in a document group are selected, and 156,398 documents are selected; establishing a document citation network model, mapping to generate an author citation network, and counting the cited times of the author and the golden citation timesAnd then carrying out author clustering analysis to divide the author into 14 author groups, and setting a threshold value for each author group to provide recommended document authors for more accurate and objective manner due to more research fields after the technical subject clustering divisionIn a manner that recommends to the user in each communityThe author group formed by the author nodes finally obtains 2,122 authors to recommend to the user according to the gold citation times and the author group.
In this example, the document whose keyword contains "ultrafine fiber" is taken as an example, 1,239 documents in the document group are selected at first, and 6,572 documents are selected; establishing a document citation network model, mapping to generate an author citation network, and counting the cited times of the author and the golden citation timesAnd then performing author clustering analysis to divide the author into 9 author groups, wherein the research field of the divided technical subject clustering is less, and a threshold value K is set for the author groups cited in gold to provide recommended documents more accurately and objectivelyminIn the method, the intersection of the author clustering division result and the gold citation author descending order arrangement result is taken to form an author list recommended to the user, and 162 experts are obtained and recommended to the user.
In this embodiment, the user manually checks the documents in order to obtain high-quality documents.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (9)
1. An author recommendation method based on citation times is characterized by comprising the following steps:
firstly, selecting a literature population range in a literature database;
secondly, constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors;
then, clustering and grouping the authors based on the author citation network;
finally, recommending document authors to the user according to the gold citation times and author clustering grouping ordering;
the step of recommending document authors to users according to gold citation times and author clustering grouping ordering comprises the following steps: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user;
the method for clustering and grouping authors based on the author reference network comprises the steps of carrying out community division on the author reference network, wherein the divided author community is regarded as a relatively independent research field, and the method comprises the following steps:
s11, quoting the author to a network Gauth.As initial network, and set as current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, if the network modularity is not increased, withdrawing the moved node, moving a new node with lower contribution degree different from the withdrawn moved node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, continuously dividing each community divided in the step S17 as an individual network, and executing the steps S12-S18 to each individual network in a recursive mode until no larger modularity is generated in the initial network to obtain a network community division result, namely that the network community division result is obtained
The contribution degree lambda of each node to modularity degreeiCalculated according to the following formula:
wherein, κr(i)Representing nodes v belonging to the community riAnd the sum of the edge values representing the reference relationship with other nodes in the community,is a node viThe number of the applied primers of (a),is a node viIs introduced number of ar(i)Representing the proportion of the edge value of the introduced or introduced node in the community r;
the modularity Q is calculated according to the following equation:
where m is the sum of the edge values representing the reference relationships in the author network.
2. The author recommendation method based on citation times as claimed in claim 1, wherein said citation network model is: g ═ V, E, (V, E) is composed of | V | ═ N document nodes andi E i; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.。
3. The author recommendation method based on citation times as claimed in claim 1, wherein the "combining the results of clustering division of document authors according to research field and ranking of document authors according to gold citation times" includes setting a first threshold, taking author nodes in author population whose gold citation times are greater than the first threshold, ranking the author nodes in descending order according to gold citation times, and then taking the intersection of the author clustering division results and the ranking results to form an author list recommended to the user, wherein the first threshold is not greater than the maximum gold citation times.
4. The author recommendation method based on citation times as claimed in claim 1, wherein the "combining the results of clustering and partitioning the document authors according to research areas and ordering the document authors according to the golden citation times" includes arranging all author nodes in an author group in a descending order according to the golden citation times, then setting a second threshold value for the author group of each research area in the results of clustering and partitioning the authors, and recommending to the user an author list consisting of author nodes with the golden citation times between the second threshold value and the maximum golden citation times in each author group, wherein the second threshold value is not greater than the maximum golden citation times.
5. The author recommendation method based on citation times as claimed in claim 1, wherein the golden citation times of said statistical document comprises the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
and S23, calculating the gold citation times of the literature.
6. The author recommendation method based on citation times as claimed in claim 5, wherein said citation times in step S21 is listed by document viThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formulaThe number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable ei,jAnd self-induction coefficient lambdai,jProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G isIf document viIs documented by vjQuote, then eijEqual to 1; if document viIs not disclosed in document vjQuote, then ei,jEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if document viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1;
in the step S22, the number of times of the reference is determinedGenerating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.To doThe vertex is the author reference relationship is the edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;representing authors in a citation networkWith the authorIf the author refers toIs authoredAnd if cited in a document, is denoted as 1,is the authorWith the authorSum of number of directed edges, i.e. authorBy the authorsThe total number of times of reference of (1) is marked as x; if the author isHas not been authoredIf the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1;
the step S23 is to calculate the author nodeNumber of golden citationsThe method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein containA node, i.e. containingIn the literature, there is a need for a solution,bar reference relationship, author nodeIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the author group Gauth.kThe number of references in the relation isScaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
7. The author recommendation method based on citation times as claimed in claim 6, wherein said "employing recursive extraction to narrow Gauth.kIn the range up to Gauth.k+1The specific method of the node number of 0 "is as follows: extraction of Gauth.kInThe author nodes of (1) to form an author groupThe rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author nodeIn the author group Gauth.kThe reference variable in (1) isAuthor nodeIn the New crop group Gautth.(k+1)The reference number in (1) is related toGroup G of newly-grown workersautth.(k+1)Therein containA node, i.e. containingThe number of the authors is such that,edges, wherein k is an integer and is not less than 0; author population extracted at layer kThe author nodes contained in the same group G have the same golden reference times kauthMiddle author nodeBy the nodeGroup of authorsDeciding, i.e. author nodeNumber of golden citationsGroup G of newly-grown workersauth.(k+1)When empty, i.e. Gauth.(k+1)Taking K as the maximum value KmaxAnd completing the calculation of golden citation of the author.
8. The author recommendation method based on citation times as claimed in claim 3, wherein said "clustering the results of document authors by research field and ranking the document authors by golden citation timesIn combination with the order, recommending a document author to the user "includes: setting a threshold KminIn the author group GauthGet all ofThe nodes are sorted in descending order according to the number of gold references, i.e.Recommending an author list to a user according to an author clustering resultWherein, K ismax≥Kmin≥0。
9. The author recommendation method based on citation times as claimed in claim 4, wherein said "combining the results of document author clustering division according to research field and ranking the document authors according to golden citation times" recommending document authors to users "comprises: author group Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that isSetting a threshold value for each author community according to the author clustering resultRecommending each community to usersThe author node of (2) is composed of an author group, i.e.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911154792.7A CN111078859B (en) | 2019-11-22 | 2019-11-22 | Author recommendation method based on reference times |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911154792.7A CN111078859B (en) | 2019-11-22 | 2019-11-22 | Author recommendation method based on reference times |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111078859A CN111078859A (en) | 2020-04-28 |
CN111078859B true CN111078859B (en) | 2021-02-09 |
Family
ID=70311262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911154792.7A Active CN111078859B (en) | 2019-11-22 | 2019-11-22 | Author recommendation method based on reference times |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111078859B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343069A (en) * | 2021-06-10 | 2021-09-03 | 北京字节跳动网络技术有限公司 | User information processing method, device, medium and electronic equipment |
CN113704412B (en) * | 2021-08-31 | 2023-05-02 | 交通运输部科学研究院 | Early identification method for revolutionary research literature in transportation field |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6633878B1 (en) * | 1999-07-30 | 2003-10-14 | Accenture Llp | Initializing an ecommerce database framework |
CN105718528A (en) * | 2016-01-15 | 2016-06-29 | 上海交通大学 | Academic map display method based on reference relationship among thesises |
CN107832412A (en) * | 2017-11-06 | 2018-03-23 | 浙江工业大学 | A kind of publication clustering method based on reference citation relation |
CN108717425A (en) * | 2018-04-26 | 2018-10-30 | 国家电网公司 | A kind of knowledge mapping people entities alignment schemes based on multi-data source |
CN108763328A (en) * | 2018-05-08 | 2018-11-06 | 北京市科学技术情报研究所 | A kind of paper recommendation method for quoting algorithm based on gold |
CN109002524A (en) * | 2018-07-13 | 2018-12-14 | 北京市科学技术情报研究所 | A kind of gold reference author's sort method based on paper adduction relationship |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8548996B2 (en) * | 2007-06-29 | 2013-10-01 | Pulsepoint, Inc. | Ranking content items related to an event |
CN102222153A (en) * | 2010-01-27 | 2011-10-19 | 洪文学 | Quantitative dialectical diagnostic method for Chinese medicine machine interrogation |
US20120288843A1 (en) * | 2011-05-13 | 2012-11-15 | G2 Collective, Llc | Interactive learning system and method |
US10491454B2 (en) * | 2016-06-03 | 2019-11-26 | Vmware, Inc. | Methods and systems to diagnose anomalies in cloud infrastructures |
-
2019
- 2019-11-22 CN CN201911154792.7A patent/CN111078859B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6633878B1 (en) * | 1999-07-30 | 2003-10-14 | Accenture Llp | Initializing an ecommerce database framework |
CN105718528A (en) * | 2016-01-15 | 2016-06-29 | 上海交通大学 | Academic map display method based on reference relationship among thesises |
CN107832412A (en) * | 2017-11-06 | 2018-03-23 | 浙江工业大学 | A kind of publication clustering method based on reference citation relation |
CN108717425A (en) * | 2018-04-26 | 2018-10-30 | 国家电网公司 | A kind of knowledge mapping people entities alignment schemes based on multi-data source |
CN108763328A (en) * | 2018-05-08 | 2018-11-06 | 北京市科学技术情报研究所 | A kind of paper recommendation method for quoting algorithm based on gold |
CN109002524A (en) * | 2018-07-13 | 2018-12-14 | 北京市科学技术情报研究所 | A kind of gold reference author's sort method based on paper adduction relationship |
Also Published As
Publication number | Publication date |
---|---|
CN111078859A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111737495B (en) | Middle-high-end talent intelligent recommendation system and method based on domain self-classification | |
CN109492157B (en) | News recommendation method and theme characterization method based on RNN and attention mechanism | |
Gulin et al. | Winning the transfer learning track of yahoo!’s learning to rank challenge with yetirank | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN105045875B (en) | Personalized search and device | |
Chen et al. | Combining factorization model and additive forest for collaborative followee recommendation | |
CN110543564B (en) | Domain label acquisition method based on topic model | |
CN104834686A (en) | Video recommendation method based on hybrid semantic matrix | |
Sharara et al. | Active surveying: A probabilistic approach for identifying key opinion leaders | |
CN107577782B (en) | Figure similarity depicting method based on heterogeneous data | |
CN111414461A (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN110737805B (en) | Method and device for processing graph model data and terminal equipment | |
CN105069080A (en) | Document retrieval method and system | |
WO2020135642A1 (en) | Model training method and apparatus employing generative adversarial network | |
CN111078859B (en) | Author recommendation method based on reference times | |
CN111078873A (en) | Domain expert selection method based on citation network and scientific research cooperation network | |
CN113806630A (en) | Attention-based multi-view feature fusion cross-domain recommendation method and device | |
Ng et al. | CrsRecs: a personalized course recommendation system for college students | |
CN106886565A (en) | A kind of basic house type auto-polymerization method | |
CN117333037A (en) | Industrial brain construction method and device for publishing big data | |
Liu et al. | Identifying experts in community question answering website based on graph convolutional neural network | |
CN108681977A (en) | A kind of lawyer's information processing method and system | |
Wu et al. | Collaborative filtering recommendation based on conditional probability and weight adjusting | |
CN110990662B (en) | Domain expert selection method based on citation network and scientific research cooperation network | |
CN109344232A (en) | A kind of public feelings information search method and terminal device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |