CN111078859A - Author recommendation method based on reference times - Google Patents

Author recommendation method based on reference times Download PDF

Info

Publication number
CN111078859A
CN111078859A CN201911154792.7A CN201911154792A CN111078859A CN 111078859 A CN111078859 A CN 111078859A CN 201911154792 A CN201911154792 A CN 201911154792A CN 111078859 A CN111078859 A CN 111078859A
Authority
CN
China
Prior art keywords
author
citation
document
network
auth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911154792.7A
Other languages
Chinese (zh)
Other versions
CN111078859B (en
Inventor
吴晨生
李�荣
刘静
张炜
张惠娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute Of Science And Technology Information
Original Assignee
Beijing Institute Of Science And Technology Information
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute Of Science And Technology Information filed Critical Beijing Institute Of Science And Technology Information
Priority to CN201911154792.7A priority Critical patent/CN111078859B/en
Publication of CN111078859A publication Critical patent/CN111078859A/en
Application granted granted Critical
Publication of CN111078859B publication Critical patent/CN111078859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses an author recommending method based on citation times, which comprises the following steps: firstly, selecting a literature population range in a literature database; secondly, constructing a citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors; then, clustering and grouping the authors based on the author citation network; and finally, recommending the document authors to the user according to the gold citation times and the author clustering grouping ordering. By defining the gold citation times of document authors, the self-introduction interference of the document authors is eliminated, the influence of low-quality and low-efficiency citations is weakened, and meanwhile, author clustering division is carried out based on citation relations among the authors, so that a user can quickly and accurately lock experts in a specific research field.

Description

Author recommendation method based on reference times
Technical Field
The invention belongs to the technical field of document retrieval, and particularly relates to an author recommendation method based on citation times.
Background
The experts meeting specific technical requirements are searched, and the corresponding experts are usually searched by utilizing a social relationship network or according to author information of scientific and technological achievements. The social relationship of the requiring party which is excessively depended by the experts is searched through a social relationship network, and the limitation is very large; the author information searching expert based on the scientific and technological achievements needs to consume a large amount of manpower and time to conduct scientific and technological achievements and achievement author investigation, and is low in efficiency and huge in workload. The two ways of manually searching for experts are too subjective, and have the problems of lack of accuracy, fairness and the like. The expert technology is recommended intelligently, and the limitation of searching experts by traditional manpower is broken.
Chinese patent application No. 201410680306.6 describes an expert recommendation method and system based on group matching: the system acquires webpage information of each expert in an expert list through a web crawler, extracts the webpage information to acquire expert academic information of each expert, calculates matching degree between each expert and a project to be matched, and finally determines the expert recommended for the project to be matched according to the matching degree and a group matching model. However, the method uses keywords in the scientific research field as the reference of the matching degree, and the result bias phenomenon is inevitably generated when the cross disciplines or emerging disciplines are encountered.
Chinese patent application No. 201811228086.8 discloses a collaborative recommendation method based on expert domain similarity and association relationship. Taking batch thesis data as a training set, constructing a cooperative relationship network, calculating the shortest path between authors by using a Dijkstra algorithm to serve as an expert correlation COR, constructing an expert word vector model by using a word2vec algorithm to calculate the cosine similarity between a correlation expert word vector and a field word vector to serve as the expert field similarity, and screening experts with the expert field similarity SIM and the expert correlation COR meeting a threshold value to serve as recommendation experts. The expert association degree provided by the method is calculated according to the cooperative relationship between the experts, and the recommended experts are closely associated with the given experts according to the cooperative relationship. However, the cooperative relationship is influenced by subjective factors, similar cooperation in non-research fields interferes with recommendation results, and the cooperative relationship between authors cannot reflect implicit relevance between inheritance of knowledge and research topics.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an author recommendation method based on citation times.
In order to solve the technical problems, the invention adopts the technical scheme that:
the invention provides an author recommending method based on citation times, which comprises the following steps: firstly, selecting a literature population range in a literature database; secondly, constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors; then, clustering and grouping the authors based on the author citation network; and finally, recommending the document authors to the user according to the gold citation times and the author clustering grouping ordering.
In the above scheme, the literature includes scientific journals, patents, meeting papers, research reports and academic papers.
The further scheme of the invention is as follows: the document cites the network model as: g ═ V, E, (V, E) is a directed network consisting of | V | ═ N document nodes and | E | ═ M edges; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.
The further scheme of the invention is as follows: the method for clustering and grouping authors based on the author reference network comprises the steps of carrying out community division on the author reference network, wherein the divided author community is regarded as a relatively independent research field, and the method comprises the following steps:
s11, quoting the author to a network Gauth.As an initial network, the network is,and set as the current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, and if the network modularity is not increased, withdrawing the mobile node, moving a new node with lower contribution degree different from the withdrawn mobile node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, continuously dividing each community divided in the step S17 as an individual network, and executing the steps S12-S18 to each individual network in a recursive mode until no larger modularity is generated in the initial network to obtain a network community division result, namely that the network community division result is obtained
Figure BDA0002284520040000031
The further scheme of the invention is as follows: the contribution degree lambda of each node to modularity degreeiCalculated according to the following formula:
Figure BDA0002284520040000032
wherein, κr(i)Representing nodes v belonging to the community riAnd the edge value sum representing the reference relation with other nodes in the communityAnd the combination of (a) and (b),
Figure BDA0002284520040000033
is a node viThe number of the applied primers of (a),
Figure BDA0002284520040000034
is a node viIs introduced number of ar(i)Representing the proportion of the edge values of the nodes in the community r whether the node is introduced or introduced.
The further scheme of the invention is as follows: the modularity Q is calculated according to the following equation:
Figure BDA0002284520040000035
wherein m is the total sum of the total edge values representing the citation relation in the literature network.
The further scheme of the invention is as follows: the step of recommending document authors to users according to gold citation times and author clustering grouping ordering comprises the following steps: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user; preferably, the combining includes setting a first threshold, taking author nodes in an author group with golden citation times larger than the first threshold, sorting in a descending order according to the golden citation times, and then taking an intersection of an author clustering division result and the sorting result in the descending order to form an author list recommended to a user, wherein the first threshold is not larger than the maximum golden citation times; preferably, the combining includes arranging all author nodes in the author group in a descending order according to the golden citation times, then setting a second threshold for the author community of each research field in the author cluster partitioning result, and recommending an author list composed of the author nodes of each author community, of which the golden citation times are between the second threshold and the maximum golden citation times, to the user, where the second threshold is not greater than the maximum golden citation times.
In the above scheme, the main role of the document author cluster division is to divide a plurality of research fields included in a technical topic, and the division result is a community of a plurality of document authors corresponding to each sub-technical field under the technical topic. However, since some technical subjects include numerous and complicated research fields, it is impossible to completely realize that document authors in each research field happen to fall into the gold citation author group, and therefore, it is necessary to introduce a threshold into the gold citation author group and/or the clustered author group, and form an intersection of the two in a certain threshold range, so as to obtain an author list recommended to a user.
The further scheme of the invention is as follows: the golden citation times of the statistical literature comprises the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
and S23, calculating the gold citation times of the literature.
The further scheme of the invention is as follows: the number of times of citation in the step S21 is represented by the document viThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formula
Figure BDA0002284520040000041
The number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable eijAnd self-induction coefficient lambdaijProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G is
Figure BDA0002284520040000042
If document viIs documented by vjQuote, then eijEqual to 1; if document viIs not disclosed in document vjQuote, then eijEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if contextDonation viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1.
The further scheme of the invention is as follows: in the step S22, the number of times of the reference is determined
Figure BDA0002284520040000043
Generating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.The author is a vertex, and the author quote relationship is an edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;
Figure BDA0002284520040000051
representing authors in a citation network
Figure BDA0002284520040000052
With the author
Figure BDA0002284520040000053
If the author refers to
Figure BDA0002284520040000054
Is authored
Figure BDA0002284520040000055
And if cited in a document, is denoted as 1,
Figure BDA0002284520040000056
is the author
Figure BDA0002284520040000057
With the author
Figure BDA0002284520040000058
Sum of number of directed edges, i.e. author
Figure BDA0002284520040000059
By the authors
Figure BDA00022845200400000510
The total number of times of reference of (1) is recorded as n; if the author is
Figure BDA00022845200400000511
By no author vjIf the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1.
The further scheme of the invention is as follows: the step S23 is to calculate the author node
Figure BDA00022845200400000512
Number of golden citations
Figure BDA00022845200400000513
The method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein contain
Figure BDA00022845200400000514
A node, i.e. containing
Figure BDA00022845200400000515
In the literature, there is a need for a solution,
Figure BDA00022845200400000516
bar reference relationship, author node
Figure BDA00022845200400000517
In the author group Gauth.kThe reference variable in (1) is
Figure BDA00022845200400000518
Author node
Figure BDA00022845200400000519
In the author group Gauth.kThe number of references in the relation is
Figure BDA00022845200400000520
Scaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
The further scheme of the invention is as follows: said "downscaling G by recursive extractionauth.kIn the range up to Gauth.k +1The specific method of the node number of 0 "is as follows: extraction of Gauth.kIn
Figure BDA00022845200400000521
The author nodes of (1) to form an author group
Figure BDA00022845200400000522
The rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author node viIn the author group Gauth.kThe reference variable in (1) is
Figure BDA00022845200400000523
Author node
Figure BDA00022845200400000524
In the New crop group Gautth.(k+1)The reference number in (1) is related to
Figure BDA00022845200400000525
Group G of newly-grown workersautth.(k+1)Therein contain
Figure BDA00022845200400000526
A node, i.e. containing
Figure BDA00022845200400000527
The number of the authors is such that,
Figure BDA00022845200400000528
edges, wherein k is an integer and is not less than 0; author population extracted at layer k
Figure BDA00022845200400000529
The author nodes contained in the same group G have the same golden reference times kauthMiddle author node
Figure BDA00022845200400000530
By the node
Figure BDA00022845200400000531
Group of authors
Figure BDA00022845200400000532
Deciding, i.e. author node
Figure BDA00022845200400000533
Number of golden citations
Figure BDA00022845200400000534
The further scheme of the invention is as follows: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user; preferably, a threshold value K is setminIn the author group Gauth.Get all
Figure BDA00022845200400000535
The nodes are sorted in descending order according to the number of gold references, i.e.
Figure BDA00022845200400000536
Recommending an author list to a user according to an author clustering result
Figure BDA00022845200400000537
Wherein, K ismax≥KminNot less than 0; preferably, the group of authors Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that is
Figure BDA0002284520040000061
Setting a threshold value for each author community according to the author clustering result
Figure BDA0002284520040000062
Recommending each community to users
Figure BDA0002284520040000063
The author node of (2) is composed of an author group, i.e.
Figure BDA0002284520040000064
Wherein the content of the first and second substances,
Figure BDA0002284520040000065
in the scheme, a combination mode of sorting authors according to gold citation times and results of clustering division of research fields by document authors is specifically described, wherein K isminAnd
Figure BDA0002284520040000066
the value of (A) can be generated and set by a system or can be set manually.
After adopting the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. the author recommendation method provided by the invention reflects the internal relation of the research subject of the author, and the method is based on the calculation method of the research content association between authors, can recommend field experts more conveniently, intelligently and accurately, and is also suitable for expert recommendation of cross subjects or emerging subjects;
2. according to the author recommending method provided by the invention, the gold citation times are adopted to replace the author citation times, so that the interference of self-citation in author evaluation is eliminated, the low-quality and low-efficiency influence of other citations is weakened, the author sequencing is more reasonable, and the expert recommended according to the method is more authoritative;
3. the author recommending method provided by the invention divides the author group according to the reference relationship network among the authors, accords with the objective law of scientific and technical development, and has few artificial interference factors.
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention without limiting the invention to the right. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow chart of an author recommendation method provided in the present invention;
FIG. 2 is a flow chart of an author recommendation method provided by the present invention;
FIG. 3 is a schematic flow chart of counting the number of gold references in the author recommendation method of the present invention;
FIG. 4 is a flow chart illustrating clustering and grouping of authors based on an author reference network in the present invention.
It should be noted that the drawings and the description are not intended to limit the scope of the inventive concept in any way, but to illustrate it by a person skilled in the art with reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the following embodiments are used for illustrating the present invention and are not intended to limit the scope of the present invention.
As shown in FIGS. 1 to 4, the invention provides an author recommending method based on citation times, and provides an author recommending method based on citation times, which quickly and accurately recommends domain experts for users by calculating gold citation times of authors and dividing the author research domain by using a community algorithm.
Examples
As shown in fig. 1, the present embodiment specifically includes the following four steps a to D:
A. selecting a literature population range in a literature database; the literature includes scientific journals, patents, meeting papers, research reports and academic papers;
B. constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors;
in this embodiment, as shown in fig. 3, step B includes constructing a document citation network, where the document citation network model is: g ═ V, E, (V, E) is a directed network consisting of | V | ═ N document nodes and | E | ═ M edges; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.
In this embodiment, as shown in fig. 2, the counting of the number of times of gold citation of the literature in step B includes the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
s23, calculating the gold citation times of the author.
In this embodiment, the number of times of referrals in the document v in the step S21 isiThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formula
Figure BDA0002284520040000081
The number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable eijAnd self-induction coefficient lambdaijProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G is
Figure BDA0002284520040000082
If document viIs documented by vjQuote, then eijEqual to 1; if document viIs not disclosed in document vjQuote, then eijEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if document viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1.
In this embodiment, in the step S22, the number of times of the reference is determined
Figure BDA0002284520040000083
Generating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.The author is a vertex, and the author quote relationship is an edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;
Figure BDA0002284520040000084
representing authors in a citation network
Figure BDA0002284520040000085
With the author
Figure BDA0002284520040000086
If the author refers to
Figure BDA0002284520040000087
Is authored
Figure BDA0002284520040000088
And if cited in a document, is denoted as 1,
Figure BDA0002284520040000089
is the author
Figure BDA00022845200400000810
With the author
Figure BDA00022845200400000811
Sum of number of directed edges, i.e. author
Figure BDA00022845200400000812
By the authors
Figure BDA00022845200400000813
The total number of times of reference of (1) is recorded as n; if the author is
Figure BDA00022845200400000814
Has not been authored
Figure BDA00022845200400000815
If the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1.
In this embodiment, the step S23 is to calculate the author node
Figure BDA00022845200400000816
Number of golden citations
Figure BDA00022845200400000817
The method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein contain
Figure BDA00022845200400000818
A node, i.e. containing
Figure BDA00022845200400000819
The number of the authors is such that,
Figure BDA00022845200400000820
bar reference relationship, author node
Figure BDA00022845200400000821
In the author group Gauth.kThe reference variable in (1) is
Figure BDA00022845200400000822
Author node
Figure BDA00022845200400000823
In the author group Gauth.kThe number of references in the relation is
Figure BDA00022845200400000824
Scaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
In this embodiment, the step of "reducing G by recursive extractionauth.kIn the range up to Gauth.k+1The specific method of the node number of 0 "is as follows: extraction of Gauth.kIn
Figure BDA0002284520040000091
The author nodes of (1) to form an author group
Figure BDA0002284520040000092
The rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author node viIn the author group Gauth.kThe reference variable in (1) is
Figure BDA0002284520040000093
Author node
Figure BDA0002284520040000094
In the New crop group Gautth.(k+1)The reference number in (1) is related to
Figure BDA0002284520040000095
Group G of newly-grown workersautth.(k+1)Therein contain
Figure BDA0002284520040000096
A node, i.e. containing
Figure BDA0002284520040000097
The number of the authors is such that,
Figure BDA0002284520040000098
edges, wherein k is an integer and is not less than 0; author population extracted at layer k
Figure BDA0002284520040000099
The author nodes contained in the same group G have the same golden reference times kauthMiddle author node
Figure BDA00022845200400000910
By the node
Figure BDA00022845200400000911
Group of authors
Figure BDA00022845200400000912
Deciding, i.e. author node
Figure BDA00022845200400000913
Number of golden citations
Figure BDA00022845200400000914
C. Clustering authors based on the author citation network;
in this embodiment, as shown in fig. 4, the "clustering and grouping authors based on the author reference network" in step C includes performing community division on the author reference network, where the divided author community is regarded as a relatively independent research field, and the steps are as follows:
s11, quoting the author to a network Gauth.As initial network, and set as current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, and if the network modularity is not increased, withdrawing the mobile node, moving a new node with lower contribution degree different from the withdrawn mobile node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, will stepEach community divided in the step S17 is continuously divided as an individual network, and the steps S12 to S18 are performed in a recursive manner for each individual network until no more modularity is generated in the initial network, and a network community division result, that is, a network community division result is obtained
Figure BDA0002284520040000101
In this embodiment, the contribution λ of each node to the modularity degree in steps S13, S15, and S17iCalculated according to the following formula:
Figure BDA0002284520040000102
wherein, κr(i)Representing nodes v belonging to the community riAnd the sum of the edge values representing the reference relationship with other nodes in the community,
Figure BDA0002284520040000103
is a node viThe number of the applied primers of (a),
Figure BDA0002284520040000104
is a node viIs introduced number of ar(i)Representing the proportion of the edge values of the nodes in the community r whether the node is introduced or introduced.
In the present embodiment, the modularity Q in steps S13, S15, and S17 is calculated according to the following equation:
Figure BDA0002284520040000105
wherein m is the sum of the edge values representing the reference relationship in the literature network.
D. And recommending the document authors to the user according to the research field divided by the author clusters and the gold citation times.
In the embodiment, the step D comprises the steps of combining the clustering division results of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user;
in one embodiment of this embodiment, a threshold K is setminIn the author group Gauth.Get all
Figure BDA0002284520040000106
The nodes are sorted in descending order according to the number of gold references, i.e.
Figure BDA0002284520040000107
Recommending an author list to a user according to an author clustering result
Figure BDA0002284520040000108
Wherein, K ismax≥Kmin≥0;
In another embodiment of this example, the group of authors, Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that is
Figure BDA0002284520040000109
Setting a threshold value for each author community according to the author clustering result
Figure BDA00022845200400001010
Recommending each community to users
Figure BDA00022845200400001011
The author node of (2) is composed of an author group, i.e.
Figure BDA00022845200400001012
Wherein the content of the first and second substances,
Figure BDA0002284520040000111
in this embodiment, the literature groups in step a may be defined according to the subject, the field, the subject, and/or the year, may be defined according to a set search policy, or may be all the literatures in the citation database.
In this embodiment, a document whose keyword includes "automatic driving" is taken as an example, and 14,260 documents are first selected from a document group156,398 authors; establishing a document citation network model, mapping to generate an author citation network, and counting the cited times of the author and the golden citation times
Figure BDA0002284520040000112
And then carrying out author clustering analysis to divide the author into 14 author groups, and setting a threshold value for each author group to provide recommended document authors for more accurate and objective manner due to more research fields after the technical subject clustering division
Figure BDA0002284520040000113
In a manner that recommends to the user in each community
Figure BDA0002284520040000114
The author group formed by the author nodes finally obtains 2,122 authors to recommend to the user according to the gold citation times and the author group.
In this example, the document whose keyword contains "ultrafine fiber" is taken as an example, 1,239 documents in the document group are selected at first, and 6,572 documents are selected; establishing a document citation network model, mapping to generate an author citation network, and counting the cited times of the author and the golden citation times
Figure BDA0002284520040000115
And then performing author clustering analysis to divide the author into 9 author groups, wherein the research field of the divided technical subject clustering is less, and a threshold value K is set for the author groups cited in gold to provide recommended documents more accurately and objectivelyminIn the method, the intersection of the author clustering division result and the gold citation author descending order arrangement result is taken to form an author list recommended to the user, and 162 experts are obtained and recommended to the user.
In this embodiment, the user manually checks the documents in order to obtain high-quality documents.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. An author recommendation method based on citation times is characterized by comprising the following steps:
firstly, selecting a literature population range in a literature database;
secondly, constructing a document citation network model according to mutual citation relations among documents in the selected document group, mapping to generate an author citation network, and counting gold citation times of document authors;
then, clustering and grouping the authors based on the author citation network;
and finally, recommending the document authors to the user according to the gold citation times and the author clustering grouping ordering.
2. The author recommendation method based on citation times as claimed in claim 1, wherein said citation network model is: g ═ V, E, (V, E) is a directed network consisting of | V | ═ N document nodes and | E | ═ M edges; wherein G represents a set of documents and citation relationships among the documents in the document population, V represents a set of documents in the document population G, and E represents a citation relationship among the documents in the document population G; calculating the number of times of other references of G in the document citing network model, generating a document citing network model G 'excluding the other references as (V, E'), and generating an author citing network G 'of the document citing network G' through linear mappingauth.
3. The author recommendation method based on citation times as claimed in claim 2, wherein said "clustering authors based on said author citation network" includes performing community division on author citation network, and the divided author community is regarded as a relatively independent research field, and the steps are as follows:
s11, quoting the author to a network Gauth.As initial network, and set as current network;
s12, randomly dividing the nodes in the current network into two communities, and then executing a step S13;
s13, calculating the contribution degree of each node to the modularity degree, calculating the network modularity degree according to the contribution degree, and then executing the step S14;
s14, moving the nodes with lower contribution degree from one community to another community, and then executing the step S15;
s15, recalculating the contribution degree of each node to the modularity and the network modularity, and then executing the step S16;
s16, judging whether the network modularity is increased or not, simultaneously judging whether the network modularity reaches the maximum value or not, if the network modularity is increased, namely the maximum value is not reached, keeping the moving result of the node and returning to the step S14, and if the network modularity is not increased, withdrawing the mobile node, moving a new node with lower contribution degree different from the withdrawn mobile node from one community to another community, and returning to the step S15; if the modularity reaches the maximum value, executing step S17;
s17, recording and storing the network modularity and community structure of the initial network at the moment, and then executing a step S18;
s18, continuously dividing each community divided in the step S17 as an individual network, and executing the steps S12-S18 to each individual network in a recursive mode until no larger modularity is generated in the initial network to obtain a network community division result, namely that the network community division result is obtained
Figure FDA0002284520030000021
4. The author recommendation method based on citation times as claimed in claim 3, wherein each node contributes to modularity degree λiCalculated according to the following formula:
Figure FDA0002284520030000022
wherein, κr(i)Representing nodes v belonging to the community riAnd the sum of the edge values representing the reference relationship with other nodes in the community,
Figure FDA0002284520030000023
is a node viThe number of the applied primers of (a),
Figure FDA0002284520030000024
is a node viIs introduced number of ar(i)Representing the proportion of the edge values of the nodes in the community r whether the node is introduced or introduced.
5. The author recommendation method based on citation times as claimed in claim 4, wherein said modularity Q is calculated according to the following formula:
Figure FDA0002284520030000025
where m is the sum of the edge values representing the reference relationships in the author network.
6. The author recommendation method based on citation times as claimed in claim 1, wherein said "recommending document authors to users according to gold citation times and author clustering grouping ordering" comprises: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user;
preferably, the combining includes setting a first threshold, taking author nodes in an author group with golden citation times larger than the first threshold, sorting in a descending order according to the golden citation times, and then taking an intersection of an author clustering division result and the sorting result in the descending order to form an author list recommended to a user, wherein the first threshold is not larger than the maximum golden citation times;
preferably, the combining includes arranging all author nodes in the author group in a descending order according to the golden citation times, then setting a second threshold for the author community of each research field in the author cluster partitioning result, and recommending an author list composed of the author nodes of each author community, of which the golden citation times are between the second threshold and the maximum golden citation times, to the user, where the second threshold is not greater than the maximum golden citation times.
7. The author recommendation method based on citation times as claimed in claim 1 or 2, wherein the golden citation times of said statistical document comprises the following steps:
s21, constructing a document citation network model, counting the citation times and the other citation times of the document, and then executing a step S22;
s22, generating a self-introduced document other-introduced network model according to the document other-introduced times mapping, generating an author cited network model through mapping, and then executing the step S23;
and S23, calculating the gold citation times of the literature.
8. The author recommendation method based on citation times as claimed in claim 7, wherein said citation times in step S21 is listed by document viThe number of citations in the literature group G is determined by reference to the variable eijThe sum is obtained, namely the number of times of the i-th document in the document group G is calculated by the formula
Figure FDA0002284520030000031
The number of times of citation of the step S21 is that of the document viThe number of his citations in the literature group G is given by reference to the variable ei,jAnd self-induction coefficient lambdai,jProduct of (e)ij·λij) The sum is obtained, namely the formula of the number of references of the ith document in the document population G is
Figure FDA0002284520030000032
If contextDonation viIs documented by vjQuote, then eijEqual to 1; if document viIs not disclosed in document vjQuote, then ei,jEqual to 0; if document viAnd document vjAt least one of the same authors, the citation being self-citation, then λijEqual to 0; if document viAnd document vjWithout the same author, this reference is his citation, then λijEqual to 1; wherein i is more than or equal to 1, and j is more than or equal to 1;
in the step S22, the number of times of the reference is determined
Figure FDA0002284520030000033
Generating a self-citation-excluded literature citation network model G ' ═ (V, E '), and generating an author citation network G by linear mapping after the literature citation network G ' is subjected to the operation ofauth.The author is a vertex, and the author quote relationship is an edge; author group Gauth.=(Vauth.,Eauth.) Is composed of | Vauth.|=Nauth.A sum of nodes | Eauth.|=Mauth.A directed network formed by edges; wherein G isauth.Representing authors in a group of authors and a set of reference relationships between authors, Vauth.Representing the group of authors Gauth.Author set in (1), Eauth.Representing the group of authors Gauth.Reference relations among the middle authors;
Figure FDA0002284520030000034
representing authors in a citation network
Figure FDA0002284520030000035
With the author
Figure FDA0002284520030000036
If the author refers to
Figure FDA0002284520030000037
Is authored
Figure FDA0002284520030000038
And if cited in a document, is denoted as 1,
Figure FDA0002284520030000039
is the author
Figure FDA00022845200300000310
With the author
Figure FDA00022845200300000311
Sum of number of directed edges, i.e. author
Figure FDA00022845200300000312
By the authors
Figure FDA00022845200300000313
The total number of times of reference of (1) is marked as x; if the author is
Figure FDA00022845200300000314
Has not been authored
Figure FDA00022845200300000315
If the reference is made, marking as 0; wherein i is more than or equal to 1, and j is more than or equal to 1;
the step S23 is to calculate the author node
Figure FDA00022845200300000316
Number of golden citations
Figure FDA00022845200300000317
The method specifically comprises the following steps: introducing a set value k, wherein the k is sequentially valued from small to large, extracting authors layer by layer in a recursive mode, and forming a new author group by the authors extracted in each layer, wherein k is an integer and is more than or equal to 0; layer k author population is Gauth.k,Gauth.kTherein contain
Figure FDA0002284520030000041
A node, i.e. containing
Figure FDA0002284520030000042
In the literature, there is a need for a solution,
Figure FDA0002284520030000043
bar reference relationship, author node
Figure FDA0002284520030000044
In the author group Gauth.kThe reference variable in (1) is
Figure FDA0002284520030000045
Author node
Figure FDA0002284520030000046
In the author group Gauth.kThe number of references in the relation is
Figure FDA0002284520030000047
Scaling down G by recursive extractionauth.kIn the range up to Gauth.k+1The number of nodes contained in the author group extracted at the k-th layer is 0, and the number of gold references of the author nodes contained in the author group extracted at the k-th layer is k.
9. The author recommendation method for citation times of documents according to claim 7, wherein said "recursively extracting to narrow Gauth.kIn the range up to Gauth.k+1The specific method of the node number of 0 "is as follows: extraction of Gauth.kIn
Figure FDA0002284520030000048
The author nodes of (1) to form an author group
Figure FDA0002284520030000049
The rest author nodes and their reference relations become the initial author group G of the k +1 th layerautth.(k+1)Author node
Figure FDA00022845200300000410
In the author group Gauth.kThe reference variable in (1) is
Figure FDA00022845200300000411
Author node
Figure FDA00022845200300000412
In the New crop group Gautth.(k+1)The reference number in (1) is related to
Figure FDA00022845200300000413
Group G of newly-grown workersautth.(k+1)Therein contain
Figure FDA00022845200300000414
A node, i.e. containing
Figure FDA00022845200300000415
The number of the authors is such that,
Figure FDA00022845200300000416
edges, wherein k is an integer and is not less than 0; author population extracted at layer k
Figure FDA00022845200300000417
The author nodes contained in the same group G have the same golden reference times kauthMiddle author node
Figure FDA00022845200300000418
By the node
Figure FDA00022845200300000419
Group of authors
Figure FDA00022845200300000420
Deciding, i.e. author node
Figure FDA00022845200300000421
Number of golden citations
Figure FDA00022845200300000422
Group G of newly-grown workersauth.(k+1)When empty, i.e. Gauth.(k+1)Taking K as the maximum value KmaxAnd completing the calculation of golden citation of the author.
10. The author recommendation method based on the number of citations according to any one of claims 6-9, characterized in that the recommendation method comprises: combining the results of clustering division of the document authors according to the research field with the ranking of the document authors according to the golden citation times, and recommending the document authors to the user;
preferably, a threshold value K is setminIn the author group Gauth.Get all
Figure FDA00022845200300000423
The nodes are sorted in descending order according to the number of gold references, i.e.
Figure FDA00022845200300000424
Recommending an author list to a user according to an author clustering result
Figure FDA00022845200300000425
Figure FDA00022845200300000426
Wherein, Kmax≥Kmin≥0;
Preferably, the group of authors Gauth.All nodes in the tree are sorted in descending order according to the number of golden references, that is
Figure FDA00022845200300000427
Setting a threshold value for each author community according to the author clustering result
Figure FDA00022845200300000428
Recommending each community to users
Figure FDA0002284520030000051
The author node of (2) is composed of an author group, i.e.
Figure FDA0002284520030000052
Wherein the content of the first and second substances,
Figure FDA0002284520030000053
CN201911154792.7A 2019-11-22 2019-11-22 Author recommendation method based on reference times Active CN111078859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911154792.7A CN111078859B (en) 2019-11-22 2019-11-22 Author recommendation method based on reference times

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911154792.7A CN111078859B (en) 2019-11-22 2019-11-22 Author recommendation method based on reference times

Publications (2)

Publication Number Publication Date
CN111078859A true CN111078859A (en) 2020-04-28
CN111078859B CN111078859B (en) 2021-02-09

Family

ID=70311262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911154792.7A Active CN111078859B (en) 2019-11-22 2019-11-22 Author recommendation method based on reference times

Country Status (1)

Country Link
CN (1) CN111078859B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343069A (en) * 2021-06-10 2021-09-03 北京字节跳动网络技术有限公司 User information processing method, device, medium and electronic equipment
CN113704412A (en) * 2021-08-31 2021-11-26 交通运输部科学研究院 Early identification method for revolutionary research literature in traffic transportation field

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633878B1 (en) * 1999-07-30 2003-10-14 Accenture Llp Initializing an ecommerce database framework
US20090049041A1 (en) * 2007-06-29 2009-02-19 Allvoices, Inc. Ranking content items related to an event
CN102222153A (en) * 2010-01-27 2011-10-19 洪文学 Quantitative dialectical diagnostic method for Chinese medicine machine interrogation
US20130325665A1 (en) * 2011-05-13 2013-12-05 Alexandra V. Shaffer System, method and device having teaching and commerce subsystems
CN105718528A (en) * 2016-01-15 2016-06-29 上海交通大学 Academic map display method based on reference relationship among thesises
US20170353345A1 (en) * 2016-06-03 2017-12-07 Vmware, Inc. Methods and systems to diagnose anomalies in cloud infrastructures
CN107832412A (en) * 2017-11-06 2018-03-23 浙江工业大学 A kind of publication clustering method based on reference citation relation
CN108717425A (en) * 2018-04-26 2018-10-30 国家电网公司 A kind of knowledge mapping people entities alignment schemes based on multi-data source
CN108763328A (en) * 2018-05-08 2018-11-06 北京市科学技术情报研究所 A kind of paper recommendation method for quoting algorithm based on gold
CN109002524A (en) * 2018-07-13 2018-12-14 北京市科学技术情报研究所 A kind of gold reference author's sort method based on paper adduction relationship

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633878B1 (en) * 1999-07-30 2003-10-14 Accenture Llp Initializing an ecommerce database framework
US20090049041A1 (en) * 2007-06-29 2009-02-19 Allvoices, Inc. Ranking content items related to an event
CN102222153A (en) * 2010-01-27 2011-10-19 洪文学 Quantitative dialectical diagnostic method for Chinese medicine machine interrogation
US20130325665A1 (en) * 2011-05-13 2013-12-05 Alexandra V. Shaffer System, method and device having teaching and commerce subsystems
CN105718528A (en) * 2016-01-15 2016-06-29 上海交通大学 Academic map display method based on reference relationship among thesises
US20170353345A1 (en) * 2016-06-03 2017-12-07 Vmware, Inc. Methods and systems to diagnose anomalies in cloud infrastructures
CN107832412A (en) * 2017-11-06 2018-03-23 浙江工业大学 A kind of publication clustering method based on reference citation relation
CN108717425A (en) * 2018-04-26 2018-10-30 国家电网公司 A kind of knowledge mapping people entities alignment schemes based on multi-data source
CN108763328A (en) * 2018-05-08 2018-11-06 北京市科学技术情报研究所 A kind of paper recommendation method for quoting algorithm based on gold
CN109002524A (en) * 2018-07-13 2018-12-14 北京市科学技术情报研究所 A kind of gold reference author's sort method based on paper adduction relationship

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343069A (en) * 2021-06-10 2021-09-03 北京字节跳动网络技术有限公司 User information processing method, device, medium and electronic equipment
CN113704412A (en) * 2021-08-31 2021-11-26 交通运输部科学研究院 Early identification method for revolutionary research literature in traffic transportation field

Also Published As

Publication number Publication date
CN111078859B (en) 2021-02-09

Similar Documents

Publication Publication Date Title
CN111737495B (en) Middle-high-end talent intelligent recommendation system and method based on domain self-classification
CN109271477B (en) Method and system for constructing classified corpus by means of Internet
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN110543564B (en) Domain label acquisition method based on topic model
CN108595425A (en) Based on theme and semantic dialogue language material keyword abstraction method
CN104834686A (en) Video recommendation method based on hybrid semantic matrix
CN108846056A (en) A kind of scientific and technological achievement evaluation expert recommended method and device
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
CN105069080B (en) A kind of document retrieval method and system
Sharara et al. Active surveying: A probabilistic approach for identifying key opinion leaders
US10387805B2 (en) System and method for ranking news feeds
CN111414461A (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN107577782B (en) Figure similarity depicting method based on heterogeneous data
CN110737805B (en) Method and device for processing graph model data and terminal equipment
CN111523055A (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN111078859B (en) Author recommendation method based on reference times
CN113806630A (en) Attention-based multi-view feature fusion cross-domain recommendation method and device
CN106886565A (en) A kind of basic house type auto-polymerization method
CN108984711A (en) A kind of personalized APP recommended method based on layering insertion
CN117333037A (en) Industrial brain construction method and device for publishing big data
Liu et al. Identifying experts in community question answering website based on graph convolutional neural network
CN110990662B (en) Domain expert selection method based on citation network and scientific research cooperation network
CN109344232A (en) A kind of public feelings information search method and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant