CN111949771A - Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning - Google Patents

Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning Download PDF

Info

Publication number
CN111949771A
CN111949771A CN202010864916.7A CN202010864916A CN111949771A CN 111949771 A CN111949771 A CN 111949771A CN 202010864916 A CN202010864916 A CN 202010864916A CN 111949771 A CN111949771 A CN 111949771A
Authority
CN
China
Prior art keywords
academic
ranking
paper
learning
hypergraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010864916.7A
Other languages
Chinese (zh)
Inventor
欧俊杰
贾雨葶
傅洛伊
王新兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010864916.7A priority Critical patent/CN111949771A/en
Publication of CN111949771A publication Critical patent/CN111949771A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for dynamically sequencing future influence of academic documents based on a mutual reinforcement framework and sequencing learning, wherein the method comprises the following steps: step A: extracting meta-information of the thesis based on the academic entities and the relationships among the entities; and B: introducing a isomorphic directed hypergraph and a heterogeneous two-part hypergraph which are defined based on hypergraph extension, and constructing a heterogeneous academic hypergraph network; and C: based on a mutual reinforcement ranking frame HSHMRR, scoring is realized for academic entities of different types on a heterogeneous academic super network; step D: on the basis of a mutual reinforcement ranking frame, a sorting learning method MART is combined, potential dynamic characteristics are learned from a historical time period, learned knowledge is applied to a target time period, and an evaluation result is formed; the present invention employs a general and efficient method that is capable of adaptively learning the underlying dynamic properties of different academic document data sets and applying the learned knowledge to ranking.

Description

Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning
Technical Field
The invention relates to the technical field of mutual reinforcement frames and sequencing learning, in particular to a method and a system for dynamically sequencing future influence of academic documents based on the mutual reinforcement frames and the sequencing learning.
Background
With the rapid growth in the number of academic documents published each year, many times the ability to influence future ratings of academic documents becomes critical to decision making. For example, when faced with a large number of recently published relevant academic documents, researchers often feel overwhelmed with knowledge of which papers should be concerned. At this point, understanding the future impact ranking of all academic literature will help in decision making, as researchers tend to focus on those papers that affect K-bits before (K is determined by their own time and effort) so that they can follow the research front and find a promising direction of research. Another example is the evaluation of academic novels. When a government or business wishes to subsidize new stars from different academic research areas, the ranking of future impact on academic publications in a particular area will play a key role in the evaluation of academic new stars in that area.
Much effort has been made in the area of academic ranking. The traditional method mainly utilizes indexes hooked with the quoted times to measure the influence of papers, researchers and periodicals. However, these metrics do not take into account the structural information of the academic literature network and are therefore generally considered inaccurate. Recently, inspired by the successful application of PageRank and HITS algorithms in web page ranking, many graph-based approaches have been proposed to rank the importance of papers, for example. However, these methods do not fully utilize the existing meta-information related to the paper, but only model the academic literature network with a traditional simple network (homogeneous or heterogeneous), resulting in limited performance of these methods in evaluating academic influence. In fact, as a large number of new documents are published each year and previous documents are cited, the entire academic document network is constantly evolving. However, both the traditional measurement method and the graph-based method consider the academic literature network as static, and ignore the relationship between the academic literature network and the graph-based method, so that the accurate prediction capability of the future influence is lacked. There are currently also a few studies attempting to rank the future impact of academic literature, such as FutureRank and mrf bank algorithms. However, these ranking algorithms only have a simple network structure and rely heavily on heuristically designed time-aware weights and time scores, which can only roughly simulate the dynamic properties of academic literature networks, and these two points limit their performances.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a system for dynamically sequencing future influence of academic documents based on a mutual reinforcement framework and sequencing learning.
The invention provides a future influence dynamic ranking method of academic documents based on a mutual reinforcement framework and ranking learning, which comprises the following steps: step A: extracting meta-information of the thesis based on the academic entities and the relationships among the entities; and B: introducing a isomorphic directed hypergraph and a heterogeneous two-part hypergraph which are defined based on hypergraph extension, and constructing a heterogeneous academic hypergraph network; and C: based on a mutual reinforcement ranking frame HSHMRR, scoring is realized for academic entities of different types on a heterogeneous academic super network;
step D: on the basis of a mutual reinforcement ranking frame, a sorting learning method MART is combined, potential dynamic characteristics are learned from a historical time period, learned knowledge is applied to a target time period, and an evaluation result is formed; step E: and acquiring future influence dynamic sequencing result information of academic literature based on a mutual reinforcement framework and sequencing learning.
The meta-information of the papers includes four types of academic entities, namely papers, authors, periodicals, and institutions. The entity relations are seven relations among four types of academic entities, namely citation relations among papers, authors, periodicals and institutions, author relations among papers and authors, publishing relations among papers and periodicals and attribution relations among papers and institutions.
Preferably, the step B includes:
step B1: the basic principle and the construction mode of the isomorphic directed hypergraph are provided;
step B2: the basic principle and the construction mode of the heterogeneous two-part hypergraph are provided;
step B3: the basic principle and the construction mode of the heterogeneous academic super-network are provided.
The heterogeneous academic super-network is composed of seven different sub-super-networks, and each sub-super-network is an example of a homogeneous directed super-graph or a heterogeneous two-part super-graph.
Preferably, the step C includes:
step C1: in reality, the importance of a particular paper pi is evaluated comprehensively considering the following aspects without manually analyzing its content;
-quality of paper citing pi;
-journal quality of publication pi;
-author reputation of pi;
-the reputation of the institution where the pi author wrote the pi;
step C2: based on the four paper indexes defined in the step C1, a mutual reinforcement ranking frame HSHMRR is provided, and the academic entities of each type are subjected to collaborative ordering;
the ideas of PageRank and Randomized HITS (RHITS) are combined, and the influence between the same academic entity class and the reinforcement between different academic entities are captured;
step C3: updating the authority scores of the papers according to the new center scores of the papers, authors, periodicals and institutions calculated in the step C2 and the authority scores of the current papers.
Preferably, the step D includes:
step D1: it is hypothesized that there are some research fields in which the articles are published in [ t- Δ t, t]During the period, the HSHMRR is run on these papers first, and then a feature vector xi is created for each paper pi using the authority score calculated by the HSHMRR. Let X be the feature vector set of the paper, and R (X) be the future t0 according to the paper ([ t0, t + t 0)]Period), J (a, B) is a function that calculates the similarity between the two ranked lists. Next, the problem of ranking the impact of these papers in the future t0 translates into finding a function
Figure BDA0002649427050000031
Make it
Figure BDA0002649427050000032
As large as possible;
preferably, step D2: further assume that now in t years, so R (X) is unknown. We introduce spearman rank correlation coefficients as J (a, B) and learn the ranking function from the historical data. First, a d-paper feature vector set is created by using HSHMRR
Figure BDA0002649427050000033
Each Xi is comprised of the groups [ t-t0-i- Δ t, t-t0-i]Study papers published in the meantime; subsequently, according to [ t-t0-i, t-i]The information generated during the period, obtaining R (xi); finally, the following is solved by the MART rank learning algorithm:
Figure BDA0002649427050000034
compared with the prior art, the invention has the following beneficial effects:
1. the invention employs a general and efficient method that is capable of adaptively learning the potential dynamic properties of different academic literature datasets and applying the learned knowledge to ranking;
2. the method can be directly applied to different data sets under different target time periods and different future influence definitions;
3. the invention has reasonable structure and convenient use and can overcome the defects of the prior art.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is an exemplary diagram of a homogeneous directed hypergraph and a heterogeneous two-part hypergraph in an embodiment of the invention.
FIG. 2 is an exemplary diagram of constructing seven sub-supernets in an embodiment of the present invention.
Fig. 3 is an exemplary diagram of an algorithm flow in an embodiment of the invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
1-3, a future influence dynamic ranking method of academic documents based on a mutual reinforcement ranking framework and ranking learning. Specifically, the present embodiment includes the following steps:
step A: extracting meta-information of the thesis based on the academic entities and the relationships among the entities;
and B: introducing a isomorphic directed hypergraph and a heterogeneous two-part hypergraph which are defined based on hypergraph extension, and constructing a heterogeneous academic hypergraph network;
and C: based on a mutual reinforcement ranking frame HSHMRR, scoring is realized for academic entities of different types on a heterogeneous academic super network;
step D: on the basis of a mutual reinforcement ranking frame, a sorting learning method MART is combined, potential dynamic characteristics are learned from a historical time period, learned knowledge is applied to a target time period, and a final evaluation result is formed;
the meta-information of the thesis in step a includes four types of academic entities, namely, a thesis, an author, a journal and an institution. The entity relations are seven relations among four types of academic entities, namely citation relations among papers, authors, periodicals and institutions, author relations among papers and authors, publishing relations among papers and periodicals and attribution relations among papers and institutions.
Fig. 1 is an exemplary diagram of a homogeneous directed hypergraph and a heterogeneous two-part hypergraph, in which (a) a colored dotted line indicates a hyperedge and an arrow indicates a direction. (b) The different colored vertices represent different types and the dashed lines represent undirected edges.
The step B comprises the following steps:
step B1: the basic principle and structure of isomorphic directed hypergraph are providedIn this way, as shown in fig. 1(a), the isomorphic directed hypergraph can be represented as H ═ V, E, where V is the set of vertices of all the same classes and E is the set of directed hyperedges. Unlike a directed edge that connects only two vertices in a traditional simple directed graph, a directed hyper-edge E ∈ E connects two sets of vertices, and is denoted as E ═ V (V)+(e),V-(e) Wherein V is+(e) Is a set of vertices, V, of the e tail end-(e) Is the set of vertices of the e-apex. It is to be noted that it is preferable that,
Figure BDA0002649427050000041
is allowed. Specifically, the method comprises the following steps:
step B1.1: mathematically describing isomorphic directed hypergraphs, one isomorphic directed hypergraph can be described by two adjacent matrices, namely VE+And VE-And a vector W. For the hyper-edge e and the vertex u, if u ∈ V+(e) We just use VE+(u,e)>0 represents V+(e) Weight of middle u, otherwise VE+(u, e) ═ 0. Similarly, if u ∈ V-(e) We have VE-(u,e)>0 represents u at V-(e) Else V E-(u, e) ═ 0. The weight of the excess edge e is represented by W (e).
Step B1.2: based on the mathematical definition given in step B1.1, the transition probability between any two vertices in the isomorphic directed hypergraph can be further solved. Assuming that a random walker now wants to move forward at a vertex u, he must first randomly choose a super edge e to come out of u and walk along it. After reaching the front end of the super edge e, he needs to go from the set V+(e) Wherein a vertex is randomly selected. Based on this idea, the transition probability from vertex u to vertex v can be formally determined as follows:
Figure BDA0002649427050000051
wherein the content of the first and second substances,
Figure BDA0002649427050000052
the first part of the product in equation (1) is the probability of selecting the super edge e when the random walker is at vertex u, and the second part is the probability of reaching v after selecting e.
Step B2: the basic principle and the construction mode of the heterogeneous two-part hypergraph are put forward, and as shown in FIG. 1(b), one heterogeneous two-part hypergraph can be Hab=(Va,Vb,Eab) Is represented by the formula (I), wherein VaIs a set of a-type vertices, VbIs a collection of type b vertices, EabIs a set of undirected hyper-edges. Similar to the homogeneous directed hypergraph, the hyperedges in the heterogeneous two-part hypergraph both connect two sets of vertices. For each super edge eab∈EabAt eabHas a set of a-type vertices at one end and a set of b-type vertices at the other end. Thus, eabCan use eab=(Va(eab),Vb(eab) Is) is shown. Specifically, the method comprises the following steps:
the heterogeneous two-part hypergraph is described mathematically: using two adjacent matrices VEaAnd VEbAnd a vector Wab. For each super edge eab∈EabFor each vertex u ∈ Va(eab) We have VEa(u,eab)>0, V ∈ V for each vertexb(eab) We have VEb(v,eab)>0, wherein VEa(u,eab) And VEb(v,eab) The weights of u and v, respectively. For with eabAdjacent vertices, VEaAnd VEbThe corresponding number in (1) is 0. e.g. of the typeabWeight of is stored in Wab(eab) In (1). Specifically, the probability of transition from type a vertex u to type b vertex v is as follows:
Figure BDA0002649427050000053
wherein I (. cndot.) is defined in formula (2). The first part of the product establishes the choice of the hyper-edge e by the random walker at vertex UabThe second part is the random pedestrian arrival eabIs at the other end of the rear from Vb(eab) Probability model of medium selection VMolding;
FIG. 2 is an example of constructing seven sub-supernets. For clarity of illustration, (c) and (e) show only a portion of the overcedges.
Step B3: the basic principle and the construction mode of the heterogeneous academic super-network are provided, as shown in FIG. 2, one heterogeneous academic super-network is composed of seven different sub-super-networks, and each sub-super-network is an example of an isomorphic directed super-graph or a heterogeneous two-part super-graph. Assuming we now have a paper database, the construction process of seven sub-supernets includes the following steps:
step B3.1: for each citation, a directed super edge is respectively established, the citation is used as a tail vertex set, the paper quoted by the citation is used as a head vertex set, and a paper citation super net (HomDHG) is obtained. Similarly, to construct an author quotation supernet (HomDHG), we create a directed superedge for each pair of quotation and quoted paper, where the authors of the quotation are the set of tail vertices and the authors of the quoted paper are the set of head vertices. The method for establishing the organization citation super-net (HomDHG) is the same as the method for establishing the author citation super-net. Regarding the establishment of the magazine quotation hyper-network (HomDHG), a directed hyper-edge is established for each quotation, the quotation periodicals are tail vertex sets, and the quotation periodicals are head vertex sets. A thesis author supernet (HetBHG) is constructed by creating an undirected superedge for each thesis, with one thesis alone as a set of vertices at one end and the other end as the set of vertices of the thesis author. The paper agency extranet (HetBHG) is also constructed in the same manner. For a hypernetwork of papers and periodicals (HetBHG), an undirected hyperedge is created for each journal, wherein one journal is used as a vertex set at one end independently, and papers published in the journal are used as vertex sets at the other end.
Step B3.2: these sub-supernets are described mathematically: we take the paper citation supernet and paper author supernet as examples, and other sub-supernets can be described in the same way. Paper quotation H for supernetp=(Vp,Ep) Is shown in which VpIs a set of thesis vertices, EpIs a directed super-edge set. It consists of two matrixes
Figure BDA0002649427050000065
And
Figure BDA0002649427050000066
and a vector WpTo describe. For tail vertex set and head vertex set respectively
Figure BDA0002649427050000061
And
Figure BDA0002649427050000062
represented supercide ep∈EpIf, if
Figure BDA0002649427050000063
We use
Figure BDA0002649427050000067
To represent the weight of paper u, otherwise
Figure BDA0002649427050000068
Similarly, if the article
Figure BDA0002649427050000069
Figure BDA00026494270500000610
We use
Figure BDA00026494270500000611
Representing the weight of u; otherwise, use
Figure BDA00026494270500000612
To indicate, the super edge epThe weight of is Wp(ep)。
Similarly, we use two matrices VEpAnd VErAnd a vector WprTo describe a paper author supernet, using Hpr=(Vp,Vr,Epr) Is shown in which VpAnd VrRespectively, vertex set of the paper and vertex set of the researcher, EprIs a non-directional overcideAnd (4) collecting. For each super edge epr∈Epr,VP(epr) And Vr(epr) Representing two sets of vertices, u ∈ V for a paperp(epr) We have VEp(u,epr)>0, V ∈ V for a researcherr(epr) We have VEr(v,epr)>0, wherein VEp(u,epr) And VEr(v,epr) The weights of u and v, respectively. For with eprAdjacent vertices, VEpAnd VErThe corresponding term in (a) is 0. e.g. of the typeprWeight of is stored in Wpr(epr) In (1).
For each sub-hypergraph, the weight of the hyper-edge is set to 1, and the weight of the vertex in the set of vertices is equal to the number of entities it represents. For example, if multiple authors of a paper are from the same organization, the weight value of the organizational vertices in the corresponding vertex set of the paper organization's supernet is equal to the number of authors of that organization.
The step C comprises the following steps:
step C1: in reality, the following aspects can be comprehensively considered to evaluate a specific paper piWithout the need to manually analyze its contents.
Reference piQuality of the article. Often, an important paper will be referenced by many other important papers. Thus, if there are many important papers cited piThen piWhich itself is likely to be important. Furthermore, if many excellent central papers quote piThe central paper is a paper that cites a number of important papers, then piMay also be important.
Publication piThe quality of the periodical. A journal is considered to be of high quality when most of the papers published in the journal are high quality papers. Conversely, if piIs distributed in high-quality periodicals, then piThe probability of being important is high.
·piThe author of (1) prestores. Authoritative researchers are typically those who have published many profound papers. Therefore, if at piAn authoritative researcher of the authors of (1), then piIs likely to be important.
·piAuthor writes piThe reputation of the mechanism in which the time is. Publishing multiple high-quality papers makes an organization more authoritative, in turn, if piIs generated by an authority, which is likely to be important.
Step C2: based on the four paper indexes defined in step C1, a mutual reinforcement ranking frame HSHMRR is proposed to perform a co-ranking for each type of academic entity. The idea of PageRank and Randomized HITS (RHITS) is combined to capture the influence between the same academic entity class and the reinforcement between different academic entities. The specific implementation mode comprises the following steps:
step C2.1: calculating a fractional variable for each periodical viEach researcher riAnd each mechanism siHSHMRR defines the corresponding center score Hv(i)、Hr(i) And Hs(i) Respectively reflect vi、riAnd siThe reputation of (1). For each paper piHSHMRR assigns it two scores: one is the center score Hp(i) The other is an authority score Ap(i)。Hp(i) Depending on piImportance of the cited paper, and Ap(i) Represents piThe importance of itself. We use Hp、Ap、Hr、HvAnd HsTo represent the corresponding fractional vector. After completion of calculation of HSHMRR, according to ApAnd ranking the papers and obtaining a final ranking result of the papers.
Step C2.2: the calculation of HSHMRR is performed in an iterative manner, as shown in fig. 3, which is a flowchart of an algorithm of HSHMRR. Initially (Algorithm lines 1-5), the central scores of the paper, author, journal and institution and the authoritative scores of the paper are initialized to
Figure BDA0002649427050000071
And
Figure BDA0002649427050000072
wherein N isp、Nr、NvAnd NsRepresenting the number of papers, authors, periodicals, and institutions, respectively. Then, iterations are performed until convergence (algorithm lines 6-14). In each iteration, the algorithm performs two calculations: one is to calculate a new central score for each type of academic entity, and the other is to calculate a new authoritative score for the paper.
Step C2.3: for a particular paper, its new center score is computed RHITS way on the paper citation supernet,
Figure BDA0002649427050000081
wherein P isr(pi|pj) Is from p in the reverse direction of the overcrowdingjTo piCan be calculated according to equation (1)
Figure BDA0002649427050000082
Wherein the content of the first and second substances,
Figure BDA0002649427050000083
the first part of the product in equation (5) is that the random walker is at the arrival of the vertex pjTime selection goes into the excess edge epThe second part is that he follows epTo the vertex p in the opposite directioniThe probability of (c). So, the entire product represents a random pedestrian pass epFrom pjTo piThe probability of (c).
Specific author riThe new center score of (a) may be calculated as follows:
Figure BDA0002649427050000084
wherein alpha is11+α 121. The first part of the sum is in the PageRank formatThe transmission score obtained by derivation of the core score of the current author in the author citation hypergraph, and the second part is the enhancement by derivation of the authority score of the current author in the paper author hypergraph in RHITS format. P (r)i|rj) And P (r)i|pk) Is calculated as follows:
Figure BDA0002649427050000085
Figure BDA0002649427050000086
P(ri|rj) And P (r)i|pk) The calculation idea of (1) and (3) is the same, that is, a random walker must first select a super edge to move from one vertex to another, and then select the target vertex after reaching the other end of the super edge. The calculation of new center scores for periodicals and institutions is the same as that of authors, with the parameters alpha respectively21、α22And alpha31、α32
Step C3 updates the authority score of the paper according to the new central scores of the paper, author, journal and institution calculated in step C2 and the authority score of the current paper, as in line 11 of the algorithm, where β is12345+β 61. The new authority score for a particular paper is a weighted sum of six parts. The first part is the authority score of the current paper deduced through the paper citation supernet in the PageRank model. The next four sections are the enhancement of new center scores derived from the corresponding sub-hypernets of the RHITS-style papers, authors, periodicals, and institutions. The last part is the personalized rank vector. Here, update A is givenp(i) The formula of (1) is as follows:
Figure BDA0002649427050000091
all transition probabilities in equation (10) are calculated based on the same idea of random walk behind equation (5).
The step D comprises the following steps:
step D1: it is hypothesized that there are some research fields in which the articles are published in [ t- Δ t, t]Meanwhile, HSHMRR is firstly operated on the papers, and then authority score calculated by HSHMRR is used as p of each paperiCreating a feature vector xi. Let X be the feature vector set of the paper, and R (X) be t in the future according to the paper0Year ([ t)0,t+t0]Period), J (a, B) is a function that calculates the similarity between the two ranked lists. Next, for these papers, t in the future0The problem of ranking the influence of years translates into finding a function
Figure BDA0002649427050000092
Make it
Figure BDA0002649427050000093
As large as possible;
step D2: further assume that now in t years, so R (X) is unknown. We introduce spearman rank correlation coefficients as J (a, B) and learn the ranking function from the historical data. First, a d-paper feature vector set is created by using HSHMRR
Figure BDA0002649427050000094
Each XiIs comprised in [ t-t0-i-Δt,t-t0-i]Study papers published in the meantime. Then, according to [ t-t ]0-i,t-i]The information generated during the process can be used to obtain R (X)i). Finally, the following problem can be solved by the MART rank learning algorithm:
Figure BDA0002649427050000095
the academic literature future influence dynamic ranking method based on the mutual reinforcement ranking frame and the ranking learning (HSHMRR-MART) provided by the embodiment mainly solves the ranking problem of the academic literature future influence, provides a universal and effective method, can adaptively learn the potential dynamic properties of different academic literature data sets, and applies the learned knowledge to ranking; original problems are creatively converted into sequencing learning problems by means of HSHMRR, and the HSHMRR is a novel mutually-reinforced ranking framework and is used for accurately measuring the importance of different types of academic entities (papers, researchers, periodicals and institutions) in different periods; the present embodiment can be directly applied to different data sets under different target time periods and different future influence definitions. The experimental results of three data sets extracted from the microsoft academic figure confirm the effectiveness of the method proposed by us, and the performance of the method is up to 29% higher than that of the current latest method in terms of the spearman grade correlation coefficient.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A future influence dynamic ranking method of academic documents based on mutual reinforcement framework and ranking learning is characterized by comprising the following steps:
step A: extracting meta-information of the thesis based on the academic entities and the relationships among the entities;
and B: introducing a isomorphic directed hypergraph and a heterogeneous two-part hypergraph which are defined based on hypergraph extension, and constructing a heterogeneous academic hypergraph network;
and C: based on a mutual reinforcement ranking frame HSHMRR, scoring is realized for academic entities of different types on a heterogeneous academic super network;
step D: on the basis of a mutual reinforcement ranking frame, learning potential dynamic characteristics from a historical time period, and applying the learned knowledge to a target time period to form an evaluation result;
step E: and acquiring future influence dynamic sequencing result information of academic literature based on a mutual reinforcement framework and sequencing learning.
2. The academic literature future influence dynamic ranking method based on mutual reinforcement framework and ranking learning according to claim 1, wherein the step B comprises:
step B1: the basic principle and the construction mode of the isomorphic directed hypergraph are provided;
step B2: the basic principle and the construction mode of the heterogeneous two-part hypergraph are provided;
step B3: the basic principle and the construction mode of the heterogeneous academic super-network are provided.
3. The method for dynamically ranking the future influence of academic documents based on mutual reinforcement framework and ranking learning according to claim 1, wherein the step C comprises:
step C1: in reality, the following aspects are comprehensively considered to evaluate a specific paper piThe importance of (c);
-reference to piThe quality of the paper of (1);
-publication of piThe quality of the periodical;
-pithe author reputation of (1);
-piauthor writes piThe reputation of the institution in which the time is;
step C2: based on the four paper indexes defined in the step C1, a mutual reinforcement ranking frame HSHMRR is provided, and the academic entities of each type are subjected to collaborative ordering;
the ideas of PageRank and Randomized HITS are combined, and the influence between academic entities of the same class and the enhancement between academic entities of different classes are captured;
step C3: updating the authority scores of the papers according to the new center scores of the papers, authors, periodicals and institutions calculated in the step C2 and the authority scores of the current papers.
4. The academic literature future influence dynamic ranking method based on mutual reinforcement framework and ranking learning according to claim 1, wherein the step D comprises:
step D1: it is hypothesized that there are some research fields in which the articles are published in [ t- Δ t, t]Meanwhile, HSHMRR is firstly operated on the papers, and then authority score calculated by HSHMRR is used as p of each paperiCreating a feature vector xi(ii) a Let X be the feature vector set of the paper, and R (X) be t in the future according to the paper0Year, i.e. at [ t0,t+t0]Influence of the period gives a ranking table, J (a, B) is a function that calculates the similarity between two ranking tables.
5. The method of claim 4, wherein the step D further comprises:
step D2: introducing spearman grade correlation coefficients as J (A, B), and learning a ranking function from historical data; first, a d-paper feature vector set is created by using HSHMRR
Figure FDA0002649427040000021
Each XiIs comprised in [ t-t0-i-Δt,t-t0-i]Study papers published in the meantime; then, according to [ t-t ]0-i,t-i]Information generated during the period, to obtain R (X)i) (ii) a Finally, the following is solved by the MART rank learning algorithm:
Figure FDA0002649427040000022
6. an academic literature future influence dynamic ranking system based on mutual reinforcement framework and ranking learning, comprising:
a module A: extracting meta-information of the thesis based on the academic entities and the relationships among the entities;
and a module B: introducing a isomorphic directed hypergraph and a heterogeneous two-part hypergraph which are defined based on hypergraph extension, and constructing a heterogeneous academic hypergraph network;
and a module C: based on a mutual reinforcement ranking frame HSHMRR, scoring is realized for academic entities of different types on a heterogeneous academic super network;
a module D: on the basis of a mutual reinforcement ranking frame, learning potential dynamic characteristics from a historical time period, and applying the learned knowledge to a target time period to form an evaluation result;
and a module E: and acquiring future influence dynamic sequencing result information of academic literature based on a mutual reinforcement framework and sequencing learning.
7. The academic document future influence dynamic ranking system based on mutual reinforcement framework and ranking learning of claim 6, wherein the module B comprises:
module B1: the basic principle and the construction mode of the isomorphic directed hypergraph are provided;
module B2: the basic principle and the construction mode of the heterogeneous two-part hypergraph are provided;
module B3: the basic principle and the construction mode of the heterogeneous academic super-network are provided.
8. The academic document future influence dynamic ranking system based on mutual reinforcement framework and ranking learning of claim 6, wherein the module C comprises:
module C1: in reality, the following aspects are comprehensively considered to evaluate a specific paper piThe importance of (c);
-reference to piThe quality of the paper of (1);
-publication of piThe quality of the periodical;
-pithe author reputation of (1);
-piauthor writes piThe reputation of the institution in which the time is;
module C2: based on four types of paper indexes defined by the module C1, a mutual reinforcement ranking frame HSHMRR is provided, and each type of academic entity is subjected to collaborative sequencing;
the ideas of PageRank and Randomized HITS are combined, and the influence between academic entities of the same class and the enhancement between academic entities of different classes are captured;
module C3: the paper authority score is updated based on the new central scores of the paper, author, journal and institution calculated by block C2 and the authority score of the current paper.
9. The academic document future influence dynamic ranking system based on mutual reinforcement framework and ranking learning of claim 6, wherein the module D comprises:
module D1: it is hypothesized that there are some research fields in which the articles are published in [ t- Δ t, t]Meanwhile, HSHMRR is firstly operated on the papers, and then authority score calculated by HSHMRR is used as p of each paperiCreating a feature vector xi(ii) a Let X be the feature vector set of the paper, and R (X) be t in the future according to the paper0Year, i.e. at [ t0,t+t0]Influence of the period gives a ranking table, J (a, B) is a function that calculates the similarity between two ranking tables.
10. The academic document future influence dynamic ranking system based on mutual reinforcement framework and ranking learning of claim 9, wherein the module D further comprises:
module D2: introducing spearman grade correlation coefficients as J (A, B), and learning a ranking function from historical data; first, a d-paper feature vector set is created by using HSHMRR
Figure FDA0002649427040000031
Each XiIs comprised in [ t-t0-i-Δt,t-t0-i]Study papers published in the meantime; then, according to [ t-t ]0-i,t-i]Information generated during the period, to obtain R (X)i) (ii) a Finally, the following is solved by the MART rank learning algorithm:
Figure FDA0002649427040000032
CN202010864916.7A 2020-08-25 2020-08-25 Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning Pending CN111949771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010864916.7A CN111949771A (en) 2020-08-25 2020-08-25 Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010864916.7A CN111949771A (en) 2020-08-25 2020-08-25 Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning

Publications (1)

Publication Number Publication Date
CN111949771A true CN111949771A (en) 2020-11-17

Family

ID=73367953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010864916.7A Pending CN111949771A (en) 2020-08-25 2020-08-25 Academic document future influence dynamic ranking method and system based on mutual reinforcement framework and ranking learning

Country Status (1)

Country Link
CN (1) CN111949771A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133843A (en) * 2014-06-25 2014-11-05 福州大学 Academic influence cooperative sequencing method of nodes in scientific and technical literature heterogeneous network
CN107391659A (en) * 2017-07-18 2017-11-24 北京工业大学 A kind of citation network academic evaluation sort method based on credit worthiness

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133843A (en) * 2014-06-25 2014-11-05 福州大学 Academic influence cooperative sequencing method of nodes in scientific and technical literature heterogeneous network
CN107391659A (en) * 2017-07-18 2017-11-24 北京工业大学 A kind of citation network academic evaluation sort method based on credit worthiness

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XI ZHANG;LUOYI FU;XINBING WANG: "Ranking the Future Influence of Scientific Literatures", 《2018 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC)》 *
姚宇航、欧俊杰等: "图灵指数——学术大数据下的跨领域跨年代学者影响力评估", 《大数据》 *

Similar Documents

Publication Publication Date Title
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
Liao et al. A continuous interval-valued linguistic ORESTE method for multi-criteria group decision making
CN109977232B (en) Graph neural network visual analysis method based on force guide graph
CN107562812A (en) A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space
CN106598950B (en) A kind of name entity recognition method based on hybrid laminated model
Yang et al. Optimal granularity selection based on cost-sensitive sequential three-way decisions with rough fuzzy sets
CN114048331A (en) Knowledge graph recommendation method and system based on improved KGAT model
CN107038184B (en) A kind of news recommended method based on layering latent variable model
CN112667877A (en) Scenic spot recommendation method and equipment based on tourist knowledge map
El Mohadab et al. Predicting rank for scientific research papers using supervised learning
CN109189926A (en) A kind of construction method of technical paper corpus
CN112905801A (en) Event map-based travel prediction method, system, device and storage medium
CN110737805B (en) Method and device for processing graph model data and terminal equipment
Fu et al. The academic social network
Xu et al. Effective community division based on improved spectral clustering
Li et al. Intelligent medical heterogeneous big data set balanced clustering using deep learning
CN109740106A (en) Large-scale network betweenness approximation method based on graph convolution neural network, storage device and storage medium
CN115270007A (en) POI recommendation method and system based on mixed graph neural network
CN115860880B (en) Personalized commodity recommendation method and system based on multi-layer heterogeneous graph convolution model
CN110347791A (en) A kind of topic recommended method based on multi-tag classification convolutional neural networks
Zhang et al. FM-based: algorithm research on rural tourism recommendation combining seasonal and distribution features
Zhou et al. Use of artificial neural networks for selective omission in updating road networks
CN109086463A (en) A kind of Ask-Answer Community label recommendation method based on region convolutional neural networks
CN110457706B (en) Point-of-interest name selection model training method, using method, device and storage medium
CN112131261A (en) Community query method and device based on community network and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201117