CN102609546B - Method and system for excavating information of academic journal paper authors - Google Patents

Method and system for excavating information of academic journal paper authors Download PDF

Info

Publication number
CN102609546B
CN102609546B CN201210072645.7A CN201210072645A CN102609546B CN 102609546 B CN102609546 B CN 102609546B CN 201210072645 A CN201210072645 A CN 201210072645A CN 102609546 B CN102609546 B CN 102609546B
Authority
CN
China
Prior art keywords
author
research direction
paper
information
academic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210072645.7A
Other languages
Chinese (zh)
Other versions
CN102609546A (en
Inventor
朝乐门
张勇
邢春晓
孙一钢
朱先忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Library
Tsinghua University
Original Assignee
National Library
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Library, Tsinghua University filed Critical National Library
Priority to CN201210072645.7A priority Critical patent/CN102609546B/en
Publication of CN102609546A publication Critical patent/CN102609546A/en
Application granted granted Critical
Publication of CN102609546B publication Critical patent/CN102609546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for excavating information of academic journal paper authors. The method comprises the steps: firstly selecting an objective subject field and establishing an OWL (Web Ontology Language) field ontology; secondly extracting author information from an academic journal paper of the objective subject field; thirdly performing the format conversion to the extracted author information, storing the information into an author information bank and calculating out the unique author ID (Identity); finally obtaining an incidence matrix of an author and an academic paper, an academic developing route chart of the author, a cooperator network diagram of the author, an academic cooperation distances among authors, a hot spot research direction map and an academic reputation map of the author by above information. According to the invention, the data source of the author information excavating method is changed; the OWL filed ontology technology is introduced into calculation processes of the academic cooperation distances among authors and the hot spot research direction; and the semantic calculation effect is improved.

Description

A kind of Academic Periodical Papers author information method for digging and system
Technical field
The present invention relates to knowledge engineering field, be specifically related to a kind of Academic Periodical Papers author information method for digging and system.
Background technology
Academic Periodical Papers author information refers to the essential informations such as author's name, sex, year of birth, native place, academic title and research direction that provide in the scientific paper being formally published on periodical, generally appear at footnote or the last endnote position of paper of paper homepage, as shown in Figure 1.With respect to books, in Academic Periodical Papers, author information has the features such as content is brief, form is fixed, word standard.
The analysis of the quantitative relation between author and document refers to disclose the relation between author and quantity of document, and the science yield-power of describing author is object information analysis method.Aspect the analysis of the quantitative relation between author and document, more representational is that Lotka's law (Lotka ' s Law)---the relation of author's quantity and Quantity of Papers is followed a kind of square inverse ratio law, that is: F (x)=C/x2, wherein x, F(x), the author that C represents respectively paper number, write x piece of writing paper accounts for ratio and the constant of author's sum.On the basis of Lotka's law, the scholars such as Fei Laqi have proposed to affect two factors that Luo Teka distributes: the one, and residing epoch of researcher or environment directly affect result of study; The 2nd, the author's quantity in statistical sample is relevant with result of study.The advantage of the quantitative relation analysis between author and document is the relation having disclosed preferably between author's frequency and Quantity of Papers, and shortcoming is not analyze other information of author, comprises the information such as year of birth, native place, academic title, research direction.
Price utilizes the distribution of every author's cooperation quantity to study collaborative problem, has drawn following equation:
Σ m = 1 I n ( x ) = N
Wherein n (x) represents to write author's number of x paper; I=nmax is the paper sum of high yield author in this field; N is whole authors' sum.M=0.749(nmax)0.5。On the Research foundation of Price, scholars have proposed the computing formula of cooperation degree, cooperative rate, specific as follows:
Although each has the relative merits of himself by oneself said method, and there is the case of successfully using in each comfortable different situation, but they cannot meet the special requirement of scientific paper author profile information excavating: first, the content of the author profile's information in Academic Periodical Papers has singularity.Secondly, the position of the author profile's information in Academic Periodical Papers has singularity.Again, the form of the author profile's information in Academic Periodical Papers has singularity.Finally, the word of the author profile's information in Academic Periodical Papers has singularity.
Summary of the invention
For the above-mentioned problems in the prior art, the invention provides a kind of Academic Periodical Papers author information method for digging and system.
The invention provides a kind of Academic Periodical Papers author information method for digging, comprising:
Step 1, select target ambit, sets up OWL domain body;
Step 2, extracts author information the Academic Periodical Papers in target ambit;
Step 3, carries out format conversion to the author information extracting, and deposits in author information storehouse, and calculate unique author ID;
Step 4, calculates author and scientific paper incidence matrix, described author and scientific paper incidence matrix S according to author ID and paper ID m * n=(s ij) m * nrepresent, wherein i and j are respectively paper ID and author ID, and m and n represent respectively paper record and author's number, s ijrepresent author's weight, the computing formula of author's weight is as follows:
S ( i , j ) = 0 n = 0 1 n n > 0 , Wherein, S(i, j) be the author weight of i author in j piece of writing paper, n is the rank order of i author in j piece of writing paper, n=1,2,3 ..., N;
Step 5, calculate author in publish thesis absolute quantity generate the academic growth route map of author of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time, wherein, the publish thesis computing formula of absolute quantity y of i author accumulative total on research direction z is as follows:
in formula, N is the paper sum that i author delivers on research direction z, S(i, j, z) be the author weight of i author in j piece of writing paper, between two research directions, exist inheritance, identity relation or set operation relation to be judged to be same research direction;
Step 6, obtains author's partner networks figure according to author and scientific paper incidence matrix, described author's cooperative network figure comprises that author gathers and paper set, and author is node, and paper is tie, and the weighted value computing method between two nodes are as follows:
D(i,j,k)=|S(i,k)-S(j,k)|;
Wherein, D(i, j, k) be difference S(i, the k of i author and j author weight in k piece of writing paper) and S(j, k) be respectively i author and j author weight in k piece of writing paper;
Step 7, calculates the scientific cooperation distance between author according to author's partner networks figure, and the computing formula of the scientific cooperation distance between described author is as follows:
L ( i , j ) = Σ k = 0 N ( k × S ( k , k + 1 ) ) , Wherein L(i, j) be the scientific cooperation distance between the author that node i and node j are corresponding, the intermediate node of k for existing on shortest path between node i and node j in author's cooperative network figure, the number that N is intermediate node;
Step 8, generates focus research direction map according to OWL domain body, author ID, research direction and focus degree thereof, and described hot research direction map generates according to following formula:
H ( i ) = π × Σ k = 0 n ( H ( k ) × D ( i , k ) ) 2
Wherein, n is author's number of being engaged in the subclass research direction of i research direction, H(k) be the focus degree of the research direction of k subclass, D (i, k) represent that the middle junction on the shortest path between research direction i and research direction k counts, H(i) represent i the author's number in research direction, D (i, 0)=1, research direction corresponding to leaf node in OWL body is subclass research direction;
Step 9, author's acientific reputation map generation module, for generating author's acientific reputation map according to author ID and author's partner networks figure, described author's acientific reputation map be take the first authors as blazer's node, the digraph that the co-worker of take is recipient's node, the computing method of described author's acientific reputation are as follows:
I ( i ) = Σ k = 0 n ( I ( k ) × A ( i , k ) )
Wherein, I(i) be i author's popularity, the cooperation author number that n is i author, k is i author's k co-worker, and A (i, k) is the distance between i author and k author, I(0) representative and i the direct cooperation number of author, and A (i, 0)=1.
In one example, in step 1, OWL domain body comprises inheritance, identity relation and the set operation relation between field term.
In one example, in step 2, author information comprises author's name, sex, year of birth, native place, academic title, research direction, Article Titles, periodical title, delivers time and author unit one belongs to; In step 3, unique author ID comprises author's name, year of birth, sex, native place, unit one belongs to's title and random code.
The invention provides the system that realizes said method, comprise that ETL module, domain body, unique identification module, author and scientific paper incidence matrix computing module, the academic growth route map of author generation module, author's cooperative network figure generation module, scientific cooperation are apart from generation module, hot research direction map generation module and author's acientific reputation map generation module;
ETL module, extracts author information for the Academic Periodical Papers in target ambit, and the author information extracting is carried out format conversion and deposited in author information storehouse;
Domain body is by being set up OWL domain body according to selected target ambit;
Unique identification module, for calculating unique author ID;
Author and scientific paper incidence matrix computing module, for calculating author and scientific paper incidence matrix according to author ID and paper ID;
The academic growth route map of author generation module, for calculating author in publish thesis absolute quantity generate author's science growth route map of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Author's cooperative network figure generation module, for obtaining author's partner networks figure according to author and scientific paper incidence matrix;
Scientific cooperation is apart from generation module, for calculate the scientific cooperation distance between author according to author's partner networks figure;
Hot research direction map generation module, for generating focus research direction map according to OWL domain body, author ID, research direction and focus degree thereof; .
Author's acientific reputation map generation module, for generating author's acientific reputation map according to author ID and author's partner networks figure.
To sum up, the major advantage of this method is: 1) break through traditional literature metering and information metering method to the inadequate phenomenon of the attention of author profile's information, propose a kind of author profile's in science opinion information mining method, changed the Data Source of author information method for digging.2) in the computation process of author's scientific cooperation distance, hot research direction, introduce OWL domain body technology, improved semantic computation effect.3) propose the grow up computing method of route, scientific cooperation distance, focus direction of author's unique identification code, scholar based on author profile's information, expanded the research visual angle that author information excavates.Therefore, compare with information metering method with aforesaid Bibliometrics, this method can meet the needs that scientific paper author information excavates better.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is described in further detail, wherein:
Fig. 1 is the author profile's information schematic diagram in scientific paper of the present invention;
Fig. 2 is that scientific paper author information of the present invention excavates basic step schematic diagram;
Fig. 3 is the E-R figure of scientific paper author information digging system of the present invention;
Fig. 4 is " author and scientific paper incidence matrix " of the present invention schematic diagram;
Fig. 5 is " the academic growth route map of author " of the present invention schematic diagram;
Fig. 6 is " author's cooperative network figure " of the present invention schematic diagram;
Fig. 7 is " author's scientific cooperation distance matrix " of the present invention schematic diagram;
Fig. 8 is " hot research direction map " of the present invention schematic diagram;
Fig. 9 is the schematic diagram of " author's acientific reputation map " of the present invention;
Figure 10 is the schematic diagram of " scientific paper author information digging system " of the present invention.
Embodiment
The present invention proposes Academic Periodical Papers author profile information mining method as shown in Figure 2, the method comprises the steps:
Step (1), selects specific ambit according to demand, adopts OWL technology to set up domain body.When building domain body, need to consider the term corresponding with this area research direction and mutual relationship thereof.The formalization representation of domain body must indicate succession between class (or attribute), is equal to, transmission, symmetry, function and inverse function relation, class set operation relation between affiliated relation, class and example between cross reference, attribute and class between corresponding relation, attribute.
Step (2) extracts author profile's information from specific area Academic Periodical Papers, comprises author's name, sex, year of birth, native place, academic title, research direction, Article Titles, periodical title, delivers time, author unit one belongs to.The extraction position of different information may be different.The information such as author's name, sex, year of birth, native place, academic title, research direction partly extract from the author profile of scientific paper; Article Titles, periodical title, deliver time and author unit one belongs to and from corresponding position, extract respectively.
Step (3), carries out format conversion to the author profile's information extracting, and deposits in author information storehouse.Design one or more information tables, for depositing author information; Author's name after extraction, sex, native place, academic title, research direction, Article Titles, periodical title, author unit one belongs to are converted to character string type; Year of birth after extraction and the time of delivering are converted to date type; After format conversion, author information is put into corresponding information table.
Step (4), calculates author's unique identification code, identifies same author and distinguishes different authors.By calculating name, year of birth, sex, native place, academic title, research direction, author unit one belongs to are carried out to function calculating, draw each author's unique identification code; Unique identification is deposited in author information table.
Step (5), take paper ID as row, author ID are row, calculates " author and scientific paper incidence matrix ", i.e. S m ╳ n=(s ij) m ╳ n, wherein i and j are respectively paper ID and author ID, and m and n represent respectively paper record and author's number, s ijrepresentative " author's weight "." author's weight " s ijrank order by author in corresponding scientific paper determines.In following content, except explicitly pointing out, the author who mentions when calculating is author ID.
Step (6), according to " author and scientific paper incidence matrix ", x axle is the time, y axle is " accumulation publish thesis absolute quantity " of i author on research direction z, adopts function y=f subj(x, z, i) generates " the academic growth route map of author "." accumulation publish thesis absolute quantity " on research direction z determined by the author's rank order publishing thesis in quantity and paper.The method that determines whether same research direction is as follows: first, and the research direction while reading paper publishing from database, and shine upon with field OWL body, secondly, judge and between research direction, whether have succession (<rdfs:subclassOf>), be equal to (<owl:equivalentClass>), set operation (<owl:disjointWith>, <owl:unionOf>, <owl:intersectionOf>, <owl:complementOf>) or example relation (<rdf:Description>, <rdf:type>), finally, if existed with co-relation, think same research direction, otherwise think different research direction.
Step (7), generates in " author's cooperative network figure "." author's cooperative network figure " take author as actor's node, the weighted graph that paper is tie.Therefore, " author's cooperative network figure " comprises two groups of information: one group is that author gathers N={n 1, n 2... .n n, wherein N is author's number; Another group is paper set L={l 1, l 2...., l n, wherein L is paper number.The absolute value of the difference of the weight of the author that the weighted value of each tie in author's cooperative network figure is represented by two nodes in the paper of tie representative determines.
Step (8), calculates " author's scientific cooperation distance " between author.Take " author's cooperative network figure " as basis, calculate the scientific cooperation distance value between author, and generation " author's scientific cooperation distance matrix ".Scientific cooperation distance value between author determines by connecting node number on author's shortest path and the weight on limit.
Step (9), generates by " hot research direction map ".Take OWL domain body as basis, take researcher as node, research direction are tie, generate by " hot research direction map "." the focus degree of research direction " determined by two variablees: the one, be engaged in author's number of this research direction, subclass research direction; The 2nd, the distance between subclass research direction and the research direction of node representative.The method that judges same research direction, its subclass direction is that research method is mapped to after OWL domain body, whether has <rdfs:subclassOf> or <owl:equivalentClass> in domain body.
Step (10), calculates author's acientific reputation.Take the first authors as blazer's node, and other cooperations author in same piece of writing paper is recipient's node, generates by " author's acientific reputation map ".Author's acientific reputation value is determined by the author's quantity with the direct cooperation of this author and each co-worker's popularity.
Below in conjunction with accompanying drawing and example, the specific embodiment of the present invention is described in further detail.Following instance is used for illustrating the present invention, but is not used for limiting the scope of the invention.
As shown in Figure 2, the excavation of scientific paper author information needs the support of OWL domain body technology.Therefore,, before analyzing academic Authors of Science Articles information, need to prepare domain body.While building OWL domain body, adopt that label L EssT.LTssT.LTrdfs:subclassOf>, <owl:equivalentClass>, <owl:disjointWith> identify respectively succession between class, are equal to, cross reference; Adopt that label L EssT.LTssT.LTrdfs:subPropertyOf>, <owl:equivalentProperty>, <owl:inverseOf> represent respectively succession between attribute, are equal to, reciprocal relation; Adopt label L EssT.LTssT.LTrdfs:domain>, <rdfs:range> to represent respectively attribute and classes relation; Adopt relation between label L EssT.LTssT.LTrdf:Description>, <rdf:type> representation class and example; Adopt mark owl:TransitiveProperty, owl:SymmetricProperty, owl:FunctionalProperty and owl:InverseFunctionalProperty to represent respectively transmission, symmetry, function and the inverse function relation between attribute; Adopt label L EssT.LTssT.LTowl:unionOf>, <owl:intersectionOf>, <owl:complementOf> to represent set operation relation.
As shown in Figure 3, extract and conversion after author's name, sex, year of birth, native place, academic title, research direction, Article Titles, periodical title, deliver the time, author unit one belongs to information deposits in respectively in ten relation tables such as author's table, paper table, paper and author's table of comparisons, academic title's table, author and academic title's table of comparisons, department table, author and department's table of comparisons, research direction table, author and the research direction table of comparisons, periodical table.The pattern of above-mentioned ten relation tables is respectively: author (author ID, author's name, date of birth, native place), paper (paper ID, thesis topic, periodical ID, date issued), author and the paper table of comparisons (author ID, paper ID, author's rank), academic title (academic title ID, academic title's title), author and academic title's table of comparisons (academic title ID, author ID, paper ID), (the ID of department of department, department name, city, place, postcode), author and department's table of comparisons (author ID, the ID of department, paper ID), research direction (research direction ID, research direction title, paper ID, author ID, body URI), author and the research direction table of comparisons (research direction ID, author ID, paper ID), periodical table (periodical title, ISBN, establish the date).
Author's unique identification code determines by name, year of birth, sex, native place, unit one belongs to's name character string, and specific formula for calculation is as follows:
AID(i)=StrConn(NameStr(N(i)), BirthStr(Y(i)), SexStr(S(i)), AffStr(A(i)), Ram(i)), AID(i wherein) be i author's unique identification code, N(i), Y(i), S(i), A(i) represent respectively i author's name, year of birth, sex, native place and unit one belongs to's title, function NameStr(), BirthStr(), SexStr(), AffStr() be respectively author's name, date of birth, the hash function of sex and unit one belongs to, Ram() be five random codes, for the author of the same name who distinguishes in same unit.
As shown in Figure 4, take paper ID as row, author ID are row, calculate " author and scientific paper incidence matrix ", i.e. S m * n=(s ij) m * n, wherein i and j are respectively paper ID and author ID, and m and n represent respectively paper record and author's number, s ijrepresentative " author's weight "." author's weight " s ijrank order by author in corresponding scientific paper determines.The specific formula for calculation of " author's weight " is as follows:
S ( i , j ) = 0 n = 0 1 n n > 0 (wherein, S(i, j) is the author weight of i author in j piece of writing paper, and n is the rank order of i author in j piece of writing paper, n=1, and 2,3 ..., N).
As shown in Figure 5, " author academic growth route map " is two-dimensional curve figure, and x axle is the time, and y axle is " accumulation publish thesis absolute quantity " of i author on research direction z, employing function y=f subj(x, z, i) generates " the academic growth route map of author " automatically." the publish thesis specific formula for calculation of absolute quantity y of i author accumulation on research direction z is as follows:
y = f Subj ( x , z , i ) = &Sigma; j = 0 N S ( i , j , z ) , Wherein N is the paper sum that i author delivers on research direction z, S(i, j, z) be " the author weight " of i author in j piece of writing paper.Wherein, judge whether that the method for same research direction is as follows: first, the research direction while reading paper publishing from database, and shine upon with field OWL body, secondly, judge and between research direction, whether have succession (<rdfs:subclassOf>), be equal to (<owl:equivalentClass>), set operation (<owl:disjointWith>, <owl:unionOf>, <owl:intersectionOf>, <owl:complementOf>) or example relation (<rdf:Description>, <rdf:type>), finally, if existed with co-relation, think same research direction, otherwise think different research direction.
As shown in Figure 6, " author's cooperative network figure " take author as actor's node, the weighted graph that paper is tie." author's cooperative network figure " comprises two groups of information: one group is that author gathers N={n 1, n 2... .n n, wherein N is author's number; Another group is paper set L={l 1, l 2...., l n, wherein L is paper number.Flexible strategy in this weighted graph are the absolute value of the difference of the author of two nodes representative weight in the paper of tie representative, and computing method are as follows:
D(i,j,k)=|S(i,k)-S(j,k)|
Wherein, D(i, j, k) be the poor of i author and j author weight in k piece of writing paper, S(i, k) and S(j, k) be respectively i author and j author weight in k piece of writing paper.
As shown in Figure 7, the row and column of " author's scientific cooperation distance matrix " is author ID, and element value is scientific cooperation distance value.Scientific cooperation distance value between author determines by connecting node number on author's shortest path and the weight on limit.The formula that calculates the scientific cooperation distance between author is as follows:
L ( i , j ) = &Sigma; k = 0 N ( k &times; S ( k , k + 1 ) ) , Wherein k is the intermediate node existing on shortest path between node i and j, the number that N is intermediate node.
As shown in Figure 8, " hot research direction map " is to take OWL domain body as basis, research direction is node, and the semantic relation between research direction is tie, and " the focus degree of research direction " determined by two variablees: the one, be engaged in author's number of this research direction, subclass research direction; The 2nd, the distance between subclass research direction and the research direction of node representative.Calculating on the basis of focus degree, usining focus degree as the independent variable of node size value, generating focus research direction map.The computing method of the focus degree of research direction are as follows:
H ( i ) = &pi; &times; &Sigma; k = 0 n ( H ( k ) &times; D ( i , k ) ) 2 , Wherein n is author's number of being engaged in the subclass research direction of i research direction, H (k) is k subclass " the focus degree of research direction ", D (i, k) represent that the middle junction on the shortest path between research direction i and research direction k counts, H(i) represent i the author's number in research direction, and D (i, 0)=1.The method that judges whether same subclass research direction is as follows: first, and the research direction while reading paper publishing from database, and shine upon with field OWL body; Secondly, judge between research direction, whether there is succession (<rdfs:subclassOf>); Again, if the inheritance of existence is thought subclass research direction, otherwise think and be not subclass relation; Then,, if there is subclass research direction, further judge whether subclass research direction also exists less subclass direction.The like, extremely till research direction corresponding to leaf node in OWL body.
As shown in Figure 9, " author's acientific reputation map " is to take the first authors as blazer's node, and other co-workers are the digraph of recipient's node." author's popularity " is determined by the author's quantity with the direct cooperation of this author and each co-worker's popularity, and circular is as follows:
I ( i ) = &Sigma; k = 0 n ( I ( k ) &times; A ( i , k ) )
Wherein, I(i) be i author's popularity, the cooperation author number that n is i author, k the co-worker that k is i author, A (i, k) is the distance between i author and k author.I(0) representative and i the direct cooperation number of author, and A (i, 0)=1.
System of the present invention as shown in figure 10, comprises that ETL module, domain body, unique identification module, author and scientific paper incidence matrix computing module, the academic growth route map of author generation module, author's cooperative network figure generation module, scientific cooperation are apart from generation module, hot research direction map generation module and author's acientific reputation map generation module;
Data are extracted, are changed and load (ETL) module, for the Academic Periodical Papers in target ambit, extract author information, and the author information extracting is carried out format conversion and deposited in author information storehouse;
Domain body is by being set up OWL domain body according to selected target ambit;
Unique identification module, for calculating unique author ID;
Author and scientific paper incidence matrix computing module, for calculating author and scientific paper incidence matrix according to author ID and paper ID;
The academic growth route map of author generation module, for calculating author in publish thesis absolute quantity generate author's science growth route map of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time;
Author's cooperative network figure generation module, for obtaining author's partner networks figure according to author and scientific paper incidence matrix;
Scientific cooperation is apart from generation module, for calculate the scientific cooperation distance between author according to author's partner networks figure;
Hot research direction map generation module, for generating focus research direction map according to OWL domain body, author ID, research direction and focus degree thereof;
Author's acientific reputation map generation module, for generating author's acientific reputation map according to author ID and author's partner networks figure.
The foregoing is only the preferred embodiment of the present invention, but protection domain of the present invention is not limited to this.Any those skilled in the art, in technical scope disclosed by the invention, all can carry out suitable change or variation to it, and this change or change and all should be encompassed in protection scope of the present invention within.

Claims (3)

1. an Academic Periodical Papers author information method for digging, is characterized in that, comprising:
Step 1, select target ambit, sets up OWL domain body;
Step 2, extracts author information the Academic Periodical Papers in target ambit;
Step 3, carries out format conversion to the author information extracting, and deposits in author information storehouse, and calculate unique author ID;
Step 4, calculates author and scientific paper incidence matrix, described author and scientific paper incidence matrix S according to author ID and paper ID m * n=(s ij) m * nrepresent, wherein i and j are respectively paper ID and author ID, and m and n represent respectively paper record and author's number, s ijrepresent author's weight, the computing formula of author's weight is as follows:
S ( i , j ) = 0 n = 0 1 n n > 0 , Wherein, S(i, j) be the author weight of i author in j piece of writing paper, n is the rank order of i author in j piece of writing paper, n=1,2,3 ..., N;
Step 5, calculate author in publish thesis absolute quantity generate the academic growth route map of author of the accumulative total of same research direction according to author and scientific paper incidence matrix, research direction and time, wherein, the publish thesis computing formula of absolute quantity y of i author accumulative total on research direction z is as follows:
in formula, N is the paper sum that i author delivers on research direction z, S(i, j, z) be the author weight of i author in j piece of writing paper, between two research directions, exist inheritance, identity relation or set operation relation to be judged to be same research direction;
Step 6, obtains author's partner networks figure according to author and scientific paper incidence matrix, described author's cooperative network figure comprises that author gathers and paper set, and author is node, and paper is tie, and the weighted value computing method between two nodes are as follows:
D(i,j,k)=|S(i,k)-S(j,k)|;
Wherein, D(i, j, k) be the poor of i author and j author weight in k piece of writing paper, S(i, k) and S(j, k) be respectively i author and j author weight in k piece of writing paper;
Step 7, calculates the scientific cooperation distance between author according to author's partner networks figure, and the computing formula of the scientific cooperation distance between described author is as follows:
L ( i , j ) = &Sigma; k = 0 N ( k &times; S ( k , k + 1 ) ) , Wherein L(i, j) be the scientific cooperation distance between the author that node i and node j are corresponding, the intermediate node of k for existing on shortest path between node i and node j in author's cooperative network figure, the number that N is intermediate node;
Step 8, generates focus research direction map according to OWL domain body, author ID and research direction, and described hot research direction map generates according to following formula:
H ( i ) = &pi; &times; &Sigma; k = 0 n ( H ( k ) &times; D ( i , k ) ) 2
Wherein, n is author's number of being engaged in the subclass research direction of i research direction, H(k) be the focus degree of the research direction of k subclass, D (i, k) represent that the middle junction on the shortest path between research direction i and research direction k counts, H(i) represent i the author's number in research direction, D (i, 0)=1; Research direction corresponding to leaf node in OWL body is subclass research direction;
Step 9, according to author ID and author's partner networks figure, generate author's acientific reputation map, described author's acientific reputation map be take the first authors as blazer's node, the digraph that the co-worker of take is recipient's node, and the computing method of described author's acientific reputation are as follows:
I ( i ) = &Sigma; k = 0 n ( I ( k ) &times; A ( i , k ) )
Wherein, I(i) be i author's popularity, the cooperation author number that n is i author, k is i author's k co-worker, and A (i, k) is the distance between i author and k author, I(0) representative and i the direct cooperation number of author, and A (i, 0)=1.
2. the method for claim 1, is characterized in that, in step 1, OWL domain body comprises inheritance, identity relation and the set operation relation between field term.
3. method as claimed in claim 2, is characterized in that, in step 2, author information comprises author's name, sex, year of birth, native place, academic title, research direction, Article Titles, periodical title, delivers time and author unit one belongs to; In step 3, unique author ID comprises author's name, year of birth, sex, native place, unit one belongs to's title and random code.
CN201210072645.7A 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors Active CN102609546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210072645.7A CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201110408020 2011-12-08
CN201110408020.9 2011-12-08
CN201210072645.7A CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Publications (2)

Publication Number Publication Date
CN102609546A CN102609546A (en) 2012-07-25
CN102609546B true CN102609546B (en) 2014-11-05

Family

ID=46526918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210072645.7A Active CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Country Status (1)

Country Link
CN (1) CN102609546B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359249A (en) * 2018-09-29 2019-02-19 清华大学 The scholar's precise positioning method and device excavated based on scholar's scientific achievement

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020302B (en) * 2012-12-31 2016-03-02 中国科学院自动化研究所 Academic Core Authors based on complex network excavates and relevant information abstracting method and system
CN104156437A (en) * 2014-08-13 2014-11-19 中科嘉速(北京)并行软件有限公司 Academic relationship network construction method based on paper author information extraction and relationship weight model
CN105653590B (en) * 2015-12-21 2019-03-26 青岛智能产业技术研究院 A kind of method that Chinese literature author duplication of name disambiguates
CN105701258A (en) * 2016-03-31 2016-06-22 比美特医护在线(北京)科技有限公司 Information processing method and device
CN106227835B (en) * 2016-07-25 2018-01-19 中南大学 Team's research direction method for digging based on two subnetwork figure hierarchical clusterings
CN106886571A (en) * 2017-01-18 2017-06-23 大连理工大学 A kind of Forecasting Methodology of the scientific cooperation sustainability based on social network analysis
WO2019070925A1 (en) * 2017-10-06 2019-04-11 Elsevier, Inc. Systems and methods for providing recommendations for academic and research entities
CN108510205B (en) * 2018-04-08 2021-07-16 大连理工大学 Author skill evaluation method based on hypergraph
CN108959543A (en) * 2018-07-02 2018-12-07 吉林大学 A kind of scientific cooperation author network partitioning method
CN109376236B (en) * 2018-07-27 2021-10-26 中山大学 Academic paper author weight analysis method based on cluster analysis
CN109741791B (en) * 2018-12-29 2020-10-23 人和未来生物科技(长沙)有限公司 Author subject direction data mining method and system for PubMed theory library
CN110941662A (en) * 2019-06-24 2020-03-31 上海市研发公共服务平台管理中心 Graphical method, system, storage medium and terminal for scientific research cooperative relationship
CN110704643B (en) * 2019-08-23 2022-07-26 上海科技发展有限公司 Method and device for automatically identifying same author of different documents and storage medium terminal
CN111488424A (en) * 2020-03-27 2020-08-04 中国科学院计算技术研究所 Method and system for discovering and tracking people in specific academic field
CN111538917B (en) * 2020-04-20 2022-08-26 清华大学 Learner migration route construction method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101126028B1 (en) * 2004-05-04 2012-07-12 더 보스턴 컨설팅 그룹, 인코포레이티드 Method and apparatus for selecting, analyzing and visualizing related database records as a network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司 Method and system of data retrieval

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359249A (en) * 2018-09-29 2019-02-19 清华大学 The scholar's precise positioning method and device excavated based on scholar's scientific achievement
CN109359249B (en) * 2018-09-29 2020-07-10 清华大学 Precise student positioning method and device based on student scientific research result mining

Also Published As

Publication number Publication date
CN102609546A (en) 2012-07-25

Similar Documents

Publication Publication Date Title
CN102609546B (en) Method and system for excavating information of academic journal paper authors
Li et al. Identifying major factors affecting groundwater change in the North China Plain with grey relational analysis
Gersl et al. Financial stability indicators: advantages and disadvantages of their use in the assessment of financial system stability
Kumar et al. Research collaboration networks of two OIC nations: Comparative study between Turkey and Malaysia in the field of ‘Energy Fuels’, 2009–2011
Fau et al. Transnational dynamics in Southeast Asia: The greater Mekong subregion and Malacca straits economic corridors
van Eymeren et al. Lopsidedness in WHISP galaxies-II. Morphological lopsidedness
Liu et al. A novel focused crawler based on cell-like membrane computing optimization algorithm
Shah et al. The economic value of natural resources and its implications for Pakistan’s economic growth
King et al. An analysis of the potential for the formation of ‘nodes of persisting complexity’
Blanc et al. Is current irrigation sustainable in the United States? An integrated assessment of climate change impact on water resources and irrigated crop yields
Zhu et al. A similarity-based automatic data recommendation approach for geographic models
Toselli et al. Techno-economic analysis of hybrid binary cycles with geothermal energy and biogas waste heat recovery
Botezan et al. Is there sustainable development after mining? a case study of three mining areas in the apuseni region (Romania)
Timperley The fight to end fossil-fuel subsidies
Godyń A revised approach to flood damage estimation in flood risk maps and flood risk management plans, Poland
Wang et al. A Strategy for Variable-Scale InSAR Deformation Monitoring in a Wide Area: A Case Study in the Turpan–Hami Basin, China
Li et al. Fluvial processes of the upper Ying River and its influences on human settlements in the Neolithic age
Stanislawski et al. Classifying physiographic regimes on terrain and hydrologic factors for adaptive generalization of stream networks
Birney et al. A spatially resolved thermodynamic assessment of geothermal powered multi-effect brackish water distillation in Texas
Hirschman Creating Sustainable Climate Change Havens for Migrating Populations in the United States and Other Global Sites
Xiao et al. Graph Neural Network-Based Design Decision Support for Shared Mobility Systems
AAMA et al. Soil fertility evaluation based on soil K, P and organic matter factors for wheat by using fuzzy logic-AHP and GIS techniques
Harvey et al. Spatial cyberinfrastructure: building new pathways for geospatial semantics on existing infrastructures
Geetha et al. A survey on divergent classification of social media networking
Dzhurka et al. Evaluating Impacts of the New Industry on the Regional Economy: Petrochemistry in the Far East

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant