CN102609546B - Method and system for excavating information of academic journal paper authors - Google Patents

Method and system for excavating information of academic journal paper authors Download PDF

Info

Publication number
CN102609546B
CN102609546B CN 201210072645 CN201210072645A CN102609546B CN 102609546 B CN102609546 B CN 102609546B CN 201210072645 CN201210072645 CN 201210072645 CN 201210072645 A CN201210072645 A CN 201210072645A CN 102609546 B CN102609546 B CN 102609546B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
research
author
academic
th
papers
Prior art date
Application number
CN 201210072645
Other languages
Chinese (zh)
Other versions
CN102609546A (en )
Inventor
朝乐门
张勇
邢春晓
孙一钢
朱先忠
Original Assignee
清华大学
国家图书馆
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

本发明公开了一种学术期刊论文作者信息挖掘方法及系统。 The present invention discloses an academic journal authors mining method and system information. 该方法中,首先选择目标学科领域,建立OWL领域本体;其次,从目标学科领域内的学术期刊论文中抽取作者信息;再次,对抽取的作者信息进行格式转换,并存入作者信息库中,并计算出唯一的作者ID;最后利用上述信息得到作者与学术论文关联矩阵,作者学术成长路线图,作者的合作者网络图,作者之间的学术合作距离,热点研究方向地图以及作者学术声望地图。 In this method, first select the target disciplines, the establishment of OWL domain ontology; secondly, academic journal articles from the target of disciplines to extract information; again, the author information extraction format conversion, and author information stored in the library, and calculate the unique author ID; and finally get associated with academic authors matrix using the above information, academic cooperation between the academic growth of the road map, the author's collaborators network diagrams, author distance, focus research and the author's academic reputation map map . 本发明改变了作者信息挖掘方法的数据来源,在作者学术合作距离、热点研究方向的计算过程中引入OWL领域本体技术,提高了语义计算效果。 The present invention changes the method of data mining of information sources, OWL domain ontology technique introduced during the calculation of the distance academic cooperation, research focus direction, improves semantic calculation results.

Description

一种学术期刊论文作者信息挖掘方法及系统 An academic journal authors mining method and system information

技术领域 FIELD

[0001] 本发明涉及知识工程领域,具体涉及一种学术期刊论文作者信息挖掘方法及系统。 [0001] The present invention relates to the field of knowledge engineering, in particular to an academic journal authors mining method and system information.

背景技术 Background technique

[0002] 学术期刊论文作者信息是指正式发表在期刊上的学术论文中给出的作者姓名、性另IJ、出生年份、籍贯、职称和研究方向等基本信息,一般出现在论文首页的脚注或论文最后的尾注位置,如图1所示。 [0002] academic papers of information refers to the author's name was published in the journal papers given, of another IJ, basic information on the year of birth, place of birth, job title and direction of research, generally appear in footnotes or Home the final position of the endnote paper, as shown in FIG. 相对于图书,学术期刊论文中作者信息具有内容简短、格式固定、用词规范等特点。 With respect to books, academic papers and author information content has short, fixed format, standardized terminology and so on.

[0003] 作者与文献之间的数量关系的分析是指以揭示作者与文献数量之间的关系,描述作者的科学生产力为目的信息分析方法。 [0003] Quantitative analysis of the relationship between the author and the literature refers to reveal the relationship between the author and the number of documents, the authors describe for the purpose of scientific productivity information analysis method. 在作者与文献之间的数量关系的分析方面,比较有代表性的是洛特卡定律(Lotka's Law)——作者数量与论文数量的关系遵循一种平方反比规律,即:F(x)=C/x2,其中x、F (X),C分别代表论文数、写X篇论文的作者占作者总数的比例和常数。 In the analysis of the relationship between the number of terms in the literature, is more representative of the Lotka Law (Lotka's Law) - the relationship between the number of authors and papers follow a number of the inverse square law, namely: F (x) = C / x2, where x, F (X), C represent the number of papers, writing papers of authors X and a constant proportion of the total. 在洛特卡定律的基础上,非拉奇等学者提出了影响洛特卡分布的两个因素:一是研究者本人所处的时代或环境直接影响着研究结果;二是统计样本中的作者数量与研究结果有关。 On the basis of Lotka Law on non Rudge and other scholars have proposed two factors that affect the distribution of Lotka: one in which the researcher himself era or the environment directly affects the results; the second is a statistical sample of the author the number of relevant studies. 作者与文献之间的数量关系分析的优点是较好地揭示了作者频率与论文数量之间的关系,缺点是没有分析作者的其他信息,包括出生年份、籍贯、职称、研究方向等信息。 Advantage analysis of the relationship between the number of authors and literature is better reveal the relationship between the frequency and the number of papers, the disadvantage is no additional information about the author's analysis, including information on the year of birth, place of birth, job title, research and so on.

[0004] 普赖斯利用每位作者合作数量的分布来研究合作问题,得出了如下方程: [0004] Price per number of co-authors use to study the distribution cooperation, we reached the following equation:

Figure CN102609546BD00041

[0006] 其中n(x)表示撰写X论文的作者数;I=nmax为该领域内最高产作者的论文总数;N为全部作者的总数。 [0006] where n (x) represents the number of X writing paper; I = Number of Papers of the highest yield in this field nmax; N is the total number of the whole. M=0.749(nmax)0.5。 M = 0.749 (nmax) 0.5. 在普赖斯的研究基础上,学者们提出了合作度、 In the study on the basis of Price's, scholars have proposed cooperation degrees,

合作率的计算公式,具体如下: Cooperation rate is calculated as follows:

[0007] [0007]

Figure CN102609546BD00042

[0009] 尽管上述方法各自有其自身的优缺点,并且各自在不同的情况下有成功运用的案例,但是它们无法满足学术论文作者简介信息挖掘的特殊需要:首先,学术期刊论文中的作者简介信息的内容具有特殊性。 [0009] While the above methods each have their own advantages and disadvantages, and each successful application cases in different situations, but they are unable to meet the special needs of academic papers Author information mining: First, academic journal articles of the author content of the information is unique. 其次,学术期刊论文中的作者简介信息的位置具有特殊性。 Secondly, the location information of the author of journal articles has a particularity. 再次,学术期刊论文中的作者简介信息的格式具有特殊性。 Again, the format of the profile information from academic papers and has a particularity. 最后,学术期刊论文中的作者简介信息的用词具有特殊性。 Finally, the wording of the profile information from academic papers and has a particularity.

发明内容 SUMMARY

[0010] 针对现有技术中存在的上述问题,本发明提供了一种学术期刊论文作者信息挖掘方法及系统。 [0010] In response to these problems in the prior art, the present invention provides an academic journal authors mining method and system information.

[0011] 本发明提供了一种学术期刊论文作者信息挖掘方法,包括: [0011] The present invention provides an academic journal authors information mining method, comprising:

[0012] 步骤I,选择目标学科领域,建立OWL领域本体; [0012] Step I, selecting a target subject areas, establishing OWL domain ontology;

[0013] 步骤2,从目标学科领域内的学术期刊论文中抽取作者信息; [0013] Step 2, extracting author information from journal articles within the target subject areas;

[0014] 步骤3,对抽取的作者信息进行格式转换,并存入作者信息库中,并计算出唯一的作者ID ; [0014] Step 3, for extraction of the information format conversion, and stored in the database of the information, and calculates a unique ID of;

[0015] 步骤4,根据作者ID和论文ID计算作者与学术论文关联矩阵,所述作者与学术论文关联矩阵用Smxn= (Sij)mxn表示,其中i和j分别为论文ID和作者ID,m和η分别代表论文篇数和作者人数,su代表作者权重,作者权重的计算公式如下: [0015] Step 4, according to the author ID and the paper ID is associated with academic matrix calculation of the correlation matrix of the papers = (Sij) mxn represented by Smxn, where i and j are of the paper ID and ID, m the formula and η represent Lunwenpianshuo and the number of representatives of su weight of weights as follows:

O « = 0 O «= 0

[0016] S (i, j) =\ I ,其中,S (i,j)为第i个作者在第j篇论文中的作者 [0016] S (i, j) = \ I, where, S (i, j) of the i-th of the j papers in

-^>0 、n - ^> 0, n

权重,η为第i个作者在第j篇论文中的排名次序,n=l,2,3,…,N ; Weight, η is the i-th order of the j-th ranked in the papers, n = l, 2,3, ..., N;

[0017] 步骤5,根据作者与学术论文关联矩阵、研究方向和年份计算作者在同一研究方向的累计发表论文绝对数量并生成作者学术成长路线图,其中,第i个作者在研究方向ζ上的累计发表论文绝对数量y的计算公式如下: [0017] Step 5, according to the authors matrix associated with academic, research and year Calculation of the absolute number of papers published in the same cumulative research and academic growth of generation roadmap, where the i-th author on the research ζ the formula has published y the absolute number of papers are as follows:

[0018] 式中,N为第i个作者在研究方向Z上发表的论文总数, [0018] where, N is the total number of papers published in the i-th author of the study in the direction Z,

S (i,j,z)为第i个作者在第j篇论文中的作者权重,两个研究方向之间存在继承关系、等同关系或集合运算关系则判定为同一研究方向; S (i, j, z) of the i-th weight of the papers in the j-th weight of succession relations between two research directions, or equivalent relationships to set operation is determined to be the same relationship research;

[0019] 步骤6,根据作者与学术论文关联矩阵得到作者的合作者网络图,所述作者合作网络图包括作者集合和论文集合,作者为结点,论文为纽带,两个结点之间的加权值计算方法如下: [0019] Step 6. Papers of the obtained correlation matrix of FIG partner network, the network of FIG cooperation papers and comprises a set of set of nodes for the paper as a link between two nodes weighting value is calculated as follows:

[0020] D (i, j, k) = | S (i, k) -S (j, k) | ; [0020] D (i, j, k) = | S (i, k) -S (j, k) |;

[0021] 其中,D (i,j,k)为第i个作者和第j个作者在第k篇论文中的权重之差S (i,k)和S (j, k)分别为第i个作者和第j个作者在第k篇论文中的权重; [0021] where, D (i, j, k) is the i th OF and the j th of the sum of weights in the k-th papers in a difference S (i, k) and S (j, k) are the i authors and authors' rights in the j-th k-papers are heavy;

[0022] 步骤7,根据作者的合作者网络图计算作者之间的学术合作距离,所述作者之间的学术合作距离的计算公式如下: [0022] Step 7, according to the formula academic cooperation between the academic cooperation between the calculation of the network of FIG collaborators distance, the distance of the following:

Σ^Υ ,I Σ ^ Υ, I

I_()(kX S(k,k +1)),其中L (i, j)为结点i和结点j对应的作者之间的学术合作k为作者合作网络图中在结点i和结点j之间最短路径上存在的中间结点,N为中间结点的个数; I _ () (kX S (k, k +1)), where j academic cooperation between corresponding k of L (i, j) for the node i and a node of the network of FIG cooperation at node i and j exists between the node on the shortest path intermediate node, N is the number of intermediate nodes;

[0024] 步骤8,根据OWL领域本体、作者ID、研究方向及其热点程度生成热点研究方向地图,所述热点研究方向地图根据下式生成: [0024] Step 8 The OWL ontology, author ID, and the degree of research focus map generating hot research, the research hotspot map generated according to the following formula:

[0025] Hii)= 2-x [0025] Hii) = 2-x

[0026] 其中,η为从事第i个研究方向的子类研究方向的作者个数,H (k)为第k个子类的研究方向的热点程度,D(i,k)代表研究方向i和研究方向k之间的最短路径上的中间结点数,H (i)代表第i个研究方向上的作者人数,D(i,0) = 1,OWL本体中的叶结点对应的研究方向为子类研究方向; [0026] where, [eta] is the i-th research in subclasses of the number of research, research hotspot degree of H (k) is the k subclasses, D (i, k) representative of research and i the middle number of nodes on the shortest path between the research k, the number of H (i) represents the i-th research, D (i, 0) = 1, the leaf node corresponding OWL body of research subclass research;

[0027] 步骤9,作者学术声望地图生成模块,用于根据作者ID以及作者的合作者网络图生成作者学术声望地图,所述作者学术声望地图以第一作者为传播者结点,以合作者为接受者结点的有向图,所述作者学术声望的计算方法如下: [0027] Step 9, academic reputation of the map generating module, according to the author ID and the co network of FIG generation of academic reputation map, the map of academic reputation as a communicator of the first node to collaborators to the recipient node to FIG following, the method of calculating academic reputation:

[0028] I CD = 讀μ [0028] I CD = Read μ

[0029] 其中,I (i)为第i个作者的声望,η为第i个作者的合作作者人数,k为第i个作者的第k个合作者,A(i,k)为第i个作者与第k个作者之间的距离,I (O)代表与第i个作者直接合作人数,且A (i,O) =1。 [0029] where, I (i) is the i-th author's reputation, η is the number of co-authors of the i-th, k is the i-th of the k-th collaborator, A (i, k) is the i the distance between the k-th one of oF, I (O) represents an i-th number of direct cooperation, and a (i, O) = 1.

[0030] 在一个示例中,步骤I中,OWL领域本体包含领域术语之间的继承关系、等同关系和集合运算关系。 [0030] In one example, step I, OWL domain ontology comprising inheritance relationship between the term of art and set operations equivalent relationships to relationships.

[0031] 在一个示例中,步骤2中,作者信息包括作者姓名、性别、出生年份、籍贯、职称、研究方向、论文标题、期刊名称、发表时间以及作者所在单位;步骤3中,唯一的作者ID包括作者的姓名、出生年份、性别、籍贯、所在单位名称以及随机码。 [0031] In one example, in step 2, author information, including the author's name, gender, year of birth, place of birth, job title, research, paper title, journal name, publication time, and Author is located; step 3, the only author ID including the author's name, year of birth, gender, place of origin, where the organization name, and random code.

[0032] 本发明提供了实现上述方法的系统,包括ETL模块、领域本体、唯一标识模块、作者与学术论文关联矩阵计算模块、作者学术成长路线图生成模块、作者合作网络图生成模块、学术合作距离生成模块、热点研究方向地图生成模块以及作者学术声望地图生成模块; [0032] The present invention provides a system for realizing the method, the ETL module, a domain ontology, a unique identification module, of module associated with the matrix calculation papers, academic growth of the route map generating module, cooperation of the network map generation module, academic cooperation from generation module, a hot research direction map generation module and the author's academic reputation map generation module;

[0033] ETL模块,用于从目标学科领域内的学术期刊论文中抽取作者信息,对抽取的作者信息进行格式转换并存入作者信息库中; [0033] ETL module for the extraction of information from academic papers within the target subject areas, the author information extraction format conversion and stored in the database of information;

[0034] 领域本体为根据所选择的目标学科领域所建立OWL领域本体; [0034] OWL domain ontology domain ontology is established according to the selected target disciplines;

[0035] 唯一标识模块,用于计算出唯一的作者ID ; [0035] The unique identification module, for calculating the unique ID of the author;

[0036] 作者与学术论文关联矩阵计算模块,用于根据作者ID和论文ID计算作者与学术论文关联矩阵; [0036] Papers of the correlation matrix calculating module, according to the paper ID and the ID of calculation of the matrix associated with the papers;

[0037] 作者学术成长路线图生成模块,用于根据作者与学术论文关联矩阵、研究方向和年份计算作者在同一研究方向的累计发表论文绝对数量并生成作者学术成长路线图; [0037] On the academic growth road map generation module, according to the authors for papers associated with the matrix, and calculate the year of research papers published in the same cumulative research of the absolute number of academic growth and generate a road map;

[0038] 作者合作网络图生成模块,用于根据作者与学术论文关联矩阵得到作者的合作者网络图; [0038] FIG OF cooperative network generating module configured to obtain a view of a network of collaborators correlation matrix of the papers;

[0039] 学术合作距离生成模块,用于根据作者的合作者网络图计算作者之间的学术合作距离; [0039] academic cooperation from generating module, for calculating the distance of academic cooperation between the partners of the network according to FIG;

[0040] 热点研究方向地图生成模块,用于根据OWL领域本体、作者ID、研究方向及其热点程度生成热点研究方向地图;。 [0040] Research Direction map generation module, according to OWL ontology, author ID, and the degree of hot spot formation research Research Direction map;.

[0041] 作者学术声望地图生成模块,用于根据作者ID以及作者的合作者网络图生成作者学术声望地图。 [0041] On the academic reputation map generation module for generating maps based on academic reputation of the author and the author's ID collaborators network diagram.

[0042] 综上,本方法的主要优点在于:1)突破传统文献计量和信息计量方法对作者简介信息的重视不够的现象,提出了一种面向学术论中的作者简介的信息挖掘方法,改变了作者信息挖掘方法的数据来源。 [0042] In summary, the main advantages of this approach are: 1) breaking the traditional bibliometric information and measurement methods focus on the author's information is not enough phenomenon, a mining oriented information Author academic theory of change the sources of data mining information. 2)在作者学术合作距离、热点研究方向的计算过程中引入OWL领域本体技术,提高了语义计算效果。 2) introducing OWL domain ontology in the art from the calculation of academic cooperation, research focus direction, improves semantic calculation results. 3)提出了基于作者简介信息的作者唯一标识码、学者成长路线、学术合作距离、热点方向的计算方法,拓展了作者信息挖掘的研究视角。 3) Author Author information presented unique identifier, scholars growth path, academic cooperation distance calculation method based on the direction of the hot, expanding the perspective of research information mining. 因此,与前述的文献计量和信息计量方法相比,本方法可以更好地满足学术论文作者信息挖掘的需要。 Therefore, compared with the literature metering information and measurement methods, the present method can better meet the needs of academic information mining.

附图说明 BRIEF DESCRIPTION

[0043] 下面结合附图来对本发明作进一步详细说明,其中: [0043] DRAWINGS The present invention will be described in further detail, in which:

[0044] 图1是本发明所述学术论文中的作者简介信息示意图; [0044] FIG. 1 is a schematic view of the profile information Academic present invention;

[0045] 图2是本发明所述学术论文作者信息挖掘基本步骤示意图; [0045] FIG. 2 is a schematic diagram of the present invention, data mining papers basic steps;

[0046] 图3是本发明所述学术论文作者信息挖掘系统的ER图; [0046] FIG. 3 is an ER diagram of the present invention, Academic mining system information;

[0047] 图4是本发明所述“作者与学术论文关联矩阵”示意图; [0047] FIG. 4 of the present invention is a "Papers of the correlation matrix" schematic;

[0048] 图5是本发明所述“作者学术成长路线图”示意图; [0048] FIG. 5 of the present invention is a "road map of academic growth" a schematic view;

[0049] 图6是本发明所述“作者合作网络图”示意图; [0049] FIG. 6 of the present invention is a "cooperative network of FIG." Schematic;

[0050] 图7是本发明所述“作者学术合作距离矩阵”示意图; [0050] FIG. 7 is the "distance matrix of academic cooperation" of the present invention is a schematic diagram;

[0051] 图8是本发明所述“热点研究方向地图”示意图; [0051] FIG. 8 of the present invention is a "hot spot map directions" a schematic view;

[0052] 图9是本发明所述“作者学术声望地图”的示意图; [0052] FIG. 9 is "academic reputation of the map" of the present invention, a schematic diagram;

[0053] 图10是本发明所述“学术论文作者信息挖掘系统”的示意图。 [0053] FIG. 10 is a schematic diagram of the present invention, "Papers of information mining system".

具体实施方式 detailed description

[0054] 本发明提出了学术期刊论文作者简介信息挖掘方法如图2所示,该方法包括下述步骤: [0054] The present invention proposes the steps of academic papers profile information mining 2, the method comprising:

[0055] 步骤(I),根据需求选择特定学科领域,采用OWL技术建立领域本体。 [0055] Step (the I), according to the demand to select a particular subject area, using technology OWL domain ontology. 在构建领域本体时需要考虑与本领域研究方向对应的术语及其相互关系。 When a domain ontology terms need to be considered corresponding to the direction of research in the art and their relationships. 领域本体的形式化表示必须指明类(或属性)之间的继承、等同、交叉关系、属性与类之间的所属关系、类与实例之间对应关系、属性间的传递、对称、函数和反函数关系、类集合运算关系。 Formal domain ontology representation must indicate inheritance between classes (or properties), equivalent to a correspondence between cross ties, those attributes and relationships between classes, classes and instances, transfer between attributes, symmetry, and inverse function function, class set operation relationship.

[0056] 步骤(2),从特定领域学术期刊论文中抽取作者简介信息,包括作者姓名、性别、出生年份、籍贯、职称、研究方向、论文标题、期刊名称、发表时间、作者所在单位。 [0056] Step (2), drawn from the academic journal articles in specific areas of the profile information, including the author's name, gender, year of birth, place of birth, job title, research, paper title, journal name, publication time, units of lies. 不同信息的抽取位置可能有所不同。 Extracting different location information may differ. 作者姓名、性别、出生年份、籍贯、职称、研究方向等信息从学术论文的作者简介部分抽取;论文标题、期刊名称、发表时间和作者所在单位分别从对应的位置抽取。 Author's name, gender, year of birth, place of origin, titles, and other research papers drawn from the author's part; paper title, journal name, Published Author and where the samples were obtained from the corresponding position.

[0057] 步骤(3),对抽取的作者简介信息进行格式转换,并存入作者信息库中。 [0057] Step (3), information on the authors extracted format conversion, and author information stored in the library. 设计一个或多个信息表,用于存放作者信息;抽取后的作者姓名、性别、籍贯、职称、研究方向、论文标题、期刊名称、作者所在单位转换为字符串类型;抽取后的出生年份和发表时间转换为日期类型;格式转换后,将作者信息放入对应的信息表。 Designing one or more information tables for storing of information; the author's name after the extraction, sex, place of birth, job title, research, paper title, journal name, author unit where converted to a string type; after extraction and year of birth Posted converted to date type; the format conversion, the corresponding author information into the information table.

[0058] 步骤(4),计算作者唯一标识码,识别同一个作者和区分不同作者。 [0058] Step (4) Calculation of the unique identification code, identifying and distinguishing the same OF OF. 通过对计算姓名、出生年份、性别、籍贯、职称、研究方向、作者所在单位进行函数计算,得出每个作者的唯一标识码;将唯一标识存入作者信息表中。 By computing the name, year of birth, sex, place of birth, job title, research direction, a function of the unit where calculated the unique identification code for each author; the unique identification of the information stored in the table.

[0059] 步骤(5),以论文ID为行、作者ID为列,计算出“作者与学术论文关联矩阵”,即SmXn= (Su)mxn,其中i和j分别为论文ID和作者ID,m和η分别代表论文篇数和作者人数,su代表“作者权重”。 [0059] Step (5) to the paper ID of the row, column author ID to calculate the "Papers of the correlation matrix", i.e. SmXn = (Su) mxn, where i and j are of the paper ID and ID, m and η represent Lunwenpianshu and the number of representatives su "of weight." “作者权重” 由作者在对应学术论文中的排名次序决定。 "Author weight" is determined by the order in the corresponding author of academic papers. 下述内容中,除明确指出的除外,在计算时提到的作者均为作者ID。 The following, in addition to excluding it clear that the authors are mentioned in the calculation of the ID.

[0060] 步骤(6),根据“作者与学术论文关联矩阵”,X轴为年份,y轴为第i个作者在研究方向Z上的“累积发表论文绝对数量”,采用函数y=fSuW (x, z, i)生成“作者学术成长路线图”。 [0060] Step (6) The "Papers of the correlation matrix", year of X-axis, y-axis of the i-th direction Z in the study "published cumulative absolute number", using the function y = fSuW ( x, z, i) generation "of academic growth roadmap." 研究方向ζ上的“累积发表论文绝对数量”由已发表论文数量和论文中的作者排名次序决定。 "Cumulative absolute number of papers published," the research ζ determined by the number of published papers and papers in order of rank. 判断是否为同一个研究方向的方法如下:首先,从数据库中读取论文发表时的研究方向,并与领域OWL本体进行映射;其次,判断研究方向之间是否存在继承(〈rdfs: subclassOf〉)、等同(〈owl: equivalentClass〉)、集合运算(〈owl:disjointffith>>〈owl:un1n0f>>〈owl:1ntersect1nOf〉、〈owl:complement0f>)或实例关系(〈rdf:Descript1n〉、〈rdf: type〉);最后,如果存在以上关系,贝U认为同一个研究方向,否则认为不同的研究方向。 As a method of determining whether a same direction of research: Firstly, when reading the research paper published from the database, and map the art OWL ontology; secondly, determining whether there is inheritance between research (<rdfs: subclassOf>) equivalent (<owl: equivalentClass>), set operations (<owl: disjointffith >> <owl: un1n0f >> <owl: 1ntersect1nOf>, <owl: complement0f>) or an instance relation (<rdf: Descript1n>, <rdf: type>); and finally, if there is more than relationship, Tony U think the same research, or that the different research directions.

[0061] 步骤(7),生成“作者合作网络图”。 [0061] Step (7) to generate the "author cooperation network map." “作者合作网络图”是以作者为行动者结点,论文为纽带的加权图。 "On cooperation network map" is authored by actors nodes, weighted graph paper as a link. 因此,“作者合作网络图”包括两组信息:一组是作者集合N= In1, n2,....nN},其中N为作者数;另一组是论文集合其中L为论文数。 Thus, "co-authors network map" includes two sets of information: one group of the set of N = In1, n2, .... nN}, where N is the number of; the other group is a collection of papers where L is the number of papers. 作者合作网络图中的每一个纽带的权重值由两个结点代表的作者在纽带代表的论文中的权重之差的绝对值决定。 Weight value of each bond of cooperation network diagram of the absolute value of the difference between the weight of the decision in the paper bond represented by two representatives of the nodes.

[0062] 步骤(8),计算作者之间的“作者学术合作距离”。 (8), "the authors academic cooperation distance" between the calculation of the [0062] step. 以“作者合作网络图”为基础,计算作者之间的学术合作距离值,并生成“作者学术合作距离矩阵”。 To "author cooperation network map" based on academic cooperation between the calculated value of the distance, and generates a "distance matrix of academic cooperation." 作者间的学术合作距离值由连接作者的最短路径上的结点个数和边上的权重决定。 Academic Cooperation between the distance value of the weight determined by the weight and the number of nodes in the shortest path on the edge of the connection.

[0063] 步骤(9),生成“热点研究方向地图”。 [0063] Step (9), generates a "hot spot map directions." 以OWL领域本体为基础,以研究者为结点、研究方向为纽带,生成“热点研究方向地图”。 In OWL domain ontology as the basis, the nodes as the researcher, research as a link, to generate a "hot spot map research." “研究方向的热点程度”由两个变量决定:一是从事该研究方向、子类研究方向的作者人数;二是子类研究方向与结点所代表的研究方向之间的距离。 "Hot extent of research" is determined by two variables: one is engaged in the research, the number of sub-class research directions; the second is the distance between the research represented a subclass of research and nodes. 判断同一个研究方向、其子类方向的方法是研究方法映射到OWL领域本体后,领域本体中是否存在〈rdfs: subclassOf〉或〈owl: equivalentClass〉。 Analyzing the same research process subclasses direction is mapped to a research method OWL domain ontology, the domain ontology in the presence or absence of <rdfs: subclassOf> or <owl: equivalentClass>.

[0064] 步骤(10),计算作者学术声望。 [0064] Step (10), the calculation of academic prestige. 以第一作者为传播者结点,同一篇论文中的其他合作作者为接受者结点,生成“作者学术声望地图”。 The first author communicator nodes, with other co-authors of a paper in the recipient node to generate the "academic reputation of the map." 作者学术声望值由与该作者直接合作的作者数量和每个合作者的声望决定。 On the academic prestige value is determined by the amount of each partner's popularity and working directly with the author.

[0065] 下面结合附图和实例,对本发明的具体实施方式作进一步详细说明。 [0065] accompanying drawings and the following examples, specific embodiments of the present invention will be further described in detail. 以下实例用于说明本发明,但并不用来限制本发明的范围。 The following examples serve to illustrate the present invention but are not intended to limit the scope of the invention.

[0066] 如图2所示,学术论文作者信息的挖掘需要OWL领域本体技术的支持。 [0066] 2, the excavation of academic information required to support the body OWL art techniques. 因此,在分析学术论文作者信息之前,需要准备领域本体。 Therefore, prior to analysis of academic information, we need the domain ontology. 构建OWL领域本体时,采用标记〈rdfs: subclassOf〉、〈owl: equivalentClass〉、<owl: disjointffith> 分别标识类之间的继承、等同、交叉关系;采用标记〈rdfs: subPropertyOf〉、〈owl: equivalentProperty〉、〈owl:1nverseOf〉分别表示属性之间的继承、等同、互逆关系;采用标记〈rdfs: domain〉、<rdfs:range)分别表示属性与类之间关系;采用标记〈rdf!Descript1n〉、<rdf:type)表不类与实例之间关系;米用标记owl: TransitiveProperty、owl: SymmetricProperty、owl: Funct1nalProperty 和owl:1nverseFunct1nalProperty 分别表不属性间的传递、对称、函数和反函数关系;采用标记〈owl:un1nOf>、〈owl:1ntersect1nOf〉、〈owl: complementOf>表示集合运算关系。 When building OWL domain ontology using tag <rdfs: subclassOf>, <owl: equivalentClass>, <owl: disjointffith> inheritance between classes are identified, equivalents, crossed relationship; tag using <rdfs: subPropertyOf>, <owl: equivalentProperty >, <owl: 1nverseOf> denote inherited attributes between, equivalents, inverse relationship; tag using <rdfs: domain>, <rdfs: range) indicates a relationship between attributes and classes respectively; labeled using <rdf Descript1n!> , <rdf: the relationship between the type) are not class table example; m mark owl: TransitiveProperty, owl: SymmetricProperty, owl: Funct1nalProperty and owl: 1nverseFunct1nalProperty properties are not transferred between the table, symmetric, and the inverse function of the function; using tag <owl: un1nOf>, <owl: 1ntersect1nOf>, <owl: complementOf> represents the set of operation rules.

[0067] 如图3所示,抽取和转换后的作者姓名、性别、出生年份、籍贯、职称、研究方向、论文标题、期刊名称、发表时间、作者所在单位信息分别存入作者表、论文表、论文与作者对照表、职称表、作者与职称对照表、部门表、作者与部门对照表、研究方向表、作者与研究方向对照表、期刊表等十个关系表中。 [0067] As shown in Figure 3, the author's name after extraction and conversion, gender, year of birth, place of birth, job title, research, paper title, journal name, publication time of the unit of information are stored in the table, the paper table , papers and authors table, table titles, authors and titles of the table, the table departments, and departments of the table, study table direction, the direction of the research table, journal table in ten relational tables. 上述十个关系表的模式分别为:作者(作者ID,作者姓名,出生年月,籍贯)、论文(论文ID,论文题目,期刊ID,发表日期)、作者与论文对照表(作者ID,论文ID、作者排名)、职称(职称ID,职称名称)、作者与职称对照表(职称ID,作者ID,论文ID)、部门(部门ID,部门名称,所在城市,邮编)、作者与部门对照表(作者ID,部门ID,论文ID)、研究方向(研究方向ID,研究方向名称,论文ID,作者ID,本体URI)、作者与研究方向对照表(研究方向ID,作者ID,论文ID )、期刊表(期刊名称、ISBN、创办日期)。 Ten Relations mode above table were: Author (author ID, author's name, date of birth, place of origin), paper (paper ID, paper title, journal ID, date of publication), author of the paper table (author ID, papers ID, author ranking), title (title ID, title name), the authors and titles of the table (title ID, author ID, the paper ID), department (department ID, department name, city, zip code), author and department table (author ID, sector ID, the paper ID), research (research direction ID, research the name, the paper ID, author ID, ontology URI), the author and the research table (research ID, author ID, the paper ID), Periodical table (journal name, ISBN, founder of date).

[0068] 作者唯一标识码由姓名、出生年份、性别、籍贯、所在单位名称字符串决定,具体计算公式如下: [0068] On the unique identification code, year of birth, sex, place of birth, the name of the unit where the string is determined by the name, the specific formula is as follows:

[0069] AID (i)=StrConn (NameStr (N (i)),BirthStr (Y (i)),SexStr (S (i)),AffStr(A (i)),Ram (i)),其中AID (i)为第i 个作者的唯一标识码,N (i)、Y (i)、S (i)、A [0069] AID (i) = StrConn (NameStr (N (i)), BirthStr (Y (i)), SexStr (S (i)), AffStr (A (i)), Ram (i)), wherein AID (i) is a unique identifier of the i-th, N (i), Y (i), S (i), a

(i)分别代表第i个作者的姓名、出生年份、性别、籍贯和所在单位名称,函数NameStr ()、BirthStr O,SexStr ()、AffStr ()分别为作者姓名、出生年月、性别和所在单位的散列函数,Ram ()为一个五位随机码,用于区分在同一个单位的同名作者。 (I) represent the i-th author's name, year of birth, gender, place of origin and the name of the unit, function NameStr (), BirthStr O, SexStr (), AffStr () are the author's name, date of birth, gender and location the hash function unit, Ram () is a five-digit random code, used to distinguish the same name of the same unit.

[0070] 如图4所示,以论文ID为行、作者ID为列,计算出“作者与学术论文关联矩阵”,即Smxn= (Su)mxn,其中i和j分别为论文ID和作者ID,m和η分别代表论文篇数和作者人数,Sij代表“作者权重”。 [0070] As shown in FIG. 4, the paper ID of the row, column author ID to calculate the "Papers of the correlation matrix", i.e. Smxn = (Su) mxn, where i and j are of the paper ID and the ID , m and η represent Lunwenpianshuo and the number of representatives Sij "of weight." “作者权重” \由作者在对应学术论文中的排名次序决定。 "Author weight" \ is determined by the order in the corresponding author of academic papers. “作者权重”的具体计算公式如下: "Author weight" of the specific formula is as follows:

[0071] [0071]

Figure CN102609546BD00091

(其中,S (i,j)为第i个作者在第j篇论文中的作者权重,η为第i个作者在第j篇论文中的排名次序,n=l,2,3,…,N)。 (Wherein, S (i, j) of the i-th weight of the j-th weight of the papers, η of the i-th order in the papers in the j, n = l, 2,3, ..., N).

[0072] 如图5所示,“作者学术成长路线图”是二维曲线图,X轴为年份,y轴为第i个作者在研究方向ζ上的“累积发表论文绝对数量”,采用函数y=fSuW (χ,ζ,ΐ)自动生成“作者学术成长路线图”。 [0072] shown in Figure 5, "the authors academic growth road map" is a two-dimensional graph, X-axis is the year, y-axis is the i-authors on the study of direction ζ "cumulative absolute number of published papers", the use of function y = fSuW (χ, ζ, ΐ) automatic generation of "academic growth of the road map." “第i个作者在研究方向ζ上的累积发表论文绝对数量y的具体计算公式如下: "I-authors published a specific formula for calculating the absolute number of papers y follows in cumulative research on the ζ direction:

[0073] [0073]

Figure CN102609546BD00092

,其中N 为第i 个作者在研究方向Z , Where N is the i-th direction of Z in the study

上发表的论文总数,S (i,j,ζ)为第i个作者在第j篇论文中的“作者权重”。 The total number of papers published, S (i, j, ζ) is the i-th of the j-papers "Author weight." 其中,判断是否同一个研究方向的方法如下:首先,从数据库中读取论文发表时的研究方向,并与领域OWL本体进行映射;其次,判断研究方向之间是否存在继承(〈rdfs: subclassOf〉)、等同(〈owl: equivalentClass〉)、集合运算(〈owl: disjointWith〉、〈owl:un1nOf>、〈owl:1ntersect1nOf〉、〈owl: complementOf〉)或实例关系(〈rdf: Descript1n〉、〈rdf:type>);最后,如果存在以上关系,则认为同一个研究方向,否则认为不同的研究方向。 Wherein Methods of determining whether a same direction as follows: First, when reading the research paper published from the database, and map the art OWL ontology; secondly, the existence of inheritance (between Analyzing research <rdfs: subclassOf> ), identical (<owl: equivalentClass>), set operations (<owl: disjointWith>, <owl: un1nOf>, <owl: 1ntersect1nOf>, <owl: complementOf>) or an instance relation (<rdf: Descript1n>, <rdf : type>); and finally, if there is more than relationship is considered the same research direction, or that the different research directions.

[0074] 如图6所示,“作者合作网络图”是以作者为行动者结点,论文为纽带的加权图。 [0074] As shown in Figure 6, "the authors cooperative network map" is authored by actors nodes, weighted graph paper as a link. “作者合作网络图”包括两组信息:一组是作者集合Ν={ηι,η2,....%},其中N为作者数;另一组是论文集合L={11; I2,….,ln},其中L为论文数。 "OF cooperative network map" includes two sets of information: the set of one group of Ν = {ηι, η2, ....%}, where N is the number of; the other group is a collection of papers L = {11; I2, ... ., ln}, where L is the number of papers. 在此加权图中的权数为两个结点代表的作者在纽带代表的论文中的权重之差的绝对值,计算方法如下: The weights in the weighting FIG absolute value of the difference between the weight of paper in the two bonds represented by nodes representative of, calculated as follows:

[0075] D (i, j, k) = | S (i, k) -S (j, k) [0075] D (i, j, k) = | S (i, k) -S (j, k)

[0076] 其中,D (i,j,k)为第i个作者和第j个作者在第k篇论文中的权重之差,S (i,k)和S (j, k)分别为第i个作者和第j个作者在第k篇论文中的权重。 [0076] wherein the weight of D (i, j, k) is the i th OF and the j th of the right in the k-th papers of the difference, S (i, k) and S (j, k) respectively for the first i j th authors and authors' rights in the k papers in weight.

[0077] 如图7所示,“作者学术合作距离矩阵”的行和列均为作者ID,元素值为学术合作距离值。 [0077] As shown in FIG 7, "Academic Cooperation distance matrix of" rows and columns are of ID, elements of the value academic cooperation distance value. 作者间的学术合作距离值由连接作者的最短路径上的结点个数和边上的权重决定。 Academic Cooperation between the distance value of the weight determined by the weight and the number of nodes in the shortest path on the edge of the connection. 计算作者之间的学术合作距离的公式如下: Formula of academic cooperation between the calculated distance as follows:

[0078] L (i,j) =^Nk=o(kxS(k,k + l)),其中k为在结点i和j之间最短路径上存在的中间结点,N为中间结点的个数。 [0078] L (i, j) = ^ Nk = o (kxS (k, k + l)), where k is an intermediate node existing on the shortest path between the node i and j, N is the intermediate node number.

[0079] 如图8所示,“热点研究方向地图”是以OWL领域本体为基础,研究方向为结点,研究方向之间的语义关系为纽带,“研究方向的热点程度”由两个变量决定:一是从事该研究方向、子类研究方向的作者人数;二是子类研究方向与结点所代表的研究方向之间的距离。 [0079] As shown, the "hot spot map research" is 8 OWL domain ontology based research as nodes, semantic relationship between research as a link, "the degree of focus of research" by the two variables decision: First, do the research, the number of sub-class research directions; the second is the distance between the research represented a subclass of research and nodes. 在计算热点程度的基础上,以热点程度作为结点面积大小值的自变量,生成热点研究方向地图。 On the basis of the calculation of the degree of focus, degree as an argument to the hot junction area size value, generating a map research hotspot. 研究方向的热点程度的计算方法如下: The method of calculating the degree of focus of the research is as follows:

[0080] H (i) =TTX^: ^//从^/^^/'其中卩为从事第土个研究方向的子类研究方向的作者个数,H(k)为第k个子类“研究方向的热点程度”,D(i,k)代表研究方向i和研究方向k之间的最短路径上的中间结点数,H (i)代表第i个研究方向上的作者人数,且D(i,0) = I。 [0080] H (i) = TTX ^: number of subclasses of research from ^ // ^ / ^ ^ / 'engaged Jie wherein the first soil research directions, H (k) is the k th class " hot degree "research, the intermediate junction points on the shortest path between D (i, K) representative of research and research i k, the number of H (i) represents the i-th research, and D ( i, 0) = I. 判断是否同子类研究方向的方法如下:首先,从数据库中读取论文发表时的研究方向,并与领域OWL本体进行映射;其次,判断研究方向之间是否存在继承(〈rdfs: subclassOf〉);再次,如果存在继承关系,则认为是子类研究方向,否则认为不是子类关系;接着,如果存在子类研究方向,进一步判断子类研究方向是否还存在更小的子类方向。 Determining whether the same subclass research are as follows: First, when reading the research paper published from the database, and map the art OWL ontology; secondly, determining whether there is inheritance between research (<rdfs: subclassOf>) ; again, if there is inheritance, then that is a subclass of research, or do not think subclass relationships; then, if there is a subclass of research, further research to determine whether there is a subclass of smaller sub-class direction. 依次类推,至到OWL本体中的叶结点对应的研究方向为止。 And so on, until the research to the leaf node corresponding to OWL body.

[0081] 如图9所示,“作者学术声望地图”是以第一作者为传播者结点,其他合作者为接受者结点的有向图。 [0081] As shown in Figure 9, "the authors academic reputation map" is the first author communicator nodes, other collaborators as the recipient node directed graph. “作者声望”由与该作者直接合作的作者数量和每个合作者的声望决定,具体计算方法如下: "On prestige" is determined by the number of authors and each partner's popularity directly with the author, the specific calculated as follows:

[0082] I⑴=2Lo师)x雄. [0082] I⑴ = 2Lo division) x male.

[0083] 其中,I (i)为第i个作者的声望,η为第i个作者的合作作者人数,k为第i个作者的第k个合作者,A(i,k)为第i个作者与第k个作者之间的距离。 [0083] where, I (i) is the i-th author's reputation, η is the number of co-authors of the i-th, k is the i-th of the k-th collaborator, A (i, k) is the i the distance between the authors and the k-authors. I (0)代表与第i个作者直接合作人数,且A (i,0) =1。 I (0) represents the i-th of the number of direct cooperation, and A (i, 0) = 1.

[0084] 本发明的系统如图10所示,包括ETL模块、领域本体、唯一标识模块、作者与学术论文关联矩阵计算模块、作者学术成长路线图生成模块、作者合作网络图生成模块、学术合作距离生成模块、热点研究方向地图生成模块以及作者学术声望地图生成模块; [0084] shown, the ETL module, a domain ontology, a unique identification module, of module associated with the matrix calculation papers, academic growth of the route map generating module, cooperation of the network map generation module, academic cooperation system according to the invention shown in FIG 10 from generation module, a hot research direction map generation module and the author's academic reputation map generation module;

[0085] 数据提取、转换和加载(ETL)模块,用于从目标学科领域内的学术期刊论文中抽取作者信息,对抽取的作者信息进行格式转换并存入作者信息库中; [0085] data extraction, transformation, and loading (ETL) module for the extraction of information from academic papers within the target subject areas, the author information extraction format conversion and stored in the database of information;

[0086] 领域本体为根据所选择的目标学科领域所建立OWL领域本体; [0086] OWL domain ontology domain ontology is established according to the selected target disciplines;

[0087] 唯一标识模块,用于计算出唯一的作者ID ; [0087] The unique identification module, for calculating the unique ID of the author;

[0088] 作者与学术论文关联矩阵计算模块,用于根据作者ID和论文ID计算作者与学术论文关联矩阵; [0088] Papers of the correlation matrix calculating module, according to the paper ID and the ID of calculation of the matrix associated with the papers;

[0089] 作者学术成长路线图生成模块,用于根据作者与学术论文关联矩阵、研究方向和年份计算作者在同一研究方向的累计发表论文绝对数量并生成作者学术成长路线图; [0089] On the academic growth road map generation module, according to the authors for papers associated with the matrix, and calculate the year of research papers published in the same cumulative research of the absolute number of academic growth and generate a road map;

[0090] 作者合作网络图生成模块,用于根据作者与学术论文关联矩阵得到作者的合作者网络图; [0090] FIG OF cooperative network generating module configured to obtain a view of a network of collaborators correlation matrix of the papers;

[0091] 学术合作距离生成模块,用于根据作者的合作者网络图计算作者之间的学术合作距离; [0091] academic cooperation from generating module, for calculating the distance of academic cooperation between the partners of the network according to FIG;

[0092] 热点研究方向地图生成模块,用于根据OWL领域本体、作者ID、研究方向及其热点程度生成热点研究方向地图; [0092] Research Direction map generation module, according to OWL ontology, author ID, and the degree of hot spot formation research Research Direction map;

[0093] 作者学术声望地图生成模块,用于根据作者ID以及作者的合作者网络图生成作者学术声望地图。 [0093] On the academic reputation map generation module for generating maps based on academic reputation of the author and the author's ID collaborators network diagram.

[0094] 以上所述仅为本发明的优选实施方式,但本发明保护范围并不局限于此。 [0094] The above preferred embodiments of the invention only, but the scope of the present invention is not limited thereto. 任何本领域的技术人员在本发明公开的技术范围内,均可对其进行适当的改变或变化,而这种改变或变化都应涵盖在本发明的保护范围之内。 Anyone skilled in the art within the technical scope of the present disclosure, may be appropriate modifications or changes, and this change or variations are encompassed within the scope of the present invention.

Claims (3)

  1. 1.一种学术期刊论文作者信息挖掘方法,其特征在于,包括: 步骤I,选择目标学科领域,建立OWL领域本体; 步骤2,从目标学科领域内的学术期刊论文中抽取作者信息; 步骤3,对抽取的作者信息进行格式转换,并存入作者信息库中,并计算出唯一的作者ID ; 步骤4,根据作者ID和论文ID计算作者与学术论文关联矩阵,所述作者与学术论文关联矩阵用Smxn= (Sij) mXn表示,其中i和j分别为论文ID和作者ID,m和η分别代表论文篇数和作者人数,Sij代表作者权重,作者权重的计算公式如下: S"(.■) =Ow = O1? j II n '其中,s (i,j)为第i个作者在第j篇论文中的作者权重, —η > 0.ηη为第i个作者在第j篇论文中的排名次序,n=l,2,3,…,N ; 步骤5,根据作者与学术论文关联矩阵、研究方向和年份计算作者在同一研究方向的累计发表论文绝对数量并生成作者学术成长路线图, An academic papers OF information mining method, characterized by comprising: a step I, selecting a target subject areas, establishing OWL domain ontology; step 2, extracting information from academic papers of the target subject area; Step 3 , author information extraction format conversion, and stored in the database of the information, and calculates a unique ID of;. 4, according to the author ID and the paper ID and calculation of correlation matrix papers, papers associated with the step of matrix = (Sij) mXn represented by Smxn, where i and j are the paper ID and author ID, m, and η represent Lunwenpianshu many authors, Sij of the representative of the weight, the formula weight of weights as follows: S "(. ■) = Ow = O1? j II n 'wherein, s (i, j) of the i-th weight of the j-th weight of the papers, -η> 0.ηη i-th papers of the j the ranking order, n = l, 2,3, ..., N; step 5, according to the authors matrix associated with academic, research and year calculation author published in the same cumulative research of the absolute number of academic growth and generate course Figure, 其中,第i个作者在研究方向z上的累计发表论文绝对数量I的计算公式如下: Υ=Σ^ο5(〖,./,ζ),式中,N为第i个作者在研究方向ζ上发表的论文总数,S (i,j,ζ)为第i个作者在第j篇论文中的作者权重,两个研究方向之间存在继承关系、等同关系或集合运算关系则判定为同一研究方向; 步骤6,根据作者与学术论文关联矩阵得到作者的合作者网络图,所述作者合作网络图包括作者集合和论文集合,作者为结点,论文为纽带,两个结点之间的加权值计算方法如下: D (i, j, k) = | S (i, k) -S (j, k) | ; 其中,D (i,j,k)为第i个作者和第j个作者在第k篇论文中的权重之差,S (i,k)和S (j, k)分别为第i个作者和第j个作者在第k篇论文中的权重; 步骤7,根据作者的合作者网络图计算作者之间的学术合作距离,所述作者之间的学术合作距离的计算公式如下: L (i,j) =H ()(々X + I) Wherein, the i-th OF accumulated in the research direction z published absolute number of papers formula I as follows: Υ = Σ ^ ο5 (. 〖, /, Ζ), where, N is the i-th OF [zeta] in research the total number of papers published, S (i, j, ζ) is the i-th of authorship in the j-papers in the heavy inheritance relationship exists between the two research directions, equal relationships or relationships set operation is determined as the same study direction; step 6 the papers of the obtained correlation matrix of FIG partner network, the network of FIG cooperation papers and comprises a set of set of nodes for the paper as a link between two nodes weighting value is calculated as follows: D (i, j, k) = | S (i, k) -S (j, k) |; wherein, D (i, j, k) of the i-th and j-th OF right at the k-th papers of weight difference, S (i, k) and S (j, k) are the i-th oF and the j th of the right in the k-th papers in weight; step 7, according to the authors FIG collaborators computing network of academic cooperation between the distance, the calculation formula of academic cooperation between the distance as follows: L (i, j) = H () (々X + I) ),其中L (i,j)为结点i和结点j对应的作者之间的学术合作距离,k为作者合作网络图中在结点i和结点j之间最短路径上存在的中间结点,N为中间结点的个数; 步骤8,根据OWL领域本体、作者ID及研究方向生成热点研究方向地图,所述热点研究方向地图根据下式生成: Hu,= irx 其中,η为从事第i个研究方向的子类研究方向的作者个数,H (k)为第k个子类的研究方向的热点程度,D (i,k)代表研究方向i和研究方向k之间的最短路径上的中间结点数,H (i)代表第i个研究方向上的作者人数,D(i,O) = I ;0WL本体中的叶结点对应的研究方向为子类研究方向; 步骤9,根据作者ID以及作者的合作者网络图生成作者学术声望地图,所述作者学术声望地图以第一作者为传播者结点,以合作者为接受者结点的有向图,所述作者学术声望的计算方法如下: I (i) = ^^0(J(f)xA(i»l)) 其中,I (i)为 ), Wherein the academic cooperation between the L (i, j) for the node i and the node j of the corresponding distance, k is the cooperation of the network of FIG present in the shortest path between the node i and node j intermediate node, N is the number of intermediate nodes; step 8 the OWL domain ontology, generation of ID and research hotspot map directions, the map research hotspot generated according to the formula: Hu, = irx where, [eta] is the shortest distance between the number of sub-categories of research studies in the i-th direction, the degree of focus of research H (k) is the k subclasses, D (i, k) representative of research and research directions k i the middle number of nodes on the path, the number of the H (i) represents the i-th research, D (i, O) = I; 0WL research leaf node corresponding to the body of the subclass research; step 9 the ID of the authors' and collaborators network of FIG generation of academic reputation map, the map of academic reputation as a communicator of the first node to the recipient node is a partner of a directed graph, Reputation calculated as follows: I (i) = ^^ 0 (J (f) xA (i »l)) wherein, I (i) as 第i个作者的声望,η为第i个作者的合作作者人数,k为第i个作者的第k个合作者,A(i,k)为第i个作者与第k个作者之间的距离,I (O)代表与第i个作者直接合作人数,且A (i,O) =1。 I-authors of prestige, η is the i-th number of co-authors authors, k is the i-th of the k-th collaborator, A (i, k) is the i-th between the author and the author of the k-th distance, i (O) represents the i-th of the number of direct cooperation, and A (i, O) = 1.
  2. 2.如权利要求1所述的方法,其特征在于,步骤I中,OWL领域本体包含领域术语之间的继承关系、等同关系和集合运算关系。 2. The method according to claim 1, characterized in that, in step I, OWL domain ontology comprising inheritance relationship between the term of art and set operations equivalent relationships to relationships.
  3. 3.如权利要求2所述的方法,其特征在于,步骤2中,作者信息包括作者姓名、性别、出生年份、籍贯、职称、研究方向、论文标题、期刊名称、发表时间以及作者所在单位;步骤3中,唯一的作者ID包括作者的姓名、出生年份、性别、籍贯、所在单位名称以及随机码。 The method according to claim 2, wherein in step 2, author information, including the author's name, gender, year of birth, place of birth, job title, research, paper title, journal name, publication time and location of the unit; 3, the unique author ID, including the author's name, year of birth, gender, place of origin, where the organization name, and random code step.
CN 201210072645 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors CN102609546B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201110408020 2011-12-08
CN201110408020.9 2011-12-08
CN 201210072645 CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210072645 CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Publications (2)

Publication Number Publication Date
CN102609546A true CN102609546A (en) 2012-07-25
CN102609546B true CN102609546B (en) 2014-11-05

Family

ID=46526918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210072645 CN102609546B (en) 2011-12-08 2012-03-19 Method and system for excavating information of academic journal paper authors

Country Status (1)

Country Link
CN (1) CN102609546B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020302B (en) * 2012-12-31 2016-03-02 中国科学院自动化研究所 Based on academic core of mining and related information extraction method and system for complex networks
CN104156437A (en) * 2014-08-13 2014-11-19 中科嘉速(北京)并行软件有限公司 Academic relationship network construction method based on paper author information extraction and relationship weight model
CN105653590A (en) * 2015-12-21 2016-06-08 青岛智能产业技术研究院 Name duplication disambiguation method of Chinese literature authors
CN106227835B (en) * 2016-07-25 2018-01-19 中南大学 Mining based research team bipartite network diagram hierarchical clustering direction
CN106886571A (en) * 2017-01-18 2017-06-23 大连理工大学 Social network analysis-based academic collaboration sustainability prediction method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明;赵朋朋;方 巍 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司;东南大学 Method and system of data retrieval

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2487601A1 (en) * 2004-05-04 2012-08-15 Boston Consulting Group, Inc. Method and apparatus for selecting, analyzing and visualizing related database records as a network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320370A (en) * 2008-05-16 2008-12-10 崔志明;赵朋朋;方 巍 Deep layer web page data source sort management method based on query interface connection drawing
CN101685455A (en) * 2008-09-28 2010-03-31 华为技术有限公司;东南大学 Method and system of data retrieval

Also Published As

Publication number Publication date Type
CN102609546A (en) 2012-07-25 application

Similar Documents

Publication Publication Date Title
Fan China on the Move: Migration, the State, and the Household
Sun et al. Data mining method for listed companies’ financial distress prediction
Garfield et al. Why do we need algorithmic historiography?
Zucker et al. Minerva unbound: Knowledge stocks, knowledge flows and new knowledge production
Parsons et al. Data citation and peer review
Smith et al. Bayesian methods in hydrologic modeling: A study of recent advancements in Markov chain Monte Carlo techniques
Yu et al. Citation prediction in heterogeneous bibliographic networks
Harnad et al. Integrating, navigating, and analysing open Eprint archives through open citation linking (the OpCit project)
Emrouznejad et al. A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016
Zuiderwijk et al. Innovation with open data: Essential elements of open data ecosystems
Bauer et al. Availability of new Bayesian-delimited gecko names and the importance of character-based species descriptions
Takeda et al. Nanobiotechnology as an emerging research domain from nanotechnology: A bibliometric approach
Sánchez-Lozano et al. GIS-based photovoltaic solar farms site selection using ELECTRE-TRI: Evaluating the case for Torre Pacheco, Murcia, Southeast of Spain
Meroño-Peñuela et al. Semantic technologies for historical research: A survey
Dandres et al. Macroanalysis of the economic and environmental impacts of a 2005–2025 European Union bioenergy policy using the GTAP model and life cycle assessment
Zhang GroRec: a group-centric intelligent recommender system integrating social, mobile and big data technologies
Clark et al. Spatial convergent cross mapping to detect causal relationships from short time series
Fleming et al. Local economic impacts of an unconventional energy boom: the coal seam gas industry in A ustralia
Chang et al. The research on the critical success factors of knowledge management and classification framework project in the Executive Yuan of Taiwan Government
Andrews et al. Scale, shale, and the state: Political ecologies and legal geographies of shale gas development in Pennsylvania
Wang Quantitative methods and socio-economic applications in GIS
Fiksel et al. A framework for sustainability indicators at EPA
Bramer Knowledge discovery and data mining
Wei et al. Sustainable urban development: A review on urban carrying capacity assessment
Ortman et al. Settlement scaling and increasing returns in an ancient society

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted