CN106484839A - A kind of journal impact appraisal procedure based on academic big data - Google Patents

A kind of journal impact appraisal procedure based on academic big data Download PDF

Info

Publication number
CN106484839A
CN106484839A CN201610874338.9A CN201610874338A CN106484839A CN 106484839 A CN106484839 A CN 106484839A CN 201610874338 A CN201610874338 A CN 201610874338A CN 106484839 A CN106484839 A CN 106484839A
Authority
CN
China
Prior art keywords
paper
pagerank
index
algorithm
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610874338.9A
Other languages
Chinese (zh)
Other versions
CN106484839B (en
Inventor
夏锋
白晓梅
宁兆龙
刘号真
孔祥杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201610874338.9A priority Critical patent/CN106484839B/en
Publication of CN106484839A publication Critical patent/CN106484839A/en
Application granted granted Critical
Publication of CN106484839B publication Critical patent/CN106484839B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of journal impact appraisal procedure based on academic big data, represent the power of influence of this periodical using the average influence power of all papers on each periodical.The method calculating paper impact factor is a kind of algorithm based on PageRank:AR PageRank, this algorithm considers the factors such as similarity between paper and the paper quoting it, author's H index, the former makes this algorithm become the algorithm of " weighting ", during algorithm iteration, every paper, in paper " ballot " that it is quoted, can give different contributions because of the similarity between paper;The latter makes " importance " of every paper to greatly increase, by the H index adding and being calculated every paper of author's H index, this index is as a label of this paper, the diversity making every paper gradually displays in calculating process, so as to relatively accurately distinguish the power of influence of every paper, finally obtain the power of influence of each periodical.

Description

A kind of journal impact appraisal procedure based on academic big data
Technical field
The present invention relates to the method based on academic big data, journal impact being estimated in sphere of learning, more particularly, to A kind of journal impact appraisal procedure that PageRank algorithm is combined with H index.
Background technology
With scientific and technological continuous development, the number being engaged in scientific research gets more and more, and their achievement in research is generally to learn The forms such as art paper, patent, books present, and the scientific research personnel of substantial amounts creates hundreds of millions of data.These data are united Referred to as academic big data.Academic big data of many uses, journal impact assessment be academic big data an important application. The power of influence of periodical is to be basic with the academic level of periodical, Academic Characteristics, is embodied with social credit worthiness by mark A kind of comprehensive effect, the quality of periodical decides the power of influence of periodical.The adopted method of journal impact assessment at present is all Analyzed, including Impact Factor (factor of influence), Eigenfactor (feature based on quoting (citation-based) The factor), PageRank scheduling algorithm.But above-mentioned algorithm there is also, and evaluation index is excessively single, cross-cutting cannot assess periodical shadow The problems such as power of sound.
Content of the invention
In place of some shortcomings mainly for above-mentioned existing research for the purpose of the present invention, proposed the phase based on academic big data Periodical power of influence appraisal procedure, by being analyzed to the reference information of paper on periodical, the PageRank proposing a kind of weighting calculates Method AR-PageRank algorithm, this algorithm considers the list of references registration of two papers, is characterized by this registration The similarity of this two papers, has been simultaneously introduced and author's H index has been considered, and author's H index of a paper is to a certain extent Represent the power of influence of this paper.
Technical scheme:
A kind of journal impact appraisal procedure based on academic big data, step is as follows:
1) the H index of Authors of Science Articles is acted on the assessment of journal article power of influence as influence factor;
2) similarity of article is calculated by comparing " registration " of the list of references of two papers;
3) pass through analyze paper citation network feature, PageRank algorithm is improved, by 1) in H index and 2) In " registration " conduct " weight " element be attached in PageRank algorithm it follows that AR-PageRank algorithm, calculate every The AR-PageRank value of piece paper;
4) the AR-PageRank value of papers all in periodical is summed up, obtain the assessed value of periodical.
Step 1):H index is an index of author's aspect, it be a scholar or scientist the quantity of publication and The tolerance of power of influence aspect, H index can relatively accurately reflect the academic achievement of a scholar, and the H index of a scholar is got over Greatly, the paper impact factor of this scholar is bigger.
Define the H index of a paper using equation below in the present invention:
Wherein pjRepresent paper, A (pj) represent pjH index, anIt is article pjOn author, H (an) represent anAuthor H Index.
In order to distinguish significance level in collection of thesis for the paper, the present invention is using the paper H index tried to achieve and equation below Paper significance level in a network is defined:
Wherein P represents paper data set, Max ({ A (pz)|pz∈ P }) represent paper data set in paper H index maximum Value, the value of θ is the effect of 0.01, θ is to make δ (pj) it is not 0.
Step 2):It is obtained in that research contents and the direction of this article by the list of references of an article, for this Consider, the present invention to calculate the similarity of article by comparing " registration " of the list of references of two papers, and using as follows Formula calculates the list of references " registration " of two papers:
Wherein pi, pjRepresent any two papers in data set, OUT (pi) and OUT (pj) represent piAnd pjReference literary composition Offer set,Represent pi, pjThe similarity of two articles, two papers are more similar,Value bigger.
Step 3):It has been investigated that, the PageRank algorithm after either the PageRank algorithm of most original still deforms, It is all the algorithm of " no weight ".So-called " no weight " refers to that an article is " averagely main in article " ballot " quoted to it Justice ", it does not distinguish the particularity quoting article, but " making no exception ".The present invention is according to step 2) in draw " similar Degree ", modifies to PageRank algorithm as the eigenvalue quoting article, new algorithm is in original PageRank algorithm On the basis of add " weight " element.Meanwhile, by step 1) in the article significance level in a network that calculates also incorporate In PageRank algorithm, obtain AR-PageRank algorithm.
The expression formula of AR-PageRank algorithm is as follows:
Wherein d is " damped coefficient ", is typically set to 0.85, IN (pi) represent all references piArticle set, PR (pi) Represent piPageRank score,It is the fundamental formular of PageRank algorithm.'s Add the PageRank algorithm making this algorithm become " weighting ".Because paper pjIn every list of references " ballot " to it This list of references can be carried out " analysis ", the different list of references of similarity can be treated with a certain discrimination.
Step 4):According to step 3) the paper AR-PageRank value that obtains, these paper scores are entered in units of periodical Row is cumulative, and cumulative point is averaging, using this average mark as last periodical scoring.
The scoring formula of periodical is as follows:
Wherein Publish (ji) represent be published in periodical jiOn paper set, | Publish (ji) | represent this collection of thesis The size closed, PR (pk) it is paper pkAR-PageRank score.
Beneficial effects of the present invention:The method calculating paper impact factor is a kind of algorithm based on PageRank:AR- PageRank, this algorithm considers the factors such as similarity between paper and the paper quoting it, author's H index, and the former makes This algorithm becomes the algorithm of " weighting ", and during algorithm iteration, every paper, can be because in paper " ballot " that it is quoted Similarity between for paper and give different contributions;The latter makes " importance " of every paper to greatly increase, by author H Index plus and be calculated the H index of every paper, this index as this paper a label so that the difference of every paper The opposite sex gradually displays such that it is able to relatively accurately distinguish the power of influence of every paper in calculating process, final acquisition The power of influence of each periodical.
Brief description
Fig. 1 applies obtained on DBLP data set respectively for PageRank, A-PageRank, AR-PageRank algorithm Ranking results.
Fig. 2 is flow chart of data processing Microsoft's MAG data set being carried out according to requirement of experiment.
Fig. 3, Fig. 4 and Fig. 5 apply on DBLP, DBLP (2011-2015), MAG data set respectively for four kinds of algorithms, and The correlation coefficient situation of each algorithm obtaining after being calculated using Spearman's correlation coefficient.
Fig. 6, Fig. 7, Fig. 8 and Fig. 9 be four kinds of algorithms on DBLP, DBLP (2011-2015) data set result right Than figure.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below by the specific embodiment party to the present invention Formula is described in further detail.
Embodiments provide a kind of journal impact appraisal procedure based on academic big data, the method includes:
Step 1:Choose DBLP, MAG data set as the experimental data set of this method, DBLP and MAG data set is carried out Pretreatment.
The present invention with reference to the journal ranking that Chinese Computer Federation is recommended for 2015, and have chosen " artificial intelligence " field 42 periodicals as evaluation criterion (MAG data set have chosen 45 artificial intelligence field periodicals).According to requirement of experiment to two Individual data set carries out pretreatment, and Fig. 2 gives the flow chart that MAG data set is processed.
The information of latter two data set of pretreatment is as follows respectively:
Table 1MAG data set
Tab.1MAG dataset
Table 2DBLP data set
Tab.2DBLP dataset
In MAG, the quantity of paper will be significantly less than the quantity of DBLP paper as can be seen from the table, because extract obtaining MAG data set in Quantity of Papers very few, the present invention has crawled DBLP data set, and using this data set as " leading " data Collection, using MAG data set as " contrast " data set.
In order to calculate factors affecting periodicals (5-year IF), the present invention has individually extracted 2011- from DBLP data set 2015 data of totally 5 years, constitute the 3rd data set.
Table 3DBLP data set (2011-2015)
Tab.3DBLP dataset(2011-2015)
Step 2:The algorithm realization part of the present invention employs 3 data sets, is MAG, DBLP, DBLP (2011- respectively 2015).MAG, DBLP data set applies respectively Eigenfactor, PageRank, A-PageRank, R-PageRank, AR-PageRank algorithm.
AR-PageRank, i.e. Author Reference PageRank, Author mean that this algorithm considers H index This factor, Reference represents the relation considering paper and its list of references.Therefore, A-PageRank algorithm is exactly only Consider the factor of H index, P-PageRank algorithm is exactly the relation only considering paper and list of references.
The expression formula of AR-PageRank algorithm is:
The expression formula of A-PageRank algorithm is:
The expression formula of R-PageRank algorithm is:
The expression formula of PageRank algorithm is:
List in Fig. 1 is that PageRank, A-PageRank, AR-PageRank algorithm is applied respectively in DBLP data set Ranking results obtained by upper.
Step 3:By step 2) in the result that obtains calculated using Spearman's correlation coefficient, obtain each algorithm Correlation coefficient situation.
The present invention is calculated the similarity between list using mathematical method.Spearman's correlation coefficient is to weigh two Variable dependent nonparametric index, it evaluates the dependency of two statistical variables using dull equation.The following institute of computing formula Show:
Wherein, ρ represents Spearman's correlation coefficient, JiRepresent one of journal list periodical, R1、R2It is journal list, R1(Ji) represent periodical JiIn list R1In position.
Fig. 3, Fig. 4 and Fig. 5 give the correlation coefficient situation that four kinds of algorithms are concentrated in different pieces of information.It can be seen that in DBLP In data set, the correlation coefficient when periodical number is equal to 42 for the algorithm AR-PageRank is maximum, and this algorithm is in DBLP data set On behave oneself best;In DBLP data set (2011-2015), AR-PageRank algorithm is still Evaluated effect in each algorithm Preferably;In MAG data set, the correlation coefficient outline of A-PageRank algorithm is higher than AR-PageRank algorithm, R- The result of PageRank algorithm and PageRank algorithm is sufficiently close to.The reason above-mentioned two situations occur and MAG data The feature of collection is relevant, by observation it was found that the paper number of MAG data set, the paper number that is cited, author's number etc. are below it Its two datasets, " fine or not " of data set has important impact to experimental result.
Fig. 6, Fig. 7, Fig. 8 and Fig. 9 be four algorithms on DBLP, DBLP (2011-2015) data set result right Ratio figure, red broken line represents that algorithm applies the result on DBLP (2011-2015) data set, and the broken line of black represents Algorithm applies the result on DBLP data set.It is not difficult to find out process on DBLP data set for the same algorithm from figure Effect is better than DBLP (2011-2015) data set, and the latter is the former subset, and comparatively speaking, the former data scale is more Greatly, data class is more, and its experimental result is also more accurately and reliably.
It is the specific embodiment of the present invention and the know-why used described in above, if conception under this invention institute Make change, function produced by it still without departing from description and accompanying drawing covered spiritual when, must belong to the present invention's Protection domain.

Claims (1)

1. a kind of journal impact appraisal procedure based on academic big data is it is characterised in that step is as follows:
Step 1):The assessment H index that the H index of Authors of Science Articles is acted on journal article power of influence as influence factor is author One index of aspect, it is a scholar or the quantity of publication of scientist and the tolerance of power of influence aspect, and H index compares Reflect the academic achievement of a scholar exactly, the H index of a scholar is bigger, and the paper impact factor of this scholar is bigger;
Define the H index of a paper with equation below:
A ( p j ) = Σ a n ∈ p j H ( a n )
Wherein:pjRepresent paper, A (pj) represent pjH index, anIt is paper pjOn author, H (an) represent anAuthor H refer to Number;
Using the H index of the Authors of Science Articles obtaining and equation below, to paper, significance level in a network is defined:
δ ( p j ) = A ( p j ) M a x ( { A ( p z ) | p z ∈ P } ) + θ
Wherein:P represents paper data set, Max ({ A (pz)|pz∈ P }) represent paper data set in Authors of Science Articles H index Big be worth, the value of θ is the effect of 0.01, θ is to make δ (pj) it is not 0;
Step 2):To calculate the similarity of article by comparing " registration " of two references in papers
Obtain research contents and the direction of this article by the list of references of an article, by comparing two references in papers " registration " similarity to calculate article, and calculate " registration " of two references in papers using equation below:
Wherein:pi, pjRepresent any two papers in data set, OUT (pi) and OUT (pj) represent piAnd pjList of references collection Close,Represent pi, pjThe similarity of two articles, two papers are more similar,Value bigger;
Step 3):By analyze paper citation network feature, PageRank algorithm is improved, by step 1) in H refer to Number and step 2) in " registration " conduct " weight " element be attached in PageRank algorithm it follows that AR-PageRank Algorithm, calculates the AR-PageRank value of every paper;
The expression formula of AR-PageRank algorithm is as follows:
Wherein:D is " damped coefficient ", is set as 0.85, IN (pi) represent all references piArticle set, PR (pi) represent pi's PageRank score,It is the fundamental formular of PageRank algorithm;Plus Enter so that this algorithm becomes the PageRank algorithm of " weighting ", as AR-PageRank algorithm;
Step 4):The AR-PageRank value of papers all in periodical is summed up, obtains the assessed value of periodical
According to step 3) the paper AR-PageRank value that obtains, paper score is added up in units of periodical, cumulative point It is averaging, using this average mark as last periodical scoring;
The scoring formula of periodical is as follows:
J ( j i ) = Σ p k ∈ P u b l i s h ( j i ) P R ( p k ) | P u b l i s h ( j i ) |
Wherein:Publish(ji) represent be published in periodical jiOn paper set, | Publish (ji) | represent this paper set Size, PR (pk) it is paper pkAR-PageRank score.
CN201610874338.9A 2016-10-08 2016-10-08 A kind of journal impact appraisal procedure based on academic big data Expired - Fee Related CN106484839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610874338.9A CN106484839B (en) 2016-10-08 2016-10-08 A kind of journal impact appraisal procedure based on academic big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610874338.9A CN106484839B (en) 2016-10-08 2016-10-08 A kind of journal impact appraisal procedure based on academic big data

Publications (2)

Publication Number Publication Date
CN106484839A true CN106484839A (en) 2017-03-08
CN106484839B CN106484839B (en) 2018-07-06

Family

ID=58268416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610874338.9A Expired - Fee Related CN106484839B (en) 2016-10-08 2016-10-08 A kind of journal impact appraisal procedure based on academic big data

Country Status (1)

Country Link
CN (1) CN106484839B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391921A (en) * 2017-07-13 2017-11-24 武汉科技大学 Bibliography influence power appraisal procedure in a kind of scientific literature
CN107731285A (en) * 2017-05-10 2018-02-23 上海明品医药科技有限公司 One kind classification educational system education contribution degree computational methods
CN108021628A (en) * 2017-11-22 2018-05-11 华南理工大学 A kind of management system of science and technology theme
CN108614867A (en) * 2018-04-12 2018-10-02 科技部科技评估中心 Frontline technology sex index computational methods based on scientific paper and system
CN108764546A (en) * 2018-05-17 2018-11-06 鞍山师范学院 A kind of paper impact factor prediction technique based on academic big data
CN108897736A (en) * 2018-06-20 2018-11-27 大连诺道认知医学技术有限公司 Document sort method and device based on Paper Rank algorithm
CN109376218A (en) * 2018-09-14 2019-02-22 大连理工大学 One kind being based on cascade paper impact factor appraisal procedure
CN112883147A (en) * 2021-01-15 2021-06-01 上海柏观数据科技有限公司 Knowledge association-based thesis citation association index evaluation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298579A (en) * 2010-06-22 2011-12-28 北京大学 Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals
US20150227559A1 (en) * 2007-07-26 2015-08-13 Dr. Hamid Hatami-Hanza Methods and systems for investigation of compositions of ontological subjects
CN105740452A (en) * 2016-02-03 2016-07-06 北京工业大学 Scientific and technical literature importance degree evaluation method based on PageRank and time decay

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150227559A1 (en) * 2007-07-26 2015-08-13 Dr. Hamid Hatami-Hanza Methods and systems for investigation of compositions of ontological subjects
CN102298579A (en) * 2010-06-22 2011-12-28 北京大学 Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals
CN105740452A (en) * 2016-02-03 2016-07-06 北京工业大学 Scientific and technical literature importance degree evaluation method based on PageRank and time decay

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOMEI BAI ET AL.: "PNCOIRank: Evaluating the Impact of Scholarly Articles with Positive and Negative Citations", 《WWW "16 COMPANION PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE COMPANION ON WORLD WIDE WEB》 *
牛琪锴等: "科学引文网中基于"H指数"的文章影响力评价", 《北京师范大学学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107731285A (en) * 2017-05-10 2018-02-23 上海明品医药科技有限公司 One kind classification educational system education contribution degree computational methods
CN107391921A (en) * 2017-07-13 2017-11-24 武汉科技大学 Bibliography influence power appraisal procedure in a kind of scientific literature
CN108021628A (en) * 2017-11-22 2018-05-11 华南理工大学 A kind of management system of science and technology theme
CN108021628B (en) * 2017-11-22 2021-12-21 华南理工大学 Management system of science and technology theme
CN108614867A (en) * 2018-04-12 2018-10-02 科技部科技评估中心 Frontline technology sex index computational methods based on scientific paper and system
CN108764546A (en) * 2018-05-17 2018-11-06 鞍山师范学院 A kind of paper impact factor prediction technique based on academic big data
CN108764546B (en) * 2018-05-17 2021-04-13 鞍山师范学院 Thesis influence prediction method based on academic big data
CN108897736A (en) * 2018-06-20 2018-11-27 大连诺道认知医学技术有限公司 Document sort method and device based on Paper Rank algorithm
CN108897736B (en) * 2018-06-20 2022-04-12 大连诺道认知医学技术有限公司 Document sorting method and device based on Paper Rank algorithm
CN109376218A (en) * 2018-09-14 2019-02-22 大连理工大学 One kind being based on cascade paper impact factor appraisal procedure
CN109376218B (en) * 2018-09-14 2020-12-11 大连理工大学 Thesis influence assessment method based on cascade
CN112883147A (en) * 2021-01-15 2021-06-01 上海柏观数据科技有限公司 Knowledge association-based thesis citation association index evaluation method and device

Also Published As

Publication number Publication date
CN106484839B (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN106484839A (en) A kind of journal impact appraisal procedure based on academic big data
Xu et al. Identifying the semantic orientation of terms using S-HAL for sentiment analysis
Barrett Structural equation modelling: Adjudging model fit
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
Wawer et al. Predicting webpage credibility using linguistic features
Haberman et al. Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions
Bertanha et al. Spatial errors in count data regressions
CN106778011A (en) A kind of scholar's influence power appraisal procedure based on academic heterogeneous network
Kelemework Automatic Amharic text news classification: Aneural networks approach
Suzuki et al. Investigating the effectiveness of Laplacian-based kernels in hub reduction
Espinosa et al. Bots and Gender Profiling using Character Bigrams.
Aitken Bayesian hierarchical random effects models in forensic science
Thoemmes M-bias, butterfly bias, and butterfly bias with correlated causes–a comment on Ding and Miratrix (2015)
Cheng et al. Application of a new superposition CES production function model
Majumdar A Generalised Fuzzy Soft Set Based Student Ranking System.
Zeng et al. Semantic multi-grain mixture topic model for text analysis
Fung et al. On sparse Fisher discriminant method for microarray data analysis
Sadika et al. Comparative study on textual data set using fuzzy clustering algorithms
Chulvi et al. Social or individual disagreement? Perspectivism in the annotation of sexist jokes
US20200034735A1 (en) System for generating topic inference information of lyrics
Henrik Classifying European Court of Human Rights cases using transformer based models
SHEN et al. PAN in, et al
Liu et al. A Comparison of Methods for Dimensionality Assessment of Categorical Item Responses
Bachchan et al. Plagiarism detection framework using monte carlo based artificial neural network for Nepali language
Nissa et al. Determinants of Investment Interest in The Millennial Generation of Salatiga in the Sharia Capital Market

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180706

Termination date: 20201008

CF01 Termination of patent right due to non-payment of annual fee