CN106484839A - A kind of journal impact appraisal procedure based on academic big data - Google Patents
A kind of journal impact appraisal procedure based on academic big data Download PDFInfo
- Publication number
- CN106484839A CN106484839A CN201610874338.9A CN201610874338A CN106484839A CN 106484839 A CN106484839 A CN 106484839A CN 201610874338 A CN201610874338 A CN 201610874338A CN 106484839 A CN106484839 A CN 106484839A
- Authority
- CN
- China
- Prior art keywords
- paper
- pagerank
- index
- algorithm
- represent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of journal impact appraisal procedure based on academic big data, represent the power of influence of this periodical using the average influence power of all papers on each periodical.The method calculating paper impact factor is a kind of algorithm based on PageRank:AR PageRank, this algorithm considers the factors such as similarity between paper and the paper quoting it, author's H index, the former makes this algorithm become the algorithm of " weighting ", during algorithm iteration, every paper, in paper " ballot " that it is quoted, can give different contributions because of the similarity between paper;The latter makes " importance " of every paper to greatly increase, by the H index adding and being calculated every paper of author's H index, this index is as a label of this paper, the diversity making every paper gradually displays in calculating process, so as to relatively accurately distinguish the power of influence of every paper, finally obtain the power of influence of each periodical.
Description
Technical field
The present invention relates to the method based on academic big data, journal impact being estimated in sphere of learning, more particularly, to
A kind of journal impact appraisal procedure that PageRank algorithm is combined with H index.
Background technology
With scientific and technological continuous development, the number being engaged in scientific research gets more and more, and their achievement in research is generally to learn
The forms such as art paper, patent, books present, and the scientific research personnel of substantial amounts creates hundreds of millions of data.These data are united
Referred to as academic big data.Academic big data of many uses, journal impact assessment be academic big data an important application.
The power of influence of periodical is to be basic with the academic level of periodical, Academic Characteristics, is embodied with social credit worthiness by mark
A kind of comprehensive effect, the quality of periodical decides the power of influence of periodical.The adopted method of journal impact assessment at present is all
Analyzed, including Impact Factor (factor of influence), Eigenfactor (feature based on quoting (citation-based)
The factor), PageRank scheduling algorithm.But above-mentioned algorithm there is also, and evaluation index is excessively single, cross-cutting cannot assess periodical shadow
The problems such as power of sound.
Content of the invention
In place of some shortcomings mainly for above-mentioned existing research for the purpose of the present invention, proposed the phase based on academic big data
Periodical power of influence appraisal procedure, by being analyzed to the reference information of paper on periodical, the PageRank proposing a kind of weighting calculates
Method AR-PageRank algorithm, this algorithm considers the list of references registration of two papers, is characterized by this registration
The similarity of this two papers, has been simultaneously introduced and author's H index has been considered, and author's H index of a paper is to a certain extent
Represent the power of influence of this paper.
Technical scheme:
A kind of journal impact appraisal procedure based on academic big data, step is as follows:
1) the H index of Authors of Science Articles is acted on the assessment of journal article power of influence as influence factor;
2) similarity of article is calculated by comparing " registration " of the list of references of two papers;
3) pass through analyze paper citation network feature, PageRank algorithm is improved, by 1) in H index and 2)
In " registration " conduct " weight " element be attached in PageRank algorithm it follows that AR-PageRank algorithm, calculate every
The AR-PageRank value of piece paper;
4) the AR-PageRank value of papers all in periodical is summed up, obtain the assessed value of periodical.
Step 1):H index is an index of author's aspect, it be a scholar or scientist the quantity of publication and
The tolerance of power of influence aspect, H index can relatively accurately reflect the academic achievement of a scholar, and the H index of a scholar is got over
Greatly, the paper impact factor of this scholar is bigger.
Define the H index of a paper using equation below in the present invention:
Wherein pjRepresent paper, A (pj) represent pjH index, anIt is article pjOn author, H (an) represent anAuthor H
Index.
In order to distinguish significance level in collection of thesis for the paper, the present invention is using the paper H index tried to achieve and equation below
Paper significance level in a network is defined:
Wherein P represents paper data set, Max ({ A (pz)|pz∈ P }) represent paper data set in paper H index maximum
Value, the value of θ is the effect of 0.01, θ is to make δ (pj) it is not 0.
Step 2):It is obtained in that research contents and the direction of this article by the list of references of an article, for this
Consider, the present invention to calculate the similarity of article by comparing " registration " of the list of references of two papers, and using as follows
Formula calculates the list of references " registration " of two papers:
Wherein pi, pjRepresent any two papers in data set, OUT (pi) and OUT (pj) represent piAnd pjReference literary composition
Offer set,Represent pi, pjThe similarity of two articles, two papers are more similar,Value bigger.
Step 3):It has been investigated that, the PageRank algorithm after either the PageRank algorithm of most original still deforms,
It is all the algorithm of " no weight ".So-called " no weight " refers to that an article is " averagely main in article " ballot " quoted to it
Justice ", it does not distinguish the particularity quoting article, but " making no exception ".The present invention is according to step 2) in draw " similar
Degree ", modifies to PageRank algorithm as the eigenvalue quoting article, new algorithm is in original PageRank algorithm
On the basis of add " weight " element.Meanwhile, by step 1) in the article significance level in a network that calculates also incorporate
In PageRank algorithm, obtain AR-PageRank algorithm.
The expression formula of AR-PageRank algorithm is as follows:
Wherein d is " damped coefficient ", is typically set to 0.85, IN (pi) represent all references piArticle set, PR (pi)
Represent piPageRank score,It is the fundamental formular of PageRank algorithm.'s
Add the PageRank algorithm making this algorithm become " weighting ".Because paper pjIn every list of references " ballot " to it
This list of references can be carried out " analysis ", the different list of references of similarity can be treated with a certain discrimination.
Step 4):According to step 3) the paper AR-PageRank value that obtains, these paper scores are entered in units of periodical
Row is cumulative, and cumulative point is averaging, using this average mark as last periodical scoring.
The scoring formula of periodical is as follows:
Wherein Publish (ji) represent be published in periodical jiOn paper set, | Publish (ji) | represent this collection of thesis
The size closed, PR (pk) it is paper pkAR-PageRank score.
Beneficial effects of the present invention:The method calculating paper impact factor is a kind of algorithm based on PageRank:AR-
PageRank, this algorithm considers the factors such as similarity between paper and the paper quoting it, author's H index, and the former makes
This algorithm becomes the algorithm of " weighting ", and during algorithm iteration, every paper, can be because in paper " ballot " that it is quoted
Similarity between for paper and give different contributions;The latter makes " importance " of every paper to greatly increase, by author H
Index plus and be calculated the H index of every paper, this index as this paper a label so that the difference of every paper
The opposite sex gradually displays such that it is able to relatively accurately distinguish the power of influence of every paper in calculating process, final acquisition
The power of influence of each periodical.
Brief description
Fig. 1 applies obtained on DBLP data set respectively for PageRank, A-PageRank, AR-PageRank algorithm
Ranking results.
Fig. 2 is flow chart of data processing Microsoft's MAG data set being carried out according to requirement of experiment.
Fig. 3, Fig. 4 and Fig. 5 apply on DBLP, DBLP (2011-2015), MAG data set respectively for four kinds of algorithms, and
The correlation coefficient situation of each algorithm obtaining after being calculated using Spearman's correlation coefficient.
Fig. 6, Fig. 7, Fig. 8 and Fig. 9 be four kinds of algorithms on DBLP, DBLP (2011-2015) data set result right
Than figure.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below by the specific embodiment party to the present invention
Formula is described in further detail.
Embodiments provide a kind of journal impact appraisal procedure based on academic big data, the method includes:
Step 1:Choose DBLP, MAG data set as the experimental data set of this method, DBLP and MAG data set is carried out
Pretreatment.
The present invention with reference to the journal ranking that Chinese Computer Federation is recommended for 2015, and have chosen " artificial intelligence " field
42 periodicals as evaluation criterion (MAG data set have chosen 45 artificial intelligence field periodicals).According to requirement of experiment to two
Individual data set carries out pretreatment, and Fig. 2 gives the flow chart that MAG data set is processed.
The information of latter two data set of pretreatment is as follows respectively:
Table 1MAG data set
Tab.1MAG dataset
Table 2DBLP data set
Tab.2DBLP dataset
In MAG, the quantity of paper will be significantly less than the quantity of DBLP paper as can be seen from the table, because extract obtaining
MAG data set in Quantity of Papers very few, the present invention has crawled DBLP data set, and using this data set as " leading " data
Collection, using MAG data set as " contrast " data set.
In order to calculate factors affecting periodicals (5-year IF), the present invention has individually extracted 2011- from DBLP data set
2015 data of totally 5 years, constitute the 3rd data set.
Table 3DBLP data set (2011-2015)
Tab.3DBLP dataset(2011-2015)
Step 2:The algorithm realization part of the present invention employs 3 data sets, is MAG, DBLP, DBLP (2011- respectively
2015).MAG, DBLP data set applies respectively Eigenfactor, PageRank, A-PageRank, R-PageRank,
AR-PageRank algorithm.
AR-PageRank, i.e. Author Reference PageRank, Author mean that this algorithm considers H index
This factor, Reference represents the relation considering paper and its list of references.Therefore, A-PageRank algorithm is exactly only
Consider the factor of H index, P-PageRank algorithm is exactly the relation only considering paper and list of references.
The expression formula of AR-PageRank algorithm is:
The expression formula of A-PageRank algorithm is:
The expression formula of R-PageRank algorithm is:
The expression formula of PageRank algorithm is:
List in Fig. 1 is that PageRank, A-PageRank, AR-PageRank algorithm is applied respectively in DBLP data set
Ranking results obtained by upper.
Step 3:By step 2) in the result that obtains calculated using Spearman's correlation coefficient, obtain each algorithm
Correlation coefficient situation.
The present invention is calculated the similarity between list using mathematical method.Spearman's correlation coefficient is to weigh two
Variable dependent nonparametric index, it evaluates the dependency of two statistical variables using dull equation.The following institute of computing formula
Show:
Wherein, ρ represents Spearman's correlation coefficient, JiRepresent one of journal list periodical, R1、R2It is journal list,
R1(Ji) represent periodical JiIn list R1In position.
Fig. 3, Fig. 4 and Fig. 5 give the correlation coefficient situation that four kinds of algorithms are concentrated in different pieces of information.It can be seen that in DBLP
In data set, the correlation coefficient when periodical number is equal to 42 for the algorithm AR-PageRank is maximum, and this algorithm is in DBLP data set
On behave oneself best;In DBLP data set (2011-2015), AR-PageRank algorithm is still Evaluated effect in each algorithm
Preferably;In MAG data set, the correlation coefficient outline of A-PageRank algorithm is higher than AR-PageRank algorithm, R-
The result of PageRank algorithm and PageRank algorithm is sufficiently close to.The reason above-mentioned two situations occur and MAG data
The feature of collection is relevant, by observation it was found that the paper number of MAG data set, the paper number that is cited, author's number etc. are below it
Its two datasets, " fine or not " of data set has important impact to experimental result.
Fig. 6, Fig. 7, Fig. 8 and Fig. 9 be four algorithms on DBLP, DBLP (2011-2015) data set result right
Ratio figure, red broken line represents that algorithm applies the result on DBLP (2011-2015) data set, and the broken line of black represents
Algorithm applies the result on DBLP data set.It is not difficult to find out process on DBLP data set for the same algorithm from figure
Effect is better than DBLP (2011-2015) data set, and the latter is the former subset, and comparatively speaking, the former data scale is more
Greatly, data class is more, and its experimental result is also more accurately and reliably.
It is the specific embodiment of the present invention and the know-why used described in above, if conception under this invention institute
Make change, function produced by it still without departing from description and accompanying drawing covered spiritual when, must belong to the present invention's
Protection domain.
Claims (1)
1. a kind of journal impact appraisal procedure based on academic big data is it is characterised in that step is as follows:
Step 1):The assessment H index that the H index of Authors of Science Articles is acted on journal article power of influence as influence factor is author
One index of aspect, it is a scholar or the quantity of publication of scientist and the tolerance of power of influence aspect, and H index compares
Reflect the academic achievement of a scholar exactly, the H index of a scholar is bigger, and the paper impact factor of this scholar is bigger;
Define the H index of a paper with equation below:
Wherein:pjRepresent paper, A (pj) represent pjH index, anIt is paper pjOn author, H (an) represent anAuthor H refer to
Number;
Using the H index of the Authors of Science Articles obtaining and equation below, to paper, significance level in a network is defined:
Wherein:P represents paper data set, Max ({ A (pz)|pz∈ P }) represent paper data set in Authors of Science Articles H index
Big be worth, the value of θ is the effect of 0.01, θ is to make δ (pj) it is not 0;
Step 2):To calculate the similarity of article by comparing " registration " of two references in papers
Obtain research contents and the direction of this article by the list of references of an article, by comparing two references in papers
" registration " similarity to calculate article, and calculate " registration " of two references in papers using equation below:
Wherein:pi, pjRepresent any two papers in data set, OUT (pi) and OUT (pj) represent piAnd pjList of references collection
Close,Represent pi, pjThe similarity of two articles, two papers are more similar,Value bigger;
Step 3):By analyze paper citation network feature, PageRank algorithm is improved, by step 1) in H refer to
Number and step 2) in " registration " conduct " weight " element be attached in PageRank algorithm it follows that AR-PageRank
Algorithm, calculates the AR-PageRank value of every paper;
The expression formula of AR-PageRank algorithm is as follows:
Wherein:D is " damped coefficient ", is set as 0.85, IN (pi) represent all references piArticle set, PR (pi) represent pi's
PageRank score,It is the fundamental formular of PageRank algorithm;Plus
Enter so that this algorithm becomes the PageRank algorithm of " weighting ", as AR-PageRank algorithm;
Step 4):The AR-PageRank value of papers all in periodical is summed up, obtains the assessed value of periodical
According to step 3) the paper AR-PageRank value that obtains, paper score is added up in units of periodical, cumulative point
It is averaging, using this average mark as last periodical scoring;
The scoring formula of periodical is as follows:
Wherein:Publish(ji) represent be published in periodical jiOn paper set, | Publish (ji) | represent this paper set
Size, PR (pk) it is paper pkAR-PageRank score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610874338.9A CN106484839B (en) | 2016-10-08 | 2016-10-08 | A kind of journal impact appraisal procedure based on academic big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610874338.9A CN106484839B (en) | 2016-10-08 | 2016-10-08 | A kind of journal impact appraisal procedure based on academic big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106484839A true CN106484839A (en) | 2017-03-08 |
CN106484839B CN106484839B (en) | 2018-07-06 |
Family
ID=58268416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610874338.9A Expired - Fee Related CN106484839B (en) | 2016-10-08 | 2016-10-08 | A kind of journal impact appraisal procedure based on academic big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106484839B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107391921A (en) * | 2017-07-13 | 2017-11-24 | 武汉科技大学 | Bibliography influence power appraisal procedure in a kind of scientific literature |
CN107731285A (en) * | 2017-05-10 | 2018-02-23 | 上海明品医药科技有限公司 | One kind classification educational system education contribution degree computational methods |
CN108021628A (en) * | 2017-11-22 | 2018-05-11 | 华南理工大学 | A kind of management system of science and technology theme |
CN108614867A (en) * | 2018-04-12 | 2018-10-02 | 科技部科技评估中心 | Frontline technology sex index computational methods based on scientific paper and system |
CN108764546A (en) * | 2018-05-17 | 2018-11-06 | 鞍山师范学院 | A kind of paper impact factor prediction technique based on academic big data |
CN108897736A (en) * | 2018-06-20 | 2018-11-27 | 大连诺道认知医学技术有限公司 | Document sort method and device based on Paper Rank algorithm |
CN109376218A (en) * | 2018-09-14 | 2019-02-22 | 大连理工大学 | One kind being based on cascade paper impact factor appraisal procedure |
CN112883147A (en) * | 2021-01-15 | 2021-06-01 | 上海柏观数据科技有限公司 | Knowledge association-based thesis citation association index evaluation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298579A (en) * | 2010-06-22 | 2011-12-28 | 北京大学 | Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals |
US20150227559A1 (en) * | 2007-07-26 | 2015-08-13 | Dr. Hamid Hatami-Hanza | Methods and systems for investigation of compositions of ontological subjects |
CN105740452A (en) * | 2016-02-03 | 2016-07-06 | 北京工业大学 | Scientific and technical literature importance degree evaluation method based on PageRank and time decay |
-
2016
- 2016-10-08 CN CN201610874338.9A patent/CN106484839B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227559A1 (en) * | 2007-07-26 | 2015-08-13 | Dr. Hamid Hatami-Hanza | Methods and systems for investigation of compositions of ontological subjects |
CN102298579A (en) * | 2010-06-22 | 2011-12-28 | 北京大学 | Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals |
CN105740452A (en) * | 2016-02-03 | 2016-07-06 | 北京工业大学 | Scientific and technical literature importance degree evaluation method based on PageRank and time decay |
Non-Patent Citations (2)
Title |
---|
XIAOMEI BAI ET AL.: "PNCOIRank: Evaluating the Impact of Scholarly Articles with Positive and Negative Citations", 《WWW "16 COMPANION PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE COMPANION ON WORLD WIDE WEB》 * |
牛琪锴等: "科学引文网中基于"H指数"的文章影响力评价", 《北京师范大学学报(自然科学版)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107731285A (en) * | 2017-05-10 | 2018-02-23 | 上海明品医药科技有限公司 | One kind classification educational system education contribution degree computational methods |
CN107391921A (en) * | 2017-07-13 | 2017-11-24 | 武汉科技大学 | Bibliography influence power appraisal procedure in a kind of scientific literature |
CN108021628A (en) * | 2017-11-22 | 2018-05-11 | 华南理工大学 | A kind of management system of science and technology theme |
CN108021628B (en) * | 2017-11-22 | 2021-12-21 | 华南理工大学 | Management system of science and technology theme |
CN108614867A (en) * | 2018-04-12 | 2018-10-02 | 科技部科技评估中心 | Frontline technology sex index computational methods based on scientific paper and system |
CN108764546A (en) * | 2018-05-17 | 2018-11-06 | 鞍山师范学院 | A kind of paper impact factor prediction technique based on academic big data |
CN108764546B (en) * | 2018-05-17 | 2021-04-13 | 鞍山师范学院 | Thesis influence prediction method based on academic big data |
CN108897736A (en) * | 2018-06-20 | 2018-11-27 | 大连诺道认知医学技术有限公司 | Document sort method and device based on Paper Rank algorithm |
CN108897736B (en) * | 2018-06-20 | 2022-04-12 | 大连诺道认知医学技术有限公司 | Document sorting method and device based on Paper Rank algorithm |
CN109376218A (en) * | 2018-09-14 | 2019-02-22 | 大连理工大学 | One kind being based on cascade paper impact factor appraisal procedure |
CN109376218B (en) * | 2018-09-14 | 2020-12-11 | 大连理工大学 | Thesis influence assessment method based on cascade |
CN112883147A (en) * | 2021-01-15 | 2021-06-01 | 上海柏观数据科技有限公司 | Knowledge association-based thesis citation association index evaluation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106484839B (en) | 2018-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106484839A (en) | A kind of journal impact appraisal procedure based on academic big data | |
Xu et al. | Identifying the semantic orientation of terms using S-HAL for sentiment analysis | |
Barrett | Structural equation modelling: Adjudging model fit | |
CN103207913B (en) | The acquisition methods of commercial fine granularity semantic relation and system | |
Wawer et al. | Predicting webpage credibility using linguistic features | |
Haberman et al. | Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions | |
Bertanha et al. | Spatial errors in count data regressions | |
CN106778011A (en) | A kind of scholar's influence power appraisal procedure based on academic heterogeneous network | |
Kelemework | Automatic Amharic text news classification: Aneural networks approach | |
Suzuki et al. | Investigating the effectiveness of Laplacian-based kernels in hub reduction | |
Espinosa et al. | Bots and Gender Profiling using Character Bigrams. | |
Aitken | Bayesian hierarchical random effects models in forensic science | |
Thoemmes | M-bias, butterfly bias, and butterfly bias with correlated causes–a comment on Ding and Miratrix (2015) | |
Cheng et al. | Application of a new superposition CES production function model | |
Majumdar | A Generalised Fuzzy Soft Set Based Student Ranking System. | |
Zeng et al. | Semantic multi-grain mixture topic model for text analysis | |
Fung et al. | On sparse Fisher discriminant method for microarray data analysis | |
Sadika et al. | Comparative study on textual data set using fuzzy clustering algorithms | |
Chulvi et al. | Social or individual disagreement? Perspectivism in the annotation of sexist jokes | |
US20200034735A1 (en) | System for generating topic inference information of lyrics | |
Henrik | Classifying European Court of Human Rights cases using transformer based models | |
SHEN et al. | PAN in, et al | |
Liu et al. | A Comparison of Methods for Dimensionality Assessment of Categorical Item Responses | |
Bachchan et al. | Plagiarism detection framework using monte carlo based artificial neural network for Nepali language | |
Nissa et al. | Determinants of Investment Interest in The Millennial Generation of Salatiga in the Sharia Capital Market |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180706 Termination date: 20201008 |
|
CF01 | Termination of patent right due to non-payment of annual fee |