CN110825942B - Method and system for calculating quality of thesis - Google Patents
Method and system for calculating quality of thesis Download PDFInfo
- Publication number
- CN110825942B CN110825942B CN201911003528.3A CN201911003528A CN110825942B CN 110825942 B CN110825942 B CN 110825942B CN 201911003528 A CN201911003528 A CN 201911003528A CN 110825942 B CN110825942 B CN 110825942B
- Authority
- CN
- China
- Prior art keywords
- paper
- frequency
- years
- calculating
- quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Abstract
The invention provides a method and a system for calculating thesis quality, and belongs to the technical field of information. The system comprises: the device comprises a thesis capturing module, a preprocessing module and a thesis quality calculating module. The method comprises the steps of firstly, establishing a set of introduced frequencies and introduced years of all the years of a paper; calculating the total introduced frequency of the paper and the growth rate and the average value of the introduced frequency of the paper in all years; calculating the fluctuation value of the variation of the guided frequency of the paper over the years; respectively normalizing the average value of the growth rate and the fluctuation value of the high and low variation of the guided frequency of the paper over the years; and finally obtaining the quality score of the paper. The invention can objectively and accurately calculate the quality score of the paper, eliminates the influence of factors such as subjective feeling of people, technical research heat, complex citation motivation and the like on the quality of the paper, ensures the accuracy and objectivity of the result and is easy to realize.
Description
Technical Field
The invention belongs to the technical field of information, and particularly provides a method and a system for calculating thesis quality.
Background
The paper is an important expression form of scientific research results and is the intelligence crystal of scientific researchers. The quantitative analysis of the paper quality is not only beneficial to evaluating the scientific research performance of academic subjects such as scientific researchers and periodicals, but also convenient for scientific researchers to select and read high-quality papers and obtain high-value knowledge, thereby improving the working efficiency and the achievement quality of scientific research. The thesis data has the characteristics of openness, sharing and the like, and becomes the most widely used data source for evaluating the scientific research performance of academic subjects at present. The objective and accurate thesis quality research calculation method has important application values in the aspects of talent evaluation, subject structure and importance research, core journal and conference determination, benign development of national science and technology promotion, effective management of scientific research work and the like.
A peer review method is adopted in traditional paper quality evaluation, and cannot meet the evaluation requirements of large-scale papers along with the rapid increase of the number of the papers. In addition, peer comments are qualitative evaluation methods, and the evaluation results are easily influenced by factors such as subjective feelings of the review experts. Therefore, researchers explore quantitative computational methods of paper quality. The frequency with which a paper is cited refers to the number of all cited documents from the publication to the present. Because the quoted frequency of the paper is simple to calculate, the paper becomes the most widely used method for measuring the quality of the paper at present, but because the problems of complex quote motivation, non-standard quote and the like exist in the quote process, and the increase of the technical research heat can also have positive influence on the attention and publication of the relevant paper, the judgment of the quality of the paper by only using the quote frequency is not accurate enough. Existing methods for calculating the quality of the paper also include academic trails, Altmetrics scores, the PaperRank method and the like. The academic trails incorporate various factors such as the number of the introduction papers, the number of the high-introduced and zero-introduced introduction papers, the introduction frequency of the reference documents and the introduction papers into a unified measurement, which easily causes higher calculation cost and low realization efficiency.
The Altmetrics score is obtained by comprehensively calculating and acquiring behavior data of reading, forwarding, commenting and the like of a user on a work in social and news websites such as Twitter, Facebook and the like. The method is easily limited by social media, and further influences the comprehensiveness, authenticity and reliability of data. The PaperRank method uses a PageRank algorithm of Google webpage ranking as a reference, and introduces parameters such as virtual nodes, time factor weights, decay time factors and the like to calculate the quality of a paper. The method is large in calculation amount and is easily influenced by publication time, subject field and parameters.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method and a system for calculating the quality of a paper. The invention can objectively and accurately calculate the quality score of the paper, eliminates the influence of factors such as subjective feeling of people, technical research heat, complex citation motivation and the like on the quality of the paper, ensures the accuracy and objectivity of the result and is easy to realize.
The invention provides a method for calculating the quality of a paper, which is characterized by comprising the following steps of:
1) establishing a set of introduced frequencies and introduced years of the paper calendar years;
randomly acquiring a paper, taking each year of the paper from publication to the present as a quoted year, combining each quoted year of the paper and a quoted frequency corresponding to the quoted year into a data set, and then expressing all the data sets corresponding to the paper as an ordered set p { (y) according to the sequence of the quoted years from small to large1,c1),(y2,c2),…(yn,cn)};
Wherein, yiIndicating the year of the paper quoted, ciPaper is shown at yiThe introduced frequency of the year; n represents the number of years of publication of the paper;
2) calculating the total frequency of the quoted papers, and expressing the following expression:
wherein f represents the total introduced frequency of the paper;
3) calculating the growth rate and the average value of the introduced frequency of the paper over the years; the method comprises the following specific steps:
3-1) calculating the introduced frequency growth rate of the paper in two adjacent years, and expressing the following expression:
in the formula, kiIndicating the introduced frequency growth rate of the paper in the i-th year;
3-2) calculating the average value of the frequency-induced growth rates of the papers over the years, wherein the expression is as follows:
in the formula (I), the compound is shown in the specification,means representing the rate of increase of the introduced frequencies of the paper over the years;
4) calculating the fluctuation value of the variation of the guided frequency of the paper over the years, wherein the expression is as follows:
in the formula, sigma represents the fluctuation value of the variation of the frequency of introduction of the paper over the years;
5) respectively normalizing the mean value of the growth rate and the fluctuation value of high and low changes of the guided frequency of the paper over the years by using the formulas (5) and (6); the expression is as follows:
e=argmin(x×10e>1) (6)
in the formula, x represents a variable to be normalized, the value of b is the digit of an integer digit of x, and the value of e is the minimum digit of a decimal point when x is larger than 1;
6) calculating the paper quality score, wherein the expression is as follows;
in the formula, w1To representThe influence weight of (c);to representThe sign of the value ifThenIf it is notThenIf it is notThen Is composed ofThe absolute value of the value; (1-w)1) Representing the impact weight of sigma.
The invention has the characteristics and beneficial effects that:
the invention can comprehensively consider the speed and the fluctuation of the change of the paper introduced frequency over the years: the increasing rate of the introduced frequency of the paper per year represents the change speed of the introduced frequency of the paper per year, and the fluctuation value of the introduced frequency of the paper per year represents the fluctuation phenomenon of the introduced frequency of the paper per year. The invention can generate positive or negative binding force on the total introduced frequency of the papers by utilizing the two factors, eliminates the external influence of factors such as the technical research heat, the complex citation motivation and the like on the improvement of the introduced frequency of the papers, avoids integrating the subjective feeling of people into the evaluation result of the quality of the papers, ensures the accuracy and the objectivity of the result, is easy to realize, and has important application value in the aspects of evaluating the scientific research performance, researching subject structure and importance, determining core periodicals and meetings of academic subjects such as scientific researchers and periodicals and the like, promoting the benign development of national science and technology, effectively managing scientific research work and the like.
Drawings
FIG. 1 is an overall flow diagram of the method of the present invention.
Fig. 2 is a schematic diagram of the system of the present invention.
Detailed Description
The invention provides a method and a system for calculating paper quality, and the invention is further described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a method for calculating the quality of a paper, the overall flow is shown as figure 1, and the method comprises the following steps:
1) establishing a set of introduced frequencies and introduced years of the paper calendar years;
randomly acquiring a paper, taking each year of the paper from publication to the present as a quoted year, combining each quoted year and the quoted frequency corresponding to the quoted year into a data set, and then expressing all the data sets corresponding to the paper as an ordered set p { (y) according to the sequence of the quoted years from small to large1,c1),(y2,c2),…(yn,cn)}. Wherein, yiIndicating the quoted year of the paper, yiThe materials need to be arranged from small to large; c. CiPaper is shown at yiThe introduced frequency of the year; n denotes the number of years of publication of the paper.
In this embodiment, the cited frequencies of the papers t in 2015, 2016, 2017, 2018 and 2019 are assumed to be 20, 50, 40, 60 and 70 respectively. The set of articles t is denoted as p { (2015,20), (2016,50), (2017,40), (2018, 60), (2019,70) }.
2) Calculating the total introduced frequency of the paper;
the total introduced frequency of the paper is obtained by cumulatively summing the introduced frequency of the paper over the years, and the calculation method is shown as formula (1).
In equation (1), f represents the total introduced frequency of the paper, ciPaper is shown at yiThe frequency of year is introduced.
In this embodiment, according to equation (1), the total frequency of paper t is f-20 +50+40+60+ 70-240.
3) Calculating the growth rate and the average value of the introduced frequency of the paper over the years;
calculating the growth rate of the introduced frequency of the paper in two adjacent years according to the set obtained in the step 1). The method comprises the following specific steps:
3-1) quoted frequency and quoted year of the papers in the collection generated according to step 1);
and (3) calculating the introduced frequency growth rate of the paper in two adjacent years, wherein the calculation method is shown as formula (2).
In the formula (2), kiIndicating the introduced frequency growth rate of the paper in year i. According to the formula (2), the calendar year frequency growth rate of the paper t is respectively
3-2) calculating the average value of the induced frequency growth rate of the paper over the years according to the calculation result of the formula (2), wherein the calculation method is shown as the formula (3).
In the formula (3), the first and second groups,representing the mean of the rates of growth of the introduced frequencies of the paper over the years. n denotes the number of years of publication of the paper.
4) Calculating the fluctuation value of the variation of the guided frequency of the paper over the years;
obtaining a fluctuation value of the frequency of the paper introduced in the past year, which changes in height, according to the frequency increase rate and the average value of the paper introduced in the past year obtained in the step 3), wherein the calculation method is shown as a formula (4).
In equation (4), σ represents the fluctuation value of the variation of the frequency of introduction of the paper over the years. According to the formula (4), in this embodiment, the fluctuation value of the high-low variation of the introduced frequency of the paper t over the years is:
5) respectively normalizing the growth rate and the high-low variation fluctuation value of the guided frequency of the paper over the years;
mean rate of growth taking into account the frequency with which papers are introduced over yearsThe magnitude of the fluctuation value sigma of the variation of the height is greatly different from that of the fluctuation value sigma of the variation of the height, and in order to ensure that the two factors play the same level role in the quality of a paper and facilitate the calculation of the quality score of the subsequent paper, the invention designs a normalization method which is to be used for calculating the quality score of the paperAnd σ is mapped to the (0,1) interval, and the calculation method is as shown in equation (5). The method has the advantage of not using other papersIn case of sum σ, different papers can be guaranteedThe normalized value of the sum sigma is consistent with the original value sequence, so that noise is prevented from being introduced into the calculation of the quality of the paper, and the accuracy of the calculation of the quality of the paper is facilitated.
e=argmin(x×10e>1) (6)
In the formula (5), x represents a variable to be normalized (represented here)Or σ), b has the value of the number of integer bits of x.
In this embodiment, the description of article tThe value (12.5) can be given by using formula (5) as b 2 and g (12.5) as 0.2125. The value of e is the minimum number of bits for shifting the decimal point to the right when x is greater than 1, and the calculation method is shown in formula (6). Assuming that x is 0.23, e is 1, and g (0.23) is 0.023.
6) Calculating a paper quality score;
and (4) calculating the quality score of the paper according to the total introduced frequency of the paper, the growth rate of the introduced frequency of the paper over the years and the fluctuation value of the high and low changes, wherein the calculation method is shown as a formula (7).
In the formula (7), w1To representThe default value of (3) is 0.7.To representThe sign of the value ifThenIf it is notThenIf it is notThen Is composed ofAbsolute value of the value. (1-w)1) Representing the impact weight of sigma. f denotes the total quoted frequency of the paper.
In this embodiment, according to equation (7), the quality score of the paper t is pi ═ 1+0.7 × 0.2125-0.3 × 0.21479] × 240 ═ 260.23512.
Aiming at papers in the same subject field, the higher the quality score calculated by the method of the invention is, the higher the quality of the paper is.
The invention provides a paper quality calculation system based on the method, the structure of which is shown in figure 2, and the system comprises: the device comprises a thesis capturing module, a preprocessing module and a thesis quality calculating module. The output end of the thesis capturing module is connected with the input end of the preprocessing module, and the output end of the preprocessing module is connected with the input end of the thesis quality calculating module.
The thesis capturing module is used for acquiring the thesis information of the quality to be calculated and the quotation thesis information corresponding to the thesis by using a web crawler method and sending the thesis information and the quotation thesis information to the preprocessing module.
The preprocessing module is used for counting the introduced frequency and the introduced year of the quality paper to be calculated according to the information received from the paper capturing module and sending the counted frequency and the introduced year to the paper quality calculating module.
The paper quality calculating module is used for calculating the quality score of the paper to be evaluated by using the method according to the quoted years and the quoted years of the paper to be evaluated, which are received from the preprocessing module.
Preferably, the thesis information of the quality to be calculated acquired by the thesis capturing module comprises a title and an author of the thesis, the referral thesis information of the thesis is captured from the academic website by using a web crawler, and the captured referral thesis information comprises the title and publication time of the referral thesis.
Preferably, the preprocessing module is used for counting the introduced frequency and the introduced year of the paper to be evaluated according to the publication time of the paper to be introduced.
In this example, the paper t has 240 cited papers, wherein the number of cited papers published in 2015 is 20, the number of cited papers published in 2016 is 50, the number of cited papers published in 2017 is 40, the number of cited papers published in 2018 is 60, and the number of cited papers published in 2019 is 70. The quoted frequencies of the paper t in 2015, 2016, 2017, 2018 and 2019 are 20, 50, 40, 60 and 70 respectively.
Preferably, the paper quality calculating module is composed of a set representation unit of the guided times and the guided times of the paper years, a total guided frequency calculation unit of the paper, a growth rate calculation unit of the guided times of the paper years, a high and low change fluctuation value calculation unit of the guided times of the paper years, and a high and low change fluctuation value normalization representation unit of the guided frequencies of the paper years, which are mutually connected;
the set representation unit of the introduced frequency and the introduced year of the paper calendar year represents the introduced frequency and the introduced year of the paper calendar year as an ordered set according to the output result of the preprocessing module;
the total introduced frequency calculating unit counts the total introduced frequency according to the feedback result of the set representing unit of the introduced frequency and the introduced year of the paper calendar year;
the growth rate calculation unit of the introduced frequency of the paper calendar year calculates the growth rate of the introduced frequency of the paper in two adjacent years and the average value of the growth rate according to the feedback result of the set representation unit of the introduced frequency and the introduced year of the paper calendar year;
the high-low change fluctuation value calculation unit of the introduced frequency of the paper all the year calculates the high-low change fluctuation value of the introduced frequency of the paper all the year according to the feedback result of the growth rate calculation unit of the introduced frequency of the paper all the year;
the normalization expression unit for the growth rate and the high-low variation fluctuation value of the introduced frequency of the paper years is used for mapping the feedback results of the growth rate calculation unit for the introduced frequency of the paper years and the high-low variation fluctuation value calculation unit for the introduced frequency of the paper years to the (0,1) interval.
The subject matter of the present invention has been described in detail with reference to the preferred embodiments, and it is to be understood that the above description is not to be taken in a limiting sense. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.
Claims (2)
1. A method for calculating the quality of a paper is characterized by comprising the following steps:
1) establishing a set of introduced frequencies and introduced years of the paper calendar years;
randomly acquiring a paper, taking each year of the paper from publication to the present as a quoted year, combining each quoted year of the paper and a quoted frequency corresponding to the quoted year into a data set, and then expressing all the data sets corresponding to the paper as an ordered set p { (y) according to the sequence of the quoted years from small to large1,c1),(y2,c2),…(yn,cn)};
Wherein, yiIndicating the year of the paper quoted, ciPaper is shown at yiThe introduced frequency of the year; n represents the number of years of publication of the paper;
2) calculating the total frequency of the quoted papers, and expressing the following expression:
wherein f represents the total introduced frequency of the paper;
3) calculating the growth rate and the average value of the introduced frequency of the paper over the years; the method comprises the following specific steps:
3-1) calculating the introduced frequency growth rate of the paper in two adjacent years, and expressing the following expression:
in the formula, kiIndicating the introduced frequency growth rate of the paper in the i-th year;
3-2) calculating the average value of the frequency-induced growth rates of the papers over the years, wherein the expression is as follows:
in the formula (I), the compound is shown in the specification,means representing the rate of increase of the introduced frequencies of the paper over the years;
4) calculating the fluctuation value of the variation of the guided frequency of the paper over the years, wherein the expression is as follows:
in the formula, sigma represents the fluctuation value of the variation of the frequency of introduction of the paper over the years;
5) respectively normalizing the mean value of the growth rate and the fluctuation value of high and low changes of the guided frequency of the paper over the years by using the formulas (5) and (6); the expression is as follows:
e=argmin(x×10e>1) (6)
in the formula, x represents a variable to be normalized, the value of b is the digit of an integer digit of x, and the value of e is the minimum digit of a decimal point when x is larger than 1;
6) calculating the paper quality score, wherein the expression is as follows;
2. A paper quality calculation system based on the method of claim 1, comprising: the system comprises a thesis capturing module, a preprocessing module and a thesis quality calculating module; the output end of the thesis capturing module is connected with the input end of the preprocessing module, and the output end of the preprocessing module is connected with the input end of the thesis quality calculating module;
the paper capturing module is used for acquiring the paper information of the quality to be calculated and the quotation paper information corresponding to the paper and sending the paper information to the preprocessing module;
the preprocessing module is used for counting the introduced frequency and the introduced year of the quality paper to be calculated according to the information received from the paper capturing module and sending the counted frequency and the introduced year to the paper quality calculating module;
and the paper quality calculating module is used for calculating the quality score of the paper to be evaluated according to the introduced frequency and the introduced year of the past year of the paper to be evaluated, which is received from the preprocessing module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911003528.3A CN110825942B (en) | 2019-10-22 | 2019-10-22 | Method and system for calculating quality of thesis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911003528.3A CN110825942B (en) | 2019-10-22 | 2019-10-22 | Method and system for calculating quality of thesis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110825942A CN110825942A (en) | 2020-02-21 |
CN110825942B true CN110825942B (en) | 2021-06-29 |
Family
ID=69550019
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911003528.3A Active CN110825942B (en) | 2019-10-22 | 2019-10-22 | Method and system for calculating quality of thesis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110825942B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112883148B (en) * | 2021-01-15 | 2023-03-28 | 博观创新(上海)大数据科技有限公司 | Subject talent evaluation control method and device based on research trend matching |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559262A (en) * | 2013-11-04 | 2014-02-05 | 北京邮电大学 | Community-based author and academic paper recommending system and recommending method |
CN108132961A (en) * | 2017-11-06 | 2018-06-08 | 浙江工业大学 | A kind of bibliography based on reference prediction recommends method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080229828A1 (en) * | 2007-03-20 | 2008-09-25 | Microsoft Corporation | Establishing reputation factors for publishing entities |
CN101887460A (en) * | 2010-07-14 | 2010-11-17 | 北京大学 | Document quality assessment method and application |
CN102156706A (en) * | 2011-01-28 | 2011-08-17 | 清华大学 | Mentor recommendation system and method |
CN107229738B (en) * | 2017-06-18 | 2020-04-03 | 杭州电子科技大学 | Academic paper search ordering method based on document scoring model and relevancy |
CN109146330A (en) * | 2018-09-25 | 2019-01-04 | 浙江理工大学 | A kind of evaluation method of the academic aptitude of scientific research institution |
-
2019
- 2019-10-22 CN CN201911003528.3A patent/CN110825942B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559262A (en) * | 2013-11-04 | 2014-02-05 | 北京邮电大学 | Community-based author and academic paper recommending system and recommending method |
CN108132961A (en) * | 2017-11-06 | 2018-06-08 | 浙江工业大学 | A kind of bibliography based on reference prediction recommends method |
Also Published As
Publication number | Publication date |
---|---|
CN110825942A (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Meschede et al. | Cross-metric compatability and inconsistencies of altmetrics | |
WO2013138961A1 (en) | Method and system for measuring web advertising effectiveness based on multiple-contact attribution model | |
Yang et al. | Using the comprehensive patent citation network (CPC) to evaluate patent value | |
CN101388024B (en) | Compression space high-efficiency search method based on complex network | |
Li et al. | Novel user influence measurement based on user interaction in microblog | |
Wei | Research on impact evaluation of open access journals | |
CN105138577A (en) | Big data based event evolution analysis method | |
Wu et al. | Measuring energy congestion in Chinese industrial sectors: a slacks-based DEA approach | |
Poirrier et al. | Robust h-index | |
Verma et al. | An altmetric comparison of highly cited digital library publications of India and China | |
KR101206863B1 (en) | Method and Apparatus for Producing User Reputation of Online Network | |
CN110825868A (en) | Topic popularity based text pushing method, terminal device and storage medium | |
CN110825942B (en) | Method and system for calculating quality of thesis | |
Dobránszki et al. | Corrective factors for author-and journal-based metrics impacted by citations to accommodate for retractions | |
Akita et al. | Pro-poorness of rural economic growth and the roles of education in Bhutan, 2007–2017 | |
CN107895053A (en) | Emerging much-talked-about topic detecting system and method based on topic cluster momentum model | |
Liu et al. | Multi-views on Nature Index of Chinese academic institutions | |
CN103389984A (en) | Method and device for providing collection association information in search results | |
Qin et al. | Assessing the quality of wikipedia pages using edit longevity and contributor centrality | |
CN114722295A (en) | Internet-based technology promotion system and method | |
Akoumianakis et al. | Retaining and exploring online remains on YouTube | |
CN113127696A (en) | Method for improving accuracy of influence measurement based on behaviors | |
CN103399918B (en) | A kind of method improving the searched rate in website | |
Kao et al. | Assessing improvement in management research in Taiwan | |
Wang et al. | The Digital Economy and the Energy “Internal Circulation”: Evidence from China’s Interprovincial Energy Trade |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |