CN110955749A - Paper attention prediction method - Google Patents
Paper attention prediction method Download PDFInfo
- Publication number
- CN110955749A CN110955749A CN201911019467.XA CN201911019467A CN110955749A CN 110955749 A CN110955749 A CN 110955749A CN 201911019467 A CN201911019467 A CN 201911019467A CN 110955749 A CN110955749 A CN 110955749A
- Authority
- CN
- China
- Prior art keywords
- paper
- attention
- papers
- method comprises
- predicting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
Abstract
The invention relates to a method for predicting the attention of a paper, which is characterized in that the attenuation factor of the current paper and the weighted reference number of all papers are calculated by confirming the paper to be predicted and acquiring information, and finally the attention index of the paper is calculated and the attention of the paper is predicted. The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.
Description
Technical Field
The invention belongs to the technical field of electric digital data processing, and particularly relates to a method for predicting the thesis attention degree in the technical field of thesis retrieval.
Background
With the development of science and technology, the number of scientific and technical papers has increased exponentially year by year.
Scientific and technological articles, also called original articles or a primary literature in intelligence science, are scientific and technological personnel or other researchers to scientifically analyze, comprehensively research and elucidate phenomena (or problems) in the fields of natural science, engineering technology science and human and art research on the basis of scientific experiments (or experiments), further research some phenomena and problems, summarize and innovate other results and conclusions, and perform electronic and written expression according to the requirements of various scientific and technological journals.
The papers, especially the scientific papers, are often academic, innovative and scientific.
A large number of scientific and technological papers increase the difficulty of retrieval and evaluation, and people are difficult to discover papers with high quality and representing the scientific and technological development trend.
In the prior art, the evaluation of the quality of the papers is mainly based on the number of citations of the papers and the influence factors of the journal in which the papers are located, and the method mainly utilizes historical citation data among the papers to carry out simple statistics and can only represent the past attention of the papers but can not predict the future attention of the papers; however, the future attention of the paper represents the development trend of the related technology, and the paper has great guiding significance for the research of the technologists, and the lack of the aspect is a lack of a dimensional information source for the technologists.
Disclosure of Invention
The invention solves the problems that the evaluation of the quality of the papers in the prior art mainly utilizes the historical citation data among the papers to carry out simple statistics, only represents the past attention of the papers and cannot predict the future attention of the papers, and provides an optimized prediction method of the attention of the papers, which fully utilizes the citation data of the papers and mines and predicts the future attention of the papers.
The invention adopts the technical scheme that a method for predicting the attention of a thesis comprises the following steps:
step 1: confirming any paper i and acquiring the information of the paper i;
step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei;
And step 3: based on the attenuation factor gammaiCalculating the respective weighted reference numbers of all the papers at present;
and 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
Preferably, in the step 1, the information of the paper i includes publication time p of the paper ii。
Preferably, in said step 2, the attenuation factor γ of the articlei=(t-pi)-αWhere t is the current time and α is the adjustable parameter.
Preferably, t-piThe value of (D) is in days.
Preferably α ∈ (0,1 ].
Preferably, in said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
Preferably, p isjThe value of-t is the negative of the number of days.
Preferably, in the step 4, the attention index of the paper iWherein E is the set of all papers, and m is any one of the set of papers E.
Preferably, the attention index P of paper iiThe higher the prediction attention corresponding to paper i.
The invention provides an optimized thesis attention prediction method, which is used for calculating an attenuation factor gamma of a current thesis by confirming a thesis i needing to be predicted and acquiring informationiAnd weighted reference numbers of all papers, and finally calculating the attention index P of the paper iiAnd based on the attention index P of the paper iiThe attention of paper i is predicted.
The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention is described in further detail with reference to the following examples, but the scope of the present invention is not limited thereto.
The invention relates to a method for predicting the attention of a paper, and states that i, j, k and m mentioned in the invention are all used as subscripts of a certain paper, and are generally positive integers.
The method comprises the following steps.
Step 1: and identifying any paper i and acquiring the information of the paper i.
In the step 1, the information of the paper i includes publication time p of the paper ii。
In the present invention, publication time p corresponding to paper iiCan be read directly from the data set.
In the invention, the prediction method is linked with the computer, and information of all existing papers including but not limited to the names, publication time, authors and abstracts of the papers is crawled by means of crawlers and the like and integrated into the database of the computer, so that subsequent calling is facilitated.
Step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei。
Attenuation factor γ of the article in said step 2i=(t-pi)-αWhere t is the current time and α is the adjustable parameter.
t-piThe value of (D) is in days.
α∈(0,1]。
In the present invention, γ isiIs inversely related to the age of the paper i, i.e., the larger the age of the paper i, the longer the time from publication, and gammaiThe smaller in principle.
In the present invention, for example, if the current time for calculating the attention of the paper i is 8/1/2019, the publication time of the paper i is 3/1/2008, and the time difference t-p between the two times isi4170, when α is 0.1, then γi=4170-0.1≈0.43。
And step 3: based on the attenuation factor gammaiThe respective weighted reference numbers of all the papers at present are calculated.
In said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
pjThe value of-t is the negative of the number of days.
In the present invention, information about all articles is read from the data set based on the attenuation factor γ determined in step 2iAnd performing weighted calculation on the reference numbers of all the papers to further obtain the respective weighted reference numbers of all the current papers.
In the present invention, let publication time of paper j be pjIf it refers to paper k, then the corresponding reference weight w for paper k referenced by paper j can be calculatedjkAnd the paper j is taken as an element in a set V of all papers quoting the paper k, and quoting weights between all elements in the set V and the paper k are sequentially calculated, wherein the sum of all quoting weights is the weighted quoting number Ck。
And 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
In said step 4, the attention index of paper iWherein E is the set of all papers, and m is any one of the set of papers E.
Attention index P of paper iiThe higher the prediction attention corresponding to paper i.
In the present invention, the attention index P of paper iiWith the weighted reference number of paper i andthe sum and the division of the weighted citation number of each paper in all the current papers can obtain papers with high attention or strong tendency from a large number of papers based on the characteristic of time exponential decay and the difference of decay speeds of papers of different ages.
In the invention, the higher predicted attention of the paper i means that the paper i may attract more attention in the future.
In the invention, the attention index can be quickly calculated through computer calculation.
The invention calculates the attenuation factor gamma of the current paper by confirming the paper i needing to be predicted and acquiring informationiAnd the weighted reference numbers of all the papers, and finally calculating the attention index P of the paper iiAnd based on the attention index P of the paper iiThe attention of paper i is predicted.
The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.
Claims (10)
1. A method for predicting the attention of a paper is characterized in that: the method comprises the following steps:
step 1: confirming any paper i and acquiring the information of the paper i;
step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei;
And step 3: based on the attenuation factor gammaiCalculating the respective weighted reference numbers of all the papers at present;
and 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
2. A paper according to claim 1The method for predicting the attention degree is characterized in that: in the step 1, the information of the paper i includes publication time p of the paper ii。
3. The method of claim 2, wherein the method comprises: attenuation factor γ of the article in said step 2i=(t-pi)-αWhere t is the current time and α is the adjustable parameter.
4. The method of claim 3, wherein the method comprises: t-piThe value of (D) is in days.
5. A method for predicting the interest level of a paper as claimed in claim 3, wherein α e (0, 1).
6. The method of claim 1, wherein the method comprises: in said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
8. The method of claim 7, wherein the method comprises: p is a radical ofjThe value of-t is the negative of the number of days.
10. The method of claim 1, wherein the method comprises: attention index P of paper iiThe higher the prediction attention corresponding to paper i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911019467.XA CN110955749A (en) | 2019-10-24 | 2019-10-24 | Paper attention prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911019467.XA CN110955749A (en) | 2019-10-24 | 2019-10-24 | Paper attention prediction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110955749A true CN110955749A (en) | 2020-04-03 |
Family
ID=69975681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911019467.XA Pending CN110955749A (en) | 2019-10-24 | 2019-10-24 | Paper attention prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110955749A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006128089A2 (en) * | 2005-05-26 | 2006-11-30 | Conjugon, Inc. | Compositions and methods for treating tissue |
CN101887460A (en) * | 2010-07-14 | 2010-11-17 | 北京大学 | Document quality assessment method and application |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
CN104573103A (en) * | 2015-01-30 | 2015-04-29 | 福州大学 | Coauthor recommending method under scientific and technical literature heterogeneous network |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
CN105740452A (en) * | 2016-02-03 | 2016-07-06 | 北京工业大学 | Scientific and technical literature importance degree evaluation method based on PageRank and time decay |
-
2019
- 2019-10-24 CN CN201911019467.XA patent/CN110955749A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006128089A2 (en) * | 2005-05-26 | 2006-11-30 | Conjugon, Inc. | Compositions and methods for treating tissue |
CN101887460A (en) * | 2010-07-14 | 2010-11-17 | 北京大学 | Document quality assessment method and application |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
CN104573103A (en) * | 2015-01-30 | 2015-04-29 | 福州大学 | Coauthor recommending method under scientific and technical literature heterogeneous network |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
CN105740452A (en) * | 2016-02-03 | 2016-07-06 | 北京工业大学 | Scientific and technical literature importance degree evaluation method based on PageRank and time decay |
Non-Patent Citations (1)
Title |
---|
《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》: "科学引文网络分析及其应用研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A survey on statistical methods for health care fraud detection | |
CN110674970A (en) | Enterprise legal risk early warning method, device, equipment and readable storage medium | |
US20210390457A1 (en) | Systems and methods for machine learning model interpretation | |
Sankar Ganesh et al. | Forecasting air quality index using an ensemble of artificial neural networks and regression models | |
CN112215696A (en) | Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis | |
Nemeth et al. | The comparison of machine-learning methods XGBoost and LightGBM to predict energy development | |
Saghir et al. | Monitoring process variation using modified EWMA | |
CN111160959A (en) | User click conversion estimation method and device | |
CN113177700A (en) | Risk assessment method, system, electronic equipment and storage medium | |
Xing et al. | Seasonal and trend forecasting of tourist arrivals: An adaptive multiscale ensemble learning approach | |
CN115936895A (en) | Risk assessment method, device and equipment based on artificial intelligence and storage medium | |
CN108846128B (en) | Cross-domain text classification method based on adaptive noise reduction encoder | |
Yahaya et al. | An enhanced bank customers churn prediction model using a hybrid genetic algorithm and k-means filter and artificial neural network | |
CN115249081A (en) | Object type prediction method and device, computer equipment and storage medium | |
CN110955749A (en) | Paper attention prediction method | |
Méndez-Jiménez et al. | Modelling and forecasting of the radiation level time series at the Canfranc Underground Laboratory | |
Kamatani et al. | Construction of a system using a deep learning algorithm to count cell numbers in nanoliter wells for viable single-cell experiments | |
CN111028086A (en) | Enhanced index tracking method based on clustering and LSTM network | |
Baba et al. | Predicting book use in university libraries by synchronous obsolescence | |
Belhouchette | Facial recognition to identify emotions: an application of deep learning | |
Reid et al. | The use of skewness, kurtosis and neural networks for determining corrosion mechanism from electrochemical noise data | |
Davydenko et al. | Identification of cyclic changes in the operation mode of the production facility based on the monitoring data | |
CN113159419A (en) | Group feature portrait analysis method, device and equipment and readable storage medium | |
Wang et al. | Evolution and abrupt change for water use structure through matrix-based Renyi's alpha order entropy functional | |
CN112633528A (en) | Power grid primary equipment operation and maintenance cost determination method based on support vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200403 |