CN110955749A - Paper attention prediction method - Google Patents

Paper attention prediction method Download PDF

Info

Publication number
CN110955749A
CN110955749A CN201911019467.XA CN201911019467A CN110955749A CN 110955749 A CN110955749 A CN 110955749A CN 201911019467 A CN201911019467 A CN 201911019467A CN 110955749 A CN110955749 A CN 110955749A
Authority
CN
China
Prior art keywords
paper
attention
papers
method comprises
predicting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911019467.XA
Other languages
Chinese (zh)
Inventor
周艳波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201911019467.XA priority Critical patent/CN110955749A/en
Publication of CN110955749A publication Critical patent/CN110955749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Abstract

The invention relates to a method for predicting the attention of a paper, which is characterized in that the attenuation factor of the current paper and the weighted reference number of all papers are calculated by confirming the paper to be predicted and acquiring information, and finally the attention index of the paper is calculated and the attention of the paper is predicted. The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.

Description

Paper attention prediction method
Technical Field
The invention belongs to the technical field of electric digital data processing, and particularly relates to a method for predicting the thesis attention degree in the technical field of thesis retrieval.
Background
With the development of science and technology, the number of scientific and technical papers has increased exponentially year by year.
Scientific and technological articles, also called original articles or a primary literature in intelligence science, are scientific and technological personnel or other researchers to scientifically analyze, comprehensively research and elucidate phenomena (or problems) in the fields of natural science, engineering technology science and human and art research on the basis of scientific experiments (or experiments), further research some phenomena and problems, summarize and innovate other results and conclusions, and perform electronic and written expression according to the requirements of various scientific and technological journals.
The papers, especially the scientific papers, are often academic, innovative and scientific.
A large number of scientific and technological papers increase the difficulty of retrieval and evaluation, and people are difficult to discover papers with high quality and representing the scientific and technological development trend.
In the prior art, the evaluation of the quality of the papers is mainly based on the number of citations of the papers and the influence factors of the journal in which the papers are located, and the method mainly utilizes historical citation data among the papers to carry out simple statistics and can only represent the past attention of the papers but can not predict the future attention of the papers; however, the future attention of the paper represents the development trend of the related technology, and the paper has great guiding significance for the research of the technologists, and the lack of the aspect is a lack of a dimensional information source for the technologists.
Disclosure of Invention
The invention solves the problems that the evaluation of the quality of the papers in the prior art mainly utilizes the historical citation data among the papers to carry out simple statistics, only represents the past attention of the papers and cannot predict the future attention of the papers, and provides an optimized prediction method of the attention of the papers, which fully utilizes the citation data of the papers and mines and predicts the future attention of the papers.
The invention adopts the technical scheme that a method for predicting the attention of a thesis comprises the following steps:
step 1: confirming any paper i and acquiring the information of the paper i;
step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei
And step 3: based on the attenuation factor gammaiCalculating the respective weighted reference numbers of all the papers at present;
and 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
Preferably, in the step 1, the information of the paper i includes publication time p of the paper ii
Preferably, in said step 2, the attenuation factor γ of the articlei=(t-pi)Where t is the current time and α is the adjustable parameter.
Preferably, t-piThe value of (D) is in days.
Preferably α ∈ (0,1 ].
Preferably, in said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
Preferably, the first and second electrodes are formed of a metal,
Figure BDA0002246716990000021
wherein t is the current time.
Preferably, p isjThe value of-t is the negative of the number of days.
Preferably, in the step 4, the attention index of the paper i
Figure BDA0002246716990000022
Wherein E is the set of all papers, and m is any one of the set of papers E.
Preferably, the attention index P of paper iiThe higher the prediction attention corresponding to paper i.
The invention provides an optimized thesis attention prediction method, which is used for calculating an attenuation factor gamma of a current thesis by confirming a thesis i needing to be predicted and acquiring informationiAnd weighted reference numbers of all papers, and finally calculating the attention index P of the paper iiAnd based on the attention index P of the paper iiThe attention of paper i is predicted.
The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention is described in further detail with reference to the following examples, but the scope of the present invention is not limited thereto.
The invention relates to a method for predicting the attention of a paper, and states that i, j, k and m mentioned in the invention are all used as subscripts of a certain paper, and are generally positive integers.
The method comprises the following steps.
Step 1: and identifying any paper i and acquiring the information of the paper i.
In the step 1, the information of the paper i includes publication time p of the paper ii
In the present invention, publication time p corresponding to paper iiCan be read directly from the data set.
In the invention, the prediction method is linked with the computer, and information of all existing papers including but not limited to the names, publication time, authors and abstracts of the papers is crawled by means of crawlers and the like and integrated into the database of the computer, so that subsequent calling is facilitated.
Step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei
Attenuation factor γ of the article in said step 2i=(t-pi)Where t is the current time and α is the adjustable parameter.
t-piThe value of (D) is in days.
α∈(0,1]。
In the present invention, γ isiIs inversely related to the age of the paper i, i.e., the larger the age of the paper i, the longer the time from publication, and gammaiThe smaller in principle.
In the present invention, for example, if the current time for calculating the attention of the paper i is 8/1/2019, the publication time of the paper i is 3/1/2008, and the time difference t-p between the two times isi4170, when α is 0.1, then γi=4170-0.1≈0.43。
And step 3: based on the attenuation factor gammaiThe respective weighted reference numbers of all the papers at present are calculated.
In said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
Figure BDA0002246716990000041
Wherein t is the current time.
pjThe value of-t is the negative of the number of days.
In the present invention, information about all articles is read from the data set based on the attenuation factor γ determined in step 2iAnd performing weighted calculation on the reference numbers of all the papers to further obtain the respective weighted reference numbers of all the current papers.
In the present invention, let publication time of paper j be pjIf it refers to paper k, then the corresponding reference weight w for paper k referenced by paper j can be calculatedjkAnd the paper j is taken as an element in a set V of all papers quoting the paper k, and quoting weights between all elements in the set V and the paper k are sequentially calculated, wherein the sum of all quoting weights is the weighted quoting number Ck
And 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
In said step 4, the attention index of paper i
Figure BDA0002246716990000042
Wherein E is the set of all papers, and m is any one of the set of papers E.
Attention index P of paper iiThe higher the prediction attention corresponding to paper i.
In the present invention, the attention index P of paper iiWith the weighted reference number of paper i andthe sum and the division of the weighted citation number of each paper in all the current papers can obtain papers with high attention or strong tendency from a large number of papers based on the characteristic of time exponential decay and the difference of decay speeds of papers of different ages.
In the invention, the higher predicted attention of the paper i means that the paper i may attract more attention in the future.
In the invention, the attention index can be quickly calculated through computer calculation.
The invention calculates the attenuation factor gamma of the current paper by confirming the paper i needing to be predicted and acquiring informationiAnd the weighted reference numbers of all the papers, and finally calculating the attention index P of the paper iiAnd based on the attention index P of the paper iiThe attention of paper i is predicted.
The method starts from the time dimension, predicts the future attention of the papers based on the characteristic that the citation influence of the papers exponentially decays along with the time and the difference of decay speeds of the papers of different ages, can predict the attention of the papers which have smaller attention but possibly attract a large amount of attention at present, can indirectly predict the development trend of related technologies, has good guiding significance for the research of science and technology personnel, and has high application value.

Claims (10)

1. A method for predicting the attention of a paper is characterized in that: the method comprises the following steps:
step 1: confirming any paper i and acquiring the information of the paper i;
step 2: calculating the attenuation factor gamma corresponding to the paper i at the current timei
And step 3: based on the attenuation factor gammaiCalculating the respective weighted reference numbers of all the papers at present;
and 4, step 4: and calculating the attention index of the paper i based on the respective weighted reference numbers of all the papers, and predicting the attention of the paper.
2. A paper according to claim 1The method for predicting the attention degree is characterized in that: in the step 1, the information of the paper i includes publication time p of the paper ii
3. The method of claim 2, wherein the method comprises: attenuation factor γ of the article in said step 2i=(t-pi)Where t is the current time and α is the adjustable parameter.
4. The method of claim 3, wherein the method comprises: t-piThe value of (D) is in days.
5. A method for predicting the interest level of a paper as claimed in claim 3, wherein α e (0, 1).
6. The method of claim 1, wherein the method comprises: in said step 3, the weighted reference number C of any paper kk=∑j∈VwjkWhere V is the set of all cited papers k, wjkFor citation of paper k and publication time pjThe reference weight corresponding to paper j.
7. The method of claim 6, wherein the method comprises:
Figure FDA0002246716980000021
wherein t is the current time.
8. The method of claim 7, wherein the method comprises: p is a radical ofjThe value of-t is the negative of the number of days.
9. The method of claim 1, wherein the method comprises: in said step 4, the attention index of paper i
Figure FDA0002246716980000022
Wherein E is the set of all papers, and m is any one of the set of papers E.
10. The method of claim 1, wherein the method comprises: attention index P of paper iiThe higher the prediction attention corresponding to paper i.
CN201911019467.XA 2019-10-24 2019-10-24 Paper attention prediction method Pending CN110955749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911019467.XA CN110955749A (en) 2019-10-24 2019-10-24 Paper attention prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911019467.XA CN110955749A (en) 2019-10-24 2019-10-24 Paper attention prediction method

Publications (1)

Publication Number Publication Date
CN110955749A true CN110955749A (en) 2020-04-03

Family

ID=69975681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911019467.XA Pending CN110955749A (en) 2019-10-24 2019-10-24 Paper attention prediction method

Country Status (1)

Country Link
CN (1) CN110955749A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006128089A2 (en) * 2005-05-26 2006-11-30 Conjugon, Inc. Compositions and methods for treating tissue
CN101887460A (en) * 2010-07-14 2010-11-17 北京大学 Document quality assessment method and application
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
CN104573103A (en) * 2015-01-30 2015-04-29 福州大学 Coauthor recommending method under scientific and technical literature heterogeneous network
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration
CN105740452A (en) * 2016-02-03 2016-07-06 北京工业大学 Scientific and technical literature importance degree evaluation method based on PageRank and time decay

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006128089A2 (en) * 2005-05-26 2006-11-30 Conjugon, Inc. Compositions and methods for treating tissue
CN101887460A (en) * 2010-07-14 2010-11-17 北京大学 Document quality assessment method and application
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
CN104573103A (en) * 2015-01-30 2015-04-29 福州大学 Coauthor recommending method under scientific and technical literature heterogeneous network
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration
CN105740452A (en) * 2016-02-03 2016-07-06 北京工业大学 Scientific and technical literature importance degree evaluation method based on PageRank and time decay

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》: "科学引文网络分析及其应用研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
Li et al. A survey on statistical methods for health care fraud detection
CN110674970A (en) Enterprise legal risk early warning method, device, equipment and readable storage medium
US20210390457A1 (en) Systems and methods for machine learning model interpretation
Sankar Ganesh et al. Forecasting air quality index using an ensemble of artificial neural networks and regression models
CN112215696A (en) Personal credit evaluation and interpretation method, device, equipment and storage medium based on time sequence attribution analysis
Nemeth et al. The comparison of machine-learning methods XGBoost and LightGBM to predict energy development
Saghir et al. Monitoring process variation using modified EWMA
CN111160959A (en) User click conversion estimation method and device
CN113177700A (en) Risk assessment method, system, electronic equipment and storage medium
Xing et al. Seasonal and trend forecasting of tourist arrivals: An adaptive multiscale ensemble learning approach
CN115936895A (en) Risk assessment method, device and equipment based on artificial intelligence and storage medium
CN108846128B (en) Cross-domain text classification method based on adaptive noise reduction encoder
Yahaya et al. An enhanced bank customers churn prediction model using a hybrid genetic algorithm and k-means filter and artificial neural network
CN115249081A (en) Object type prediction method and device, computer equipment and storage medium
CN110955749A (en) Paper attention prediction method
Méndez-Jiménez et al. Modelling and forecasting of the radiation level time series at the Canfranc Underground Laboratory
Kamatani et al. Construction of a system using a deep learning algorithm to count cell numbers in nanoliter wells for viable single-cell experiments
CN111028086A (en) Enhanced index tracking method based on clustering and LSTM network
Baba et al. Predicting book use in university libraries by synchronous obsolescence
Belhouchette Facial recognition to identify emotions: an application of deep learning
Reid et al. The use of skewness, kurtosis and neural networks for determining corrosion mechanism from electrochemical noise data
Davydenko et al. Identification of cyclic changes in the operation mode of the production facility based on the monitoring data
CN113159419A (en) Group feature portrait analysis method, device and equipment and readable storage medium
Wang et al. Evolution and abrupt change for water use structure through matrix-based Renyi's alpha order entropy functional
CN112633528A (en) Power grid primary equipment operation and maintenance cost determination method based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200403