CN106250438A

CN106250438A - Based on random walk model zero quotes article recommends method and system

Info

Publication number: CN106250438A
Application number: CN201610595617.1A
Authority: CN
Inventors: 吴峥; 邓丰雨; 宋振宇; 王乐群; 李世韬; 吴昊; 杨蕴意; 杨雨城; 何伟堃; 廖鸣; 廖一鸣; 齐雨; 赵璟浩; 傅洛伊; 王新兵
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-07-26
Filing date: 2016-07-26
Publication date: 2016-12-21
Anticipated expiration: 2036-07-26
Also published as: CN106250438B

Abstract

The invention provides a kind of based on random walk model zero and quote article recommendation method and system, including: step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, mechanism by random walk method, deliver the eigenvalue corresponding to the time；Step 2: set up order models, and choose the paper data construct training set after step 1 processes；Step 3: training set is ranked up by Weak Classifier；Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches, obtain optimal sequencing model；Step 5: needed for recommending user by order models, zero quotes document.Present invention uses brand-new paper sequence thought, so that the paper newly delivered can more efficiently be recommended, it is simple to user obtains maximally related new paper.

Description

Based on random walk model zero quotes article recommends method and system

Technical field

The present invention relates to recommended technology field, in particular it relates to a kind of based on random walk model zero quotes article and pushes away Recommend method and system.

Background technology

Scientific research activity is to improve social productive forces and the strategic support of overall national strength.Countries in the world are all paid much attention to for section The input of the activity of grinding.Science and technology research and development are put the core position in the national development overall situation by China, and scientific research is propped up by state revenue and expenditure Go out increase steadily.2012, the research and development of China put into funds (including industrial quarters and academia) and alreadys more than ten thousand Hundred million, it is 10298.4 hundred million yuan, reaches the level of medium-developed country.

One of scientific research activity the most direct output result is scientific paper.According to statistics, from 2004 to 2014 years, section of China The personnel that grind deliver technical paper 136.98 ten thousand the most altogether, occupy the second in the world.Paper is cited 1037.01 ten thousand times altogether, Occupy the world the 4th.Research practice shows, scientific paper is that scientific research personnel carries out scientific research activity or proceeds further investigation Very important information resources.But, in the face of the documents and materials that the information age is vast as the open sea, retrieve the most quickly and accurately To the academic resources that oneself is required, for scientific research personnel, it is strictly an extremely important and challenging work Make.The effectively sequence of scientific literature contributes to research worker and finds high-quality paper, and it was found that have the research of potential prospect Direction.Meanwhile, paper sorts in academic reward system and also plays an important role.

Traditional method often uses number of references as the standard of tolerance.But, the excessively unification of this standard, draw each Importance equality treat, have ignored high-quality quote and commonly quote between diversity.Paper is quoted by many researcheres Network is regarded as similar to web page interlinkage system, uses PageRank and HITS algorithm and provides the mark of every paper with for arranging Sequence.But in life, dynamic citation network is different from daily computer network, because the paper newly delivered is merely able to draw With the paper delivered before it, and the paper delivered before cannot quote the paper delivered later.Because this citation network The different characteristics innately having so that the paper relatively early delivered will more have superiority quoting aspect, and this also will be to common algorithm Accuracy produce tremendous influence.

People have had been made by many and have made great efforts to solve this problem, but more pay close attention to text analyzing, investigate whole Individual citation network, the paper newly delivered is not the most referenced by other papers, and this causes new paper obtaining in existing algorithm Divide on the low side.But, the direction representated by new paper general the most relatively before paper more forward position, the most worth for researcher Pay close attention to.So a brand-new sort algorithm, for scientific research personnel obtain resource requirement, in time grasp discipline development dynamically, carry Self capacity of scientific research high, and then strengthen the research strength of country, all there is considerable meaning.This is at big data age particularly Important, do not mean only that to easily facilitate and find direction, forward position, also imply that being substantially improved of efficiency.From the beginning of 2000, relevant The Quantity of Papers of paper sequence and commending system is in the trend risen year by year.According to incompletely statistics, the correlative theses of only 2013 Quantity has just reached more than 30 pieces.But, study in the sequence in the face of newly publishing thesis and be still within the starting stage.Annual number is with ten thousand The new paper publishing of meter, this field lacks sort algorithm accurately and researchers cannot be looked for rapidly from the data of magnanimity To the information meeting oneself needs.This also urges and makes us find a kind of brand-new algorithm, has these papers newly delivered The sequence of effect, predicts in following 50 to ten years with this, and which kind of paper will be more likely to become following focus and forward position Direction.ZeroRank algorithm is we have invented based on this.By author, meeting, mechanism is as the index of assessment, through to the past ten The data in remaining year are analyzed detection, finally achieve the effective prediction to paper focus, compensate for existing algorithm greatly and exist To the deficiency in terms of the assessment that newly publishes thesis.

Summary of the invention

For defect of the prior art, it is an object of the invention to provide a kind of zero citation based on random walk model Method and system recommended by chapter.

Based on random walk model zero provided according to the present invention quotes article and recommends method, comprises the steps:

Step 1: build academic network model, obtain the first authors of every paper, meeting or phase by random walk method Periodical, mechanism, deliver the eigenvalue corresponding to the time；

Step 2: set up order models, and choose the paper data construct training set after step 1 processes；

Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue enters The grader of row sequence；

Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,

If not mating, then according in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results with The weight of eigenvalue corresponding to this Weak Classifier, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3；

If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak The eigenvalue kind that grader is considered, returns and performs step 3；The most then obtain optimal sequencing model；

Step 5: needed for recommending user by optimal sequencing model, zero quotes document.

Preferably, described step 1 includes:

Step 1.1: all papers that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800 Resource；

Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network of four class limit collection Model；Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing Mechanism, paper publishing time；

Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set the time Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.

Preferably, described step 1.2 includes:

Step 1.2.1: set up academic network model, represents this science network with G:

G=(P ∪ A ∪ V ∪ F, E^PP∪E^PA∪E^PV∪E^PF)

Limit (p_v,p_u)∈E^PPRepresent that paper v quotes a paper u；

Limit (p_v,a_u)∈E^PARepresent that the first authors of paper v is u；

Limit (p_v,v_u)∈E^PVRepresent that paper v is published on meeting or periodical u；

Limit (p_v,f_u)∈E^PFRepresent paper v from mechanism u；

Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectively_vRepresent Paper v, p_uRepresent paper u, a_uRepresent author u, v_uRepresent meeting and periodical u, f_uOutgoing mechanism u, E^PP、E^PA、E^PV、E^PFTable respectively Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism；

Step 1.2.2: the paper in the academic network model of foundation, corresponding time relationship:

In academic network G, the paper publishing time is expressed as t₀＜ t₁＜ ... ＜ t_crt, wherein t₀Represent in network and deliver the earliest 1800 times of paper, t_crtRepresent current year；

Step 1.2.3: set up zero and quote paper data set Z:

Z={p_z∈P|t(p_z)=t_crt}

In formula: p_zRepresent the paper in set Z；t(p_z) represent paper deliver the time.

Preferably, described step 1.3 includes:

Step 1.3.1: setup parameter: ω₁,ω₂,ω₃,ω₄,ω₅,ρ,t_crt, wherein, parameter ω₁Represent remaining paper pair The contribution weight of score, ω₂Represent author's contribution weight to paper score, ω₃Represent meeting and the periodical including this paper Contribution weight to this paper score, ω₄Represent the mechanism delivering this paper contribution weight to paper score, ω₅Represent paper Delivering the time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, t_crtRepresent current year；

Step 1.3.2: initializing paper score value, computing formula is as follows:

In formula: p_iRepresenting any one paper, N represents field number, and i represents i-th article, and i span is 0～N；

Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is such as Under:

In formula: a_iRepresent author's i score, v_iRepresent meeting and periodical i score, f_iOutgoing mechanism i score, Ai represents author I, p_jRepresent that paper j, AVG () are that average calculates function；

Step 1.3.4: calculating the score of paper, computing formula is as follows:

\begin{matrix} p_{i}^{'} = ω_{1} \underset{P_{j} &Element; i n (P_{i})}{Σ} \frac{p_{j}}{| o u t (P_{j}) |} + ω_{2} \frac{1}{Z_{A}} {AVG}_{A_{j} &Element; n e i g h (P_{i})} (a_{j}) + ω_{3} \frac{1}{Z_{V}} {AVG}_{V_{j} &Element; n e i g h (P_{i})} (v_{j}) \\ + ω_{4} \frac{1}{Z_{F}} {AVG}_{F_{j} &Element; n e i g h (P_{i})} (f_{j}) + ω_{5} \frac{1}{Z_{T}} \exp (- ρ (t_{i} - t_{c r t})) \end{matrix};

In formula: p_i' represent any one paper i, p_jRepresent the paper j, a quoted by paper i_jRepresent that the author of paper i obtains Point, v_jRepresent that paper i's includes periodical or meeting score, f_jRepresent that paper i's delivers mechanism's score, t_iRepresent delivering of paper i Time, Z_A,Z_V,Z_F,Z_TFor normalization variable, ρ is time decay factor.

Preferably, described step 2 includes:

Step 2.1: selected t is from t₀To t_crtEach timing node in-1 moment, paper t being had occurred and that draws It is built into t fragment, the most altogether t by relation_crt-t₀Individual fragment is built into zero and quotes collection of thesis；

Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising t_crt-t₀Individual fragment data eigenvalue Training set.

Preferably, step 1 uses parallel method perform random walk method, comprise the steps:

Step A1: eigenvalue based on adjacent paper updates the first authors of follow-up paper, meeting or periodical, machine respectively The eigenvalue of structure；

Step A2: judge the institute in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information Eigenvalue after the eigenvalue having paper node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as phase Adjacent paper, returns and performs step A1；The most then enter step 2 to continue executing with.

Based on random walk model zero provided according to the present invention quotes article commending system, including:

Academic network model sets up module: is used for building academic network model, and obtains every opinion by random walk method Literary composition the first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time；

Training set builds module: sets up order models, and chooses the paper after academic network model sets up resume module Data construct training set；

Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider The grader that single eigenvalue is ranked up；

Order models builds module: judge the ranking results of Weak Classifier whether with the true ranking results phase of training set Join, obtain optimal sequencing model.

Preferably, described academic network model sets up module and includes:

Retrieval module: it is all that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800 Paper resource；

Model building module: by paper key message is extracted, set up and comprise four class point sets and of four class limit collection Art network model；Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, opinion Mechanism, paper publishing time delivered in literary composition；

Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set Paper in time period, as training set, is analyzed academic network model by random walk method, is obtained the first authors of paper, meeting View or periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.

Compared with prior art, the present invention has a following beneficial effect:

1, the present invention is based on the basic parameter in available data iterative processing developing algorithm, and the performance according to algorithm model is real The most automatically train evolution, under big data cases, realizing the parallel processing of algorithm, employ brand-new paper sequence thought, So that the paper newly delivered more efficiently is recommended, meet the Search Requirement of vast researcher.

2, the present invention efficiently solves zero and quotes article sequencing problem, calculates by combining random walk model and self adaptation Method, the information that analysis conventional sort algorithm does not accounts for, it is particularly suited for the future influence power of paper newly delivered and important The analysis of degree, obtains its priority ordering result.

Accompanying drawing explanation

By the detailed description non-limiting example made with reference to the following drawings of reading, the further feature of the present invention, Purpose and advantage will become more apparent upon:

Fig. 1 for the present invention provide based on random walk model zero quote article recommend method flow chart；

Fig. 2 is the data message schematic diagram of derivation time decay factor；

Fig. 3 is academic network model schematic diagram；

Fig. 4 be training set choose schematic diagram；

Fig. 5 is the operation time diagram of parallel algorithm.

Detailed description of the invention

Below in conjunction with specific embodiment, the present invention is described in detail.Following example will assist in the technology of this area Personnel are further appreciated by the present invention, but limit the present invention the most in any form.It should be pointed out that, the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, it is also possible to make some changes and improvements.These broadly fall into the present invention Protection domain.

Step S1: build academic network model, and the method using random walk, asks for writing the first work of every paper Person, receives the meeting of this paper or periodical and delivers the scoring of mechanism's these three eigenvalue of this paper and paper is commented Point；Now the symbol implementing to be directed to use with in step is explained, explain the situation and be shown in Table 1.

The definition explanation of table 1. symbol

Owing to the paper resource distribution on the Internet extremely disperses, and annual data volume more new capital is the hugest, institute It is broadly divided into two steps with the structure for academic network model, is made up of step S1.1 and step S1.2, including data Obtain and integrate；Hereafter the analysis of this model mainly have employed the way of random walk, specifically launching by step of this algorithm Rapid S1.3 completes.The following is the detailed step involved by step one:

Step S1.1: the academic spectrum data resource using Microsoft to provide, is obtained from all opinions delivered so far 1800 Literary composition resource,

Step S1.2: use the text analyzing instrument optimized, by the extraction to paper key message, sets up and comprises four classes Point set and the academic network model of four class limit collection.(accompanying drawing 3 is shown in by model)

Step A1: set up academic network model, represents this science network with G:

G=(P ∪ A ∪ V ∪ F, E^PP∪E^PA∪E^PV∪E^PF)

Limit (p_v,p_u)∈E^PPRepresent that paper v quotes a paper u；

Limit (p_v,a_u)∈E^PARepresent that the first authors of paper v is u；

Limit (p_v,f_u)∈E^PFRepresent paper v from mechanism u.

Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectively_vRepresent Paper v, p_uRepresent paper u, a_uRepresent author u, v_uRepresent meeting and periodical u, f_uOutgoing mechanism u, E^PP、E^PA、E^PV、E^PFTable respectively Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism.

Step A2: the paper in the academic network model of foundation, corresponding time relationship:

In academic network G, the paper publishing time is expressed as t₀＜ t₁＜ ... ＜ t_crt, wherein t₀Represent in network and deliver the earliest 1800 times of paper, t_crtRepresent current year.

Step A3: set up zero and quote paper data set Z:

Z={p_z∈P|t(p_z)=t_crt}

In formula: p_zRepresent the paper in set Z；t(p_z) represent paper deliver the time；t_crtRepresent current year.

Step S1.3: under each field, quotes paper set using the paper of 2011 as zero, by feature-value-score and Paper is marked.Owing to the scoring of paper, author, meeting and periodical, mechanism is inter-related, so we devise optimization Random walk method carries out characteristics extraction.

The step of feature-value-score and paper scoring is as follows:

Step B1: setup parameter: ω₁,ω₂,ω₃,ω₄,ω₅,ρ,t_crt, wherein, parameter ω₁Represent that remaining paper is to obtaining The contribution weight divided, ω₂Represent author's contribution weight to paper score, ω₃Represent and include the meeting of this paper and periodical to this The contribution weight of paper score, ω₄Represent the mechanism delivering this paper contribution weight to paper score, ω₅Represent paper publishing The time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, t_crtRepresent current year.

Step B2: initializing paper score value, computing formula is as follows:

In formula: p_iRepresenting any one paper, N represents paper number in field, and i represents i-th paper, and span is 0- N；

Step B3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:

Step B4: calculating the score of paper, computing formula is as follows:

The calculating of decay factor ρ:

Choose the paper of computer science, totally 8884763.

According to the time after every paper publishing and to this time the meansigma methods quoting number of times of paper, make by Number of references-time graph, as shown in Figure 2.Ignore the first two point, use this curve of exponential function matching to obtain optimal knot Really:

ce^-0.124t

Therefore, use ρ=-0.124 as time decay factor.

Process to INFORMATION OF INCOMPLETE point

Due in data set, author, meeting and periodical, mechanism information the most complete, so in order to solve this Individual problem, have employed the way of dummy node, if such as paper u does not has author information, it assumes that a virtual author, and false If this author has only delivered this paper u.

The specific implementation process of average function:

The thought realized with reference to Page Rank algorithm, calculates paper score

Set up figure G_P=(P, E^PP),G_A=(P ∪ A, E^PA),G_V=(P ∪ V, E^PV),G_F=(P ∪ F, E^PF), each contain Corresponding point set and Bian Ji；G_PRepresentation theory texts and pictures, G_ARepresent author's figure, G_VRepresent periodical and meeting figure, G_FOutgoing mechanism figure；

First calculating the score of author, meeting and periodical, mechanism, initial paper score is

A=A_AP{ calculates author score matrix a}

V=A_VP{ calculates meeting or periodical score matrix v}

F=A_FP{ calculates mechanism score matrix f}

A_A,A_V,A_FFor normalized adjacency matrix, have recorded author and paper, meeting and periodical and paper, mechanism respectively With the relation of paper, the then score of double counting paper:

For A_A,A_V,A_FTransposed matrix, have recorded paper and author respectively, paper and meeting and periodical, paper With the relation of mechanism, finally restrain, i.e. as p | p_k-p_k+1| ＜ 10^-9Time terminate calculate.

Step B5: arrange zero and quote paper set (as shown in Figure 4), using 2011 as current year, hid when the year before last The information in time after Fen, obtains zero and quotes paper set.

Step B6: characteristics extraction, was set as the paper of 1800 to 2010 years training set, and uses the random trip of optimization Walk method and training set is carried out characteristics extraction.

Step S2: use Ranking Algorithm, choose data construct training set, choose Weak Classifier and according to single weak point Existing order models revised by class device, constantly repeats aforesaid operations until obtaining optimal models；

For solving the problem of different characteristic value training order models in integrating step S1, traditional method is to select linear regression Or k nearest neighbor algorithm, but this type of method is for the problems referred to above inapplicable.Because to two papers from different time sections, Paper total citations was affected by time and historical factor, therefore was ranked up being irrational to these two papers.Cause This uses Ranking Algorithm, is analyzed respectively for the paper from different time sections, is embodied as step as follows:

Step S2.1: selected t is from t₀To t_crtEach timing node in-1 moment, paper t being had occurred and that draws It is built into t fragment, the most altogether t by relation_crt-t₀Individual fragment is built into " zero quotes collection of thesis ", due to t in experiment₀Non-key work With, by t₀It is entered as t_crt-10；

Step S2.2: use the characteristics extraction algorithm of step S1, " zero quotes collection of thesis " built for step S2.1, Obtain comprising t_crt-t₀The training set of individual fragment data eigenvalueWhereinGeneration respectively " author " in table t fragment, " meeting ", " mechanism " eigenvalue, y_tRepresent the actual of t fragment and quote ranking；

Step S2.3, for training set S produced in step S2.2, uses AdaRank algorithm to be iterated, in iteration Each wheel adds new Weak Classifier k_n, adjust the weight α of new grader_n, add current order models and obtain new model r_n, when When grader performance no longer promotes, iteration terminates, and obtains optimal sequencing model, and r represents the order models being initially added, by " making Person ", " meeting ", the weight composition of " mechanism " three partial feature value.

Step S3: parallel random walk part, is to invent to dissolve parallel on the basis of the random walk part of step S1 Certainly scheme, saving-algorithm runs the time, reduces space requirement；

Owing to the random walk part of step S1 has time complexity and the space complexity of O (M+N) of O (M), wherein M represents the quantity on limit in academic network model, and N represents the total quantity of paper in training set so that allow it transport on a single machine Row becomes unrealistic, so proposing the parallelization solution of a random walk.

Step S3.1:RankAVF is mainly for the author in academic network model, and meeting and three, mechanism are for paper Scoring has three factors of main impact to mark.Its process is exactly, and take steps the characteristics extraction algorithm in, is learning Art network model extracts author, meeting, the eigenvalue of agency node adjacent paper node respectively, is averaged and calculates it Eigenvalue, replaces original eigenvalue on node with the new feature value calculated, it is achieved the renewal of network, newer calculate Eigenvalue passes to adjacent paper node, completes the iteration of an AVF.Computing formula is as follows:

{ calculate author's score a} by paper score value p

{ pass through paper score value p and calculate meeting score v}

{ calculate mechanism's score f} by paper score value p

In formula: AVG represents mean function.

Step S3.2:RankP process be namely based on paper node diagnostic value that last iteration obtains and adjacent author, Meeting, the eigenvalue of the new paper node of eigenvalue calculation of agency node also update, and the new feature value calculated is passed to The follow-up paper node of this paper node and adjacent author, meeting, agency node.Computing formula is as follows:

In formula: AVG represents mean function, exp represents exponential function.

Step S3.3: two above for the algorithm of individual node in academic network model, the most parallel iteration, if The eigenvalue that all paper nodes calculate all is restrained, and algorithm just stops iteration, i.e. obtains for commenting of newly haveing a learned dissertation published Point.

Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes within the scope of the claims or revise, this not shadow Ring the flesh and blood of the present invention.In the case of not conflicting, the feature in embodiments herein and embodiment can any phase Combination mutually.

Claims

1. quote article recommendation method for one kind based on random walk model zero, it is characterised in that comprise the steps:

Step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, machine by random walk method Structure, deliver the eigenvalue corresponding to the time；

Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue is arranged The grader of sequence；

If not mating, then according to weak with this in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results The weight of eigenvalue corresponding to grader, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3；

If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak typing The eigenvalue kind that device is considered, returns and performs step 3；The most then obtain optimal sequencing model；

The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1 includes:

Step 1.1: all paper resources that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800；

Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network mould of four class limit collection Type；Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing machine Structure, paper publishing time；

Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set in the time period Paper as training set, analyze academic network model by random walk method, obtain the first authors of paper, meeting or phase Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.

The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1.2 includes:

G=(P ∪ A ∪ V ∪ F, E^PP∪E^PA∪E^PV∪E^PF)

Limit (p_v,p_u)∈E^PPRepresent that paper v quotes a paper u；

Limit (p_v,a_u)∈E^PARepresent that the first authors of paper v is u；

Limit (p_v,f_u)∈E^PFRepresent paper v from mechanism u；

Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectively_vRepresent paper v, p_uRepresent paper u, a_uRepresent author u, v_uRepresent meeting and periodical u, f_uOutgoing mechanism u, E^PP、E^PA、E^PV、E^PFRepresent paper respectively Between, paper and author, paper and meeting and periodical, the line of paper and mechanism；

In academic network G, the paper publishing time is expressed as t₀＜ t₁＜ ... ＜ t_crt, wherein t₀Represent the opinion delivered the earliest in network 1800 times of literary composition, t_crtRepresent current year；

Step 1.2.3: set up zero and quote paper data set Z:

Z={p_z∈P|t(p_z)=t_crt}

The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1.3 includes:

Step 1.3.1: setup parameter: ω₁,ω₂,ω₃,ω₄,ω₅,ρ,t_crt, wherein, parameter ω₁Represent that remaining paper is to score Contribution weight, ω₂Represent author's contribution weight to paper score, ω₃Represent and include the meeting of this paper and periodical to this opinion The contribution weight of literary composition score, ω₄Represent the mechanism delivering this paper contribution weight to paper score, ω₅Represent paper publishing year The part contribution weight to paper score, ρ represents the importance parameter of paper publishing time, t_crtRepresent current year；

Step 1.3.2: initializing paper score value, computing formula is as follows:

Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:

In formula: a_iRepresent author's i score, v_iRepresent meeting and periodical i score, f_iOutgoing mechanism i score, Ai represents author i, p_jTable Show that paper j, AVG () are that average calculates function；

Step 1.3.4: calculating the score of paper, computing formula is as follows:

In formula: p '_iRepresent any one paper i, p_jRepresent the paper j, a quoted by paper i_jRepresent author's score of paper i, v_j Represent that paper i's includes periodical or meeting score, f_jRepresent that paper i's delivers mechanism's score, t_iRepresent that paper i's delivers the time, Z_A,Z_V,Z_F,Z_TFor normalization variable, ρ is time decay factor.

5. quote article according to based on random walk model zero described in claim 1 or 4 and recommend method, it is characterised in that Described step 2 includes:

Step 2.1: selected t is from t₀To t_crtEach timing node in-1 moment, the paper adduction relationship that t is had occurred and that It is built into t fragment, the most altogether t_crt-t₀Individual fragment is built into zero and quotes collection of thesis；

Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising t_crt-t₀The training of individual fragment data eigenvalue Collection.

The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that step Use parallel method to perform random walk method in 1, comprise the steps:

Step A1: eigenvalue based on adjacent paper updates the spy of the first authors of follow-up paper, meeting or periodical, mechanism respectively Value indicative；

Step A2: judge all opinions in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information Eigenvalue after the eigenvalue of literary composition node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as adjacent opinion Literary composition, returns and performs step A1；The most then enter step 2 to continue executing with.

7. quote article commending system for one kind based on random walk model zero, it is characterised in that including:

Academic network model sets up module: is used for building academic network model, and obtains every paper by random walk method The first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time；

Training set builds module: sets up order models, and chooses the paper data after academic network model sets up resume module Build training set；

Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only to consider single The grader that eigenvalue is ranked up；

Order models builds module: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches, Obtain optimal sequencing model.

The most according to claim 7 based on random walk model zero quotes article commending system, it is characterised in that described Academic network model sets up module and includes:

Retrieval module: all papers that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800 Resource；

Model building module: by paper key message is extracted, set up and comprise four class point sets and the academic net of four class limit collection Network model；Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper are sent out Table mechanism, paper publishing time；

Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set the time Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.