CN106250438A - Based on random walk model zero quotes article recommends method and system - Google Patents

Based on random walk model zero quotes article recommends method and system Download PDF

Info

Publication number
CN106250438A
CN106250438A CN201610595617.1A CN201610595617A CN106250438A CN 106250438 A CN106250438 A CN 106250438A CN 201610595617 A CN201610595617 A CN 201610595617A CN 106250438 A CN106250438 A CN 106250438A
Authority
CN
China
Prior art keywords
paper
represent
score
meeting
periodical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610595617.1A
Other languages
Chinese (zh)
Other versions
CN106250438B (en
Inventor
吴峥
邓丰雨
宋振宇
王乐群
李世韬
吴昊
杨蕴意
杨雨城
何伟堃
廖鸣
廖一鸣
齐雨
赵璟浩
傅洛伊
王新兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610595617.1A priority Critical patent/CN106250438B/en
Publication of CN106250438A publication Critical patent/CN106250438A/en
Application granted granted Critical
Publication of CN106250438B publication Critical patent/CN106250438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention provides a kind of based on random walk model zero and quote article recommendation method and system, including: step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, mechanism by random walk method, deliver the eigenvalue corresponding to the time;Step 2: set up order models, and choose the paper data construct training set after step 1 processes;Step 3: training set is ranked up by Weak Classifier;Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches, obtain optimal sequencing model;Step 5: needed for recommending user by order models, zero quotes document.Present invention uses brand-new paper sequence thought, so that the paper newly delivered can more efficiently be recommended, it is simple to user obtains maximally related new paper.

Description

Based on random walk model zero quotes article recommends method and system
Technical field
The present invention relates to recommended technology field, in particular it relates to a kind of based on random walk model zero quotes article and pushes away Recommend method and system.
Background technology
Scientific research activity is to improve social productive forces and the strategic support of overall national strength.Countries in the world are all paid much attention to for section The input of the activity of grinding.Science and technology research and development are put the core position in the national development overall situation by China, and scientific research is propped up by state revenue and expenditure Go out increase steadily.2012, the research and development of China put into funds (including industrial quarters and academia) and alreadys more than ten thousand Hundred million, it is 10298.4 hundred million yuan, reaches the level of medium-developed country.
One of scientific research activity the most direct output result is scientific paper.According to statistics, from 2004 to 2014 years, section of China The personnel that grind deliver technical paper 136.98 ten thousand the most altogether, occupy the second in the world.Paper is cited 1037.01 ten thousand times altogether, Occupy the world the 4th.Research practice shows, scientific paper is that scientific research personnel carries out scientific research activity or proceeds further investigation Very important information resources.But, in the face of the documents and materials that the information age is vast as the open sea, retrieve the most quickly and accurately To the academic resources that oneself is required, for scientific research personnel, it is strictly an extremely important and challenging work Make.The effectively sequence of scientific literature contributes to research worker and finds high-quality paper, and it was found that have the research of potential prospect Direction.Meanwhile, paper sorts in academic reward system and also plays an important role.
Traditional method often uses number of references as the standard of tolerance.But, the excessively unification of this standard, draw each Importance equality treat, have ignored high-quality quote and commonly quote between diversity.Paper is quoted by many researcheres Network is regarded as similar to web page interlinkage system, uses PageRank and HITS algorithm and provides the mark of every paper with for arranging Sequence.But in life, dynamic citation network is different from daily computer network, because the paper newly delivered is merely able to draw With the paper delivered before it, and the paper delivered before cannot quote the paper delivered later.Because this citation network The different characteristics innately having so that the paper relatively early delivered will more have superiority quoting aspect, and this also will be to common algorithm Accuracy produce tremendous influence.
People have had been made by many and have made great efforts to solve this problem, but more pay close attention to text analyzing, investigate whole Individual citation network, the paper newly delivered is not the most referenced by other papers, and this causes new paper obtaining in existing algorithm Divide on the low side.But, the direction representated by new paper general the most relatively before paper more forward position, the most worth for researcher Pay close attention to.So a brand-new sort algorithm, for scientific research personnel obtain resource requirement, in time grasp discipline development dynamically, carry Self capacity of scientific research high, and then strengthen the research strength of country, all there is considerable meaning.This is at big data age particularly Important, do not mean only that to easily facilitate and find direction, forward position, also imply that being substantially improved of efficiency.From the beginning of 2000, relevant The Quantity of Papers of paper sequence and commending system is in the trend risen year by year.According to incompletely statistics, the correlative theses of only 2013 Quantity has just reached more than 30 pieces.But, study in the sequence in the face of newly publishing thesis and be still within the starting stage.Annual number is with ten thousand The new paper publishing of meter, this field lacks sort algorithm accurately and researchers cannot be looked for rapidly from the data of magnanimity To the information meeting oneself needs.This also urges and makes us find a kind of brand-new algorithm, has these papers newly delivered The sequence of effect, predicts in following 50 to ten years with this, and which kind of paper will be more likely to become following focus and forward position Direction.ZeroRank algorithm is we have invented based on this.By author, meeting, mechanism is as the index of assessment, through to the past ten The data in remaining year are analyzed detection, finally achieve the effective prediction to paper focus, compensate for existing algorithm greatly and exist To the deficiency in terms of the assessment that newly publishes thesis.
Summary of the invention
For defect of the prior art, it is an object of the invention to provide a kind of zero citation based on random walk model Method and system recommended by chapter.
Based on random walk model zero provided according to the present invention quotes article and recommends method, comprises the steps:
Step 1: build academic network model, obtain the first authors of every paper, meeting or phase by random walk method Periodical, mechanism, deliver the eigenvalue corresponding to the time;
Step 2: set up order models, and choose the paper data construct training set after step 1 processes;
Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue enters The grader of row sequence;
Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,
If not mating, then according in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results with The weight of eigenvalue corresponding to this Weak Classifier, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3;
If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak The eigenvalue kind that grader is considered, returns and performs step 3;The most then obtain optimal sequencing model;
Step 5: needed for recommending user by optimal sequencing model, zero quotes document.
Preferably, described step 1 includes:
Step 1.1: all papers that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800 Resource;
Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network of four class limit collection Model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing Mechanism, paper publishing time;
Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set the time Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
Preferably, described step 1.2 includes:
Step 1.2.1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u;
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent Paper v, puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFTable respectively Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism;
Step 1.2.2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent in network and deliver the earliest 1800 times of paper, tcrtRepresent current year;
Step 1.2.3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time.
Preferably, described step 1.3 includes:
Step 1.3.1: setup parameter: ω12345,ρ,tcrt, wherein, parameter ω1Represent remaining paper pair The contribution weight of score, ω2Represent author's contribution weight to paper score, ω3Represent meeting and the periodical including this paper Contribution weight to this paper score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper Delivering the time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year;
Step 1.3.2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents field number, and i represents i-th article, and i span is 0~N;
Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is such as Under:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author I, pjRepresent that paper j, AVG () are that average calculates function;
Step 1.3.4: calculating the score of paper, computing formula is as follows:
p i ′ = ω 1 Σ P j ∈ i n ( P i ) p j | o u t ( P j ) | + ω 2 1 Z A AVG A j ∈ n e i g h ( P i ) ( a j ) + ω 3 1 Z V AVG V j ∈ n e i g h ( P i ) ( v j ) + ω 4 1 Z F AVG F j ∈ n e i g h ( P i ) ( f j ) + ω 5 1 Z T exp ( - ρ ( t i - t c r t ) ) ;
In formula: pi' represent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent that the author of paper i obtains Point, vjRepresent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent delivering of paper i Time, ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
Preferably, described step 2 includes:
Step 2.1: selected t is from t0To tcrtEach timing node in-1 moment, paper t being had occurred and that draws It is built into t fragment, the most altogether t by relationcrt-t0Individual fragment is built into zero and quotes collection of thesis;
Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising tcrt-t0Individual fragment data eigenvalue Training set.
Preferably, step 1 uses parallel method perform random walk method, comprise the steps:
Step A1: eigenvalue based on adjacent paper updates the first authors of follow-up paper, meeting or periodical, machine respectively The eigenvalue of structure;
Step A2: judge the institute in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information Eigenvalue after the eigenvalue having paper node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as phase Adjacent paper, returns and performs step A1;The most then enter step 2 to continue executing with.
Based on random walk model zero provided according to the present invention quotes article commending system, including:
Academic network model sets up module: is used for building academic network model, and obtains every opinion by random walk method Literary composition the first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time;
Training set builds module: sets up order models, and chooses the paper after academic network model sets up resume module Data construct training set;
Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider The grader that single eigenvalue is ranked up;
Order models builds module: judge the ranking results of Weak Classifier whether with the true ranking results phase of training set Join, obtain optimal sequencing model.
Preferably, described academic network model sets up module and includes:
Retrieval module: it is all that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800 Paper resource;
Model building module: by paper key message is extracted, set up and comprise four class point sets and of four class limit collection Art network model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, opinion Mechanism, paper publishing time delivered in literary composition;
Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set Paper in time period, as training set, is analyzed academic network model by random walk method, is obtained the first authors of paper, meeting View or periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
Compared with prior art, the present invention has a following beneficial effect:
1, the present invention is based on the basic parameter in available data iterative processing developing algorithm, and the performance according to algorithm model is real The most automatically train evolution, under big data cases, realizing the parallel processing of algorithm, employ brand-new paper sequence thought, So that the paper newly delivered more efficiently is recommended, meet the Search Requirement of vast researcher.
2, the present invention efficiently solves zero and quotes article sequencing problem, calculates by combining random walk model and self adaptation Method, the information that analysis conventional sort algorithm does not accounts for, it is particularly suited for the future influence power of paper newly delivered and important The analysis of degree, obtains its priority ordering result.
Accompanying drawing explanation
By the detailed description non-limiting example made with reference to the following drawings of reading, the further feature of the present invention, Purpose and advantage will become more apparent upon:
Fig. 1 for the present invention provide based on random walk model zero quote article recommend method flow chart;
Fig. 2 is the data message schematic diagram of derivation time decay factor;
Fig. 3 is academic network model schematic diagram;
Fig. 4 be training set choose schematic diagram;
Fig. 5 is the operation time diagram of parallel algorithm.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in detail.Following example will assist in the technology of this area Personnel are further appreciated by the present invention, but limit the present invention the most in any form.It should be pointed out that, the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, it is also possible to make some changes and improvements.These broadly fall into the present invention Protection domain.
Based on random walk model zero provided according to the present invention quotes article and recommends method, comprises the steps:
Step S1: build academic network model, and the method using random walk, asks for writing the first work of every paper Person, receives the meeting of this paper or periodical and delivers the scoring of mechanism's these three eigenvalue of this paper and paper is commented Point;Now the symbol implementing to be directed to use with in step is explained, explain the situation and be shown in Table 1.
The definition explanation of table 1. symbol
Owing to the paper resource distribution on the Internet extremely disperses, and annual data volume more new capital is the hugest, institute It is broadly divided into two steps with the structure for academic network model, is made up of step S1.1 and step S1.2, including data Obtain and integrate;Hereafter the analysis of this model mainly have employed the way of random walk, specifically launching by step of this algorithm Rapid S1.3 completes.The following is the detailed step involved by step one:
Step S1.1: the academic spectrum data resource using Microsoft to provide, is obtained from all opinions delivered so far 1800 Literary composition resource,
Step S1.2: use the text analyzing instrument optimized, by the extraction to paper key message, sets up and comprises four classes Point set and the academic network model of four class limit collection.(accompanying drawing 3 is shown in by model)
Step A1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u.
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent Paper v, puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFTable respectively Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism.
Step A2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent in network and deliver the earliest 1800 times of paper, tcrtRepresent current year.
Step A3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time;tcrtRepresent current year.
Step S1.3: under each field, quotes paper set using the paper of 2011 as zero, by feature-value-score and Paper is marked.Owing to the scoring of paper, author, meeting and periodical, mechanism is inter-related, so we devise optimization Random walk method carries out characteristics extraction.
The step of feature-value-score and paper scoring is as follows:
Step B1: setup parameter: ω12345,ρ,tcrt, wherein, parameter ω1Represent that remaining paper is to obtaining The contribution weight divided, ω2Represent author's contribution weight to paper score, ω3Represent and include the meeting of this paper and periodical to this The contribution weight of paper score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper publishing The time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year.
Step B2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents paper number in field, and i represents i-th paper, and span is 0- N;
Step B3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author I, pjRepresent that paper j, AVG () are that average calculates function;
Step B4: calculating the score of paper, computing formula is as follows:
In formula: pi' represent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent that the author of paper i obtains Point, vjRepresent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent delivering of paper i Time, ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
The calculating of decay factor ρ:
Choose the paper of computer science, totally 8884763.
According to the time after every paper publishing and to this time the meansigma methods quoting number of times of paper, make by Number of references-time graph, as shown in Figure 2.Ignore the first two point, use this curve of exponential function matching to obtain optimal knot Really:
ce-0.124t
Therefore, use ρ=-0.124 as time decay factor.
Process to INFORMATION OF INCOMPLETE point
Due in data set, author, meeting and periodical, mechanism information the most complete, so in order to solve this Individual problem, have employed the way of dummy node, if such as paper u does not has author information, it assumes that a virtual author, and false If this author has only delivered this paper u.
The specific implementation process of average function:
The thought realized with reference to Page Rank algorithm, calculates paper score
Set up figure GP=(P, EPP),GA=(P ∪ A, EPA),GV=(P ∪ V, EPV),GF=(P ∪ F, EPF), each contain Corresponding point set and Bian Ji;GPRepresentation theory texts and pictures, GARepresent author's figure, GVRepresent periodical and meeting figure, GFOutgoing mechanism figure;
First calculating the score of author, meeting and periodical, mechanism, initial paper score is
A=AAP{ calculates author score matrix a}
V=AVP{ calculates meeting or periodical score matrix v}
F=AFP{ calculates mechanism score matrix f}
AA,AV,AFFor normalized adjacency matrix, have recorded author and paper, meeting and periodical and paper, mechanism respectively With the relation of paper, the then score of double counting paper:
For AA,AV,AFTransposed matrix, have recorded paper and author respectively, paper and meeting and periodical, paper With the relation of mechanism, finally restrain, i.e. as p | pk-pk+1| < 10-9Time terminate calculate.
Step B5: arrange zero and quote paper set (as shown in Figure 4), using 2011 as current year, hid when the year before last The information in time after Fen, obtains zero and quotes paper set.
Step B6: characteristics extraction, was set as the paper of 1800 to 2010 years training set, and uses the random trip of optimization Walk method and training set is carried out characteristics extraction.
Step S2: use Ranking Algorithm, choose data construct training set, choose Weak Classifier and according to single weak point Existing order models revised by class device, constantly repeats aforesaid operations until obtaining optimal models;
For solving the problem of different characteristic value training order models in integrating step S1, traditional method is to select linear regression Or k nearest neighbor algorithm, but this type of method is for the problems referred to above inapplicable.Because to two papers from different time sections, Paper total citations was affected by time and historical factor, therefore was ranked up being irrational to these two papers.Cause This uses Ranking Algorithm, is analyzed respectively for the paper from different time sections, is embodied as step as follows:
Step S2.1: selected t is from t0To tcrtEach timing node in-1 moment, paper t being had occurred and that draws It is built into t fragment, the most altogether t by relationcrt-t0Individual fragment is built into " zero quotes collection of thesis ", due to t in experiment0Non-key work With, by t0It is entered as tcrt-10;
Step S2.2: use the characteristics extraction algorithm of step S1, " zero quotes collection of thesis " built for step S2.1, Obtain comprising tcrt-t0The training set of individual fragment data eigenvalueWhereinGeneration respectively " author " in table t fragment, " meeting ", " mechanism " eigenvalue, ytRepresent the actual of t fragment and quote ranking;
Step S2.3, for training set S produced in step S2.2, uses AdaRank algorithm to be iterated, in iteration Each wheel adds new Weak Classifier kn, adjust the weight α of new gradern, add current order models and obtain new model rn, when When grader performance no longer promotes, iteration terminates, and obtains optimal sequencing model, and r represents the order models being initially added, by " making Person ", " meeting ", the weight composition of " mechanism " three partial feature value.
Step S3: parallel random walk part, is to invent to dissolve parallel on the basis of the random walk part of step S1 Certainly scheme, saving-algorithm runs the time, reduces space requirement;
Owing to the random walk part of step S1 has time complexity and the space complexity of O (M+N) of O (M), wherein M represents the quantity on limit in academic network model, and N represents the total quantity of paper in training set so that allow it transport on a single machine Row becomes unrealistic, so proposing the parallelization solution of a random walk.
Step S3.1:RankAVF is mainly for the author in academic network model, and meeting and three, mechanism are for paper Scoring has three factors of main impact to mark.Its process is exactly, and take steps the characteristics extraction algorithm in, is learning Art network model extracts author, meeting, the eigenvalue of agency node adjacent paper node respectively, is averaged and calculates it Eigenvalue, replaces original eigenvalue on node with the new feature value calculated, it is achieved the renewal of network, newer calculate Eigenvalue passes to adjacent paper node, completes the iteration of an AVF.Computing formula is as follows:
{ calculate author's score a} by paper score value p
{ pass through paper score value p and calculate meeting score v}
{ calculate mechanism's score f} by paper score value p
In formula: AVG represents mean function.
Step S3.2:RankP process be namely based on paper node diagnostic value that last iteration obtains and adjacent author, Meeting, the eigenvalue of the new paper node of eigenvalue calculation of agency node also update, and the new feature value calculated is passed to The follow-up paper node of this paper node and adjacent author, meeting, agency node.Computing formula is as follows:
In formula: AVG represents mean function, exp represents exponential function.
Step S3.3: two above for the algorithm of individual node in academic network model, the most parallel iteration, if The eigenvalue that all paper nodes calculate all is restrained, and algorithm just stops iteration, i.e. obtains for commenting of newly haveing a learned dissertation published Point.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes within the scope of the claims or revise, this not shadow Ring the flesh and blood of the present invention.In the case of not conflicting, the feature in embodiments herein and embodiment can any phase Combination mutually.

Claims (8)

1. quote article recommendation method for one kind based on random walk model zero, it is characterised in that comprise the steps:
Step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, machine by random walk method Structure, deliver the eigenvalue corresponding to the time;
Step 2: set up order models, and choose the paper data construct training set after step 1 processes;
Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue is arranged The grader of sequence;
Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,
If not mating, then according to weak with this in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results The weight of eigenvalue corresponding to grader, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3;
If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak typing The eigenvalue kind that device is considered, returns and performs step 3;The most then obtain optimal sequencing model;
Step 5: needed for recommending user by optimal sequencing model, zero quotes document.
The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1 includes:
Step 1.1: all paper resources that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800;
Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network mould of four class limit collection Type;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing machine Structure, paper publishing time;
Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set in the time period Paper as training set, analyze academic network model by random walk method, obtain the first authors of paper, meeting or phase Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1.2 includes:
Step 1.2.1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u;
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent paper v, puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFRepresent paper respectively Between, paper and author, paper and meeting and periodical, the line of paper and mechanism;
Step 1.2.2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent the opinion delivered the earliest in network 1800 times of literary composition, tcrtRepresent current year;
Step 1.2.3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time.
The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described Step 1.3 includes:
Step 1.3.1: setup parameter: ω12345,ρ,tcrt, wherein, parameter ω1Represent that remaining paper is to score Contribution weight, ω2Represent author's contribution weight to paper score, ω3Represent and include the meeting of this paper and periodical to this opinion The contribution weight of literary composition score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper publishing year The part contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year;
Step 1.3.2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents field number, and i represents i-th article, and i span is 0~N;
Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author i, pjTable Show that paper j, AVG () are that average calculates function;
Step 1.3.4: calculating the score of paper, computing formula is as follows:
In formula: p 'iRepresent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent author's score of paper i, vj Represent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent that paper i's delivers the time, ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
5. quote article according to based on random walk model zero described in claim 1 or 4 and recommend method, it is characterised in that Described step 2 includes:
Step 2.1: selected t is from t0To tcrtEach timing node in-1 moment, the paper adduction relationship that t is had occurred and that It is built into t fragment, the most altogether tcrt-t0Individual fragment is built into zero and quotes collection of thesis;
Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising tcrt-t0The training of individual fragment data eigenvalue Collection.
The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that step Use parallel method to perform random walk method in 1, comprise the steps:
Step A1: eigenvalue based on adjacent paper updates the spy of the first authors of follow-up paper, meeting or periodical, mechanism respectively Value indicative;
Step A2: judge all opinions in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information Eigenvalue after the eigenvalue of literary composition node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as adjacent opinion Literary composition, returns and performs step A1;The most then enter step 2 to continue executing with.
7. quote article commending system for one kind based on random walk model zero, it is characterised in that including:
Academic network model sets up module: is used for building academic network model, and obtains every paper by random walk method The first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time;
Training set builds module: sets up order models, and chooses the paper data after academic network model sets up resume module Build training set;
Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only to consider single The grader that eigenvalue is ranked up;
Order models builds module: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches, Obtain optimal sequencing model.
The most according to claim 7 based on random walk model zero quotes article commending system, it is characterised in that described Academic network model sets up module and includes:
Retrieval module: all papers that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800 Resource;
Model building module: by paper key message is extracted, set up and comprise four class point sets and the academic net of four class limit collection Network model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper are sent out Table mechanism, paper publishing time;
Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set the time Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
CN201610595617.1A 2016-07-26 2016-07-26 Zero-citation article recommendation method and system based on random walk model Active CN106250438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610595617.1A CN106250438B (en) 2016-07-26 2016-07-26 Zero-citation article recommendation method and system based on random walk model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610595617.1A CN106250438B (en) 2016-07-26 2016-07-26 Zero-citation article recommendation method and system based on random walk model

Publications (2)

Publication Number Publication Date
CN106250438A true CN106250438A (en) 2016-12-21
CN106250438B CN106250438B (en) 2020-07-14

Family

ID=57603682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610595617.1A Active CN106250438B (en) 2016-07-26 2016-07-26 Zero-citation article recommendation method and system based on random walk model

Country Status (1)

Country Link
CN (1) CN106250438B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038211A (en) * 2017-02-28 2017-08-11 大连理工大学 A kind of paper impact factor appraisal procedure based on quantum migration
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method
CN108228728A (en) * 2017-12-11 2018-06-29 北京航空航天大学 A kind of paper network node of parametrization represents learning method
CN108614867A (en) * 2018-04-12 2018-10-02 科技部科技评估中心 Frontline technology sex index computational methods based on scientific paper and system
CN108764943A (en) * 2018-05-30 2018-11-06 公安部第三研究所 Suspicious user method for monitoring and analyzing based on funds transaction network
CN109299379A (en) * 2018-10-30 2019-02-01 东软集团股份有限公司 Article recommended method, device, storage medium and electronic equipment
CN109345416A (en) * 2018-09-12 2019-02-15 连尚(新昌)网络科技有限公司 It is a kind of for recording the method and apparatus of the adduction relationship between works
CN109726297A (en) * 2018-12-28 2019-05-07 沈阳航空航天大学 A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy
CN110209840A (en) * 2019-06-06 2019-09-06 北京百奥知信息科技有限公司 A kind of paper impact factor appraisal procedure based on multidimensional characteristic
CN110254438A (en) * 2018-03-12 2019-09-20 松下知识产权经营株式会社 Information processing unit and program recorded medium
CN111198897A (en) * 2018-11-19 2020-05-26 中国农业大学 Scientific research hotspot topic analysis method and device and electronic equipment
CN111723578A (en) * 2020-06-09 2020-09-29 平安科技(深圳)有限公司 Hot spot prediction method and device based on random walk model and computer equipment
CN113392319A (en) * 2021-05-13 2021-09-14 宁波大学 Academic paper recommendation method based on network representation and auxiliary information embedding

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298579A (en) * 2010-06-22 2011-12-28 北京大学 Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals
CN102521337A (en) * 2011-12-08 2012-06-27 华中科技大学 Academic community system based on massive knowledge network
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
US20150111190A1 (en) * 2013-10-22 2015-04-23 Steven Michael VITTORIO Educational content search and results
CN104636426A (en) * 2014-12-22 2015-05-20 河海大学 Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions
CN105550216A (en) * 2015-12-03 2016-05-04 百度在线网络技术(北京)有限公司 Searching method and device of academic research information and excavating method and device of academic research information
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298579A (en) * 2010-06-22 2011-12-28 北京大学 Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals
CN102521337A (en) * 2011-12-08 2012-06-27 华中科技大学 Academic community system based on massive knowledge network
CN103440329A (en) * 2013-09-04 2013-12-11 北京邮电大学 Authoritative author and high-quality paper recommending system and recommending method
US20150111190A1 (en) * 2013-10-22 2015-04-23 Steven Michael VITTORIO Educational content search and results
CN104636426A (en) * 2014-12-22 2015-05-20 河海大学 Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions
CN105550216A (en) * 2015-12-03 2016-05-04 百度在线网络技术(北京)有限公司 Searching method and device of academic research information and excavating method and device of academic research information
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUJING WANG 等: "Ranking Scientific Articles by Exploiting Citations,Authors,Journals,and Time Information", 《PROCEEDINGS OF TWENTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
秦臻: "学术社会网络建模和学术资源推荐方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038211A (en) * 2017-02-28 2017-08-11 大连理工大学 A kind of paper impact factor appraisal procedure based on quantum migration
CN108132961B (en) * 2017-11-06 2020-06-30 浙江工业大学 Reference recommendation method based on citation prediction
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method
CN108228728A (en) * 2017-12-11 2018-06-29 北京航空航天大学 A kind of paper network node of parametrization represents learning method
CN108228728B (en) * 2017-12-11 2020-07-17 北京航空航天大学 Parameterized thesis network node representation learning method
CN110254438A (en) * 2018-03-12 2019-09-20 松下知识产权经营株式会社 Information processing unit and program recorded medium
CN108614867A (en) * 2018-04-12 2018-10-02 科技部科技评估中心 Frontline technology sex index computational methods based on scientific paper and system
CN108614867B (en) * 2018-04-12 2022-03-15 科技部科技评估中心 Academic paper-based technology frontier index calculation method and system
CN108764943A (en) * 2018-05-30 2018-11-06 公安部第三研究所 Suspicious user method for monitoring and analyzing based on funds transaction network
CN108764943B (en) * 2018-05-30 2021-09-24 公安部第三研究所 Suspicious user monitoring and analyzing method based on fund transaction network
CN109345416A (en) * 2018-09-12 2019-02-15 连尚(新昌)网络科技有限公司 It is a kind of for recording the method and apparatus of the adduction relationship between works
CN109299379A (en) * 2018-10-30 2019-02-01 东软集团股份有限公司 Article recommended method, device, storage medium and electronic equipment
CN111198897A (en) * 2018-11-19 2020-05-26 中国农业大学 Scientific research hotspot topic analysis method and device and electronic equipment
CN111198897B (en) * 2018-11-19 2023-06-13 中国农业大学 Scientific research hotspot topic analysis method and device and electronic equipment
CN109726297A (en) * 2018-12-28 2019-05-07 沈阳航空航天大学 A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy
CN109726297B (en) * 2018-12-28 2022-12-23 沈阳航空航天大学 Bipartite network node prediction algorithm based on mutual exclusion strategy
CN110209840A (en) * 2019-06-06 2019-09-06 北京百奥知信息科技有限公司 A kind of paper impact factor appraisal procedure based on multidimensional characteristic
CN111723578A (en) * 2020-06-09 2020-09-29 平安科技(深圳)有限公司 Hot spot prediction method and device based on random walk model and computer equipment
CN111723578B (en) * 2020-06-09 2023-11-17 平安科技(深圳)有限公司 Hot spot prediction method and device based on random walk model and computer equipment
CN113392319A (en) * 2021-05-13 2021-09-14 宁波大学 Academic paper recommendation method based on network representation and auxiliary information embedding

Also Published As

Publication number Publication date
CN106250438B (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN106250438A (en) Based on random walk model zero quotes article recommends method and system
Olczyk A systematic retrieval of international competitiveness literature: a bibliometric study
CN103631859B (en) Intelligent review expert recommending method for science and technology projects
Fahimnia et al. Green supply chain management: A review and bibliometric analysis
CN105589948B (en) A kind of reference citation network visualization and literature recommendation method and system
Eliacik et al. Influential user weighted sentiment analysis on topic based microblogging community
Zhou et al. Classifying the political leaning of news articles and users from user votes
CN103729432B (en) Method for analyzing and sequencing academic influence of theme literature in citation database
Zhou et al. A novel Data Envelopment Analysis model for evaluating industrial production and environmental management system
CN108614867B (en) Academic paper-based technology frontier index calculation method and system
CN106682172A (en) Keyword-based document research hotspot recommending method
CN105117422A (en) Intelligent social network recommender system
Chen et al. Spreadsheet property detection with rule-assisted active learning
CN103617481B (en) A kind of domain knowledge extraction and supplying system and the method for Process-Oriented
CN104636426A (en) Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions
CN105631018A (en) Article feature extraction method based on topic model
Tuesta et al. Analysis of an advisor–advisee relationship: An exploratory study of the area of exact and earth sciences in Brazil
Semerikov et al. Automation of the Export Data from Open Journal Systems to the Russian Science Citation Index
Li et al. A hybrid model for experts finding in community question answering
Tayal et al. Personalized ranking of products using aspect-based sentiment analysis and Plithogenic sets
CN104572915B (en) One kind is based on the enhanced customer incident relatedness computation method of content environment
Tang et al. Internationalizing AI: evolution and impact of distance factors
Liu et al. How to choose appropriate experts for peer review: An intelligent recommendation method in a big data context
Chen et al. Identifying the key success factors of movie projects in crowdfunding
Wei et al. Using network flows to identify users sharing extremist content on social media

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant