CN106250438A - Based on random walk model zero quotes article recommends method and system - Google Patents
Based on random walk model zero quotes article recommends method and system Download PDFInfo
- Publication number
- CN106250438A CN106250438A CN201610595617.1A CN201610595617A CN106250438A CN 106250438 A CN106250438 A CN 106250438A CN 201610595617 A CN201610595617 A CN 201610595617A CN 106250438 A CN106250438 A CN 106250438A
- Authority
- CN
- China
- Prior art keywords
- paper
- represent
- score
- meeting
- periodical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Abstract
The invention provides a kind of based on random walk model zero and quote article recommendation method and system, including: step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, mechanism by random walk method, deliver the eigenvalue corresponding to the time;Step 2: set up order models, and choose the paper data construct training set after step 1 processes;Step 3: training set is ranked up by Weak Classifier;Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches, obtain optimal sequencing model;Step 5: needed for recommending user by order models, zero quotes document.Present invention uses brand-new paper sequence thought, so that the paper newly delivered can more efficiently be recommended, it is simple to user obtains maximally related new paper.
Description
Technical field
The present invention relates to recommended technology field, in particular it relates to a kind of based on random walk model zero quotes article and pushes away
Recommend method and system.
Background technology
Scientific research activity is to improve social productive forces and the strategic support of overall national strength.Countries in the world are all paid much attention to for section
The input of the activity of grinding.Science and technology research and development are put the core position in the national development overall situation by China, and scientific research is propped up by state revenue and expenditure
Go out increase steadily.2012, the research and development of China put into funds (including industrial quarters and academia) and alreadys more than ten thousand
Hundred million, it is 10298.4 hundred million yuan, reaches the level of medium-developed country.
One of scientific research activity the most direct output result is scientific paper.According to statistics, from 2004 to 2014 years, section of China
The personnel that grind deliver technical paper 136.98 ten thousand the most altogether, occupy the second in the world.Paper is cited 1037.01 ten thousand times altogether,
Occupy the world the 4th.Research practice shows, scientific paper is that scientific research personnel carries out scientific research activity or proceeds further investigation
Very important information resources.But, in the face of the documents and materials that the information age is vast as the open sea, retrieve the most quickly and accurately
To the academic resources that oneself is required, for scientific research personnel, it is strictly an extremely important and challenging work
Make.The effectively sequence of scientific literature contributes to research worker and finds high-quality paper, and it was found that have the research of potential prospect
Direction.Meanwhile, paper sorts in academic reward system and also plays an important role.
Traditional method often uses number of references as the standard of tolerance.But, the excessively unification of this standard, draw each
Importance equality treat, have ignored high-quality quote and commonly quote between diversity.Paper is quoted by many researcheres
Network is regarded as similar to web page interlinkage system, uses PageRank and HITS algorithm and provides the mark of every paper with for arranging
Sequence.But in life, dynamic citation network is different from daily computer network, because the paper newly delivered is merely able to draw
With the paper delivered before it, and the paper delivered before cannot quote the paper delivered later.Because this citation network
The different characteristics innately having so that the paper relatively early delivered will more have superiority quoting aspect, and this also will be to common algorithm
Accuracy produce tremendous influence.
People have had been made by many and have made great efforts to solve this problem, but more pay close attention to text analyzing, investigate whole
Individual citation network, the paper newly delivered is not the most referenced by other papers, and this causes new paper obtaining in existing algorithm
Divide on the low side.But, the direction representated by new paper general the most relatively before paper more forward position, the most worth for researcher
Pay close attention to.So a brand-new sort algorithm, for scientific research personnel obtain resource requirement, in time grasp discipline development dynamically, carry
Self capacity of scientific research high, and then strengthen the research strength of country, all there is considerable meaning.This is at big data age particularly
Important, do not mean only that to easily facilitate and find direction, forward position, also imply that being substantially improved of efficiency.From the beginning of 2000, relevant
The Quantity of Papers of paper sequence and commending system is in the trend risen year by year.According to incompletely statistics, the correlative theses of only 2013
Quantity has just reached more than 30 pieces.But, study in the sequence in the face of newly publishing thesis and be still within the starting stage.Annual number is with ten thousand
The new paper publishing of meter, this field lacks sort algorithm accurately and researchers cannot be looked for rapidly from the data of magnanimity
To the information meeting oneself needs.This also urges and makes us find a kind of brand-new algorithm, has these papers newly delivered
The sequence of effect, predicts in following 50 to ten years with this, and which kind of paper will be more likely to become following focus and forward position
Direction.ZeroRank algorithm is we have invented based on this.By author, meeting, mechanism is as the index of assessment, through to the past ten
The data in remaining year are analyzed detection, finally achieve the effective prediction to paper focus, compensate for existing algorithm greatly and exist
To the deficiency in terms of the assessment that newly publishes thesis.
Summary of the invention
For defect of the prior art, it is an object of the invention to provide a kind of zero citation based on random walk model
Method and system recommended by chapter.
Based on random walk model zero provided according to the present invention quotes article and recommends method, comprises the steps:
Step 1: build academic network model, obtain the first authors of every paper, meeting or phase by random walk method
Periodical, mechanism, deliver the eigenvalue corresponding to the time;
Step 2: set up order models, and choose the paper data construct training set after step 1 processes;
Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue enters
The grader of row sequence;
Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,
If not mating, then according in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results with
The weight of eigenvalue corresponding to this Weak Classifier, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3;
If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak
The eigenvalue kind that grader is considered, returns and performs step 3;The most then obtain optimal sequencing model;
Step 5: needed for recommending user by optimal sequencing model, zero quotes document.
Preferably, described step 1 includes:
Step 1.1: all papers that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800
Resource;
Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network of four class limit collection
Model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing
Mechanism, paper publishing time;
Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set the time
Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or
Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
Preferably, described step 1.2 includes:
Step 1.2.1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u;
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent
Paper v, puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFTable respectively
Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism;
Step 1.2.2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent in network and deliver the earliest
1800 times of paper, tcrtRepresent current year;
Step 1.2.3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time.
Preferably, described step 1.3 includes:
Step 1.3.1: setup parameter: ω1,ω2,ω3,ω4,ω5,ρ,tcrt, wherein, parameter ω1Represent remaining paper pair
The contribution weight of score, ω2Represent author's contribution weight to paper score, ω3Represent meeting and the periodical including this paper
Contribution weight to this paper score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper
Delivering the time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year;
Step 1.3.2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents field number, and i represents i-th article, and i span is 0~N;
Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is such as
Under:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author
I, pjRepresent that paper j, AVG () are that average calculates function;
Step 1.3.4: calculating the score of paper, computing formula is as follows:
In formula: pi' represent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent that the author of paper i obtains
Point, vjRepresent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent delivering of paper i
Time, ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
Preferably, described step 2 includes:
Step 2.1: selected t is from t0To tcrtEach timing node in-1 moment, paper t being had occurred and that draws
It is built into t fragment, the most altogether t by relationcrt-t0Individual fragment is built into zero and quotes collection of thesis;
Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising tcrt-t0Individual fragment data eigenvalue
Training set.
Preferably, step 1 uses parallel method perform random walk method, comprise the steps:
Step A1: eigenvalue based on adjacent paper updates the first authors of follow-up paper, meeting or periodical, machine respectively
The eigenvalue of structure;
Step A2: judge the institute in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information
Eigenvalue after the eigenvalue having paper node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as phase
Adjacent paper, returns and performs step A1;The most then enter step 2 to continue executing with.
Based on random walk model zero provided according to the present invention quotes article commending system, including:
Academic network model sets up module: is used for building academic network model, and obtains every opinion by random walk method
Literary composition the first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time;
Training set builds module: sets up order models, and chooses the paper after academic network model sets up resume module
Data construct training set;
Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider
The grader that single eigenvalue is ranked up;
Order models builds module: judge the ranking results of Weak Classifier whether with the true ranking results phase of training set
Join, obtain optimal sequencing model.
Preferably, described academic network model sets up module and includes:
Retrieval module: it is all that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800
Paper resource;
Model building module: by paper key message is extracted, set up and comprise four class point sets and of four class limit collection
Art network model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, opinion
Mechanism, paper publishing time delivered in literary composition;
Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set
Paper in time period, as training set, is analyzed academic network model by random walk method, is obtained the first authors of paper, meeting
View or periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
Compared with prior art, the present invention has a following beneficial effect:
1, the present invention is based on the basic parameter in available data iterative processing developing algorithm, and the performance according to algorithm model is real
The most automatically train evolution, under big data cases, realizing the parallel processing of algorithm, employ brand-new paper sequence thought,
So that the paper newly delivered more efficiently is recommended, meet the Search Requirement of vast researcher.
2, the present invention efficiently solves zero and quotes article sequencing problem, calculates by combining random walk model and self adaptation
Method, the information that analysis conventional sort algorithm does not accounts for, it is particularly suited for the future influence power of paper newly delivered and important
The analysis of degree, obtains its priority ordering result.
Accompanying drawing explanation
By the detailed description non-limiting example made with reference to the following drawings of reading, the further feature of the present invention,
Purpose and advantage will become more apparent upon:
Fig. 1 for the present invention provide based on random walk model zero quote article recommend method flow chart;
Fig. 2 is the data message schematic diagram of derivation time decay factor;
Fig. 3 is academic network model schematic diagram;
Fig. 4 be training set choose schematic diagram;
Fig. 5 is the operation time diagram of parallel algorithm.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in detail.Following example will assist in the technology of this area
Personnel are further appreciated by the present invention, but limit the present invention the most in any form.It should be pointed out that, the ordinary skill to this area
For personnel, without departing from the inventive concept of the premise, it is also possible to make some changes and improvements.These broadly fall into the present invention
Protection domain.
Based on random walk model zero provided according to the present invention quotes article and recommends method, comprises the steps:
Step S1: build academic network model, and the method using random walk, asks for writing the first work of every paper
Person, receives the meeting of this paper or periodical and delivers the scoring of mechanism's these three eigenvalue of this paper and paper is commented
Point;Now the symbol implementing to be directed to use with in step is explained, explain the situation and be shown in Table 1.
The definition explanation of table 1. symbol
Owing to the paper resource distribution on the Internet extremely disperses, and annual data volume more new capital is the hugest, institute
It is broadly divided into two steps with the structure for academic network model, is made up of step S1.1 and step S1.2, including data
Obtain and integrate;Hereafter the analysis of this model mainly have employed the way of random walk, specifically launching by step of this algorithm
Rapid S1.3 completes.The following is the detailed step involved by step one:
Step S1.1: the academic spectrum data resource using Microsoft to provide, is obtained from all opinions delivered so far 1800
Literary composition resource,
Step S1.2: use the text analyzing instrument optimized, by the extraction to paper key message, sets up and comprises four classes
Point set and the academic network model of four class limit collection.(accompanying drawing 3 is shown in by model)
Step A1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u.
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent
Paper v, puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFTable respectively
Show between paper, paper and author, paper and meeting and periodical, the line of paper and mechanism.
Step A2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent in network and deliver the earliest
1800 times of paper, tcrtRepresent current year.
Step A3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time;tcrtRepresent current year.
Step S1.3: under each field, quotes paper set using the paper of 2011 as zero, by feature-value-score and
Paper is marked.Owing to the scoring of paper, author, meeting and periodical, mechanism is inter-related, so we devise optimization
Random walk method carries out characteristics extraction.
The step of feature-value-score and paper scoring is as follows:
Step B1: setup parameter: ω1,ω2,ω3,ω4,ω5,ρ,tcrt, wherein, parameter ω1Represent that remaining paper is to obtaining
The contribution weight divided, ω2Represent author's contribution weight to paper score, ω3Represent and include the meeting of this paper and periodical to this
The contribution weight of paper score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper publishing
The time contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year.
Step B2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents paper number in field, and i represents i-th paper, and span is 0-
N;
Step B3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author
I, pjRepresent that paper j, AVG () are that average calculates function;
Step B4: calculating the score of paper, computing formula is as follows:
In formula: pi' represent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent that the author of paper i obtains
Point, vjRepresent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent delivering of paper i
Time, ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
The calculating of decay factor ρ:
Choose the paper of computer science, totally 8884763.
According to the time after every paper publishing and to this time the meansigma methods quoting number of times of paper, make by
Number of references-time graph, as shown in Figure 2.Ignore the first two point, use this curve of exponential function matching to obtain optimal knot
Really:
ce-0.124t
Therefore, use ρ=-0.124 as time decay factor.
Process to INFORMATION OF INCOMPLETE point
Due in data set, author, meeting and periodical, mechanism information the most complete, so in order to solve this
Individual problem, have employed the way of dummy node, if such as paper u does not has author information, it assumes that a virtual author, and false
If this author has only delivered this paper u.
The specific implementation process of average function:
The thought realized with reference to Page Rank algorithm, calculates paper score
Set up figure GP=(P, EPP),GA=(P ∪ A, EPA),GV=(P ∪ V, EPV),GF=(P ∪ F, EPF), each contain
Corresponding point set and Bian Ji;GPRepresentation theory texts and pictures, GARepresent author's figure, GVRepresent periodical and meeting figure, GFOutgoing mechanism figure;
First calculating the score of author, meeting and periodical, mechanism, initial paper score is
A=AAP{ calculates author score matrix a}
V=AVP{ calculates meeting or periodical score matrix v}
F=AFP{ calculates mechanism score matrix f}
AA,AV,AFFor normalized adjacency matrix, have recorded author and paper, meeting and periodical and paper, mechanism respectively
With the relation of paper, the then score of double counting paper:
For AA,AV,AFTransposed matrix, have recorded paper and author respectively, paper and meeting and periodical, paper
With the relation of mechanism, finally restrain, i.e. as p | pk-pk+1| < 10-9Time terminate calculate.
Step B5: arrange zero and quote paper set (as shown in Figure 4), using 2011 as current year, hid when the year before last
The information in time after Fen, obtains zero and quotes paper set.
Step B6: characteristics extraction, was set as the paper of 1800 to 2010 years training set, and uses the random trip of optimization
Walk method and training set is carried out characteristics extraction.
Step S2: use Ranking Algorithm, choose data construct training set, choose Weak Classifier and according to single weak point
Existing order models revised by class device, constantly repeats aforesaid operations until obtaining optimal models;
For solving the problem of different characteristic value training order models in integrating step S1, traditional method is to select linear regression
Or k nearest neighbor algorithm, but this type of method is for the problems referred to above inapplicable.Because to two papers from different time sections,
Paper total citations was affected by time and historical factor, therefore was ranked up being irrational to these two papers.Cause
This uses Ranking Algorithm, is analyzed respectively for the paper from different time sections, is embodied as step as follows:
Step S2.1: selected t is from t0To tcrtEach timing node in-1 moment, paper t being had occurred and that draws
It is built into t fragment, the most altogether t by relationcrt-t0Individual fragment is built into " zero quotes collection of thesis ", due to t in experiment0Non-key work
With, by t0It is entered as tcrt-10;
Step S2.2: use the characteristics extraction algorithm of step S1, " zero quotes collection of thesis " built for step S2.1,
Obtain comprising tcrt-t0The training set of individual fragment data eigenvalueWhereinGeneration respectively
" author " in table t fragment, " meeting ", " mechanism " eigenvalue, ytRepresent the actual of t fragment and quote ranking;
Step S2.3, for training set S produced in step S2.2, uses AdaRank algorithm to be iterated, in iteration
Each wheel adds new Weak Classifier kn, adjust the weight α of new gradern, add current order models and obtain new model rn, when
When grader performance no longer promotes, iteration terminates, and obtains optimal sequencing model, and r represents the order models being initially added, by " making
Person ", " meeting ", the weight composition of " mechanism " three partial feature value.
Step S3: parallel random walk part, is to invent to dissolve parallel on the basis of the random walk part of step S1
Certainly scheme, saving-algorithm runs the time, reduces space requirement;
Owing to the random walk part of step S1 has time complexity and the space complexity of O (M+N) of O (M), wherein
M represents the quantity on limit in academic network model, and N represents the total quantity of paper in training set so that allow it transport on a single machine
Row becomes unrealistic, so proposing the parallelization solution of a random walk.
Step S3.1:RankAVF is mainly for the author in academic network model, and meeting and three, mechanism are for paper
Scoring has three factors of main impact to mark.Its process is exactly, and take steps the characteristics extraction algorithm in, is learning
Art network model extracts author, meeting, the eigenvalue of agency node adjacent paper node respectively, is averaged and calculates it
Eigenvalue, replaces original eigenvalue on node with the new feature value calculated, it is achieved the renewal of network, newer calculate
Eigenvalue passes to adjacent paper node, completes the iteration of an AVF.Computing formula is as follows:
{ calculate author's score a} by paper score value p
{ pass through paper score value p and calculate meeting score v}
{ calculate mechanism's score f} by paper score value p
In formula: AVG represents mean function.
Step S3.2:RankP process be namely based on paper node diagnostic value that last iteration obtains and adjacent author,
Meeting, the eigenvalue of the new paper node of eigenvalue calculation of agency node also update, and the new feature value calculated is passed to
The follow-up paper node of this paper node and adjacent author, meeting, agency node.Computing formula is as follows:
In formula: AVG represents mean function, exp represents exponential function.
Step S3.3: two above for the algorithm of individual node in academic network model, the most parallel iteration, if
The eigenvalue that all paper nodes calculate all is restrained, and algorithm just stops iteration, i.e. obtains for commenting of newly haveing a learned dissertation published
Point.
Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned
Particular implementation, those skilled in the art can make a variety of changes within the scope of the claims or revise, this not shadow
Ring the flesh and blood of the present invention.In the case of not conflicting, the feature in embodiments herein and embodiment can any phase
Combination mutually.
Claims (8)
1. quote article recommendation method for one kind based on random walk model zero, it is characterised in that comprise the steps:
Step 1: build academic network model, obtain the first authors of every paper, meeting or periodical, machine by random walk method
Structure, deliver the eigenvalue corresponding to the time;
Step 2: set up order models, and choose the paper data construct training set after step 1 processes;
Step 3: be ranked up training set by Weak Classifier, described Weak Classifier refers to only consider that single eigenvalue is arranged
The grader of sequence;
Step 4: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,
If not mating, then according to weak with this in the ranking results of Weak Classifier and the discrepancy adjustment order models of true ranking results
The weight of eigenvalue corresponding to grader, and after adjusting training concentrates the weighted value of each fragment, return and perform step 3;
If coupling, then judge whether the Weak Classifier that all eigenvalues are corresponding to be ranked up, if it is not, then change weak typing
The eigenvalue kind that device is considered, returns and performs step 3;The most then obtain optimal sequencing model;
Step 5: needed for recommending user by optimal sequencing model, zero quotes document.
The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that described
Step 1 includes:
Step 1.1: all paper resources that the academic spectrum data resource acquisition using Microsoft to provide was delivered so far from 1800;
Step 1.2: by paper key message is extracted, set up and comprise four class point sets and the academic network mould of four class limit collection
Type;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper publishing machine
Structure, paper publishing time;
Step 1.3: select paper art, quote paper set using the paper in a certain year as zero, to set in the time period
Paper as training set, analyze academic network model by random walk method, obtain the first authors of paper, meeting or phase
Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described
Step 1.2 includes:
Step 1.2.1: set up academic network model, represents this science network with G:
G=(P ∪ A ∪ V ∪ F, EPP∪EPA∪EPV∪EPF)
Limit (pv,pu)∈EPPRepresent that paper v quotes a paper u;
Limit (pv,au)∈EPARepresent that the first authors of paper v is u;
Limit (pv,vu)∈EPVRepresent that paper v is published on meeting or periodical u;
Limit (pv,fu)∈EPFRepresent paper v from mechanism u;
Wherein: P, A, V, F represent the four class point sets that paper, author, meeting and periodical, mechanism are constituted, p respectivelyvRepresent paper v,
puRepresent paper u, auRepresent author u, vuRepresent meeting and periodical u, fuOutgoing mechanism u, EPP、EPA、EPV、EPFRepresent paper respectively
Between, paper and author, paper and meeting and periodical, the line of paper and mechanism;
Step 1.2.2: the paper in the academic network model of foundation, corresponding time relationship:
In academic network G, the paper publishing time is expressed as t0< t1< ... < tcrt, wherein t0Represent the opinion delivered the earliest in network
1800 times of literary composition, tcrtRepresent current year;
Step 1.2.3: set up zero and quote paper data set Z:
Z={pz∈P|t(pz)=tcrt}
In formula: pzRepresent the paper in set Z;t(pz) represent paper deliver the time.
The most according to claim 2 based on random walk model zero quotes article recommends method, it is characterised in that described
Step 1.3 includes:
Step 1.3.1: setup parameter: ω1,ω2,ω3,ω4,ω5,ρ,tcrt, wherein, parameter ω1Represent that remaining paper is to score
Contribution weight, ω2Represent author's contribution weight to paper score, ω3Represent and include the meeting of this paper and periodical to this opinion
The contribution weight of literary composition score, ω4Represent the mechanism delivering this paper contribution weight to paper score, ω5Represent paper publishing year
The part contribution weight to paper score, ρ represents the importance parameter of paper publishing time, tcrtRepresent current year;
Step 1.3.2: initializing paper score value, computing formula is as follows:
In formula: piRepresenting any one paper, N represents field number, and i represents i-th article, and i span is 0~N;
Step 1.3.3: passing through paper score value and calculate the score of author, meeting or periodical, mechanism respectively, computing formula is as follows:
In formula: aiRepresent author's i score, viRepresent meeting and periodical i score, fiOutgoing mechanism i score, Ai represents author i, pjTable
Show that paper j, AVG () are that average calculates function;
Step 1.3.4: calculating the score of paper, computing formula is as follows:
In formula: p 'iRepresent any one paper i, pjRepresent the paper j, a quoted by paper ijRepresent author's score of paper i, vj
Represent that paper i's includes periodical or meeting score, fjRepresent that paper i's delivers mechanism's score, tiRepresent that paper i's delivers the time,
ZA,ZV,ZF,ZTFor normalization variable, ρ is time decay factor.
5. quote article according to based on random walk model zero described in claim 1 or 4 and recommend method, it is characterised in that
Described step 2 includes:
Step 2.1: selected t is from t0To tcrtEach timing node in-1 moment, the paper adduction relationship that t is had occurred and that
It is built into t fragment, the most altogether tcrt-t0Individual fragment is built into zero and quotes collection of thesis;
Step 2.2: quote collection of thesis for the zero of step 2.1 structure and obtain comprising tcrt-t0The training of individual fragment data eigenvalue
Collection.
The most according to claim 1 based on random walk model zero quotes article recommends method, it is characterised in that step
Use parallel method to perform random walk method in 1, comprise the steps:
Step A1: eigenvalue based on adjacent paper updates the spy of the first authors of follow-up paper, meeting or periodical, mechanism respectively
Value indicative;
Step A2: judge all opinions in the paper Citations networks being made up of the first authors, meeting or periodical, mechanism information
Eigenvalue after the eigenvalue of literary composition node is the most all updated and updates all is restrained, if it is not, then using follow-up paper as adjacent opinion
Literary composition, returns and performs step A1;The most then enter step 2 to continue executing with.
7. quote article commending system for one kind based on random walk model zero, it is characterised in that including:
Academic network model sets up module: is used for building academic network model, and obtains every paper by random walk method
The first authors, meeting or periodical, mechanism, deliver the eigenvalue corresponding to the time;
Training set builds module: sets up order models, and chooses the paper data after academic network model sets up resume module
Build training set;
Weak Classifier order module: be ranked up training set by Weak Classifier, described Weak Classifier refers to only to consider single
The grader that eigenvalue is ranked up;
Order models builds module: judge that the ranking results of the Weak Classifier true ranking results whether with training set matches,
Obtain optimal sequencing model.
The most according to claim 7 based on random walk model zero quotes article commending system, it is characterised in that described
Academic network model sets up module and includes:
Retrieval module: all papers that the academic spectrum data resource acquisition for being provided by Microsoft was delivered so far from 1800
Resource;
Model building module: by paper key message is extracted, set up and comprise four class point sets and the academic net of four class limit collection
Network model;Paper key message therein is to include: Article Titles, author, papers included periodical or the meeting included, paper are sent out
Table mechanism, paper publishing time;
Model analysis module: select paper art, quote paper set using the paper in a certain year as zero, to set the time
Paper in Duan as training set, analyzes academic network model by random walk method, obtain the first authors of paper, meeting or
Periodical, mechanism, deliver the scoring of the feature-value-score corresponding to the time and this paper.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610595617.1A CN106250438B (en) | 2016-07-26 | 2016-07-26 | Zero-citation article recommendation method and system based on random walk model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610595617.1A CN106250438B (en) | 2016-07-26 | 2016-07-26 | Zero-citation article recommendation method and system based on random walk model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250438A true CN106250438A (en) | 2016-12-21 |
CN106250438B CN106250438B (en) | 2020-07-14 |
Family
ID=57603682
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610595617.1A Active CN106250438B (en) | 2016-07-26 | 2016-07-26 | Zero-citation article recommendation method and system based on random walk model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250438B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038211A (en) * | 2017-02-28 | 2017-08-11 | 大连理工大学 | A kind of paper impact factor appraisal procedure based on quantum migration |
CN108132961A (en) * | 2017-11-06 | 2018-06-08 | 浙江工业大学 | A kind of bibliography based on reference prediction recommends method |
CN108228728A (en) * | 2017-12-11 | 2018-06-29 | 北京航空航天大学 | A kind of paper network node of parametrization represents learning method |
CN108614867A (en) * | 2018-04-12 | 2018-10-02 | 科技部科技评估中心 | Frontline technology sex index computational methods based on scientific paper and system |
CN108764943A (en) * | 2018-05-30 | 2018-11-06 | 公安部第三研究所 | Suspicious user method for monitoring and analyzing based on funds transaction network |
CN109299379A (en) * | 2018-10-30 | 2019-02-01 | 东软集团股份有限公司 | Article recommended method, device, storage medium and electronic equipment |
CN109345416A (en) * | 2018-09-12 | 2019-02-15 | 连尚(新昌)网络科技有限公司 | It is a kind of for recording the method and apparatus of the adduction relationship between works |
CN109726297A (en) * | 2018-12-28 | 2019-05-07 | 沈阳航空航天大学 | A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy |
CN110209840A (en) * | 2019-06-06 | 2019-09-06 | 北京百奥知信息科技有限公司 | A kind of paper impact factor appraisal procedure based on multidimensional characteristic |
CN110254438A (en) * | 2018-03-12 | 2019-09-20 | 松下知识产权经营株式会社 | Information processing unit and program recorded medium |
CN111198897A (en) * | 2018-11-19 | 2020-05-26 | 中国农业大学 | Scientific research hotspot topic analysis method and device and electronic equipment |
CN111723578A (en) * | 2020-06-09 | 2020-09-29 | 平安科技(深圳)有限公司 | Hot spot prediction method and device based on random walk model and computer equipment |
CN113392319A (en) * | 2021-05-13 | 2021-09-14 | 宁波大学 | Academic paper recommendation method based on network representation and auxiliary information embedding |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298579A (en) * | 2010-06-22 | 2011-12-28 | 北京大学 | Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals |
CN102521337A (en) * | 2011-12-08 | 2012-06-27 | 华中科技大学 | Academic community system based on massive knowledge network |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
US20150111190A1 (en) * | 2013-10-22 | 2015-04-23 | Steven Michael VITTORIO | Educational content search and results |
CN104636426A (en) * | 2014-12-22 | 2015-05-20 | 河海大学 | Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions |
CN105550216A (en) * | 2015-12-03 | 2016-05-04 | 百度在线网络技术(北京)有限公司 | Searching method and device of academic research information and excavating method and device of academic research information |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
-
2016
- 2016-07-26 CN CN201610595617.1A patent/CN106250438B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102298579A (en) * | 2010-06-22 | 2011-12-28 | 北京大学 | Scientific and technical literature-oriented model and method for sequencing papers, authors and periodicals |
CN102521337A (en) * | 2011-12-08 | 2012-06-27 | 华中科技大学 | Academic community system based on massive knowledge network |
CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
US20150111190A1 (en) * | 2013-10-22 | 2015-04-23 | Steven Michael VITTORIO | Educational content search and results |
CN104636426A (en) * | 2014-12-22 | 2015-05-20 | 河海大学 | Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions |
CN105550216A (en) * | 2015-12-03 | 2016-05-04 | 百度在线网络技术(北京)有限公司 | Searching method and device of academic research information and excavating method and device of academic research information |
CN105740386A (en) * | 2016-01-27 | 2016-07-06 | 北京航空航天大学 | Thesis search method and device based on sorting integration |
Non-Patent Citations (2)
Title |
---|
YUJING WANG 等: "Ranking Scientific Articles by Exploiting Citations,Authors,Journals,and Time Information", 《PROCEEDINGS OF TWENTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
秦臻: "学术社会网络建模和学术资源推荐方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038211A (en) * | 2017-02-28 | 2017-08-11 | 大连理工大学 | A kind of paper impact factor appraisal procedure based on quantum migration |
CN108132961B (en) * | 2017-11-06 | 2020-06-30 | 浙江工业大学 | Reference recommendation method based on citation prediction |
CN108132961A (en) * | 2017-11-06 | 2018-06-08 | 浙江工业大学 | A kind of bibliography based on reference prediction recommends method |
CN108228728A (en) * | 2017-12-11 | 2018-06-29 | 北京航空航天大学 | A kind of paper network node of parametrization represents learning method |
CN108228728B (en) * | 2017-12-11 | 2020-07-17 | 北京航空航天大学 | Parameterized thesis network node representation learning method |
CN110254438A (en) * | 2018-03-12 | 2019-09-20 | 松下知识产权经营株式会社 | Information processing unit and program recorded medium |
CN108614867A (en) * | 2018-04-12 | 2018-10-02 | 科技部科技评估中心 | Frontline technology sex index computational methods based on scientific paper and system |
CN108614867B (en) * | 2018-04-12 | 2022-03-15 | 科技部科技评估中心 | Academic paper-based technology frontier index calculation method and system |
CN108764943A (en) * | 2018-05-30 | 2018-11-06 | 公安部第三研究所 | Suspicious user method for monitoring and analyzing based on funds transaction network |
CN108764943B (en) * | 2018-05-30 | 2021-09-24 | 公安部第三研究所 | Suspicious user monitoring and analyzing method based on fund transaction network |
CN109345416A (en) * | 2018-09-12 | 2019-02-15 | 连尚(新昌)网络科技有限公司 | It is a kind of for recording the method and apparatus of the adduction relationship between works |
CN109299379A (en) * | 2018-10-30 | 2019-02-01 | 东软集团股份有限公司 | Article recommended method, device, storage medium and electronic equipment |
CN111198897A (en) * | 2018-11-19 | 2020-05-26 | 中国农业大学 | Scientific research hotspot topic analysis method and device and electronic equipment |
CN111198897B (en) * | 2018-11-19 | 2023-06-13 | 中国农业大学 | Scientific research hotspot topic analysis method and device and electronic equipment |
CN109726297A (en) * | 2018-12-28 | 2019-05-07 | 沈阳航空航天大学 | A kind of two subnetwork node prediction algorithms based on mutual exclusion strategy |
CN109726297B (en) * | 2018-12-28 | 2022-12-23 | 沈阳航空航天大学 | Bipartite network node prediction algorithm based on mutual exclusion strategy |
CN110209840A (en) * | 2019-06-06 | 2019-09-06 | 北京百奥知信息科技有限公司 | A kind of paper impact factor appraisal procedure based on multidimensional characteristic |
CN111723578A (en) * | 2020-06-09 | 2020-09-29 | 平安科技(深圳)有限公司 | Hot spot prediction method and device based on random walk model and computer equipment |
CN111723578B (en) * | 2020-06-09 | 2023-11-17 | 平安科技(深圳)有限公司 | Hot spot prediction method and device based on random walk model and computer equipment |
CN113392319A (en) * | 2021-05-13 | 2021-09-14 | 宁波大学 | Academic paper recommendation method based on network representation and auxiliary information embedding |
Also Published As
Publication number | Publication date |
---|---|
CN106250438B (en) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106250438A (en) | Based on random walk model zero quotes article recommends method and system | |
Olczyk | A systematic retrieval of international competitiveness literature: a bibliometric study | |
CN103631859B (en) | Intelligent review expert recommending method for science and technology projects | |
Fahimnia et al. | Green supply chain management: A review and bibliometric analysis | |
CN105589948B (en) | A kind of reference citation network visualization and literature recommendation method and system | |
Eliacik et al. | Influential user weighted sentiment analysis on topic based microblogging community | |
Zhou et al. | Classifying the political leaning of news articles and users from user votes | |
CN103729432B (en) | Method for analyzing and sequencing academic influence of theme literature in citation database | |
Zhou et al. | A novel Data Envelopment Analysis model for evaluating industrial production and environmental management system | |
CN108614867B (en) | Academic paper-based technology frontier index calculation method and system | |
CN106682172A (en) | Keyword-based document research hotspot recommending method | |
CN105117422A (en) | Intelligent social network recommender system | |
Chen et al. | Spreadsheet property detection with rule-assisted active learning | |
CN103617481B (en) | A kind of domain knowledge extraction and supplying system and the method for Process-Oriented | |
CN104636426A (en) | Multi-factor comprehensive quantitative analysis and sorting method for academic influences of scientific research institutions | |
CN105631018A (en) | Article feature extraction method based on topic model | |
Tuesta et al. | Analysis of an advisor–advisee relationship: An exploratory study of the area of exact and earth sciences in Brazil | |
Semerikov et al. | Automation of the Export Data from Open Journal Systems to the Russian Science Citation Index | |
Li et al. | A hybrid model for experts finding in community question answering | |
Tayal et al. | Personalized ranking of products using aspect-based sentiment analysis and Plithogenic sets | |
CN104572915B (en) | One kind is based on the enhanced customer incident relatedness computation method of content environment | |
Tang et al. | Internationalizing AI: evolution and impact of distance factors | |
Liu et al. | How to choose appropriate experts for peer review: An intelligent recommendation method in a big data context | |
Chen et al. | Identifying the key success factors of movie projects in crowdfunding | |
Wei et al. | Using network flows to identify users sharing extremist content on social media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |