CN102262661B - Web page access forecasting method based on k-order hybrid Markov model - Google Patents

Web page access forecasting method based on k-order hybrid Markov model Download PDF

Info

Publication number
CN102262661B
CN102262661B CN 201110200145 CN201110200145A CN102262661B CN 102262661 B CN102262661 B CN 102262661B CN 201110200145 CN201110200145 CN 201110200145 CN 201110200145 A CN201110200145 A CN 201110200145A CN 102262661 B CN102262661 B CN 102262661B
Authority
CN
China
Prior art keywords
page
user
markov model
session
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110200145
Other languages
Chinese (zh)
Other versions
CN102262661A (en
Inventor
顾庆
任颖新
汤九斌
陈道蓄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN 201110200145 priority Critical patent/CN102262661B/en
Publication of CN102262661A publication Critical patent/CN102262661A/en
Application granted granted Critical
Publication of CN102262661B publication Critical patent/CN102262661B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a Web page access forecasting method based on a k-order hybrid Markov model, and the method comprises the following steps of: firstly, gathering and handling Web server access log data, identifying a client and users and eliminating insignificant access data; then identifying user sessions to construct a Web log database; select log data from the database according to forecasting targets, and organizing (k+1) tuples by taking a session as a unit, wherein the (k+1) tuples are used for training the k-order hybrid Markov model; learning and calibrating a parameter set of the k-order hybrid Markov model by using an expectation-maximization algorithm; and identifying sessions according to page access operations of target users and applying the model for forecasting Web pages to be accessed by the users at the next step. The method provided by the invention can be used for recommending pages needed to be accessed by the users so as to reduce the delay of page access and optimize user experience; and in addition, from the point of a Web server, the organizational structure of the Web pages can be improved, the result sequence of a search engine is guided, and the page cache mechanism can be improved, thereby enhancing the quality of service.

Description

A kind of Web page access Forecasting Methodology based on k rank mixing Markov model
Technical field
The present invention relates to the personalized forecasting techniques of Web page access, especially for the Internet era Web server information more and more huger and complicated, the situation that user's visit capacity is increasing; Need to determine next step page that may access of user according to user's accessing characteristic, help the user faster and better find needed information; Assist simultaneously the Web server page of cache user needs in advance, improve the page link structure, thereby improve server access efficient.
Background technology
Be accompanied by the fast development of internet, WWW (World Wide Web) has become a worldwide hypermedia acquisition of information platform, people more and more depend on Web and obtain various information, have also become the part of people's daily life and browse Web.A complete Web uses the http protocol that comprises Web server, client browser, completes to communicate by letter between client and server, HTML (Hypertext Markup Language) HTML and the uniform resource position mark URL that is used for describing the Web page.Use the continuous expansion of scale along with Web, the page that comprises on Web server is more and more, and the institutional framework of the Web page also becomes increasingly complex.The user is easy in the ocean of the information that gets lost in.How to allow the user find own needed information faster and betterly; Allow Web server provide personalized service to promote service quality according to user's use habit, become the problem that present Web application need to solve.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method that access characteristic historical according to the user predicted web access, access situation that simultaneously can be current according to the user, prediction or next step page that will access (set) of recommendation user.
For achieving the above object, the present invention adopts following step:
1) at first collect and arrange Web server access log data; For each Visitor Logs in daily record, identify customer end browser and user; Get rid of insignificant visit data, as auto-programmings such as robot and reptiles to the Visitor Logs of Web server, non-page access record etc., according to the record accessing operation o=<u that extracts, x, t 〉, wherein u represents that user, x represent the Web page, t representation page access time;
2) then identify user conversation S, be used for setting up the Web log database, deposit is used for the historical data of Web page access prediction;
3) choose and organize daily record data according to target of prediction from database, arrange and tissue (k+1) tuple-set by session;
4) set up k rank mixing Markov model, and adopt greatest hope algorithm (Expectation-Maximization) to train this k rank mixing Markov model, then the based on data collection
Figure BDA0000076381530000021
The parameter set of study and calibration k rank mixing Markov model;
5) accessing operation of based target user to the Web page identified nearest user conversation, the Web page of next step access of k rank mixing Markov model predictive user after application training.
Above-mentioned steps 2) in, the process of identification user conversation is: make session S={o 1, o 2..., o l, make all operations o in session S iThe execution user identical; Then according to last accessing operation o i=<u, x i, t iThe time t that occurs iAnd the page x that accesses i, judge next operation o i+ 1=<u, x i+1, t i+1Whether belonging to same session S, this is judged based on following three conditions:
Page x i+1By page x iQuote, i.e. x i+1URL be contained in page x iIn;
Page x i+1By the x that removes that was accessed in session S iOutside other webpage t jQuote, as x j(j<i, corresponding accessing operation o jAnd t ∈ S), jAnd t i+1Mistiming less than session threshold value (as 30 minutes);
Page x i+1Do not quoted by the page of being accessed in session S, but last operation t iAnd t i+1Mistiming less than page threshold value (as 5 minutes).
If satisfy one of above-mentioned condition, decision o i+1Belong to session S; Otherwise operation o i+1Open a new session.
Above-mentioned steps 4) in, k rank mixing Markov model is by k state-transition matrix { Λ 1, Λ 2... Λ kAnd a weight vector A={ α 1, α 2..., α kForm.Make that in Web server, the page adds up to n, transition matrix Λ jA n * n matrix, its element λ jAfter (x, y) representation page x is accessed, page y accessed probability, i.e. conditional probability p (x after an interval j page in same session k+1| x K-j+1), page x wherein K-j+1And x k+1Be equal to respectively page x and y.
All n in k state-transition matrix in k rank mixing Markov model 2K element, and the weights of the k in weight vector A, the parameter set that component model need to be trained.By step 4), the data-oriented collection
Figure BDA0000076381530000022
Adopt greatest hope Algorithm Learning and these parameters of calibration.At first calculate the initial value of each parameter, in weight vector A, the initial value of element is:
α j = 1 k ( 1 ≤ j ≤ k )
For state-transition matrix Λ j, element λ jThe initial value of (x, y) is for (to make data set
Figure BDA0000076381530000024
In (k+1) tuple quantity be m, tuple X i=<x I, 1, x I, 2..., x I, k, x I, k+1):
Figure BDA0000076381530000025
Wherein δ () function is defined as:
δ ( b ) = 1 b is true 0 b is false
New parameter value when application greatest hope algorithm need to calculate next iteration according to the parameter current value is labeled as respectively α ' jAnd λ ' j(x, y).For calculating new parameter value, introduce hidden variable collection Θ={ θ 1, θ 2..., θ m, hidden variable θ wherein iRepresent tuple X iThe random character of self.Make that j is the page number of interval access, θ i(j) computing formula is as follows:
θ i ( j ) = α j · λ j ( x i , k - j + 1 , x i , k + 1 ) Σ l = 1 k a l · λ l ( x i , k - l + 1 , x i , k + 1 )
According to θ i(j), can estimate that the new value of each parameter is as follows:
α j ′ = 1 m Σ i = 1 m θ i ( j )
Figure BDA0000076381530000034
Respectively substitution previous round parameter value and new parameter value, likelihood function that can the calculation training collection.Can determine by judging whether likelihood function restrains whether k rank mixing Markov model trains complete.
Above-mentioned steps 5) process of using the Web page of next step access of k rank mixing Markov model predictive user in is: at first identify targeted customer's current sessions, obtain k the page<x of user's connected reference recently in this session 1, x 2..., x k.Then to the arbitrary page y on Web server, design conditions probable value p (y|x 1, x 2..., x k):
p ( y | x 1 , x 2 , . . . , x k ) = Σ j = 1 k α j · λ j ( x k - j + 1 , y )
Front T the page (for example T=10) conduct of selecting at last to have most probable value is looked ahead or is recommended user's the page.
The present invention arranges out user's history access record according to the web access daily record, the identification user conversation; Take correlation rule or the sequence pattern of session as unit Mining Web page access, the support of compute associations rule or sequence pattern and degree of confidence, the Web page sequence of having accessed according to the user calculates accessed (stablizing) probability of other pages; And then prediction or next step page that need to access of recommendation user.Existing method or lack the identification of user conversation and effective tissue; Perhaps use the low order Markov model to cause precision of prediction (as the Top-T degree of accuracy) too low; Perhaps use the high-order Markov model to cause that the model complexity is high and the data cover rate is low, in the situation that increase predicted time and complexity may reduce precision of prediction on the contrary.
The method is calculated simple, and the model complexity is low, has higher execution performance.Be with good expansibility simultaneously, easily combine with other Web method for digging such as user clustering analysis etc.Experimental data shows, the execution performance of the inventive method is equal to the low order Markov model, and precision of prediction is equivalent to the high-order Markov model.The inventive method also comprises the daily record data finishing techniques such as user conversation analysis, can effectively promote the effect of Web page access prediction.
Description of drawings
Fig. 1 is the overall framework of Web page access Forecasting Methodology;
Fig. 2 is the treatment scheme that arranges web access daily record identification user conversation;
Fig. 3 chooses and organizes the treatment scheme of daily record data according to target of prediction;
Fig. 4 is the flow process that adopts greatest hope Algorithm for Training k rank mixing Markov model;
Fig. 5 uses the flow process that k rank mixing Markov model is completed the Web Web page predicting;
Fig. 6 is the Comparison of experiment results of 2 rank mixing Markov models and complete 2 rank Markov models.
Embodiment
Figure 1 shows that the general technical framework of Web page access Forecasting Methodology, as seen, the input of the method is the Web page of Web server history log record and targeted customer's recent visit.The output of method is the Web page set of recommending or looking ahead for the targeted customer.Technological frame is divided into 5 modules: arrange Web page access daily record and identify the user; The identification user conversation is also set up the Web log database; Choose and organize daily record data according to target of prediction; Training k rank mixing Markov model; Operate next step accession page of prediction according to the targeted customer at last.
Web server and application program thereof be continuous recording user access log data in the course of the work, and different Web servers realizes that the mode of its log recording and data layout are slightly different.In the standard situation, for each HTTP request of user (or other agreements such as FTP, the inventive method is mainly for the Web page request), all can generate a log recording.The information that log recording comprises usually has: the resource of client ip address, request (page or URL), optional parameter, HTTP executing state, user agent's (browser and OS Type, version), referring domain (page), user Cookie etc.The Web logdata record access behavior of different user, to the reasonable analysis of daily record data with utilize the service quality that can effectively promote Web server.
Table 1 is depicted as an example of Web server (maya.cs.depaul.edu) log recording:
Table 1:
Figure BDA0000076381530000041
Figure BDA0000076381530000051
In table 1, IP address of record 1 expression is the page "/classes/cs589/papers.html " on the user access server (maya.cs.depaul.edu) of 1.2.3.4; The HTTP operation of adopting is " GET "; User browser (Mozilla) and operating system (Windows NT) information recording/is in Agent Domain; Referring domain represents that where the user from accessing current page, i.e. " http://dataminingresources.blogspot.com/ ".Record 2 reflections be same user, access resources "/classes/cs589/papers/cms-tai.pdf " again after having accessed " paper.html ", its referring domain is " paper.html ".Record 3 another users of expression pass through the Google search engine inquiry to the book server page, and referring domain " http://www.google.com/search? ... ", access resources "/classes/ds575/papers/hyperlink.pdf ".At last, record 4~6 reflection third party connected reference server resources, actual in the same page of access: "/classes/cs480/announce.html "; Wherein record 5 and 6 expressions obtaining 2 resource objects in the page.
Figure 2 shows that and arrange the web access daily record, identify user conversation and lay in the treatment scheme in daily record storehouse.In arranging web access daily record process, at first for each log recording identification calling party.User's identification can be based on client ip address and browser, and this is a kind of rough recognition methods; Identification more accurately is based on the Cookie of client, and Cookie is created in client by server, is used for sign and follows the tracks of user's request.Then clear up invalid Visitor Logs, comprise the Visitor Logs of getting rid of non-page access record and being produced by auto-programmings such as reptiles.
According to the web access daily record after arranging, wherein each record can be defined as an accessing operation o=<u, x, t 〉, wherein u represents that user, x represent the Web page, t representation page access time.The web access daily record is regarded as the list of accessing operation.Next by the user, list is sorted out, safeguarded its accessing operation list for each user.
By shown in Figure 2, need to identify user's session in each user list.Define a session S={o 1, o 2..., o l, one group of page operation that representative of consumer access Web server is associated around the time.Existence dependency relationship before and after page operation in session can be used for access behavior and the custom of analysis user; It is separate that page operation between session can be thought.If ignore the identification of session, can produce a large amount of noises in Web Web page predicting process, cause predict the outcome inaccurate.
Identification user conversation S is according to accessing operation o i=<u, x i, t iThe time t that occurs iAnd the page x that accesses iAt first all operations o in session S iThe execution user must be identical, and o iSequence in chronological order.Then for the next operation of user o i+1=<u, x i+1, t i+1, decision o i+1Whether belong to same session S based on following three conditions:
Page x i+1By page x iQuote, i.e. x i+1URL be contained in page x iIn;
Page x i+1By the x that removes that was accessed in session S iOutside other pages quote, as x j(j<i, corresponding accessing operation o jAnd t ∈ S), jAnd t i+1Mistiming less than session threshold value (as 30 minutes);
Page x i+1Do not quoted by the page of being accessed in session S, but last operation t iAnd t i+1Mistiming less than page threshold value (as 5 minutes).
If satisfy one of above-mentioned condition, decision o i+1Belong to session S; Otherwise operation o i+1Open a new session.Wherein session threshold value and page threshold value can be adjusted according to actual conditions.After the identification user conversation, set up the Web log database.Log database is by user, two hierarchical organizations of session, as the data basis of web access prediction.
Figure 3 shows that according to target of prediction and choose daily record data and be organized into the treatment scheme of (k+1) tuples list.At first choose daily record data from database according to target of prediction.Can obtain its historical visit data according to the targeted customer, if data volume is less, may affect the effect of Web page predicting.A kind of improving one's methods is all historical visit datas of choosing similar user.Can use clustering method attribute close (as age, sex, educational background, income level etc.) or the approaching user of access module are classified as a class.
After obtaining daily record data, selected constant k, k page operation before requiring to consider during representative prediction next page, i.e. the page of next step access of user depends on front k the page of having accessed.The k value is selected and need to according to the smallest size of user conversation, be done compromise between precision of prediction.Extract (k+1) tuple X=<x take session as unit 1, x 2..., x k, x k+1.The page in each (k+1) tuple belongs to same session, comprises one group of page of user's connected reference in session.Belonging to the page that adjacent two (k+1) tuples of same session comprise can repeat, and for example next tuple of tuple X can be: X '=<x 2, x 3..., x k+1, x k+2.All (k+1) tuples consist of a set
Figure BDA0000076381530000061
Be used for training k rank mixing Markov model.
Figure 4 shows that the treatment scheme that adopts greatest hope Algorithm for Training k rank mixing Markov model.At first mixing Markov model in k rank is by k state-transition matrix { Λ 1, Λ 2... Λ kAnd a weight vector A={ α 1, α 2..., α kForm.Make that in Web server, the page adds up to n, transition matrix Λ jA n * n matrix, its element λ jAfter (x, y) representation page x is accessed, page y accessed probability, i.e. conditional probability p (x after an interval j page in same session k+1| x K-j+1), page x wherein K-j+1And x k+1Be equal to respectively page x and y.
All n in k state-transition matrix in k rank mixing Markov model 2K element, and the weights of the k in weight vector A, the parameter set that component model need to be trained.Corresponding with it, the complete needed number of parameters of k rank Markov model is
Figure BDA0000076381530000071
According to Fig. 4, the data-oriented collection Adopt greatest hope algorithm (Expectation-Maximization) to learn and calibrate these parameters.Calculate the initial value of each parameter, in weight vector A, the initial value of element is:
α j = 1 k ( 1 ≤ j ≤ k ) - - - ( 1 )
For state-transition matrix Λ j, element λ jThe initial value of (x, y) is for (to make data set
Figure BDA0000076381530000074
In (k+1) tuple quantity be m, tuple X i=<x I, 1, x I, 2..., x I, k, x I, k+1):
Figure BDA0000076381530000075
Wherein δ () function is defined as:
δ ( b ) = 1 b is true 0 b is false - - - ( 3 )
Then calculate likelihood function
Figure BDA0000076381530000077
Representative under the parameter current collection, data set
Figure BDA0000076381530000078
There is the possible probability of (namely becoming sampled value).Computing formula is as follows:
P (X wherein i| Λ 1, Λ 2... Λ k, A) be illustrated under the parameter current collection, (k+1) tuple X iBecome the probability of sampled value, computing formula is as follows:
p ( X i | Λ 1 , Λ 2 , . . . Λ k , A ) = Σ j = 1 k α j · λ j ( x i , k - j + 1 , x i , k + 1 ) - - - ( 5 )
New parameter value when next calculating next iteration according to the parameter current value is labeled as respectively α ' jAnd λ ' j(x, y).For calculating new parameter value, introduce hidden variable collection Θ={ θ 1, θ 2..., θ m, hidden variable θ wherein iRepresent tuple X iThe random character of self.Make that j is the page number of interval access, θ i(j) computing formula is as follows:
θ i ( j ) = α j · λ j ( x i , k - j + 1 , x i , k + 1 ) Σ l = 1 k a l · λ l ( x i , k - l + 1 , x i , k + 1 ) - - - ( 6 )
According to θ i(j), can estimate that the new value of each parameter is as follows:
α j ′ = 1 m Σ i = 1 m θ i ( j ) - - - ( 7 )
Figure BDA0000076381530000082
According to new parameter value, the likelihood function under new parameters sets is calculated in application of formula (4) and (5) again
Figure BDA0000076381530000083
Calculate the relative difference Δ of a same likelihood function of new likelihood function:
Figure BDA0000076381530000084
Judge that at last whether difference DELTA is less than set-point, as 10 -6, showing that if it is iteration tends towards stability (convergence), the training of k rank mixing Markov model finishes; Otherwise reenter the next round iteration, continue calculate new parameters sets and repeat above-mentioned computing according to the parameter current collection.
Figure 5 shows that and use the flow process that k rank mixing Markov model is completed the Web Web page predicting.At first identify the last session of targeted customer, obtain the page<x of user's front k connected reference in this session 1, x 2..., x k.Then to the arbitrary page y on Web server, design conditions probable value p (y|x 1, x 2..., x k):
p ( y | x 1 , x 2 , . . . , x k ) = Σ j = 1 k α j · λ j ( x k - j + 1 , y ) - - - ( 10 )
Front T the page (Top-T, for example T=10) conduct of selecting at last to have most probable value is looked ahead or is recommended user's the page.
The inventive method is used k rank mixing Markov model, on the basis of identification user conversation, and k page access situation before the based target user, next step page that will access of predictive user.This method obtains active balance between precision of prediction and model complexity, can fully use on the one hand the historical visit data of Web server, completes prediction with less time and space cost; Basic suitable with the high-order Markov model aspect precision of prediction on the other hand, be with good expansibility simultaneously.
Figure 6 shows that the Comparison of experiment results of 2 rank (k=2) mixing Markov model and complete 2 rank Markov models.Experiment based on daily record data be MSWEB-DATA, comprise 285 pages, the user conversation quantity that can identify is 37211.Experiment performance index one relatively are the Top-T precision of predictions, are defined as the ratio that next step actual access page of user exists in front T the page that method is recommended; The 2nd, algorithm execution time is mainly the predicted time of module 5.For guaranteeing the robustness of experiment, the user conversation of random selection 80% is used for model training in experiment, and the user conversation of residue 20% is used for the check precision of prediction and calculates the execution time.Algorithm realized by Java language, and dominant frequency is 1.6GHz, in save as 2GB.Because session is selected with randomness, experiment repeats 10 times, and the average of calculation of performance indicators.
As seen from Figure 6, aspect the Top-T precision of prediction, the inventive method can obtain the prediction effect that is better than complete 2 rank Markov models, and when T=10, the inventive method can reach the precision of prediction more than 80% on the experimental data collection.Aspect algorithm execution time, the inventive method significantly is better than complete 2 rank Markov models, and needed predicted time is less than the latter's 1%.
The present invention is directed to the forecasting problem of Web page access, access characteristic historical according to the targeted customer and pattern are effectively predicted or recommend next step page that will access of user, the information that helps user quick location to obtain; Help simultaneously Web server effectively to organize page structure and page cache is provided, promoting service quality.By above-mentioned specific embodiment as seen, this method can realize more accurate Web Web page predicting with less model complexity.

Claims (4)

1. Web page access Forecasting Methodology based on k rank mixing Markov model is characterized in that comprising following steps:
1) at first collect and arrange Web server access log data, for each Visitor Logs in daily record, identify customer end browser and user; Get rid of insignificant visit data; According to each record accessing operation o=<u that extracts, x, t 〉, wherein u represents that user, x represent the Web page, t representation page access time;
2) identification user conversation S is used for setting up the Web log database, and deposit is used for the historical data of Web page access prediction;
3) choose and organize daily record data according to target of prediction from database, arrange and tissue (k+1) tuple-set by session; Flow process is: at first select the user and obtain session data based on target of prediction; Then extract (k+1) tuple X=<x take session as unit 1, x 2..., x k, x k+1, each (k+1) tuple belongs to same session, comprises one group of page of user's connected reference in session; The accession page of adjacent two (k+1) tuples allows to overlap; All (k+1) tuples consist of a data set at last
Figure FDA00002485692000011
4) set up k rank mixing Markov model, and adopt this k rank mixing Markov model of greatest hope Algorithm for Training, then the based on data collection
Figure FDA00002485692000012
The parameter set of study and calibration k rank mixing Markov model; Flow process is: k rank mixing Markov model is by k state-transition matrix { Λ 1, Λ 2... Λ kAnd a weight vector A={ α 1, α 2..., α kForm, make that in Web server, the page adds up to n, transition matrix Λ jA n * n matrix, its element λ jAfter (x, y) representation page x is accessed, page y accessed probability, i.e. conditional probability p (x after an interval j page in same session k+1| x K-j+1), page x wherein K-j+1And x k+1Be equal to respectively page x and y; All n in k state-transition matrix in k rank mixing Markov model 2K element, and the weights of the k in weight vector A, the parameter set that component model need to be trained;
At first data-oriented collection
Figure FDA00002485692000013
Calculate the initial value of each parameter, in weight vector A, the initial value of element is:
α j = 1 k , ( 1 ≤ j ≤ k )
For state-transition matrix Λ j, make data set
Figure FDA00002485692000015
In (k+1) tuple quantity be m, tuple X i=<x I, 1, x I, 2..., x I, k, x I, k+1, element λ jThe initial value of (x, y) is:
Figure FDA00002485692000016
Wherein δ () function is defined as:
δ ( b ) = 1 b is true 0 b is false
Then calculate the new parameter value of next iteration needs according to the parameter current value, be labeled as respectively α ' jAnd λ ' j(x, y) for calculating new parameter value, introduces hidden variable collection Θ={ θ 1, θ 2..., θ m, hidden variable θ wherein iRepresent tuple X iThe random character of self; Make that j is the page number of interval access, θ i(j) computing formula is as follows:
θ i ( j ) = α j · λ j ( x i , k - j + 1 , x i , k + 1 ) Σ l = 1 k α l · λ l ( x i , k - l + 1 , x i , k + 1 )
According to θ i(j), can estimate that the new value of each parameter is as follows:
α j ′ = 1 m Σ i = 1 m θ i ( j )
Figure FDA00002485692000023
Distinguish at last substitution previous round parameter value and new parameter value, whether the likelihood function of calculation training collection restrains by judging likelihood function, determines whether k rank mixing Markov model trains complete;
5) accessing operation of based target user to the Web page identified nearest user conversation, the Web page of next step access of k rank mixing Markov model predictive user after application training.
2. the Web page access Forecasting Methodology based on k rank mixing Markov model according to claim 1, is characterized in that the described insignificant visit data of step 1) comprises the Visitor Logs of getting rid of non-page access record and being produced by auto-programmings such as reptiles.
3. the Web page access Forecasting Methodology based on k rank mixing Markov model according to claim 1 and 2, is characterized in that step 2) the process of identification user conversation be: make session S={o 1, o 2..., o l, all operations o iThe execution user identical, then according to last accessing operation o i=<u, x i, t iThe time t that occurs iAnd the page x that accesses i, judge next operation o i+1=<u, x i+1, t i+1Whether belonging to same session S, this is judged based on following three conditions:
Page x i+1By page x iQuote, i.e. x i+1URL be contained in page x iIn;
Page x i+1By the x that removes that was accessed in session S iOutside other webpage t jQuote, and t jAnd t i+1Mistiming less than the session threshold value of setting;
Page x i+1Do not quoted by the page of being accessed in session S, but last operation t iAnd t i+1Mistiming less than the page threshold value of setting;
If satisfy one of above-mentioned condition, decision o i+1Belong to session S; Otherwise operation o i+1Open a new session.
4. the Web page access Forecasting Methodology based on k rank mixing Markov model according to claim 1 and 2, is characterized in that step 5) in the process of the Web page of next step access of k rank mixing Markov model predictive user after application training be: the current sessions of at first identifying the targeted customer; Then obtain k the page:<x of user's connected reference recently in current sessions 1, x 2..., x k, next for all page y on Web server, design conditions probable value p (y|x 1, x 2..., x k):
p ( y | x 1 , x 2 , . . . , x k ) = Σ j = 1 k α j · λ j ( x k - j + 1 , y )
Front T page conduct selecting at last to have most probable value looked ahead or recommended user's the page.
CN 201110200145 2011-07-18 2011-07-18 Web page access forecasting method based on k-order hybrid Markov model Expired - Fee Related CN102262661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110200145 CN102262661B (en) 2011-07-18 2011-07-18 Web page access forecasting method based on k-order hybrid Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110200145 CN102262661B (en) 2011-07-18 2011-07-18 Web page access forecasting method based on k-order hybrid Markov model

Publications (2)

Publication Number Publication Date
CN102262661A CN102262661A (en) 2011-11-30
CN102262661B true CN102262661B (en) 2013-06-12

Family

ID=45009290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110200145 Expired - Fee Related CN102262661B (en) 2011-07-18 2011-07-18 Web page access forecasting method based on k-order hybrid Markov model

Country Status (1)

Country Link
CN (1) CN102262661B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103679496B (en) * 2012-09-19 2021-10-08 盛趣信息技术(上海)有限公司 State recommendation method and system
CN104182801B (en) * 2013-05-22 2017-06-23 阿里巴巴集团控股有限公司 A kind of method and apparatus for predicting website visiting amount
CN104516897B (en) * 2013-09-29 2018-03-02 国际商业机器公司 A kind of method and apparatus being ranked up for application
CN103646008B (en) * 2013-12-13 2016-06-08 东南大学 A kind of web service composition method
CN104778036B (en) * 2015-01-16 2017-12-29 中国船舶重工集团公司第七0九研究所 One kind browses candidate's interfacial process and system for generating user
CN105930400B (en) * 2016-04-15 2019-10-11 南京大学 A kind of session searching method based on markov decision process model
CN106650800B (en) * 2016-12-08 2020-06-30 南京航空航天大学 Markov equivalence class model distributed learning method based on Storm
CN108345481B (en) * 2017-01-22 2023-04-18 腾讯科技(深圳)有限公司 Page display method and device, client and server
CN109218741B (en) * 2017-07-04 2021-10-22 阿里巴巴集团控股有限公司 Live broadcast control method and device
CN107729544B (en) * 2017-11-01 2021-06-22 阿里巴巴(中国)有限公司 Method and device for recommending applications
CN107895039B (en) * 2017-11-29 2020-11-24 华中科技大学 Method for constructing log database of campus network authentication system
CN108509640A (en) * 2018-04-11 2018-09-07 焦点科技股份有限公司 A kind of page layout optimization method based on sequence prediction
CN108763453B (en) * 2018-05-28 2020-06-16 浙江口碑网络技术有限公司 Page data processing method and device based on behavior prediction
CN113779450A (en) * 2020-08-31 2021-12-10 北京沃东天骏信息技术有限公司 Page access method and page access device
CN112035744B (en) * 2020-08-31 2023-07-25 建信金融科技有限责任公司 Page recommendation method, device, equipment and storage medium
CN112733060B (en) * 2021-01-13 2023-12-01 中南大学 Cache replacement method and device based on session cluster prediction and computer equipment
CN113132350A (en) * 2021-03-12 2021-07-16 嘉兴职业技术学院 Anti-crawler strategy generation method based on Markov decision process
CN116167829B (en) * 2023-04-26 2023-08-29 湖南惟客科技集团有限公司 Multidimensional and multi-granularity user behavior analysis method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471071A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Speech synthesis system based on mixed hidden Markov model

Also Published As

Publication number Publication date
CN102262661A (en) 2011-11-30

Similar Documents

Publication Publication Date Title
CN102262661B (en) Web page access forecasting method based on k-order hybrid Markov model
CN103177090B (en) A kind of topic detection method and device based on big data
White et al. Predicting short-term interests using activity-based search context
US8990208B2 (en) Information management and networking
US8615514B1 (en) Evaluating website properties by partitioning user feedback
US8255390B2 (en) Session based click features for recency ranking
WO2015192667A1 (en) Advertisement recommending method and advertisement recommending server
US20130246383A1 (en) Cursor Activity Evaluation For Search Result Enhancement
CN101261634B (en) Studying method and system based on increment Q-Learning
CN105740444A (en) User score-based project recommendation method
CN101180624A (en) Link-based spam detection
CN104008203A (en) User interest discovering method with ontology situation blended in
WO2018232331A1 (en) Systems and methods for optimizing and simulating webpage ranking and traffic
Drechsler et al. Rapid viability analysis for metapopulations in dynamic habitat networks
CN102222098A (en) Method and system for pre-fetching webpage
JP4894580B2 (en) Seasonal analysis system, seasonality analysis method, and seasonality analysis program
WO2012021653A2 (en) Search engine optimization at scale
RU2733481C2 (en) Method and system for generating feature for ranging document
CN112487283A (en) Method and device for training model, electronic equipment and readable storage medium
JP6630874B2 (en) Searching needs evaluation device, evaluation system, evaluation method, and evaluation module production method
US10146876B2 (en) Predicting real-time change in organic search ranking of a website
Forsati et al. An efficient algorithm for web recommendation systems
KR100975510B1 (en) Method and System for Updating Web Page Index
Wang et al. Evaluating similarity measures for dataset search
Dai et al. An efficient web usage mining approach using chaos optimization and particle swarm optimization algorithm based on optimal feedback model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130612

Termination date: 20190718

CF01 Termination of patent right due to non-payment of annual fee