CN102222098A - Method and system for pre-fetching webpage - Google Patents

Method and system for pre-fetching webpage Download PDF

Info

Publication number
CN102222098A
CN102222098A CN2011101654593A CN201110165459A CN102222098A CN 102222098 A CN102222098 A CN 102222098A CN 2011101654593 A CN2011101654593 A CN 2011101654593A CN 201110165459 A CN201110165459 A CN 201110165459A CN 102222098 A CN102222098 A CN 102222098A
Authority
CN
China
Prior art keywords
user
webpage
cluster
access pattern
ant group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011101654593A
Other languages
Chinese (zh)
Inventor
彭海朋
万淼
沈红斌
李丽香
王枞
杨义先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN2011101654593A priority Critical patent/CN102222098A/en
Publication of CN102222098A publication Critical patent/CN102222098A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and a system for pre-fetching a webpage based on chaotic ant colony optimization clustering in order to achieve the purpose of improving quality of service of a website. The method comprises the following steps of: pre-processing a weblog to acquire a trusted weblog; establishing a user access mode matrix for expressing whether a user accesses a feature webpage according to an access interest of the user and the trusted weblog; performing optimization clustering on the user access mode matrix by adopting a clustering algorithm based on the chaotic ant colony optimization, marking a category to which the user belongs according to a preset category number label, and establishing a user public archive; and extracting a webpage of which the pre-fetching probability exceeds a preset pre-fetching probability threshold and storing the webpage in a cache. Compared with the conventional pre-fetching technology, the invention has the advantage that the accuracy is greatly improved.

Description

A kind of webpage forecasting method and system
Technical field
The present invention relates to the webpage prefetching technique, relate in particular to a kind of webpage forecasting method and system based on chaos ant group optimization cluster.
Background technology
Along with developing rapidly with extensively universal of Internet, the contradiction of the quick growth of information and people's notice finiteness is in continuous increase, and how network user's growing interest can find most suitable information in the shortest time.The operator of each website also more and more wishes to understand the active situation of visitor in the website, excavates client activities information from the data ocean of huge customer group, allows the user can obtain personalized service.
Improve the influence power of website,, just should improve website structure according to user's browse mode with raising Web service quality, and finally realize the personalized recommendation of website for the user provides better service.
Summary of the invention
Technical matters to be solved by this invention is to be to provide a kind of webpage prefetching technique, reaches the purpose that improves the website service quality.
In order to solve the problems of the technologies described above, the present invention at first provides a kind of webpage forecasting method, comprises the steps:
Network log is carried out pre-service, obtain the trustable network daily record;
According to user's visit interest and this trustable network daily record, set up and express the user access pattern the matrix whether user has visited the feature webpage;
Use is optimized cluster based on the clustering algorithm of chaos ant group optimization to this user access pattern matrix, and according to the classification under the default class label mark user, sets up user's public records;
According to this user's public records, the page that the probability of looking ahead is surpassed the default probability threshold value of looking ahead extracts and is saved in the buffer memory.
Wherein, this network log is carried out pretreated step, comprising:
This network log is carried out data cleansing, User Recognition and session jd.
Wherein, the step to this network log carries out this data cleansing comprises:
Picture in the filtering web page filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
Wherein, use this clustering algorithm that this user access pattern matrix is carried out this optimization cluster,, set up the step of this user's public records, comprising according to the classification under this class label mark user based on the chaos ant group optimization:
Use this clustering algorithm that this user access pattern matrix is optimized cluster, obtain the position of cluster centre based on the chaos ant group optimization;
According to user and each distances of clustering centers, adopt the affiliated classification of this class label mark user, set up this user's public records according to the classification under the user.
The present invention also provides a kind of webpage pre-fetching system based on chaos ant group optimization cluster, comprising:
Pretreatment module is used for network log is carried out pre-service, obtains the trustable network daily record;
First sets up module, is used for visit interest and this trustable network daily record according to the user, sets up and expresses the user access pattern the matrix whether user has visited the feature webpage;
Second sets up module, is used to use the clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, and according to the classification under the default class label mark user, sets up user's public records;
The preextraction module is used for according to this user's public records, and the page that the probability of looking ahead is surpassed the default probability threshold value of looking ahead extracts and is saved in the buffer memory.
Wherein, this pretreatment module is used for this network log is carried out data cleansing, User Recognition and session jd, obtains this trustable network daily record.
Wherein, this pretreatment module is used for the picture of filtering web page, filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
Wherein, this second is set up module and comprises:
Cluster cell is used to use this clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, obtains the position of cluster centre;
Set up the unit, be used for, adopt the affiliated classification of this class label mark user, set up this user's public records according to the classification under the user according to user and each distances of clustering centers.
Compared with prior art, the present invention has the following advantages:
At network log magnanimity, higher-dimension, the various characteristics of data scale, the webpage prefetching technique based on chaos ant group optimization cluster that the present invention proposes has good in convergence effect, is applicable to that the class that comprises has the data set of a plurality of sizes and density, is applicable to the advantage of high dimensional data.Group's webpage that the present invention proposes scheme of looking ahead is compared with existing prefetching technique, and accuracy rate has increased significantly.
Technical scheme of the present invention can be used for numerous general or special purpose computingasystem environment or configuration.For example: personal computer, server computer, multicomputer system, network PC, mainframe computer, comprise distributed computing environment of above any system or equipment or the like.
Description of drawings
Fig. 1 is the schematic flow sheet of the embodiment of the invention based on the webpage forecasting method of chaos ant group optimization cluster;
Fig. 2 is the composition synoptic diagram of the embodiment of the invention based on the webpage pre-fetching system of chaos ant group optimization cluster.
Embodiment
Describe embodiments of the present invention in detail below with reference to drawings and Examples.
The present invention has designed a kind of webpage forecasting method and system based on the chaos ant group optimization, reduces the response time to reach, and improves the purpose of website service quality.
The Web daily record data has the characteristics of himself, and, renewal speed big as quantity is fast, complex structure etc.It is a lot of to the research that Web user carries out cluster to use traditional clustering method, but does not have special high-efficiency method, and the result of gained also is difficult to the basis as user-customized recommended.Chaos ant group (CAS) optimized Algorithm is a kind of optimisation technique based on colony, has simple, the fast convergence rate of algorithm, and the characteristics that required priori is few in optimizing process, need not the gradient information of object, have stronger versatility.Clustering algorithm (CAS-C) based on the chaos ant group optimization has good experiment effect on extensive high dimensional data, its cluster result is stable, insensitive to the center initial value, be used in the data set that classification varies in size, and can find the characteristics of globally optimal solution to meet the particular demands of Web user clustering.
The iterative equation of CAS-C algorithm model can be described below:
Formula (1)
Wherein,
(1) t represents the step number of current iteration, and t-1 represents the iteration step number of previous step;
(2) r iBe the tissue factor of i ant, y i(t) being i ant when t step iteration organizes variable, y i(0)=0.999;
(3) z Id(t) d of i ant of expression ties up the current state of variable, d=1 wherein, and 2 ..., D, D are the dimensions of search volume;
(4) z Kid(t) under the expression current state, search for i the position of ant on the d dimension space at the center of k class;
(5) zbest Kgd(t-1) optimum position of being found during i the ant at the center of k class of expression search and its neighbours ant t-1 in front go on foot on the d dimension space is the overall optimum position of all ants;
(6)
Figure BSA00000520644500041
And V Id(0<V Id<1) determined i the hunting zone of ant on the d dimension;
(7) a is an enough big positive constant, and can value be a=2000;
(8) b is a constant, 0≤b≤2/3;
(9) exp represents e constant, e=2.71828.
The user access pattern matrix is as the input matrix of CAS-C algorithm.Under the prerequisite of given clusters number K, the execution in step of CAS-C algorithm is as follows:
1. initialization.Before the CAS-C algorithm begins iteration, need the parameter of its algorithm be set in advance, and compose and give their certain initial values.Make t=1, and in the search volume, generate the position of n ant at random for each cluster centre.
2. iteration begins, and makes t=t+1, and each ant individuality carries out the position according to iterative equation formula (1) and moves, and calculates the individual neighbours with it of each ant and goes on foot the optimum position that searches at preceding t, with the cost of this calculating target function.
3. the objective function cost value in the cost of the objective function of more current step and the previous step iteration.If the target function value before the currency ratio is little and do not reach the greatest iteration step number, then advance, upgrade the position of current ant, and upgrade the cost value of objective function.Choose Euclidean distance in the calculating and measure the distance of each ant in data space.
4. carry out step number when algorithm and reach default greatest iteration step number Istep, algorithm stops, and forwards for the 5th step to, otherwise returns for the 2nd step.
5. labeled clusters center.After iteration stops, the some spots of algorithm convergence in the space, promptly all ants all can move to the several fixed positions in the data space, and these several points are exactly the cluster centre that clustering algorithm finally obtains.
6. dividing data obtains cluster result.According to the cluster centre that obtains,, each data markers of data centralization in corresponding class, is obtained final cluster result according to the principle of minimum distance.
Embodiment one, a kind of webpage forecasting method based on chaos ant group optimization cluster
As shown in Figure 1, present embodiment mainly comprises the steps:
Step S110 carries out pre-service to network log, obtains the trustable network daily record; This pre-service mainly comprises data cleansing, User Recognition and session jd; Data cleansing wherein comprises the picture in the filtering web page, filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
In the present embodiment, the clicking rate threshold value default for webpage is 2, and clicking rate generally is reflected as user's transient state action less than the webpage of this clicking rate threshold value, and attention rate that can not representative of consumer and browse interest.
Step S120 according to user's visit interest and this trustable network daily record, sets up and expresses the user access pattern the matrix whether user has visited the feature webpage.
Step S130 uses the clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, and according to the classification under the default class label mark user, sets up user's public records;
Comprising: use the CAS-C algorithm that the user access pattern matrix is optimized cluster, obtain the position of cluster centre; According to user and each distances of clustering centers, adopt the affiliated classification of default class label mark user, set up user's public records according to the classification under the user.
Which classification the user is included into through after the cluster, just can be endowed such other label.For example: 100 users just have 6 corresponding class labels through being divided into 6 classifications after the clustering algorithm cluster, and each user has its corresponding class label.
Step S140, according to this user's public records and the default probability threshold value of looking ahead, the page that the probability of will looking ahead in advance surpasses the probability threshold value of looking ahead extracts and is saved in the buffer memory of server, as the buffer memory page in the following user capture process.When the user is follow-up when conducting interviews, can reduce user's access time, improve the response speed of system, improve service quality.
For each classification of user, make P={p 1, p 2..., p mBeing the collections of web pages that server end obtains, the webpage prefetch rules is defined as follows:
{ p 1 , p 2 , . . . , p x } → c { q 1 , q 2 , . . . q j }
Wherein, P 1={ p 1, p 2..., p xThe collections of web pages of having visited for the user, P 2={ q 1, q 2... q jBe the collections of web pages of looking ahead, then
Figure BSA00000520644500052
C is the probability threshold value of looking ahead, and is expressed as and has visited P 1Customer group in visited P 2User's ratio.
Data cleansing among the step S110 is disposed inconsistent, irrelevant data exactly from Web daily record data source, the Web daily record is converted into the reliable precise information that is fit to data mining, i.e. trustable network daily record.
At first from a plurality of servers, read the relevant Web daily record data of merging, analyze then and they are deposited in the corresponding data field.The attribute such as byte number, error code, user agent that comprises the URL page that IP address, user ID, user ask to visit, requesting method, access time, host-host protocol, transmission in the Web daily record data.User's once request may allow browser automatically download a plurality of adjuncts, and as some pictures etc., the All Files of download constitutes a page view, constitutes the situation of once asking corresponding a plurality of journal entries.
Data cleansing can reduce the Web log record according to analyzing, and mainly comprises the cleaning of following three aspects.
(1) URL extension name: in the general information website, just content page is relevant with user's request, (suffix is called gif to the page request of some picture categories, jpg etc.) and the script class file (suffix is called js, cgi, the file of css) can be considered to ask the file that has nothing to do, it should be deleted with the user.Because generally, the user can clearly not specify and go for whole pictures and the script file of asking on certain webpage, picture in the daily record and script file are to carry out the pictorial information that carries in the webpage of content for script of web page frame configuration mostly, when user's browsing pages word content, download automatically as ancillary documents, therefore, these pictures and script file can not actual response go out user's request behavior, will be removed in the data cleansing process.
(2) action: the GET action is the action of user requests webpage, and waiting as POST (POST is generally the action of user's submission form) action of other then can filter out, and keeps the action of user requests webpage.
(3) status code: the result of status code indication user request, with the 2 expression request successes that start, as 200 expression Transaction Success, 206 oneself GET through having finished certain customers of expression servers ask; Expression requests with 3 beginnings are successfully turned to, and find the page of request as 302 expressions, 303 expression suggestion other URL of client-access or adopt other modes, and 305 expression requested resource must obtain from the address of server appointment; Expression link with 4 beginnings makes mistakes, as 400 expression false request (as grammar mistake), and 401 expression request authorization failures; Expressions with 5 beginnings produce server errors, produce internal errors as 500 expression servers, and 501 expression servers are not supported the function of asking.When carrying out data cleansing, should filter out information with 4 and 5 beginnings; In a word, filter request mistake and produce the information of server error, and obtain or keep the information of ask successfully and asking quilt successfully to be turned to.
User Recognition among the step S110.If carry out cluster analysis to the excavation of user access pattern or to the user, it is most important that the User Recognition problem then seems, because colony is made up of individuality, having only has more clearly understanding to individuality, can discern the feature of colony.Because local cache, the existence of acting server and fire wall makes User Recognition become very complicated.The method of User Recognition mainly contains IP address and agency (agent) at present, embeds session identification (sessionID), registration, and Cookie, agent software is revised several methods such as browser.Through after the User Recognition, select n user.
Session jd among the step S110.Session is meant the page sequence that same user asks continuously in a navigation process, it has represented the once effectively visit of user to server.Session jd (Session Identification) is after User Recognition, the access sequence of each user in a period of time is decomposed, thereby obtain corresponding session.Obviously the page of different user request belongs to different sessions.Session jd method commonly used is an overtime method, promptly sets timeout threshold.The time threshold of system default is 30 minutes.
The application of clustering algorithm need be carried out formalization representation to the data in the Web daily record for convenience, makes it become the understandable input form of clustering algorithm.
Above-mentioned steps S120 specifically can be divided into the feature webpage and extract and set up two processes of user access pattern matrix.
Choose in this process at the feature webpage, the page that filters out the page of unique user request and only occur in a session from the trustable network daily record is formed an interest page set thereby obtain numerous user's interest pages.For excavating common user's interest, the user journal after the pre-service needs further to filter.The page that has only a user to ask can't be represented the user's of colony interest, will be filtered; The page that occurs in the middle of a session simultaneously only can only reflect that user's transient state is paid close attention to, and lasting interest that can not representative of consumer also needs to be filtered.Through after the above processing, obtain an interest collections of web pages L={URL who forms by m user's interest web page address 1, URL 2..., URL m, the webpage in the set of this interest page is just as the feature webpage of user clustering.
Set up this process of user access pattern matrix, on the basis of interest collections of web pages L, for each user who chooses sets up the browse mode vector.For .j user (j=1,2 ..., n, n are total number of users), create a browse mode vector A j={ R 1, R 2..., R m, R wherein l(l=1,2 ..., m, m are the number of feature webpage) and be a two-valued variable, represent whether this user visited feature webpage URL lIf this user has asked URL l, R lValue be 1; Otherwise, R lValue is 0.A as can be seen jRepresent user j whether to visit webpage among the interest page set L, can reflect this user's the behavior of browsing, be referred to as the browse mode vector of unique user.Each user's browse mode vector is integrated the user access pattern matrix A that to obtain a size be n * m.Each row of this user access pattern matrix is represented a user, and each row is represented each feature webpage, and the value of each element of user access pattern matrix is 1 or 0, represents whether certain user has clicked this feature webpage.This user access pattern matrix will be as the input of user clustering algorithm.
Embodiment two, based on the webpage pre-fetching system of chaos ant group optimization cluster
In conjunction with embodiment illustrated in fig. 1, present embodiment as shown in Figure 2 comprises that mainly pretreatment module 210, first sets up module 220, second and set up module 230 and preextraction module 240, wherein:
Pretreatment module 210 is used for network log is carried out pre-service, obtains the trustable network daily record;
First sets up module 220, links to each other with this pretreatment module 210, is used for visit interest and this trustable network daily record according to the user, sets up and expresses the user access pattern the matrix whether user has visited the feature webpage;
Second sets up module 230, first set up module 220 and link to each other with this, be used to use clustering algorithm that this user access pattern matrix is optimized cluster, and, set up user's public records according to the classification under the default classification number indicia user based on the chaos ant group optimization;
Preextraction module 240 is set up module 230 and is linked to each other with second, is used for according to this user's public records, and the page that the probability of looking ahead is surpassed the default probability threshold value of looking ahead extracts and is saved in the buffer memory.
Wherein, this pretreatment module 210 is used for this network log is carried out data cleansing, User Recognition and session jd, obtains this trustable network daily record.
Wherein, this pretreatment module 210 is used for the picture of filtering web page, filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
Wherein, this second is set up module 230 and comprises:
Cluster cell is used to use this clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, obtains the position of cluster centre;
Set up the unit, be used for, adopt the affiliated classification of this class label mark user, set up this user's public records according to the classification under the user according to user and each distances of clustering centers.
Though the disclosed embodiment of the present invention as above, the embodiment that described content just adopts for the ease of understanding the present invention is not in order to limit the present invention.Technician in any the technical field of the invention; under the prerequisite that does not break away from the disclosed spirit and scope of the present invention; can do any modification and variation what implement in form and on the details; but scope of patent protection of the present invention still must be as the criterion with the scope that appending claims was defined.

Claims (8)

1. the webpage forecasting method based on chaos ant group optimization cluster is characterized in that, comprises the steps:
Network log is carried out pre-service, obtain the trustable network daily record;
According to user's visit interest and this trustable network daily record, set up and express the user access pattern the matrix whether user has visited the feature webpage;
Use is optimized cluster based on the clustering algorithm of chaos ant group optimization to this user access pattern matrix, and according to default classification number label, the classification under the mark user is set up user's public records;
According to this user's public records, the page that the probability of looking ahead is surpassed the default probability threshold value of looking ahead extracts and is saved in the buffer memory.
2. method according to claim 1 is characterized in that, this network log is carried out this pretreated step, comprising:
This network log is carried out data cleansing, User Recognition and session jd.
3. method according to claim 2 is characterized in that, the step to this network log carries out this data cleansing comprises:
Picture in the filtering web page filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
4. method according to claim 1, it is characterized in that, use this clustering algorithm that this user access pattern matrix is carried out this optimization cluster, according to the classification under this class label mark user based on the chaos ant group optimization, set up the step of this user's public records, comprising:
Use this clustering algorithm that this user access pattern matrix is optimized cluster, obtain the position of cluster centre based on the chaos ant group optimization;
According to user and each distances of clustering centers, adopt the affiliated classification of this class label mark user, set up this user's public records according to the classification under the user.
5. the webpage pre-fetching system based on chaos ant group optimization cluster is characterized in that, comprising:
Pretreatment module is used for network log is carried out pre-service, obtains the trustable network daily record;
First sets up module, is used for visit interest and this trustable network daily record according to the user, sets up and expresses the user access pattern the matrix whether user has visited the feature webpage;
Second sets up module, is used to use the clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, and according to the classification under the default classification number indicia user, sets up user's public records;
The preextraction module is used for according to this user's public records, and the page that the probability of looking ahead is surpassed the default probability threshold value of looking ahead extracts and is saved in the buffer memory.
6. system according to claim 5 is characterized in that:
This pretreatment module is used for this network log is carried out data cleansing, User Recognition and session jd, obtains this trustable network daily record.
7. system according to claim 6 is characterized in that:
This pretreatment module is used for the picture of filtering web page, filters the webpage that dynamic web page and clicking rate are lower than default click threshold.
8. system according to claim 1 is characterized in that, this second is set up module and comprise:
Cluster cell is used to use this clustering algorithm based on the chaos ant group optimization that this user access pattern matrix is optimized cluster, obtains the position of cluster centre;
Set up the unit, be used for, adopt the affiliated classification of this class label mark user, set up this user's public records according to the classification under the user according to user and each distances of clustering centers.
CN2011101654593A 2011-06-20 2011-06-20 Method and system for pre-fetching webpage Pending CN102222098A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101654593A CN102222098A (en) 2011-06-20 2011-06-20 Method and system for pre-fetching webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101654593A CN102222098A (en) 2011-06-20 2011-06-20 Method and system for pre-fetching webpage

Publications (1)

Publication Number Publication Date
CN102222098A true CN102222098A (en) 2011-10-19

Family

ID=44778650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101654593A Pending CN102222098A (en) 2011-06-20 2011-06-20 Method and system for pre-fetching webpage

Country Status (1)

Country Link
CN (1) CN102222098A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446222A (en) * 2011-12-22 2012-05-09 华为技术有限公司 Method, device and system of webpage content preloading
CN103020214A (en) * 2012-12-07 2013-04-03 北京奇虎科技有限公司 Method and equipment for processing access website history record information
WO2013071779A1 (en) * 2011-11-15 2013-05-23 Tencent Technology (Shenzhen) Company Limited Method and device for accessing web pages
CN103577439A (en) * 2012-07-27 2014-02-12 北京搜狗信息服务有限公司 Webpage pre-reading method and webpage pre-reading system
CN104077296A (en) * 2013-03-27 2014-10-01 联想(北京)有限公司 Information processing method and server
CN104221046A (en) * 2011-12-08 2014-12-17 谷歌公司 Method and apparatus for pre-fetching place page data for subsequent display on a mobile computing device
CN105117213A (en) * 2015-07-30 2015-12-02 青岛海尔智能家电科技有限公司 Preprocessing method and apparatus based on release-subscription mode
CN105759715A (en) * 2016-02-23 2016-07-13 柳州职业技术学院 Intelligent self-tuning injection molding machine control method
CN103744959B (en) * 2014-01-06 2017-01-25 同济大学 Webpage class feature vector extracting method based on ant colony algorithm
CN107851071A (en) * 2015-08-11 2018-03-27 三菱电机株式会社 Web browsing apparatus and web viewing programs
CN111104600A (en) * 2019-12-23 2020-05-05 杭州安恒信息技术股份有限公司 WEB site webpage recommendation method, device, equipment and medium
CN112131199A (en) * 2020-09-25 2020-12-25 杭州安恒信息技术股份有限公司 Log processing method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320375A (en) * 2008-07-04 2008-12-10 浙江大学 Digital book search method based on user click action
CN101430708A (en) * 2008-11-21 2009-05-13 哈尔滨工业大学深圳研究生院 Blog hierarchy classification tree construction method based on label clustering
CN101944358A (en) * 2010-08-27 2011-01-12 太原理工大学 Ant colony algorithm-based codebook classification method and codebook classification device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320375A (en) * 2008-07-04 2008-12-10 浙江大学 Digital book search method based on user click action
CN101430708A (en) * 2008-11-21 2009-05-13 哈尔滨工业大学深圳研究生院 Blog hierarchy classification tree construction method based on label clustering
CN101944358A (en) * 2010-08-27 2011-01-12 太原理工大学 Ant colony algorithm-based codebook classification method and codebook classification device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MIAO WAN等: "CAS based clustering algorithm forWeb users", 《NONLINEAR DYNAMICS》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013071779A1 (en) * 2011-11-15 2013-05-23 Tencent Technology (Shenzhen) Company Limited Method and device for accessing web pages
CN104221046A (en) * 2011-12-08 2014-12-17 谷歌公司 Method and apparatus for pre-fetching place page data for subsequent display on a mobile computing device
CN102446222A (en) * 2011-12-22 2012-05-09 华为技术有限公司 Method, device and system of webpage content preloading
CN102446222B (en) * 2011-12-22 2014-12-10 华为技术有限公司 Method, device and system of webpage content preloading
CN103577439B (en) * 2012-07-27 2017-02-08 北京搜狗信息服务有限公司 Webpage pre-reading method and webpage pre-reading system
CN103577439A (en) * 2012-07-27 2014-02-12 北京搜狗信息服务有限公司 Webpage pre-reading method and webpage pre-reading system
CN103020214A (en) * 2012-12-07 2013-04-03 北京奇虎科技有限公司 Method and equipment for processing access website history record information
CN104077296B (en) * 2013-03-27 2017-12-29 联想(北京)有限公司 The method and server of processing information
US9614886B2 (en) 2013-03-27 2017-04-04 Lenovo (Beijing) Co., Ltd. Method for processing information and server
CN104077296A (en) * 2013-03-27 2014-10-01 联想(北京)有限公司 Information processing method and server
CN103744959B (en) * 2014-01-06 2017-01-25 同济大学 Webpage class feature vector extracting method based on ant colony algorithm
CN105117213A (en) * 2015-07-30 2015-12-02 青岛海尔智能家电科技有限公司 Preprocessing method and apparatus based on release-subscription mode
CN105117213B (en) * 2015-07-30 2021-10-19 青岛海尔智能家电科技有限公司 Pre-processing method and device based on publish-subscribe mode
CN107851071A (en) * 2015-08-11 2018-03-27 三菱电机株式会社 Web browsing apparatus and web viewing programs
CN105759715A (en) * 2016-02-23 2016-07-13 柳州职业技术学院 Intelligent self-tuning injection molding machine control method
CN111104600A (en) * 2019-12-23 2020-05-05 杭州安恒信息技术股份有限公司 WEB site webpage recommendation method, device, equipment and medium
CN111104600B (en) * 2019-12-23 2023-04-07 杭州安恒信息技术股份有限公司 WEB site webpage recommendation method, device, equipment and medium
CN112131199A (en) * 2020-09-25 2020-12-25 杭州安恒信息技术股份有限公司 Log processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN102222098A (en) Method and system for pre-fetching webpage
CN102158365A (en) User clustering method and system in weblog mining
CN102254004A (en) Method and system for modeling Web in weblog excavation
CN106446228B (en) Method and device for collecting and analyzing WEB page data
Das et al. Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method
US10572565B2 (en) User behavior models based on source domain
Senkul et al. Improving pattern quality in web usage mining by using semantic information
CN102073726B (en) Structured data import method and device for search engine system
CN107862553A (en) Advertisement real-time recommendation method, device, terminal device and storage medium
US20110246462A1 (en) Method and System for Prompting Changes of Electronic Document Content
CN111259220B (en) Data acquisition method and system based on big data
Chitraa et al. An enhanced clustering technique for web usage mining
Suguna et al. User interest level based preprocessing algorithms using web usage mining
Chakraborty et al. Clustering of web sessions by FOGSAA
CN116226494B (en) Crawler system and method for information search
CN111127057B (en) Multi-dimensional user portrait recovery method
CN113961811B (en) Event map-based conversation recommendation method, device, equipment and medium
Khonsha et al. New hybrid web personalization framework
Maratea et al. An heuristic approach to page recommendation in web usage mining
Shen et al. A Catalogue Service for Internet GIS ervices Supporting Active Service Evaluation and Real‐Time Quality Monitoring
Maheswari et al. Algorithm for Tracing Visitors' On-Line Behaviors for Effective Web Usage Mining
Adhiya et al. AN EFFICIENT AND NOVEL APPROACH FOR WEB SEARCH PERSONALIZATION USING WEB USAGE MINING.
Anitha An efficient agglomerative clustering algorithm for web navigation pattern identification
Jayaprakash et al. A Comprehensive Survey on Data Preprocessing Methods in Web Usage Minning
Zubi et al. Applying web mining application for user behavior understanding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20111019