CN1198224C - Adaptive internet catalogue web page recommending method - Google Patents

Adaptive internet catalogue web page recommending method Download PDF

Info

Publication number
CN1198224C
CN1198224C CN 03131974 CN03131974A CN1198224C CN 1198224 C CN1198224 C CN 1198224C CN 03131974 CN03131974 CN 03131974 CN 03131974 A CN03131974 A CN 03131974A CN 1198224 C CN1198224 C CN 1198224C
Authority
CN
China
Prior art keywords
webpage
many examples
bag
execution
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 03131974
Other languages
Chinese (zh)
Other versions
CN1471020A (en
Inventor
周志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN 03131974 priority Critical patent/CN1198224C/en
Publication of CN1471020A publication Critical patent/CN1471020A/en
Application granted granted Critical
Publication of CN1198224C publication Critical patent/CN1198224C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention discloses an adaptive Internet directory web page recommending method which comprises the following steps: a user submits a browse path through a client. After acquiring a web page, an Internet intermediate server firstly judges whether the web page is a directory web page or not, if true, then the directory web page is submitted to a recommending part of directory web pages, else the directory web page is submitted to a recommending part of normal pages to be processed. A recommending result is submitted to the user through the client. The present invention has the advantages that a user needs not to indicate interested link content in previous browsed directory web pages in detail, and the present invention can recommend the internet directory web pages based on the user's individual preference and use conditions so that the performance of the Internet intermediate server can be assisted to increase.

Description

A kind of adaptive the Internet directories webpage recommending method
One, technical field
The present invention relates to the Internet (Internet) network intermediate server, particularly a kind of method of carrying out Internet catalogue webpage recommending according to user's individual preference and operating position adaptively.
Two, background technology
Internet network intermediate server is at the barrier of client computer between Internet, this device is generally client computer services such as fire wall is provided, in addition, this device has also been preserved the historical viewings record of a large number of users, along with increasing of user's access times, this device can be adjusted and improve according to user's use habit and preference adaptively, thus the efficient that helps user's raising information to obtain.
When receiving the browse path that the user submits to by client computer, this device at first obtains new web page according to this path, utilize webpage recommending method the content of new web page to be analyzed then according to user's historical viewings record, whether can be interested thereby judge the user in the content that comprises in the new web page, and inform the user in the mode of friendliness, lose time on uninterested webpage to reduce it.Under normal conditions, available historical viewings record must be the feedback fully that obtains the user, and promptly the user has spelt out concrete which content interested.
Run into the catalogue webpage through regular meeting during the information of user on browsing Internet, these webpages only provide title or summary, and being received within concrete in its subordinate's webpage that is linked to.For example each big portal website (as www.sina.com.cn) has all comprised a large amount of catalogue webpages.Because the catalogue webpage comprises a large amount of links, being difficult to the requirement user effort plenty of time points out its interested concrete each link, can only ask in this catalogue webpage of user feedback "Yes" or " not being " to comprise its interested linked contents, therefore, common webpage recommending method can't be handled this kind situation, and still not having special recommend method at present at the catalogue webpage, this makes Internet network middleware server unit be difficult to provide effectively recommendation service when running into the catalogue webpage.
Three, summary of the invention
The objective of the invention is to be difficult to carry out well the problem of Internet catalogue webpage recommending, a kind of adaptive Internet catalogue webpage recommending method is provided, with the auxiliary performance that improves Internet network middleware server unit at prior art.
For realizing purpose of the present invention, a kind ofly utilize the many learn-by-examples technology in the machine learning that Internet catalogue webpage is analyzed the method for carrying out the catalogue webpage recommending thereby the invention provides, this method may further comprise the steps: (1) receives the catalogue webpage; (2) if history data set is not empty, then execution in step 3, otherwise forward step 18:(3 to) find out a link from the current directory webpage; (4) if the number of words that link comprises is no less than predetermined threshold value t, then execution in step 5, otherwise forward step 8 to; (5) obtain the content of linked web pages; (6) remove in the linked web pages content and carry out word frequency statistic behind the no sincere speech; (7) example in many examples of generation bag; (8) if do not have other links of not investigating on the catalogue webpage, then execution in step 9, otherwise find out a link of not investigating and forward step 4 to; (9) step 3 is concentrated in together to the 8 some examples that produce become the B of example bag more than; (10) calculating B and historical data are concentrated the distance between each many examples bag of preserving; (11) determined set with B is changed to sky; (12) from the concentrated B ' of example bag more than that finds out of historical data; (13) if with the nearest r of B many examples bag in comprise B ', then B ' is added the determined set of B; (14) and if comprise B in the nearest c of B ' many examples bag, then B ' is added the determined set of B; (15) if historical data is concentrated and not had other many examples of not investigating bags, then execution in step 16, otherwise find out many examples bag of not investigating, with it as B ' and forward step 13 to; (16) if in the determined set of B just many examples bag number be no less than negative many examples bag number, then recommendation results is " recommendation ", otherwise recommendation results is " not recommending "; (17) recommendation results is submitted to the user; (18) if the user provides feedback for the current directory webpage, then execution in step 19, otherwise forward step 21 to; (19) if the current directory webpage has been carried out representation of knowledge processing, then execution in step 20, otherwise execution in step 20 again after the execution in step 3 to 9; (20), then deposit many examples bag of correspondence in history data set as many examples bag just, otherwise deposit it in history data set as negative many examples bag if user feedback is to have comprised its interested linked contents in the expression current directory webpage; (21) finish.
Advantage of the present invention is not need concrete interested linked contents in the catalogue webpage that the user points out to browse in the past, just can carry out Internet catalogue webpage recommending, with the auxiliary performance that improves Internet network middleware server unit according to user's individual preference and operating position.
Below in conjunction with accompanying drawing most preferred embodiment is elaborated.
Four, description of drawings
Fig. 1 is the synoptic diagram of Internet network middleware server when handling webpage recommending.
Fig. 2 is the process flow diagram of the inventive method.
Fig. 3 is the process flow diagram of representation of knowledge processing procedure.
Fig. 4 is the process flow diagram of recommendation process process.
Five, embodiment
Internet network middleware server is the barrier between client computer and the Internet, and the user then is related by client computer and Internet network middleware server.
As shown in Figure 1, the user submits browse path to by client computer, Internet network middleware server is after obtaining webpage, judge earlier its whether catalogue webpage, then the catalogue webpage is given catalogue webpage recommending part, otherwise then give generic web page and recommend section processes, recommendation results is submitted to the user by client computer, and the user can also submit its feedback to by client computer.The present invention relates generally to catalogue webpage recommending part, i.e. step 1 among Fig. 1 among Fig. 1.
Method of the present invention as shown in Figure 2.Step 10 is initial actuatings.Step 12 receives the catalogue webpage.Step 14 judges whether current history data set is empty, if history data set is not empty, then has the condition of carrying out the catalogue webpage recommending, execution in step 16, otherwise do not possess the condition of carrying out the catalogue webpage recommending, forward step 22 to.History data set is empty when user's first pass Internet network middleware server access Internet, and along with the increase of user's access times, its scale will enlarge gradually, can carry out suitable cleaning where necessary.The step 16 pair catalogue webpage when pre-treatment carries out the representation of knowledge to be handled, and consequently this catalogue webpage is expressed as example bag more than, so that utilize many learn-by-examples technology to analyze and recommendation process in step 18.Step 16 and step 18 will be specifically introduced in conjunction with Fig. 3 and Fig. 4 respectively in the part of back.Step 20 is submitted to the user with recommendation results.No matter whether the current directory webpage is recommended by Internet network middleware server, with providing feedback to it per family, represents promptly in this catalogue webpage that "Yes" or " not being " have comprised the user's interest linked contents.The step 22 of Fig. 2 judges whether the user has provided feedback, if provided feedback, then by execution in step 24 to 26, the current directory webpage is preserved with auxiliary recommendation process in the future; Otherwise need not preserve the current directory webpage, forward step 28 to.Step 24 judges whether that the current directory webpage was carried out the representation of knowledge to be handled.If the current directory webpage carried out recommendation by Internet network middleware server, then it was carried out the representation of knowledge and handled, so direct execution in step 26; Otherwise want after the first execution in step 16 execution in step 26 again.Step 26 is according to user's feedback, many examples bag of current directory webpage correspondence is saved in historical data concentrates.If the user represents that "Yes" has comprised its interested linked contents in the current directory webpage, then deposit many examples bag of this catalogue webpage correspondence in history data set as many examples bag just; Otherwise deposit it in history data set as negative many examples bag.Step 28 is done states.
Fig. 3 describes the step 16 of Fig. 2 in detail, and its effect is that the catalogue webpage is expressed as the needed representation of knowledge form of many learn-by-examples technology, i.e. many examples bag.The step 160 of Fig. 3 is initial states.Step 161 is found out a link from the current catalogue webpage of investigating.The number of words of step 162 pair this link is analyzed, if number of words is no less than t, thinks that then this is the link that needs are analyzed, execution in step 163; Otherwise think that this link is to point to advertisement etc. and the irrelevant content of recommendation task, therefore it is not analyzed, forward step 168 to.Here t is a default round values, for example can be made as 6 Chinese characters to Chinese webpage t, and can be made as 4 English words to English webpage t.
The step 163 of Fig. 3 finds the webpage that is linked to according to the URL address of current link.Step 164 obtains the content of webpage.Step 165 will not have the word of practical significance to remove to the recommendation task some, for example " " on the Chinese web page, " ", " ", " the " on the English webpage, " a ", " is " etc.Then, remaining content of text carries out word frequency statistic on the step 166 pair webpage, promptly counts the frequency that different words occurs.Step 167 is found out the highest n of the frequency of occurrences word and is formed an item vector, for example [t 1, t 2..., t n], t wherein 1Be the highest word of the frequency of occurrences, t 2For the high word of the frequency of occurrences second, so analogize, n is a default round values for example 10 here.The item vector that obtains is exactly corresponding to an example in many examples bag of current directory webpage.Step 168 judges that whether the current directory webpage also has the link of not investigating, and finds out a link of not investigating if having with regard to execution in step 169, and forwards step 162 to; Otherwise with regard to execution in step 170 all examples are formed example bag more than, many examples bag here is the set of an item vector, and for example hypothesis has m example, and then this many examples bag is { [t 11, t 12..., t 1n], [t 21, t 22..., t 2n] ..., [t M1, m 2..., t Mn], [t wherein I1, t I2..., t In] be i example, t IjBe the high word of word frequency j in i the pairing webpage of example, step 171 is done states of Fig. 3.
Fig. 4 describes the step 18 of Fig. 2 in detail, and its effect is to utilize many learn-by-examples technology that the pairing many examples bag of catalogue webpage is analyzed, thereby the catalogue webpage is made recommendation.The step 180 of Fig. 4 is initial states.Step 181 receives the many examples bag by step 16 (the being Fig. 3) generation of Fig. 2, for sake of convenience, claims that below this many examples bag is B.Step 182 is calculated the distance of B and concentrated each the many examples bag preserved of historical data, and the distance calculation has here been used the custom-designed calculating formula of the present invention:
Dist ( X , Y ) = Min x ∈ X , y ∈ Y ( 1 - Σ i , j = 1 x i = y j n 1 n )
Wherein X represents two different many examples bags respectively with Y, and (X is similar more with Y for X, the Y) distance between expression X and the Y, and its distance is more little for Dist.Min (Z) expression is got minimum value to Z.X represents an example among the X, and y represents an example among the Y, x iI word among the expression x, y jJ word among the expression y.N represents the number of words that comprised in each example of X and Y, i.e. n in the step 167 of Fig. 4.
The step 183 of Fig. 4 is changed to sky with the determined set of B.The determined set of B is a set that several many examples bags of being concentrated by historical data are formed, and these many examples bags will determine the recommendation results to B.Step 184 is concentrated from historical data and is found out example bag more than, for sake of convenience, claims that below it is B '.Step 185 considers many examples bag that B concentrates with historical data, judge B ' whether be with r nearest many examples bag of B in one, if then execution in step 186 is with the determined set of B ' adding B; Otherwise just forward step 187 to.Here r is a default round values for example 2.Step 187 considers many examples bag that B concentrates with historical data, judges one in c many examples bag that whether B is and B ' is nearest, if then execution in step 186 is with the determined set of B ' adding B; Otherwise just forward step 188 to.Here c is a default round values for example 4.Step 188 is judged the concentrated many examples bag whether do not investigated in addition of historical data, if having, then execution in step 189, find out many examples bag of not investigated, use B ' to represent this bag, and forward step 185 to; Otherwise with regard to execution in step 190.
Many examples bag in the determined set of the step 190 of Fig. 4 couple B is added up, if just the number of many examples bag is no less than the number of negative many examples bag, then the pairing catalogue webpage of B just is considered to comprise the user's interest linked contents, and promptly execution in step 191, obtain a result " recommendation "; Otherwise the pairing catalogue webpage of B just is considered to not comprise the user's interest linked contents, and promptly execution in step 192, obtain a result " not recommending ".Step 193 is the done state of Fig. 4.

Claims (2)

1, a kind of adaptive the Internet directories webpage recommending method, its method may further comprise the steps:
The user submits browse path to by client computer, the internet network intermediate server judges whether it is the catalogue webpage earlier after obtaining webpage, then the catalogue webpage is given catalogue webpage recommending part, otherwise give generic web page and recommend section processes, recommendation results is submitted to the user by client computer; It is characterized in that: described catalogue webpage recommending partly may further comprise the steps:
(1) receives the catalogue webpage;
(2) if history data set be a sky, execution in step (3) then, otherwise forward step (18) to;
(3) find out a link from the current directory webpage;
(4) if the number of words that comprises of link is no less than predetermined threshold value t, execution in step (5) then, otherwise forward step (8) to;
(5) obtain the content of linked web pages;
(6) remove in the linked web pages content and carry out word frequency statistic behind the no sincere speech;
(7) example in many examples of generation bag;
(8) if do not have other links of not investigating on the catalogue webpage, execution in step (9) then, otherwise find out a link of not investigating and forward step (4) to;
(9) some examples that step (3) to (8) is produced concentrate in together and become the B of example bag more than;
(10) calculating B and historical data are concentrated the distance between each many examples bag of preserving;
(11) determined set with B is changed to sky;
(12) from the concentrated B ' of example bag more than that finds out of historical data;
(13) if with the nearest r of B many examples bag in comprise B ', then B ' is added the determined set of B;
(14) and if comprise B in the nearest c of B ' many examples bag, then B ' is added the determined set of B;
(15) do not have other many examples of not investigating bags if historical data is concentrated, execution in step (16) then, otherwise find out many examples bag of not investigating, with it as B ' and forward step (13) to;
(16) if in the determined set of B just many examples bag number be no less than negative many examples bag number, then recommendation results is " recommendation ", otherwise recommendation results is " not recommending ";
(17) recommendation results is submitted to the user;
(18) if the user provides feedback for the current directory webpage, execution in step (19) then, otherwise forward step (21) to;
(19) if the current directory webpage has been carried out representation of knowledge processing, execution in step (20) then, otherwise execution in step (20) again after the execution in step (3) to (9);
(20), then deposit many examples bag of correspondence in history data set as many examples bag just, otherwise deposit it in history data set as negative many examples bag if user feedback is to have comprised its interested linked contents in the expression current directory webpage;
(21) finish;
Wherein t, r, c are default round values.
2, a kind of adaptive the Internet directories webpage recommending method according to claim 1 is characterized in that:
Concentrate the distance between each many examples bag of preservation by B in the following formula calculation procedure (10) and historical data:
Dist ( X , Y ) = Mim x ∈ X , y ∈ Y ( 1 - Σ i , j = 1 x i = Y j n 1 n )
Wherein X represents two different many examples bags respectively with Y, Dist (X, Y) distance between expression X and the Y; X represents an example among the X, and y represents an example among the Y, x iI word among the expression x, y jThe number of words that comprised in each example of X and Y represented in j the word of expression among the y, n.
CN 03131974 2003-06-24 2003-06-24 Adaptive internet catalogue web page recommending method Expired - Fee Related CN1198224C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 03131974 CN1198224C (en) 2003-06-24 2003-06-24 Adaptive internet catalogue web page recommending method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 03131974 CN1198224C (en) 2003-06-24 2003-06-24 Adaptive internet catalogue web page recommending method

Publications (2)

Publication Number Publication Date
CN1471020A CN1471020A (en) 2004-01-28
CN1198224C true CN1198224C (en) 2005-04-20

Family

ID=34153929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 03131974 Expired - Fee Related CN1198224C (en) 2003-06-24 2003-06-24 Adaptive internet catalogue web page recommending method

Country Status (1)

Country Link
CN (1) CN1198224C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7698626B2 (en) 2004-06-30 2010-04-13 Google Inc. Enhanced document browsing with automatically generated links to relevant information
CN101005490B (en) * 2006-01-20 2010-06-02 中国科学院计算技术研究所 Method for providing personalized service facing final user
CN101071424B (en) * 2006-06-23 2010-08-25 腾讯科技(深圳)有限公司 Personalized information push system and method
CN101661483B (en) * 2008-08-29 2012-10-03 株式会社理光 Recommendation system and recommendation method
US20120203723A1 (en) * 2011-02-04 2012-08-09 Telefonaktiebolaget Lm Ericsson (Publ) Server System and Method for Network-Based Service Recommendation Enhancement
CN103544313B (en) * 2013-11-04 2017-09-08 北京国双科技有限公司 Data processing method and device for webpage recommending

Also Published As

Publication number Publication date
CN1471020A (en) 2004-01-28

Similar Documents

Publication Publication Date Title
US8060520B2 (en) Optimization of targeted advertisements based on user profile information
US7584171B2 (en) Collaborative-filtering content model for recommending items
US7574422B2 (en) Collaborative-filtering contextual model optimized for an objective function for recommending items
US20080201219A1 (en) Query classification and selection of associated advertising information
US20090077081A1 (en) Attribute-Based Item Similarity Using Collaborative Filtering Techniques
US20070112792A1 (en) Personalized search and headlines
US20110184928A1 (en) Detection of behavior-based associations between search strings and items
US20080306937A1 (en) Using search trails to provide enhanced search interaction
US20100082582A1 (en) Combining log-based rankers and document-based rankers for searching
US20080120287A1 (en) Collaborative-filtering contextual model based on explicit and implicit ratings for recommending items
US20100100607A1 (en) Adjusting Content To User Profiles
AU2003230990B2 (en) System and method for navigating search results
US20020124075A1 (en) Probability associative matrix algorithm
US20080114738A1 (en) System for improving document interlinking via linguistic analysis and searching
JP2008524695A (en) Search engine for computer networks
JP2013500541A (en) Assign keywords to web pages
CN102722498A (en) Search engine and implementation method thereof
CN103365839A (en) Recommendation search method and device for search engines
US20100125781A1 (en) Page generation by keyword
CN103870461A (en) Topic recommendation method, device and server
CN102722499A (en) Search engine and implementation method thereof
CN1198224C (en) Adaptive internet catalogue web page recommending method
US11334592B2 (en) Self-orchestrated system for extraction, analysis, and presentation of entity data
US7814109B2 (en) Automatic categorization of network events
US20090077093A1 (en) Feature Discretization and Cardinality Reduction Using Collaborative Filtering Techniques

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050420