Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with the accompanying drawing embodiment that develops simultaneously, technical scheme of the present invention is elaborated.
Participate in Fig. 1, Fig. 1 is the process flow diagram of embodiment of the invention webpage recommending method, may further comprise the steps:
Step 101, obtain the click inquiry log, described click inquiry log comprises user ID, keyword and webpage ID.
Click inquiry log, refer to when the user utilizes search engine inquiry information, the relative recording that user's search behavior is carried out, can comprise the information such as user ID (ID), keyword and banner (ID), Search Results of the every click of user can record one and click inquiry log, for example, user search " way is seen ", search engine returns many Search Results; If it is 1234 webpage that user A has clicked webpage ID, then can record one click inquiry log as: user A, way see, 1234.In actual applications, when the user uses search engine inquiry information, provide the service provider of search service generally all can carry out log recording to user's search behavior.Here, described keyword also is the keyword that the user inquires about in search engine, and described webpage ID also is the ID of the webpage clicked in Search Results corresponding to this keyword of user, and each webpage has unique webpage ID.
Step 102, gather the key word information of each user ID, set up the interest model of this user ID; The webpage ID that gathers all user ID obtains the key word information in webpage corresponding to each webpage ID, sets up the interest model of this webpage ID; Determine the degree of association of user ID and webpage ID according to the interest model of the interest model of user ID and webpage ID.
In fact, if the user is interested in Search Results corresponding to certain keyword, can illustrate to a certain extent that then this user is also interested in this keyword, therefore, can be with the user to the number of clicks of the Search Results under each keyword as weighing the index of user to the interest level of this keyword.If the user is interested in the Search Results under certain class keywords, can illustrate to a certain extent that then this user is also interested in this class keywords, therefore, can be with the user to the number of clicks of Search Results corresponding to every class keywords as weighing the index of user ID to the interest level of this class keywords.Therefore, can set up interest model, also can set up interest model to the interest level of every class keywords according to user ID the interest level of each keyword according to user ID, can also synthetic user ID to the interest level of each keyword and the interest model of the interest level of every class keywords being set up user ID.
Same reason, if certain keyword repeatedly occurs in webpage corresponding to webpage ID, can illustrate to a certain extent that then the content of the webpage that this webpage ID is corresponding and this keyword may be more relevant, therefore, the occurrence number of each keyword is as weighing the index of webpage ID to the interest level of this keyword in the webpage that can webpage ID is corresponding.Same reason, if certain class keywords repeatedly occurs among the webpage ID, can illustrate to a certain extent that then the content of the webpage that this webpage ID is corresponding and this class keywords may be more relevant, therefore, the occurrence number of every class keywords is as weighing the index of webpage ID to the interest level of this class keywords in the webpage that can webpage ID is corresponding.Therefore, can set up webpage ID interest model, also can set up webpage ID interest model according to the occurrence number of every class keywords in webpage corresponding to webpage ID according to the occurrence number of each keyword in webpage corresponding to webpage ID, the occurrence number of each keyword and the occurrence number of every class keywords be set up webpage ID interest model in can also the composite web page ID corresponding webpage.
The below describes respectively the interest model of setting up user ID and the method for setting up the interest model of webpage ID:
At first, can set up interest model to the interest level of each keyword according to user ID:
In this case, the interest model of described user ID includes only the first relevant item of interest of keyword, in described the first item of interest, can comprise a plurality of the first interest subitems, wherein, each first interest subitem representative of consumer ID is to the interest of a keyword, and particular content can comprise that keyword, user ID are to the interest-degree of keyword;
The described key word information that gathers each user ID, the interest model of setting up this user ID specifically can comprise: gather all keywords that user corresponding to this user ID inquired about, the number of webpage clicking ID determines that according to the number of webpage clicking ID this user ID is to the interest-degree of this keyword when adding up this user and inquiring about each keyword.
Correspondingly, can set up interest model according to the occurrence number of each keyword in webpage corresponding to webpage ID:
In this case, the interest model of described webpage ID comprises the second item of interest that keyword is relevant, can comprise in described the second item of interest can a plurality of the second interest subitems, wherein, each second interest subitem represents webpage ID to the interest of a keyword, and particular content can comprise that keyword, webpage ID are to the interest-degree of keyword;
The described webpage ID that gathers all user ID, obtain the key word information of webpage corresponding to each webpage ID, the interest model of setting up this webpage ID comprises: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, add up the occurrence number of remaining each keyword in this webpage, determine that according to the occurrence number of this keyword this webpage ID is to the interest-degree of this keyword.
Need to prove, the described invalid word of present specification can comprise that specifically preposition, adverbial word, interjection, adjective, occurrence number (also are about to the very few or too much word of occurrence number in webpage ID less than the first preset ratio and/or greater than the word of the second preset ratio, be considered as invalid word), wherein, described the first preset ratio is less than the second preset ratio.In addition, the content of the webpage that webpage ID is corresponding specifically can be title and the summary info that comprises webpage, also can be to comprise the information such as the title of webpage and text.
When the interest model of the interest-degree of keyword being set up user ID according to user ID, and when the keyword occurrence number is set up the interest model of webpage ID among the webpage ID, the interest model of each user ID can be mapped to a N dimensional vector V
K1, wherein, every one dimension representative of consumer ID is to the interest-degree of a keyword; The interest model of each webpage ID is mapped to a N dimensional vector V
K2, wherein, every one dimension represents webpage ID to the interest-degree of a keyword; By calculating V
K1And V
K2Between distance D
KDetermine the degree of association between this user ID and this webpage ID.Here, calculate V
K1And V
K2Between the method for distance can use the method for prior art, for example, calculate both cosine distances.
Secondly, according to user ID the interest level of every class keywords is set up interest model:
In this case, the interest model of described user ID includes only the 3rd relevant item of interest of keyword type, in described the second item of interest, can comprise a plurality of the 3rd interest subitems, wherein, each the 3rd interest subitem representative of consumer ID is to the interest of a class keywords, and particular content can comprise that keyword type, user ID are to the interest-degree of keyword type;
The described key word information that gathers each user ID, the interest model of setting up this user ID specifically can comprise: gather all keywords and the affiliated type of definite each keyword that user corresponding to this user ID inquired about; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this class keywords when adding up this user and inquiring about every class keywords.
Correspondingly, can set up interest model according to the occurrence number of every class keywords among the webpage ID:
In this case, the interest model of described webpage ID comprises the 4th item of interest that keyword is relevant, can comprise a plurality of the 4th interest subitems in described the 4th item of interest, wherein, each the 4th interest subitem represents webpage ID to the interest of a class keywords, and particular content can comprise that keyword type, webpage ID are to the interest-degree of keyword type;
The described webpage ID that gathers all user ID, obtain the key word information in webpage corresponding to each webpage ID, the interest model of setting up this webpage ID comprises: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, determine the affiliated type of each keyword of residue, add up the occurrence number of every class keywords in this webpage, determine that according to the occurrence number of this class keywords this webpage ID is to the interest-degree of this class keywords.
When the interest model of the interest-degree of every class keywords being set up user ID according to user ID, and when setting up the interest model of webpage ID according to the occurrence number of every class keywords among the webpage ID, the interest model of each user ID can be mapped to a N dimensional vector V
C1, wherein, every one dimension representative of consumer ID is to the interest-degree of a class keywords; The interest model of each webpage ID is mapped to a N dimensional vector V
C2, wherein, every one dimension represents webpage ID to the interest-degree of a class keywords; By calculating V
C1And V
C2Between distance D
CDetermine the degree of association between this user ID and this webpage ID.Here, calculate V
C1And V
C2Between the method for distance can use the method for prior art, for example, calculate both cosine distances.
At last, set up interest model according to user ID to each keyword and to the interest level of every class keywords:
In this case, the interest model of described user ID comprises the first item of interest three item of interest relevant with keyword type that keyword is relevant;
The described key word information that gathers each user ID, the interest model of setting up this user ID specifically can comprise: gather all keywords and the affiliated type of definite each keyword that user corresponding to this user ID inquired about; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this keyword when adding up this user and inquiring about each keyword; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this class keywords when adding up this user and inquiring about every class keywords.
According to user ID to each keyword and that the interest level of every class keywords is set up an example of interest model is as follows:
[way is seen: 0.9 ix35:0.8 pork braised in brown sauce: 0.6 grilled fish: 0.5] [automobile: 0.8 cuisines: 0.2]
Wherein in first bracket, " way is seen ", " ix35 ", " pork braised in brown sauce " are keyword, and the numeral behind each keyword back colon is that user ID is to the interest-degree of keyword; In second bracket, " automobile ", " cuisines " are keyword type, and the numeral behind each keyword type back colon is that user ID is to the interest-degree of this class keywords.
Correspondingly, can set up interest model according to the occurrence number of each keyword and every class keywords among the webpage ID:
In this case, the interest model of described webpage ID comprises the second item of interest and the 4th relevant item of interest of keyword type that keyword is relevant;
The described webpage ID that gathers all user ID, obtain the key word information in webpage corresponding to each webpage ID, the interest model of setting up this webpage ID comprises: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, the occurrence number of statistics each keyword of residue in this webpage ID determines that according to the occurrence number of this keyword this webpage ID is to the interest-degree of this keyword; Determine the affiliated type of each keyword, add up the occurrence number of every class keywords in this webpage, determine that according to the occurrence number of this class keywords this webpage ID is to the interest-degree of this class keywords.
An example setting up interest model according to the occurrence number of each keyword and every class keywords among the webpage ID is as follows:
[way is seen: 0.9 evaluation and test: 0.8 test ride: 0.6] [automobile: 0.8]
Wherein in first bracket, way sight, evaluation and test, test ride are keyword, and the numeral behind each keyword back colon is that webpage ID is to the interest-degree of keyword; In second bracket, automobile is keyword type, and the numeral behind each keyword type back colon is that webpage ID is to the interest-degree of this class keywords.
When the interest model of the interest-degree of each keyword and every class keywords being set up user ID according to user ID, and when setting up the interest model of webpage ID according to the occurrence number of each keyword and every class keywords in webpage corresponding to webpage ID, user ID in the interest model of each user ID can be mapped to respectively N dimensional vector V to the interest-degree of keyword and user ID to the interest-degree of keyword type
K1And V
C1User ID in the interest model of each webpage ID is mapped to respectively N dimensional vector V to the interest-degree of keyword and user ID to the interest-degree of keyword type
K2And V
C2Calculate V
K1And V
K2Between distance D
K, and V
C1And V
C2Between distance D
C, by to D
KAnd D
CThe method that is weighted calculating is determined the degree of association between this user ID and this webpage ID, for example, adopts following formula to calculate: D=a * D
K+ (1-a) * D
C, wherein, D is the degree of association between this user ID and this webpage ID, and a is preset value, and a is less than 1 and greater than a real number of 0.
An example of the degree of association that calculates user ID and webpage ID is as follows:
User A->webpage A:0.9->webpage B:0.7->webpage C:0.3
In this example, user A is respectively 0.9,0.7,0.3 to the interest-degree of webpage A, webpage B, webpage C.
Step 103, when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, according to the webpage ID of the degree of association of user ID select progressively the first default number from high to low, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
In this step, when the user need to check certain Search Results, need to click this Search Results, like this, background server can receive this click Search Results order, and will click the corresponding radio web page search transcoding page or leaf that the Search Results order asks and offer the user, in this transcoding page or leaf, carry out webpage recommending simultaneously.Can be according to the user ID of in step 102, setting up and the degree of association between webpage ID, according to the user ID degree of association webpage ID being sorted, select the webpage ID of the first forward default number of ordering, the webpage that each webpage ID that then will select is corresponding is recommended in this transcoding page or leaf.By will in this transcoding page or leaf, recommending with the webpage of higher the first default number of the user ID degree of association, realization with the interested webpage recommending of user's most probable to the user, so that the user does not need again to select query string search, just can find interested webpage, thereby can make user's fast finding to interested target web.For example, when user A clicked Search Results and enters radio web page transcoding page or leaf, can be with two the webpage IDs higher with the user A degree of association: the webpage that webpage A, webpage B are corresponding be recommended in the transcoding page or leaf.
Here, webpage is recommended in the transcoding page or leaf, be actually the transcoding page or leaf is put in the chained address of this webpage, like this, the user can enter interested recommendation webpage by clicking this chained address.
In actual applications, after the user clicks Search Results and enters radio web page search transcoding page or leaf, if the user is interested in the content of this transcoding page or leaf, then the interest level of user's pair other webpage close with this transcoding page or leaf content also can be relatively high, therefore, can also be with other webpage recommending close with the content of this transcoding page or leaf to user ID.
In fact, user's interest exists period, often only interested in content in a certain respect within a period of time, for example, current related content to the automobile aspect of user is interested, the whole of search are the relevant contents of automobile, therefore, user ID within a period of time search and the behavior of clicking Search Results characterized this user ID in the interest of this section in the time.In addition, if two user ID were searched for identical keyword, can think that then there is similarity in these two user ID aspect interest, for example, if user A and user B all searched for " way is seen ", can think that then they all see automobile to the way interested.If two users also clicked identical webpage ID, can think also that then there is similarity in these two users aspect interest.If there is similarity in two users aspect interest, can think that then the webpage ID that they click also has certain relevance.
For example, user A in Preset Time search and the click behavior as shown in Table 1:
Table one
User B in Preset Time search and the click behavior as shown in Table 2:
Table two
Wherein, user A and user B all searched for " way see " and all clicked webpage ID is 1234 and 2345 webpage, and therefore, there are similarity in user A and user B aspect interest, also have relevance between the webpage ID of click.
Because the webpage ID that user A and user B click has relevance, can carry out the webpage cluster to the webpage ID of user A and user B click.Can carry out the webpage cluster by different strategies, for example: the webpage ID that user A and user B are clicked separately carries out cluster and obtains: the cluster [1,234 2,345 7,890 8901] of the webpage ID formation that the cluster [1,234 2,345 3,456 4,567 5,678 6789] that the webpage ID that family A clicked consists of and user B clicked; Also all webpage ID of user A and user B click can be carried out cluster obtains: [1,234 2,345 3,456 4,567 5,678 6,789 7,890 8901]; The webpage ID of user A and the common click of user B can also be carried out cluster obtains: [1,234 2345].
In the webpage cluster that above-mentioned three kinds of methods obtain, in the webpage cluster that last a kind of method obtains, the degree of association between each webpage ID is the highest.Therefore, can determine the degree of association between the webpage ID based on user's search and click behavior.If two user ID have been searched for identical keyword and have been clicked identical webpage ID, then has the larger degree of association between two common webpage ID that click of user, example user A described above and user B, all clicked webpage ID and be 1234 and 2345 webpage, then webpage ID has the higher degree of association between two webpages of 1234 and 2345.
Therefore, in the embodiment of the invention shown in Figure 1, can further include: gather each keyword that user corresponding to all user ID inquires about in the Preset Time, for each user of this keyword of inquiry, between any two webpage ID that this user clicks, set up incidence relation; The webpage ID that gathers all users' clicks of this keyword of search in the Preset Time, for each webpage ID, add up the occurrence number of each incidence relation corresponding to this webpage ID, according to the occurrence number of each incidence relation determine this webpage ID and and this webpage ID have the degree of association between the webpage ID of incidence relation.Like this, when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, can also be further according to the webpage ID of the degree of association select progressively second default number from high to low of this transcoding page or leaf, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
The below describes the method for determining the degree of association between webpage ID for example: suppose that user A has clicked webpage 1,2,3 in Preset Time, illustrate that webpage 1,2,3 has relevance, therefore can generate one group of incidence relation [1-2], [2-3], [1-3]; Suppose that user B has clicked webpage 1,2,4 in Preset Time, illustrate that webpage 1,2,4 has relevance, therefore can generate another group incidence relation [1-2], [2-4], [1-4].So, for webpage 1, the occurrence number of incidence relation [1-2] that can statistical web page 1 correspondence, [1-3], [1-4], wherein [1-2] occurs 2 times, each occurs [1-3], [1-4] once, so, can think that webpage 1 and the degree of association of webpage 2 are higher than the degree of association of webpage 1 and webpage 3 and webpage 1 and webpage 4.Enter the Web page 1 the time when the user clicks Search Results, can in webpage 1, webpage 2 be recommended the user.
In actual applications, also there is relevance between each webpage ID in Search Results corresponding to same keyword, and according to the difference of the content of webpage corresponding to each webpage ID, the degree of association is not identical yet, and the Search Results (form is: searching order position, title, webpage ID) that for example keyword " Highlander " is corresponding comprises following several:
1: Highlander authority evaluation and test: 123
2: Highlander VS Odyssey VS Mazda 8:234
3: " Highlander of Toyota " fresh picture quotation: 345
4: comprehensive preferential 1.5 ten thousand yuan of minimum the selling 28.88 ten thousand: 456 of Highlander
5. Highlander's relative merits _ automobile China Highlander comment: 567
Wherein, webpage ID is that the content of 123,234,567 webpage mainly lays particular emphasis on the evaluation and test to the Highlander, the degree of association between these three webpages is relatively high, mainly lays particular emphasis on Highlander's quotation and webpage ID is the content of 345,456 webpage, and the degree of association between these two webpages is higher.Therefore, can also carry out the webpage cluster to Search Results corresponding to same keyword, the degree of association that belongs between the webpage ID of same webpage cluster is higher, for example in above-mentioned example, carry out the webpage cluster and can obtain keyword " Highlander " corresponding webpage cluster [123 234 567] and [345 456].
In fact; when the user uses search engine inquiry information; usually can carry out log recording to keyword and the corresponding Search Results of user search; for example keyword and search result list corresponding to keyword are noted down in the Search Results displaying daily record, wherein can be comprised one or more webpage ID in the search result list.Like this, just can show search result list corresponding to each keyword of log acquisition by Search Results.In addition, also can when searching key word, directly obtain search result list corresponding to keyword that search engine returns.
After having obtained search result list corresponding to keyword, can be for each the webpage ID in the search result list content of corresponding webpage analyze, obtain the key word information in webpage corresponding to this webpage ID, generate this webpage ID characteristic of correspondence vector; Then just can according to all the webpage ID characteristic of correspondence vectors in search result list corresponding to this keyword, generate one or more webpage clusters corresponding to this keyword.Like this, when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, search the webpage cluster at this transcoding page or leaf place in just can be at the keyword of this user inquiry corresponding one or more webpage clusters, in the webpage cluster that finds, select the webpage ID of the 3rd default number, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
Wherein, the above-mentioned key word information of obtaining in webpage corresponding to webpage ID, the method that generates this webpage ID characteristic of correspondence vector is specifically as follows: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, add up the occurrence number of remaining each keyword in this webpage, generate this webpage ID characteristic of correspondence vector according to the occurrence number of each keyword in this webpage.Here, the content of webpage can be title and the summary info that comprises webpage ID, also can be title and the text message that comprises webpage ID.Here, if all words all as a dimension, the proper vector that obtains can be larger, can adopt dimensionality reduction technology, and high dimension vector is changed into low dimensional vector.
In addition, described according to all the webpage ID characteristic of correspondence vectors in search result list corresponding to each keyword, the method that generates one or more webpage clusters corresponding to this keyword is specifically as follows: adopt K arest neighbors (K-Nearest Neighbor, KNN) sorting algorithm to carry out cluster to all the webpage ID characteristic of correspondence vectors in search result list corresponding to each keyword.
Above embodiment of the invention webpage recommending method is had been described in detail, the present invention also provides a kind of webpage recommending device, and this device can make user's fast finding to target web.
Referring to Fig. 2, Fig. 2 is the structural representation of embodiment of the invention webpage recommending device, and this device comprises: log acquisition unit 201, the first analytic unit 202, recommendation unit 203; Wherein,
Log acquisition unit 201 is used for obtaining the click inquiry log, and described click inquiry log comprises user ID, keyword and webpage ID;
The first analytic unit 202 is used for gathering the key word information of clicking each user ID of inquiry log, sets up the interest model of this user ID; Gather the webpage ID that clicks all user ID in the inquiry log, obtain the key word information in webpage corresponding to each webpage ID, set up the interest model of this webpage ID; Determine the degree of association of user ID and webpage ID according to the interest model of the interest model of user ID and webpage ID;
Recommendation unit 203, be used for when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, according to the webpage ID of the degree of association of user ID select progressively the first default number from high to low, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
In said apparatus,
The interest model of described user ID comprises the first item of interest, and described the first item of interest comprises a plurality of the first interest subitems, and described the first interest subitem comprises that keyword, user ID are to the interest-degree of keyword;
Described the first analytic unit 202 is gathering the key word information of clicking each user ID in the inquiry log, when setting up the interest model of this user ID, be used for: gather all keywords that user corresponding to this user ID inquired about, the number of webpage clicking ID determines that according to the number of webpage clicking ID this user ID is to the interest-degree of this keyword when adding up this user and inquiring about each keyword;
The interest model of described webpage ID comprises the second item of interest, and described the second item of interest comprises a plurality of the second interest subitems, and described the second interest subitem comprises that keyword, webpage ID are to the interest-degree of keyword;
Described the first analytic unit 202 is gathering the webpage ID that clicks all user ID in the inquiry log, obtain the key word information in webpage corresponding to each webpage ID, when setting up the interest model of this webpage ID, be used for: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, add up the occurrence number of remaining each keyword in this webpage, determine that according to the occurrence number of this keyword this webpage ID is to the interest-degree of this keyword.
Described the first analytic unit 202 is used for when determining the degree of association of user ID and webpage ID according to the interest model of the interest model of user ID and webpage ID:
According to the interest-degree generation N dimensional vector V of user ID in the interest model of each user ID to each keyword
K1
According to the interest-degree generation N dimensional vector V of webpage ID in the interest model of each webpage ID to each keyword
K2
Calculate N dimensional vector V
K1And V
K2Between distance D
K, with D
KBe designated as the degree of association between this user ID and this webpage ID.
In said apparatus,
The interest model of described user ID comprises the 3rd item of interest, and described the 3rd item of interest comprises a plurality of the 3rd interest subitems, and described the first interest subitem comprises that keyword type, user ID are to the interest-degree of keyword type;
Described the first analytic unit 202 is gathering the key word information of clicking each user ID in the inquiry log, set up the interest model of this user ID when concrete, be used for: gather all keywords that user corresponding to this user ID inquired about and determine type under each keyword; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this class keywords when adding up this user and inquiring about every class keywords;
The interest model of described webpage ID comprises the 4th item of interest, and described the 4th item of interest comprises a plurality of the 4th interest subitems, and described the 4th interest subitem comprises that keyword type, webpage ID are to the interest-degree of keyword type;
Described the first analytic unit 202 is gathering the webpage ID that clicks all user ID in the inquiry log, obtain the key word information in webpage corresponding to each webpage ID, when setting up the interest model of this webpage ID, be used for: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, determine the affiliated type of each keyword of residue, add up the occurrence number of every class keywords in this webpage, determine that according to the occurrence number of this class keywords this webpage ID is to the interest-degree of this class keywords.
Described the first analytic unit 202 is used for when determining the degree of association of user ID and webpage ID according to the interest model of the interest model of user ID and webpage ID:
According to the interest-degree generation N dimensional vector V of user ID in the interest model of each user ID to each class keywords
C1
According to the interest-degree generation N dimensional vector V of webpage ID in the interest model of each webpage ID to each class keywords
C2
Calculate N dimensional vector V
C1And V
C2Between distance D
C, with D
CBe designated as the degree of association between this user ID and this webpage ID.
In said apparatus,
The interest model of described user ID comprises the first item of interest, the 3rd item of interest; Described the first item of interest comprises a plurality of the first interest subitems, and described the first interest subitem comprises that keyword, user ID are to the interest-degree of keyword; Described the 3rd item of interest comprises a plurality of the 3rd interest subitems, and described the 3rd interest subitem comprises that keyword type, user ID are to the interest-degree of keyword type;
Described the first analytic unit 202 is gathering the key word information of clicking each user ID in the inquiry log, when setting up the interest model of this user ID, is used for: gather all keywords and the affiliated type of definite each keyword that user corresponding to this user ID inquired about; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this keyword when adding up this user and inquiring about each keyword; The number of webpage clicking ID determines that according to the number of this user's webpage clicking ID this user ID is to the interest-degree of this class keywords when adding up this user and inquiring about every class keywords;
The interest model of described webpage ID comprises the second item of interest, the 4th item of interest; Described the second item of interest comprises a plurality of the second interest subitems, and described the second interest subitem comprises that keyword, webpage ID are to the interest-degree of keyword; Described the 4th item of interest comprises a plurality of the 4th interest subitems, and described the 4th interest subitem comprises that keyword type, webpage ID are to the interest-degree of keyword type;
Described the first analytic unit 202 is gathering the webpage ID that clicks all user ID in the inquiry log, obtain the key word information in webpage corresponding to each webpage ID, when setting up the interest model of this webpage ID, be used for: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, the occurrence number of statistics each keyword of residue in this webpage determines that according to the occurrence number of this keyword this webpage ID is to the interest-degree of this keyword; Determine the affiliated type of each keyword, add up the occurrence number of every class keywords in this webpage, determine that according to the occurrence number of this class keywords this webpage ID is to the interest-degree of this class keywords.
Described the first analytic unit 202 is used for when determining the degree of association of user ID and webpage ID according to the interest model of the interest model of user ID and webpage ID:
According to the interest-degree generation N dimensional vector V of user ID in the interest model of each user ID to each keyword
K1According to the interest-degree generation N dimensional vector V of user ID in the interest model of each user ID to each class keywords
C1
According to the interest-degree generation N dimensional vector V of webpage ID in the interest model of each webpage ID to each keyword
K2, according to the interest-degree generation N dimensional vector V of webpage ID in the interest model of each webpage ID to each class keywords
C2
Calculate N dimensional vector V
K1And V
K2Between distance D
K, and N dimensional vector V
C1And V
C2Between distance D
C, to described D
KAnd D
CBe weighted the degree of association that calculates between this user ID and this webpage ID.
Described the first analytic unit 202 adopts following formula to described D
KAnd D
CBe weighted the degree of association that calculates between this user ID and this webpage ID:
D=a * D
K+ (1-a) * D
C, wherein, D is the degree of association between this user ID and this webpage ID, a is preset value, and a is greater than 0 and less than 1 real number.
In addition, this device also comprises the second analytic unit 204;
Described the second analytic unit 204, be used for gathering each keyword of clicking user's inquiry corresponding to interior all user ID of inquiry log Preset Time, for each user of this keyword of inquiry, between any two webpage ID that this user clicks, set up incidence relation; The webpage ID that gathers all users' clicks of this keyword of inquiry in the Preset Time, for each webpage ID, add up the occurrence number of each incidence relation corresponding to this webpage ID, according to the occurrence number of each incidence relation determine this webpage ID and and this webpage ID have the degree of association between the webpage ID of incidence relation;
Described recommendation unit 203, when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, further according to the webpage ID of the degree of association of this transcoding page or leaf select progressively the second default number from high to low, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
In addition, this device also comprises: the 3rd analytic unit 205;
Described log acquisition unit 201 is further used for: obtain search result list corresponding to each keyword;
Described the 3rd analytic unit 205 is used for each the webpage ID for search result list corresponding to each keyword, obtains the key word information in webpage corresponding to this webpage ID, generates this webpage ID characteristic of correspondence vector; All webpage ID characteristic of correspondence vectors according in search result list corresponding to each keyword generate one or more webpage clusters corresponding to this keyword;
Described recommendation unit 203, when the click Search Results order that receives the user enters radio web page search transcoding page or leaf, further in one or more webpage clusters corresponding to the keyword of this user search, search the webpage cluster at this transcoding page or leaf place, in the webpage cluster that finds, select the webpage ID of the 3rd default number, webpage corresponding to each webpage ID of selecting recommended in this transcoding page or leaf.
The key word information of described the 3rd analytic unit 205 in obtaining webpage corresponding to this webpage ID, when generating this webpage ID characteristic of correspondence vector, be used for: the content to webpage corresponding to this webpage ID is carried out participle, remove invalid word, add up the occurrence number of remaining each keyword in this webpage, generate this webpage ID characteristic of correspondence vector according to the occurrence number of each keyword in this webpage;
Described the 3rd analytic unit 205 is vectorial according to all the webpage ID characteristics of correspondence in search result list corresponding to each keyword, when generating one or more webpage cluster corresponding to this keyword, be used for: all the webpage ID characteristics of correspondence vectors to search result list corresponding to this keyword adopt the most contiguous KNN sorting algorithm of K to carry out cluster.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.