CN103324645A - Method and device for recommending webpage - Google Patents

Method and device for recommending webpage Download PDF

Info

Publication number
CN103324645A
CN103324645A CN2012100808315A CN201210080831A CN103324645A CN 103324645 A CN103324645 A CN 103324645A CN 2012100808315 A CN2012100808315 A CN 2012100808315A CN 201210080831 A CN201210080831 A CN 201210080831A CN 103324645 A CN103324645 A CN 103324645A
Authority
CN
China
Prior art keywords
webpage
user
interest
keyword
ids
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012100808315A
Other languages
Chinese (zh)
Other versions
CN103324645B (en
Inventor
王犇
何军
杨志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201210080831.5A priority Critical patent/CN103324645B/en
Publication of CN103324645A publication Critical patent/CN103324645A/en
Application granted granted Critical
Publication of CN103324645B publication Critical patent/CN103324645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for recommending a webpage. The method comprises the steps as follows: a click query log is acquired, and the click query log comprises user IDs (identifiers), keywords and webpage IDs; keyword information of each user ID is gathered, and an interest model of the user ID is established; the webpage IDs of all the user IDs are gathered, keyword information in a webpage corresponding to each webpage ID is acquired, and an interest model of the webpage ID is established; associations degrees of the user IDs and the webpage IDs are determined according to the interest models of the user IDs and the interest models of the webpage IDs; when a research result clicking command of a user is received, and a wireless web search transcoding page is entered, a first webpage ID with a preset number is selected according to an order from high to low of the associations degrees of the webpage IDs and the user IDs, and the selected webpage corresponding to each webpage ID is recommended in the transcoding page. According to the method and the device, the target webpage can be searched rapidly.

Description

Webpage recommendation method and device
Technical Field
The invention relates to the technical field of data mining, in particular to a webpage recommendation method and device.
Background
With the growth of mobile phone internet users, the search behavior through the mobile phone terminal is more and more, and in order to help the user to search the required information, the wireless search engine generally provides some keywords related to the web page in the wireless web page search transcoding page clicked by the user for the user to click and search, or provides keywords related to the current query string for the user to click and search.
However, in the current industry, the way of providing related keywords for a user to click for query when the user searches and clicks a certain wireless web page search transcoding page for browsing is substantially to reduce the search range, improve the search accuracy and help the user to obtain more ideal search results, but the user also needs to select the query string again for searching and check the search results again to find the interested web pages, and the intermediate process is long.
Disclosure of Invention
In view of this, the present invention provides a web page recommendation method, which can quickly find a target web page.
In order to achieve the above object, the present invention provides a web page recommendation method, including:
acquiring a click query log, wherein the click query log comprises a user ID, a keyword and a webpage ID;
summarizing the keyword information of each user ID, and establishing an interest model of the user ID; summarizing the webpage IDs of all user IDs, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID;
when a click search result command of a user is received and the wireless webpage search transcoding page is entered, a first preset number of webpage IDs are selected according to the sequence from high to low of the association degree with the user ID, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page
The invention also provides a webpage recommendation device, which comprises: a log obtaining unit 201, a first analyzing unit and a recommending unit;
the log acquiring unit is used for acquiring a click query log, and the click query log comprises a user ID, a keyword and a webpage ID;
the first analysis unit is used for summarizing and clicking the keyword information of each user ID in the query log and establishing an interest model of the user ID; summarizing and clicking the webpage IDs of all user IDs in the query log, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID;
and the recommending unit is used for selecting a first preset number of webpage IDs according to the sequence from high to low of the association degree with the user ID when a click search result command of the user is received to enter a wireless webpage search transcoding page, and recommending the webpage corresponding to each selected webpage ID in the transcoding page.
According to the technical scheme, the click query logs are analyzed, the interest model of the user ID and the interest model of the webpage ID are established, the association degree of the user ID and the webpage ID is established according to the interest model of the user ID and the interest model of the webpage ID, therefore, when a user clicks a search result to enter a wireless search transcoding page, the webpage IDs with the first preset number are selected according to the sequence of the association degree with the user ID from high to low, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page, so that the user can quickly find a target webpage.
Drawings
FIG. 1 is a flowchart of a web page recommendation method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a web page recommendation device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the technical solutions of the present invention are described in detail below with reference to the accompanying drawings and examples.
Referring to fig. 1, fig. 1 is a flowchart of a web page recommendation method according to an embodiment of the present invention, including the following steps:
step 101, obtaining a click query log, wherein the click query log comprises a user ID, a keyword and a webpage ID.
The click query log refers to a relevant record of a search behavior of a user when the user queries information by using a search engine, and may include information such as user Identification (ID), keywords, and web page Identification (ID), and when the user clicks one search result, one click query log may be recorded, for example, the user searches for "route viewing", and the search engine returns a plurality of search results; if user A clicks on a web page with a web page ID of 1234, a click query log may be recorded such as: user a, tour, 1234. In practical applications, when a user queries information using a search engine, a service provider providing search services typically logs the user's search behavior. Here, the keyword is a keyword that a user inquires in a search engine, and the web page ID is an ID of a web page clicked by the user in a search result corresponding to the keyword, where each web page has a unique web page ID.
Step 102, summarizing the keyword information of each user ID, and establishing an interest model of the user ID; summarizing the webpage IDs of all user IDs, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; and determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID.
In fact, if a user is interested in a search result corresponding to a certain keyword, it can be stated to some extent that the user is also interested in the keyword, and therefore, the number of clicks of the user on the search result under each keyword can be used as an index for measuring the degree of interest of the user on the keyword. If the user is interested in the search results under a certain category of keywords, the user can be shown to be interested in the category of keywords to a certain extent, and therefore, the number of clicks of the search results corresponding to each category of keywords by the user can be used as an index for measuring the interest degree of the user ID in the category of keywords. Therefore, an interest model can be established according to the interest degree of the user ID for each keyword, the interest model can also be established according to the interest degree of the user ID for each type of keyword, and the interest model of the user ID can also be established by integrating the interest degree of the user ID for each keyword and the interest degree of each type of keyword.
Similarly, if a keyword appears in a web page corresponding to a web page ID many times, it can be described to some extent that the content of the web page corresponding to the web page ID may be related to the keyword, and therefore, the number of occurrences of each keyword in the web page corresponding to the web page ID can be used as an index for measuring the interest degree of the web page ID in the keyword. Similarly, if a certain type of keyword appears in the web page ID for many times, it can be described to a certain extent that the content of the web page corresponding to the web page ID may be relatively related to the type of keyword, and therefore, the appearance frequency of each type of keyword in the web page corresponding to the web page ID can be used as an index for measuring the interest degree of the web page ID in the type of keyword. Therefore, the web page ID interest model can be established according to the occurrence frequency of each keyword in the web page corresponding to the web page ID, the web page ID interest model can also be established according to the occurrence frequency of each type of keyword in the web page corresponding to the web page ID, and the web page ID interest model can also be established by integrating the occurrence frequency of each keyword in the web page corresponding to the web page ID and the occurrence frequency of each type of keyword.
The following describes the methods for establishing an interest model of a user ID and an interest model of a web page ID, respectively:
first, an interest model can be established according to the interest degree of each keyword by the user ID:
in this case, the interest model of the user ID includes only the first interest item related to the keyword, and the first interest item may include a plurality of first interest sub-items, where each first interest sub-item represents the interest of the user ID in one keyword, and the specific content may include the keyword and the interest level of the user ID in the keyword;
the summarizing of the keyword information of each user ID and establishing the interest model of the user ID may specifically include: summarizing all the keywords inquired by the user corresponding to the user ID, counting the number of the clicked webpage IDs when the user inquires each keyword, and determining the interest degree of the user ID for the keyword according to the number of the clicked webpage IDs.
Correspondingly, an interest model can be established according to the occurrence frequency of each keyword in the webpage corresponding to the webpage ID:
in this case, the interest model of the web page ID includes a second interest item related to the keyword, where the second interest item may include a plurality of second interest sub-items, where each second interest sub-item represents an interest of the web page ID in one keyword, and the specific content may include the keyword and an interest level of the web page ID in the keyword;
the summarizing of the webpage IDs of all the user IDs, the obtaining of the keyword information of the webpage corresponding to each webpage ID, and the establishing of the interest model of the webpage ID comprise the following steps: and segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
It should be noted that the null words described in this document may specifically include prepositions, adverbs, sighs, adjectives, and words whose occurrence times are less than a first preset ratio and/or greater than a second preset ratio (i.e., words whose occurrence times are too few or too many in the web page ID are regarded as null words), where the first preset ratio is less than the second preset ratio. The content of the web page corresponding to the web page ID may specifically include information such as a title and summary of the web page, or may include information such as a title and a text of the web page.
When the interest degree of the keyword is established according to the user IDThe interest model of each user ID can be mapped to an N-dimensional vector V when the interest model of the webpage ID is established according to the occurrence times of keywords in the webpage IDK1Wherein each dimension represents the interest of the user ID in a keyword; mapping an interest model for each web page ID to an N-dimensional vector VK2Wherein, each dimension represents the interest degree of the webpage ID to a keyword; by calculating VK1And VK2A distance D betweenKTo determine the degree of association between the user ID and the web page ID. Here, V is calculatedK1And VK2The distance between them can be calculated using prior art methods, for example, calculating the cosine distance between them.
Secondly, establishing an interest model according to the interest degree of the user ID to each type of keywords:
in this case, the interest model of the user ID only includes third interest items related to keyword types, and a plurality of third interest sub-items may be included in the second interest items, where each third interest sub-item represents an interest of the user ID in a category of keywords, and the specific content may include a keyword type and an interest level of the user ID in the keyword type;
the summarizing of the keyword information of each user ID and establishing the interest model of the user ID may specifically include: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of the clicked webpage IDs when the user inquires each type of keyword, and determining the interest degree of the user ID on the type of keyword according to the number of the clicked webpage IDs of the user.
Accordingly, an interest model can be established according to the occurrence times of each type of keywords in the webpage ID:
in this case, the interest model of the web page ID includes a fourth interest item related to the keyword, where the fourth interest item may include a plurality of fourth interest sub-items, where each fourth interest sub-item represents an interest of the web page ID in a category of keywords, and the specific content may include a keyword type and an interest level of the web page ID in the keyword type;
summarizing the webpage IDs of all the user IDs, acquiring keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID comprises the following steps: the method comprises the steps of segmenting the content of a webpage corresponding to the webpage ID, removing invalid words, determining the type of each remaining keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
When an interest model of the user ID is established according to the interest degree of the user ID for each type of keywords and the interest model of the webpage ID is established according to the occurrence times of each type of keywords in the webpage ID, the interest model of each user ID can be mapped to an N-dimensional vector VC1Wherein each dimension represents the interest of the user ID in a class of keywords; mapping an interest model for each web page ID to an N-dimensional vector VC2Each dimension represents the interest degree of the webpage ID to a class of keywords; by calculating VC1And VC2A distance D betweenCTo determine the degree of association between the user ID and the web page ID. Here, V is calculatedC1And VC2The distance between them can be calculated using prior art methods, for example, calculating the cosine distance between them.
And finally, establishing an interest model for each keyword and the interest degree of each type of keyword according to the user ID:
in this case, the interest model of the user ID includes a first interest item related to a keyword and a third interest item related to a keyword type;
the summarizing of the keyword information of each user ID and establishing the interest model of the user ID may specifically include: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of webpage IDs clicked when the user inquires each keyword, and determining the interest degree of the user ID for the keyword according to the number of the webpage IDs clicked by the user; counting the number of the clicked webpage IDs when the user inquires each type of keyword, and determining the interest degree of the user ID on the type of keyword according to the number of the clicked webpage IDs of the user.
An example of establishing an interest model for each keyword and for the degree of interest for each type of keyword based on the user ID is as follows:
[ purpose: 0.9 ix 35: 0.8 braised pork in brown sauce: 0.6, roasting the fish: 0.5] [ automobile: 0.8 cate: 0.2]
Wherein, in the first middle bracket, "Tu guan", "ix 35" and "braised pork in brown sauce" are keywords, and the number behind the colon of each keyword is the interest degree of the user ID for the keyword; in the second middle bracket, "car" and "food" are keyword types, and the number after the colon of each keyword type is the interest level of the user ID in the keyword.
Accordingly, an interest model can be established according to each keyword in the web page ID and the occurrence frequency of each type of keyword:
in this case, the interest model of the web page ID includes a second interest item related to the keyword and a fourth interest item related to the keyword type;
summarizing the webpage IDs of all the user IDs, acquiring keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID comprises the following steps: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage ID, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword; determining the type of each keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID to the keyword according to the occurrence frequency of the keyword.
An example of establishing an interest model according to each keyword in the web page ID and the occurrence number of each type of keyword is as follows:
[ purpose: 0.9 evaluation: 0.8 test driving: 0.6] [ automobile: 0.8]
In the first middle bracket, the way watching, evaluating and driving trying are keywords, and the number behind the colon mark of each keyword is the interest degree of the webpage ID to the keyword; in the second middle bracket, the car is the keyword type, and the number after colon number after each keyword type is the interest degree of the webpage ID for the keyword.
When an interest model of the user ID is established according to the interest degree of the user ID to each keyword and each type of keyword, and the interest model of the webpage ID is established according to the occurrence times of each keyword and each type of keyword in the webpage corresponding to the webpage ID, the interest degree of the user ID to the keyword and the interest degree of the user ID to the keyword type in the interest model of each user ID can be respectively mapped to the N-dimensional vector VK1And VC1(ii) a Respectively mapping the interest degree of the user ID in the interest model of each webpage ID to the keywords and the interest degree of the user ID in the keyword type to an N-dimensional vector VK2And VC2(ii) a Calculating VK1And VK2A distance D betweenKAnd VC1And VC2A distance D betweenCThrough the pair DKAnd DCThe association between the user ID and the web page ID is determined by performing a weighted calculation, for example, using the following formula: d is a × DK+(1-a)×DCWherein D is the degree of association between the user ID and the web page ID, a is a preset value, and a is a real number smaller than 1 and larger than 0.
An example of calculating the association between the user ID and the web page ID is as follows:
user A- > Web page A: 0.9- > Web page B: 0.7- > Web page C: 0.3
In this example, the user a has interest degrees of 0.9, 0.7, and 0.3 in the web page a, the web page B, and the web page C, respectively.
103, when a click search result command of the user is received and the wireless webpage search transcoding page is entered, selecting a first preset number of webpage IDs according to the sequence from high to low of the association degree with the user ID, and recommending the webpage corresponding to each selected webpage ID in the transcoding page.
In this step, when the user needs to check a certain search result, the user needs to click the search result, so that the background server receives the click search result command, provides the corresponding wireless web page search transcoding page requested by the click search result command to the user, and meanwhile, performs web page recommendation in the transcoding page. The web page IDs may be sorted according to the association between the user ID and the web page ID established in step 102 and the association with the user ID, a first preset number of web page IDs sorted in the top is selected, and then the web page corresponding to each selected web page ID is recommended in the transcoding page. The webpage with the most possible interest of the user is recommended to the user by recommending the first preset number of webpages with high association degree with the user ID in the code conversion page, so that the user can find the interested webpage without selecting the query string again for searching, and the user can quickly find the interested target webpage. For example, when user a clicks on the search result into a wireless web page, two web page IDs with a higher association with user a may be: and recommending the webpages corresponding to the webpage A and the webpage B in the code conversion page.
Here, the web page is recommended in the code conversion page, and actually, the link address of the web page is put in the code conversion page, so that the user can enter the interested recommended web page by clicking the link address.
In practical application, after a user clicks a search result to enter a wireless webpage for searching a transcoding page, if the user is interested in the content of the transcoding page, the degree of interest of the user in other webpages similar to the content of the transcoding page is relatively high, so that other webpages similar to the content of the transcoding page can be recommended to the user ID.
In fact, the interest of the user is periodic, and the user is often interested in only some content in a certain period of time, for example, the user is currently interested in only the related content in the aspect of the automobile, and all the search is the content related to the automobile, so that the search of the user ID in a period of time and the behavior of clicking on the search result characterize the interest of the user ID in the period of time. In addition, if two user IDs have searched for the same keyword, the two user IDs may be considered similar in interest, for example, if both user a and user B have searched for "drive-through," they may both be considered interested in driving-through. Two users may also be considered similar in interest if they have also clicked on the same web page ID. If there is similarity in interests between two users, the web page IDs they click on can be considered to have some relevance.
For example, the search and click behavior of user A within a preset time is shown in Table one:
watch 1
The search and click behavior of the user B in the preset time is shown in the following table II:
Figure BDA0000146381260000092
watch two
In which, both the user a and the user B search for "through view" and click on the web pages with the web page IDs 1234 and 2345, so that there is similarity between the user a and the user B in terms of interests, and there is also a correlation between the clicked web page IDs.
Because the webpage IDs clicked by the user A and the user B have relevance, webpage clustering can be carried out on the webpage IDs clicked by the user A and the user B. Web page clustering may be performed by different strategies, for example: clustering the webpage IDs clicked by the user A and the user B respectively to obtain: a cluster [ 123423453456456756786789 ] formed by the webpage IDs clicked by the user A and a cluster [ 1234234578908901 ] formed by the webpage IDs clicked by the user B; or clustering all the webpage IDs clicked by the user A and the user B to obtain: [ 12342345345645675678678978908901 ]; the webpage IDs clicked by the user A and the user B together can be clustered to obtain: [12342345].
In the webpage clusters obtained by the last method, the relevance between the webpage IDs is the highest. Accordingly, the degree of association between web page IDs may be determined based on the user's search and click behavior. If two user IDs search for the same keyword and click on the same web page ID, the web page IDs clicked by two users together have a higher association degree, for example, if the user a and the user B both click on web pages with web page IDs 1234 and 2345, the two web pages with web page IDs 1234 and 2345 have a higher association degree.
Therefore, in the embodiment of the present invention shown in fig. 1, the method may further include: summarizing each keyword queried by a user corresponding to all user IDs within preset time, and establishing an association relation between any two webpage IDs clicked by the user aiming at each user querying the keyword; summarizing the webpage IDs clicked by all users searching the keyword within preset time, counting the occurrence frequency of each association relation corresponding to each webpage ID aiming at each webpage ID, and determining the association degree between each webpage ID and the webpage ID having the association relation with the webpage ID according to the occurrence frequency of each association relation. Therefore, when a click search result command of a user is received and the wireless webpage search transcoding page is entered, a second preset number of webpage IDs can be further selected according to the sequence from high to low of the relevance degree of the transcoding page, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page.
The following description will be given by way of example to a method for determining the association between web page IDs: suppose that a user A clicks the web pages 1, 2 and 3 within a preset time, which indicates that the web pages 1, 2 and 3 have relevance, so that a group of relevance relations [1-2], [2-3] and [1-3] can be generated; suppose that the user B clicks on the web pages 1, 2, and 4 within a preset time, which indicates that the web pages 1, 2, and 4 have relevance, another set of relevance relationships [1-2], [2-4], [1-4] can be generated. Then, for the web page 1, the occurrence frequency of the association relations [1-2], [1-3], [1-4] corresponding to the web page 1 can be counted, wherein [1-2] occurs 2 times, and [1-3] and [1-4] each occur once, and then, the association degree between the web page 1 and the web page 2 can be considered to be higher than the association degree between the web page 1 and the web page 3 and the association degree between the web page 1 and the web page 4. When the user clicks on the search result into web page 1, web page 2 may be recommended to the user in web page 1.
In practical applications, there is also a correlation between web page IDs in search results corresponding to the same keyword, and the correlation degree is different according to different contents of web pages corresponding to the web page IDs, for example, a search result (in a format of search ranking position, title, web page ID) corresponding to the keyword "hannda" includes the following items:
1: hanranda authority evaluation: 123
2: hanlanda VS odsse VS majorda 8: 234
3: the latest quotation of the Toyota Hanlanda: 345
4: hanlanda offers 1.5 ten thousand Yuan minimum sale 28.88 ten thousand: 456
5. Hanlandao disadvantage — automotive chinese hanlanda comment: 567
The content of the web pages with the web page IDs 123, 234 and 567 mainly focuses on evaluation of the Hanranda, the relevance between the three web pages is relatively high, the content of the web pages with the web page IDs 345 and 456 mainly focuses on price quotation of the Hanranda, and the relevance between the two web pages is relatively high. Therefore, the search results corresponding to the same keyword can be clustered into web pages, and the web page IDs belonging to the same web page cluster have a higher degree of association, for example, in the above example, the web page clusters [ 123234567 ] and [ 345456 ] corresponding to the keyword "hananda" can be obtained by clustering the web pages.
In practice, when a user queries for information using a search engine, a log of keywords searched by the user and corresponding search results is usually performed, for example, a search result list corresponding to the keywords and the keywords is recorded in a search result presentation log, wherein the search result list may include one or more web page IDs. Thus, the search result list corresponding to each keyword can be obtained through the search result presentation log. In addition, when searching for a keyword, a search result list corresponding to the keyword returned by the search engine may be directly acquired.
After the search result list corresponding to the keyword is obtained, analyzing the content of the webpage corresponding to each webpage ID in the search result list, obtaining keyword information in the webpage corresponding to the webpage ID, and generating a feature vector corresponding to the webpage ID; then, one or more web page clusters corresponding to the keyword can be generated according to the feature vectors corresponding to all the web page IDs in the search result list corresponding to the keyword. Therefore, when a click search result command of a user is received and the wireless webpage search transcoding page is entered, the webpage cluster where the transcoding page is located can be searched in one or more webpage clusters corresponding to the keyword inquired by the user, a third preset number of webpage IDs are selected from the searched webpage clusters, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page.
The method for acquiring the keyword information in the web page corresponding to the web page ID and generating the feature vector corresponding to the web page ID may specifically be: and segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and generating a feature vector corresponding to the webpage ID according to the occurrence frequency of each keyword in the webpage. Here, the content of the web page may be title and summary information including the web page ID, or title and body information including the web page ID. Here, if all words are taken as a dimension, the obtained feature vector is large, and a dimension reduction technique can be adopted to convert a high-dimensional vector into a low-dimensional vector.
In addition, the method for generating one or more web page clusters corresponding to each keyword according to the feature vectors corresponding to all the web page IDs in the search result list corresponding to each keyword may specifically be: and clustering the feature vectors corresponding to all the webpage IDs in the search result list corresponding to each keyword by adopting a K-Nearest Neighbor (KNN) classification algorithm.
The above detailed description of the web page recommendation method according to the embodiment of the invention also provides a web page recommendation device, which enables a user to quickly find a target web page.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a web page recommendation device according to an embodiment of the present invention, where the device includes: a log obtaining unit 201, a first analyzing unit 202 and a recommending unit 203; wherein,
a log obtaining unit 201, configured to obtain a click query log, where the click query log includes a user ID, a keyword, and a web page ID;
the first analysis unit 202 is configured to collect keyword information of each user ID in the click query log, and establish an interest model of the user ID; summarizing and clicking the webpage IDs of all user IDs in the query log, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID;
the recommending unit 203 is configured to, when a click search result command of a user is received and the wireless webpage enters a code conversion page for searching for webpages, select a first preset number of webpage IDs in the sequence from high to low in association with the user ID, and recommend a webpage corresponding to each selected webpage ID in the code conversion page.
In the above-described apparatus, the first and second air-conditioning units,
the interest model of the user ID comprises a first interest item, the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interest degrees of the user ID on the keywords;
the first analysis unit 202 is configured to, when gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID: summarizing all keywords inquired by a user corresponding to the user ID, counting the number of clicked webpage IDs when the user inquires each keyword, and determining the interest degree of the user ID for the keywords according to the number of the clicked webpage IDs;
the interest model of the webpage ID comprises a second interest item, the second interest item comprises a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of the webpage ID on the keywords;
the first analysis unit 202 is configured to, when summarizing and clicking the web page IDs of all the user IDs in the query log, obtain keyword information in a web page corresponding to each web page ID, and establish an interest model of the web page ID,: and segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
When determining the association degree between the user ID and the web page ID according to the interest model of the user ID and the interest model of the web page ID, the first analysis unit 202 is configured to:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKD isKAnd recording the association degree between the user ID and the webpage ID.
In the above-described apparatus, the first and second air-conditioning units,
the interest model of the user ID comprises a third interest item, the third interest item comprises a plurality of third interest sub-items, and the first interest sub-item comprises a keyword type and an interest degree of the user ID on the keyword type;
the first analysis unit 202 is configured to, when gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID,: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID for the type of keyword according to the number of the click web page IDs of the user;
the interest model of the webpage ID comprises a fourth interest item, the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise a keyword type and interest degree of the webpage ID on the keyword type;
the first analysis unit 202 is configured to, when summarizing and clicking the web page IDs of all the user IDs in the query log, obtain keyword information in a web page corresponding to each web page ID, and establish an interest model of the web page ID,: the method comprises the steps of segmenting the content of a webpage corresponding to the webpage ID, removing invalid words, determining the type of each remaining keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
When determining the association degree between the user ID and the web page ID according to the interest model of the user ID and the interest model of the web page ID, the first analysis unit 202 is configured to:
generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
Generating N-dimensional vector V according to interest degree of each webpage ID to various keywords in interest model of each webpage IDC2
Calculating an N-dimensional vector VC1And VC2A distance D betweenCD isCIs recorded as the userThe degree of association between the ID and the web page ID.
In the above-described apparatus, the first and second air-conditioning units,
the interest model of the user ID comprises a first interest item and a third interest item; the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interestingness of the keywords in user ID; the third interest item comprises a plurality of third interest sub-items, and the third interest sub-items comprise keyword types and interestingness of the keyword types by the user ID;
the first analysis unit 202 is configured to, when gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of webpage IDs clicked when the user inquires each keyword, and determining the interest degree of the user ID for the keyword according to the number of the webpage IDs clicked by the user; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID for the type of keyword according to the number of the click web page IDs of the user;
the interest model of the webpage ID comprises a second interest item and a fourth interest item; the second interest items comprise a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of webpage IDs (identity) in the keywords; the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise keyword types and interest degrees of webpage IDs in the keyword types;
the first analysis unit 202 is configured to, when summarizing and clicking the web page IDs of all the user IDs in the query log, obtain keyword information in a web page corresponding to each web page ID, and establish an interest model of the web page ID,: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword; determining the type of each keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID to the keyword according to the occurrence frequency of the keyword.
When determining the association degree between the user ID and the web page ID according to the interest model of the user ID and the interest model of the web page ID, the first analysis unit 202 is configured to:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1(ii) a Generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2Generating N-dimensional vector V according to the interest degree of each webpage ID to various keywords in the interest model of each webpage IDC2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKAnd an N-dimensional vector VC1And VC2A distance D betweenCTo said DKAnd DCAnd performing weighting calculation to obtain the association degree between the user ID and the webpage ID.
The first analysis unit 202 applies the following formula to the DKAnd DCAnd performing weighted calculation to obtain the association degree between the user ID and the webpage ID:
D=a×DK+(1-a)×DCwherein D is the degree of association between the user ID and the web page ID, a is a preset value, and a is a real number greater than 0 and less than 1.
In addition, the apparatus further comprises a second analysis unit 204;
the second analysis unit 204 is configured to summarize each keyword queried by the user corresponding to all user IDs within a preset time in the click query log, and establish an association relationship between any two web page IDs clicked by the user for each user querying the keyword; summarizing the webpage IDs clicked by all users inquiring the keyword within preset time, counting the occurrence frequency of each association relation corresponding to each webpage ID aiming at each webpage ID, and determining the association degree between each webpage ID and the webpage ID having the association relation with the webpage ID according to the occurrence frequency of each association relation;
the recommending unit 203 further selects a second preset number of webpage IDs according to the sequence from high to low of the association degree with the transcoding page when receiving a click search result command of the user to enter the wireless webpage search transcoding page, and recommends the webpage corresponding to each selected webpage ID in the transcoding page.
In addition, the apparatus further comprises: a third analysis unit 205;
the log obtaining unit 201 is further configured to: acquiring a search result list corresponding to each keyword;
the third analysis unit 205 is configured to, for each web page ID in the search result list corresponding to each keyword, acquire keyword information in a web page corresponding to the web page ID, and generate a feature vector corresponding to the web page ID; generating one or more webpage clusters corresponding to each keyword according to the feature vectors corresponding to all webpage IDs in the search result list corresponding to each keyword;
when a click search result command of a user is received and the wireless webpage search transcoding page is entered, the recommending unit 203 further searches for a webpage cluster where the transcoding page is located in one or more webpage clusters corresponding to the keyword searched by the user, selects a third preset number of webpage IDs in the searched webpage clusters, and recommends the webpage corresponding to each selected webpage ID in the transcoding page.
When the third analysis unit 205 acquires the keyword information in the web page corresponding to the web page ID and generates the feature vector corresponding to the web page ID, it is configured to: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and generating a feature vector corresponding to the webpage ID according to the occurrence frequency of each keyword in the webpage;
the third analysis unit 205 is configured to, when generating one or more web page clusters corresponding to each keyword according to the feature vectors corresponding to all the web page IDs in the search result list corresponding to the keyword,: and clustering the feature vectors corresponding to all the webpage IDs in the search result list corresponding to the keyword by adopting a K nearest KNN classification algorithm.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (22)

1. A method for web page recommendation, the method comprising:
acquiring a click query log, wherein the click query log comprises a user ID, a keyword and a webpage ID;
summarizing the keyword information of each user ID, and establishing an interest model of the user ID; summarizing the webpage IDs of all user IDs, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID;
when a click search result command of a user is received and the wireless webpage search transcoding page is entered, a first preset number of webpage IDs are selected according to the sequence from high to low of the association degree of the user IDs, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page.
2. The web page recommendation method of claim 1,
the interest model of the user ID comprises a first interest item, the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interest degrees of the user ID on the keywords;
the step of summarizing the keyword information of each user ID and establishing an interest model of the user ID comprises the following steps: summarizing all keywords inquired by a user corresponding to the user ID, counting the number of clicked webpage IDs when the user inquires each keyword, and determining the interest degree of the user ID for the keywords according to the number of the clicked webpage IDs;
the interest model of the webpage ID comprises a second interest item, the second interest item comprises a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of the webpage ID on the keywords;
summarizing the webpage IDs of all the user IDs, acquiring keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID comprises the following steps: and segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
3. The web page recommendation method of claim 2,
the determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID comprises the following steps:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKD isKAnd recording the association degree between the user ID and the webpage ID.
4. The web page recommendation method of claim 1,
the interest model of the user ID comprises a third interest item, the third interest item comprises a plurality of third interest sub-items, and the first interest sub-item comprises a keyword type and an interest degree of the user ID on the keyword type;
the step of summarizing the keyword information of each user ID and establishing an interest model of the user ID comprises the following steps: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID in the type of keyword according to the number of the click web page IDs;
the interest model of the webpage ID comprises a fourth interest item, the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise a keyword type and interest degree of the webpage ID on the keyword type;
summarizing the webpage IDs of all the user IDs, acquiring keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID comprises the following steps: the method comprises the steps of segmenting the content of a webpage corresponding to the webpage ID, removing invalid words, determining the type of each remaining keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
5. The web page recommendation method of claim 4,
the determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID comprises the following steps:
generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
Generating N-dimensional vector V according to interest degree of each webpage ID to various keywords in interest model of each webpage IDC2
Calculating an N-dimensional vector VC1And VC2A distance D betweenCD isCAnd recording the association degree between the user ID and the webpage ID.
6. The web page recommendation method of claim 1,
the interest model of the user ID comprises a first interest item and a third interest item; the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interestingness of the keywords in user ID; the third interest item comprises a plurality of third interest sub-items, and the third interest sub-items comprise keyword types and interestingness of the keyword types by the user ID;
the step of summarizing the keyword information of each user ID and establishing an interest model of the user ID comprises the following steps: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of webpage IDs clicked when the user inquires each keyword, and determining the interest degree of the user ID for the keyword according to the number of the webpage IDs clicked by the user; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID for the type of keyword according to the number of the click web page IDs of the user;
the interest model of the webpage ID comprises a second interest item and a fourth interest item; the second interest items comprise a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of webpage IDs (identity) in the keywords; the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise keyword types and interest degrees of webpage IDs in the keyword types;
the summarizing of the webpage IDs of all the user IDs, the obtaining of the keyword information of the webpage corresponding to each webpage ID, and the establishing of the interest model of the webpage ID comprise the following steps: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword; determining the type of each keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID to the keyword according to the occurrence frequency of the keyword.
7. The web page recommendation method of claim 6,
the determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID comprises the following steps:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1(ii) a Generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2Generating N-dimensional vector V according to the interest degree of each webpage ID to various keywords in the interest model of each webpage IDC2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKAnd an N-dimensional vector VC1And VC2A distance D betweenCTo said DKAnd DCAnd performing weighting calculation to obtain the association degree between the user ID and the webpage ID.
8. The web page recommendation method according to claim 7, wherein:
for the DKAnd DCThe method for obtaining the association degree between the user ID and the webpage ID by weighting calculation adopts the following formula:
D=a×DK+(1-a)×DCwherein D is the degree of association between the user ID and the web page ID, a is a preset value, and a is a real number greater than 0 and less than 1.
9. The method for recommending web pages according to claim 1, further comprising: summarizing each keyword queried by a user corresponding to all user IDs within preset time, and establishing an association relation between any two webpage IDs clicked by the user aiming at each user ID for querying the keyword; summarizing the webpage IDs clicked by all users inquiring the keyword within preset time, counting the occurrence frequency of each association relation corresponding to each webpage ID aiming at each webpage ID, and determining the association degree between each webpage ID and the webpage ID having the association relation with the webpage ID according to the occurrence frequency of each association relation;
when a click search result command of a user is received and the wireless webpage search transcoding page is entered, a second preset number of webpage IDs are further selected from high to low according to the association degree of the transcoding page, and the webpage corresponding to each selected webpage ID is recommended in the transcoding page.
10. The method for recommending web pages according to claim 1, further comprising:
acquiring a search result list corresponding to each keyword, acquiring keyword information of a webpage corresponding to the webpage ID aiming at each webpage ID in the search result list corresponding to the keyword, and generating a feature vector corresponding to the webpage ID;
generating one or more webpage clusters corresponding to each keyword according to the feature vectors corresponding to all webpage IDs in the search result list corresponding to each keyword;
when a click search result command of a user is received to enter a wireless webpage search transcoding page, further searching a webpage cluster where the transcoding page is located in one or more webpage clusters corresponding to the keyword searched by the user, selecting a third preset number of webpage IDs in the searched webpage clusters, and recommending the webpage corresponding to each selected webpage ID in the transcoding page.
11. The web page recommendation method of claim 10,
the method for acquiring the keyword information of the webpage corresponding to the webpage ID and generating the feature vector corresponding to the webpage ID comprises the following steps: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and generating a feature vector corresponding to the webpage ID according to the occurrence frequency of each keyword in the webpage;
the method for generating one or more webpage clusters corresponding to each keyword according to the feature vectors corresponding to all webpage IDs in the search result list corresponding to each keyword comprises the following steps: and clustering the feature vectors corresponding to all the webpage IDs in the search result list corresponding to the keyword by adopting a K nearest KNN classification algorithm.
12. A web page recommendation apparatus, comprising: the system comprises a log obtaining unit, a first analyzing unit and a recommending unit;
the log acquiring unit is used for acquiring a click query log, and the click query log comprises a user ID, a keyword and a webpage ID;
the first analysis unit is used for summarizing and clicking the keyword information of each user ID in the query log and establishing an interest model of the user ID; summarizing and clicking the webpage IDs of all user IDs in the query log, acquiring keyword information in a webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID; determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID;
and the recommending unit is used for selecting a first preset number of webpage IDs according to the sequence from high to low of the association degree with the user ID when a click search result command of the user is received to enter a wireless webpage search transcoding page, and recommending the webpage corresponding to each selected webpage ID in the transcoding page.
13. The web page recommendation device of claim 12,
the interest model of the user ID comprises a first interest item, the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interest degrees of the user ID on the keywords;
the first analysis unit is used for gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID: summarizing all keywords inquired by a user corresponding to the user ID, counting the number of clicked webpage IDs when the user inquires each keyword, and determining the interest degree of the user ID for the keywords according to the number of the clicked webpage IDs;
the interest model of the webpage ID comprises a second interest item, the second interest item comprises a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of the webpage ID on the keywords;
the first analysis unit is used for summarizing and clicking the webpage IDs of all the user IDs in the query log, acquiring the keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID, and is used for: and segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
14. The web page recommendation device of claim 13,
the first analysis unit is used for determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKD isKAnd recording the association degree between the user ID and the webpage ID.
15. The web page recommendation device of claim 12,
the interest model of the user ID comprises a third interest item, the third interest item comprises a plurality of third interest sub-items, and the first interest sub-item comprises a keyword type and an interest degree of the user ID on the keyword type;
the first analysis unit is used for gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID for the type of keyword according to the number of the click web page IDs of the user;
the interest model of the webpage ID comprises a fourth interest item, the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise a keyword type and interest degree of the webpage ID on the keyword type;
the first analysis unit is used for summarizing and clicking the webpage IDs of all the user IDs in the query log, acquiring the keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID, and is used for: the method comprises the steps of segmenting the content of a webpage corresponding to the webpage ID, removing invalid words, determining the type of each remaining keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword.
16. The web page recommendation device of claim 15,
the first analysis unit is used for determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID:
generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
According to the webpage ID in the interest model of each webpage ID to eachInterestingness generation of class keywords into N-dimensional vector VC2
Calculating an N-dimensional vector VC1And VC2A distance D betweenCD isCAnd recording the association degree between the user ID and the webpage ID.
17. The web page recommendation device of claim 12,
the interest model of the user ID comprises a first interest item and a third interest item; the first interest item comprises a plurality of first interest sub-items, and the first interest sub-items comprise keywords and interestingness of the keywords in user ID; the third interest item comprises a plurality of third interest sub-items, and the third interest sub-items comprise keyword types and interestingness of the keyword types by the user ID;
the first analysis unit is used for gathering the keyword information of each user ID in the click query log and establishing an interest model of the user ID: summarizing all keywords queried by a user corresponding to the user ID and determining the type of each keyword; counting the number of webpage IDs clicked when the user inquires each keyword, and determining the interest degree of the user ID for the keyword according to the number of the webpage IDs clicked by the user; counting the number of click web page IDs when the user inquires each type of keyword, and determining the interest degree of the user ID for the type of keyword according to the number of the click web page IDs of the user;
the interest model of the webpage ID comprises a second interest item and a fourth interest item; the second interest items comprise a plurality of second interest sub-items, and the second interest sub-items comprise keywords and interest degrees of webpage IDs (identity) in the keywords; the fourth interest item comprises a plurality of fourth interest sub-items, and the fourth interest sub-items comprise keyword types and interest degrees of webpage IDs in the keyword types;
the first analysis unit is used for summarizing and clicking the webpage IDs of all the user IDs in the query log, acquiring the keyword information in the webpage corresponding to each webpage ID, and establishing an interest model of the webpage ID, and is used for: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and determining the interest degree of the webpage ID on the keyword according to the occurrence frequency of the keyword; determining the type of each keyword, counting the occurrence frequency of each keyword in the webpage, and determining the interest degree of the webpage ID to the keyword according to the occurrence frequency of the keyword.
18. The web page recommendation device of claim 17,
the first analysis unit is used for determining the association degree of the user ID and the webpage ID according to the interest model of the user ID and the interest model of the webpage ID:
generating N-dimensional vector V according to the interest degree of each keyword in the interest model of each user IDK1(ii) a Generating N-dimensional vector V according to interest degree of each user ID in various keywords in interest model of each user IDC1
Generating N-dimensional vector V according to the interest degree of each keyword by the webpage ID in the interest model of each webpage IDK2Generating N-dimensional vector V according to the interest degree of each webpage ID to various keywords in the interest model of each webpage IDC2
Calculating an N-dimensional vector VK1And VK2A distance D betweenKAnd an N-dimensional vector VC1And VC2A distance D betweenCTo said DKAnd DCAnd performing weighting calculation to obtain the association degree between the user ID and the webpage ID.
19. The web page recommendation device of claim 18,
the first analysis unit adopts the following formula to the DKAnd DCAnd performing weighted calculation to obtain the association degree between the user ID and the webpage ID:
D=a×DK+(1-a)×DCwherein D is the degree of association between the user ID and the web page ID, a is a preset value, and a is a real number greater than 0 and less than 1.
20. The web page recommendation device of claim 12, further comprising a second analysis unit;
the second analysis unit is used for summarizing each keyword queried by the user corresponding to all the user IDs within preset time in the click query log, and establishing an association relation between any two webpage IDs clicked by the user aiming at each user querying the keyword; summarizing the webpage IDs clicked by all users inquiring the keyword within preset time, counting the occurrence frequency of each association relation corresponding to each webpage ID aiming at each webpage ID, and determining the association degree between each webpage ID and the webpage ID having the association relation with the webpage ID according to the occurrence frequency of each association relation;
and the recommending unit further selects a second preset number of webpage IDs according to the sequence from high to low of the association degree with the transcoding page when receiving a click search result command of a user to enter a wireless webpage search transcoding page, and recommends the webpage corresponding to each selected webpage ID in the transcoding page.
21. The web page recommendation device of claim 12, further comprising: a third analysis unit;
the log obtaining unit is further configured to: acquiring a search result list corresponding to each keyword;
the third analysis unit is used for acquiring keyword information in the webpage corresponding to the webpage ID aiming at each webpage ID in the search result list corresponding to each keyword, and generating a feature vector corresponding to the webpage ID; generating one or more webpage clusters corresponding to each keyword according to the feature vectors corresponding to all webpage IDs in the search result list corresponding to each keyword;
and the recommending unit is used for further searching a webpage cluster where the transcoding page is located in one or more webpage clusters corresponding to the keyword searched by the user when receiving a click search result command of the user to enter the wireless webpage search transcoding page, selecting a third preset number of webpage IDs in the searched webpage clusters, and recommending the webpage corresponding to each selected webpage ID in the transcoding page.
22. The web page recommendation device of claim 21,
the third analysis unit is configured to, when acquiring keyword information in a web page corresponding to the web page ID and generating a feature vector corresponding to the web page ID: segmenting the content of the webpage corresponding to the webpage ID, removing invalid words, counting the occurrence frequency of each remaining keyword in the webpage, and generating a feature vector corresponding to the webpage ID according to the occurrence frequency of each keyword in the webpage;
the third analysis unit is configured to, when generating one or more web page clusters corresponding to each keyword according to the feature vectors corresponding to all the web page IDs in the search result list corresponding to the keyword,: and clustering the feature vectors corresponding to all the webpage IDs in the search result list corresponding to the keyword by adopting a K nearest KNN classification algorithm.
CN201210080831.5A 2012-03-23 2012-03-23 A kind of webpage recommending method and device Active CN103324645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210080831.5A CN103324645B (en) 2012-03-23 2012-03-23 A kind of webpage recommending method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210080831.5A CN103324645B (en) 2012-03-23 2012-03-23 A kind of webpage recommending method and device

Publications (2)

Publication Number Publication Date
CN103324645A true CN103324645A (en) 2013-09-25
CN103324645B CN103324645B (en) 2018-10-09

Family

ID=49193392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210080831.5A Active CN103324645B (en) 2012-03-23 2012-03-23 A kind of webpage recommending method and device

Country Status (1)

Country Link
CN (1) CN103324645B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559265A (en) * 2013-11-04 2014-02-05 北京中搜网络技术股份有限公司 Individualized push method of cell phone client
CN103678710A (en) * 2013-12-31 2014-03-26 同济大学 Information recommendation method based on user behaviors
CN104063443A (en) * 2014-06-13 2014-09-24 百度在线网络技术(北京)有限公司 Method and device for providing search result
CN104268268A (en) * 2014-10-13 2015-01-07 宁波公众信息产业有限公司 Method and system for associating webpage information
CN105488205A (en) * 2015-12-09 2016-04-13 百度在线网络技术(北京)有限公司 Page generation method and page generation apparatus
CN105528456A (en) * 2015-12-25 2016-04-27 北京奇虎科技有限公司 User type based search interface showing method and device
CN105589971A (en) * 2016-01-08 2016-05-18 车智互联(北京)科技有限公司 Method and device for training recommendation model, and recommendation system
CN105608071A (en) * 2015-12-21 2016-05-25 北京奇虎科技有限公司 Generation method and device for determining machine learning algorithm of head word
CN105678335A (en) * 2016-01-08 2016-06-15 车智互联(北京)科技有限公司 Click rate pre-estimation method, device and calculating equipment
CN105989020A (en) * 2015-01-29 2016-10-05 北京灵集科技有限公司 Method and device for multi-data source matching of call network
CN106156106A (en) * 2015-04-03 2016-11-23 阿里巴巴集团控股有限公司 The computational methods of user characteristic data and device
CN106294596A (en) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 The method and device of information search
CN106844680A (en) * 2017-01-25 2017-06-13 百度在线网络技术(北京)有限公司 The methods of exhibiting and device of recommendation information
CN107544980A (en) * 2016-06-24 2018-01-05 北京国双科技有限公司 A kind of method and device for searching webpage
CN108153857A (en) * 2017-12-22 2018-06-12 北京奇虎科技有限公司 A kind of method and system for being used to be associated network access data processing
CN109241403A (en) * 2018-08-03 2019-01-18 腾讯科技(深圳)有限公司 Item recommendation method, device, machinery equipment and computer readable storage medium
CN109685539A (en) * 2018-08-21 2019-04-26 平安普惠企业管理有限公司 Homepage methods of exhibiting, equipment, storage medium and device based on data processing
CN109871380A (en) * 2019-01-14 2019-06-11 深圳市东信时代信息技术有限公司 A kind of crowd's packet application method and system based on Redis
CN110990571A (en) * 2019-12-02 2020-04-10 精硕科技(北京)股份有限公司 Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment
CN112507230A (en) * 2020-12-16 2021-03-16 平安银行股份有限公司 Webpage recommendation method and device based on browser, electronic equipment and storage medium
CN117725314A (en) * 2023-12-18 2024-03-19 无锡市泛亚资讯网络有限公司 Keyword-based website management popularization method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101789018A (en) * 2010-02-09 2010-07-28 清华大学 Method and device for constructing webpage click describing files based on mutual information
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN101853308A (en) * 2010-06-11 2010-10-06 中兴通讯股份有限公司 Method and application terminal for personalized meta-search
US20110302155A1 (en) * 2010-06-03 2011-12-08 Microsoft Corporation Related links recommendation
CN102364467A (en) * 2011-09-29 2012-02-29 北京亿赞普网络技术有限公司 Network search method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551806A (en) * 2008-04-03 2009-10-07 北京搜狗科技发展有限公司 Personalized website navigation method and system
CN101819572A (en) * 2009-09-15 2010-09-01 电子科技大学 Method for establishing user interest model
CN101789018A (en) * 2010-02-09 2010-07-28 清华大学 Method and device for constructing webpage click describing files based on mutual information
US20110302155A1 (en) * 2010-06-03 2011-12-08 Microsoft Corporation Related links recommendation
CN101853308A (en) * 2010-06-11 2010-10-06 中兴通讯股份有限公司 Method and application terminal for personalized meta-search
CN102364467A (en) * 2011-09-29 2012-02-29 北京亿赞普网络技术有限公司 Network search method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王宇: "基于搜索历史的用户兴趣建模", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559265A (en) * 2013-11-04 2014-02-05 北京中搜网络技术股份有限公司 Individualized push method of cell phone client
CN103678710A (en) * 2013-12-31 2014-03-26 同济大学 Information recommendation method based on user behaviors
CN104063443A (en) * 2014-06-13 2014-09-24 百度在线网络技术(北京)有限公司 Method and device for providing search result
CN104268268A (en) * 2014-10-13 2015-01-07 宁波公众信息产业有限公司 Method and system for associating webpage information
CN104268268B (en) * 2014-10-13 2018-05-22 宁波公众信息产业有限公司 A kind of webpage information correlating method and system
CN105989020A (en) * 2015-01-29 2016-10-05 北京灵集科技有限公司 Method and device for multi-data source matching of call network
CN105989020B (en) * 2015-01-29 2019-09-10 北京灵集科技有限公司 A kind of matched method and apparatus of call network multi-data source
CN106156106B (en) * 2015-04-03 2019-10-22 阿里巴巴集团控股有限公司 The calculation method and device of user characteristic data
CN106156106A (en) * 2015-04-03 2016-11-23 阿里巴巴集团控股有限公司 The computational methods of user characteristic data and device
CN105488205A (en) * 2015-12-09 2016-04-13 百度在线网络技术(北京)有限公司 Page generation method and page generation apparatus
CN105488205B (en) * 2015-12-09 2019-05-03 百度在线网络技术(北京)有限公司 Page generation method and device
CN105608071A (en) * 2015-12-21 2016-05-25 北京奇虎科技有限公司 Generation method and device for determining machine learning algorithm of head word
CN105528456A (en) * 2015-12-25 2016-04-27 北京奇虎科技有限公司 User type based search interface showing method and device
CN105528456B (en) * 2015-12-25 2019-04-26 北京奇虎科技有限公司 Search interface methods of exhibiting and device based on user type
CN105678335A (en) * 2016-01-08 2016-06-15 车智互联(北京)科技有限公司 Click rate pre-estimation method, device and calculating equipment
CN105678335B (en) * 2016-01-08 2019-07-02 车智互联(北京)科技有限公司 It estimates the method, apparatus of clicking rate and calculates equipment
CN105589971B (en) * 2016-01-08 2018-12-18 车智互联(北京)科技有限公司 The method, apparatus and recommender system of training recommended models
CN105589971A (en) * 2016-01-08 2016-05-18 车智互联(北京)科技有限公司 Method and device for training recommendation model, and recommendation system
CN107544980B (en) * 2016-06-24 2020-07-24 北京国双科技有限公司 Method and device for searching webpage
CN107544980A (en) * 2016-06-24 2018-01-05 北京国双科技有限公司 A kind of method and device for searching webpage
CN106294596A (en) * 2016-07-29 2017-01-04 北京小米移动软件有限公司 The method and device of information search
CN106844680A (en) * 2017-01-25 2017-06-13 百度在线网络技术(北京)有限公司 The methods of exhibiting and device of recommendation information
CN108153857A (en) * 2017-12-22 2018-06-12 北京奇虎科技有限公司 A kind of method and system for being used to be associated network access data processing
CN109241403A (en) * 2018-08-03 2019-01-18 腾讯科技(深圳)有限公司 Item recommendation method, device, machinery equipment and computer readable storage medium
CN109241403B (en) * 2018-08-03 2022-11-22 腾讯科技(北京)有限公司 Project recommendation method and device, machine equipment and computer-readable storage medium
CN109685539A (en) * 2018-08-21 2019-04-26 平安普惠企业管理有限公司 Homepage methods of exhibiting, equipment, storage medium and device based on data processing
CN109871380A (en) * 2019-01-14 2019-06-11 深圳市东信时代信息技术有限公司 A kind of crowd's packet application method and system based on Redis
CN109871380B (en) * 2019-01-14 2022-11-11 深圳市东信时代信息技术有限公司 Crowd pack application method and system based on Redis
CN110990571B (en) * 2019-12-02 2024-04-02 北京秒针人工智能科技有限公司 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment
CN110990571A (en) * 2019-12-02 2020-04-10 精硕科技(北京)股份有限公司 Method and device for obtaining discussion occupation ratio, storage medium and electronic equipment
CN112507230A (en) * 2020-12-16 2021-03-16 平安银行股份有限公司 Webpage recommendation method and device based on browser, electronic equipment and storage medium
CN112507230B (en) * 2020-12-16 2024-05-17 平安银行股份有限公司 Webpage recommendation method and device based on browser, electronic equipment and storage medium
CN117725314A (en) * 2023-12-18 2024-03-19 无锡市泛亚资讯网络有限公司 Keyword-based website management popularization method and system
CN117725314B (en) * 2023-12-18 2024-06-07 无锡市泛亚资讯网络有限公司 Keyword-based website management popularization method and system

Also Published As

Publication number Publication date
CN103324645B (en) 2018-10-09

Similar Documents

Publication Publication Date Title
CN103324645B (en) A kind of webpage recommending method and device
US20200311155A1 (en) Systems for and methods of finding relevant documents by analyzing tags
TWI636416B (en) Method and system for multi-phase ranking for content personalization
KR101171405B1 (en) Personalization of placed content ordering in search results
CN108763321B (en) Related entity recommendation method based on large-scale related entity network
US8745067B2 (en) Presenting comments from various sources
US8341147B2 (en) Blending mobile search results
JP5501373B2 (en) System and method for collecting and ranking data from multiple websites
CN102799591B (en) Method and device for providing recommended word
US20060248072A1 (en) System and method for spam identification
US8819006B1 (en) Rich content for query answers
US20070214133A1 (en) Methods for filtering data and filling in missing data using nonlinear inference
US20130007124A1 (en) System and method for performing a semantic operation on a digital social network
US20100306249A1 (en) Social network systems and methods
US8290986B2 (en) Determining quality measures for web objects based on searcher behavior
US20100161592A1 (en) Query Intent Determination Using Social Tagging
WO2012088591A9 (en) System and method for performing a semantic operation on a digital social network
KR20140091530A (en) Relevance of name and other search queries with social network features
US20100185623A1 (en) Topical ranking in information retrieval
US20120150846A1 (en) Web-Relevance Based Query Classification
CN102411626A (en) Correlation fraction distribution-based method for classifying query intentions
CN102364467A (en) Network search method and system
Deepak et al. Operators for similarity search: Semantics, techniques and usage scenarios
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
US20130332440A1 (en) Refinements in Document Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131028

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131028

Address after: 518057 Tencent Building, 16, Nanshan District hi tech park, Guangdong, Shenzhen

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant