WO2009000174A1

WO2009000174A1 - Method and device of web page rank

Info

Publication number: WO2009000174A1
Application number: PCT/CN2008/070608
Authority: WO
Inventors: Zhiyuan Liu
Original assignee: Tencent Technology (Shenzhen) Company Limited
Priority date: 2007-06-25
Filing date: 2008-03-27
Publication date: 2008-12-31
Also published as: CN101079064A; CN101079064B

Abstract

A method and device of web page rank are provided which adapted to the computer application field. The method includes: each web page corresponds to at least one web page type related to the content, and each web page corresponds to one web page type vector containing at least one element representing the respective weight of the at least one web page type which the web page corresponds to; when a first user click a web page, the element of the clicked web page's vector corresponding to the web page type that the first user uses most is determined, and the value of determined element is increased; when a second user searches web pages, the web page types that the search content corresponds to is determined, and the at least one pages achieved are sorted based on the the value of the element of each web page vector corresponding to the determined web page type. The problems of users' vicious clicks and blindly adding scores to the web pages brought by adding scores to web pages directly according to users' click times in prior art are resolved.

Description

Web page sorting method and device

Technical field

The present invention relates to the field of computer applications, and in particular, to a web page sorting method and apparatus. Background of the invention

Search engines are an area where competition is currently fierce. The focus of search engine competition is not only rich content, but also user experience. In general, the problem that search engines face now is not insufficient information but excessive information. Searching for a keyword often results in thousands of results.

In the actual application process, when users use the search engine, they want the first page or even the first five Uniform Resource Locators (URLs) to include the information that the user wants, so the ranking becomes the quality of the search engine. The key factor. The famous search engine Google can become the world's number one search engine in a short period of time, because the pagerank technology it invented can effectively solve the sorting problem.

But nowadays, various network companies have already understood and most of them have adopted pagerank technology. In fact, the ranking results of any of the current large search engines are not based on a single algorithm, but on the summary of dozens or even hundreds of factors. The result. Commonly used algorithms include pagerank, hits algorithm (a hyperlink-based search algorithm), Hilltop algorithm (a search engine ranking algorithm for large categories), and so on.

In the prior art, when the results of the search by the search engine are sorted, the webpage is directly scored by the number of clicks of the user. Summary of the invention

The embodiment of the present invention is implemented by the method for sorting webpages. The method includes: each webpage corresponding to at least one webpage category related to the content, and each webpage corresponds to one webpage a webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage;

Determine the type of webpage that the user uses the most according to the webpage that the user has visited;

When the first user clicks on a webpage, determining an element of the webpage category trajectory corresponding to the clicked webpage corresponding to the webpage category most used by the first user, increasing the value of the determined element;

When the second user searches for the webpage, the webpage category corresponding to the search content is determined, and the searched at least one webpage is sorted according to the value of the element corresponding to the determined webpage category in each webpage category vector.

Another embodiment of the present invention provides a webpage sorting apparatus, the apparatus comprising: a first module, configured to determine, according to a webpage visited by a user, a webpage category that is most used by a user, and determine a webpage category vector corresponding to the webpage; wherein, each webpage Corresponding to at least one webpage category related to the content, each webpage corresponds to a webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage;

a second module, configured to: when the first user clicks on a webpage, determine, in the webpage category vector determined by the first module corresponding to the webpage that is clicked, an element corresponding to the webpage category that is used by the first user, The value of the determined element;

The third module is configured to: when the second user searches for a webpage, determine a webpage category corresponding to the search content, and search for at least the value of the element corresponding to the determined webpage category in each webpage category vector from the second module. Sort a page. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart of a web page sorting method according to an embodiment of the present invention;

2 is a structural diagram of a search engine in an embodiment of the present invention; FIG. 3 is a schematic structural diagram of a webpage sorting apparatus according to an embodiment of the present invention. Mode for carrying out the invention

In order to make the objects, the technical solutions and the advantages of the present invention more comprehensible, the present invention will be further described in detail below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the present invention divides the user expert category according to the network protocol (Internet Protocol, IP) log accessed by the user, and adds a score to the value of the webpage category vector corresponding to the webpage according to the webpage clicked by the user, when the user retrieves the information, The user's search results are sorted according to the web page category vector.

Embodiments of the present invention provide a method for sorting web pages. In the method, each web page corresponds to at least one web page category related to the content, and each web page corresponds to a web page category vector. The web page category vector includes at least one element, and at least one element included in the web page category vector respectively represents a weight of each of the at least one web page category corresponding to the web page. The web page category vector is an n-dimensional vector, where n is equal to the number of web page categories. It should be noted that the web page category vector can be implemented with an array containing n elements, n equal to the number of web page categories.

First, the type of webpage that the user uses the most is determined based on the webpage that the user has visited, that is, the expert category of the user is determined. The webpage category that the user uses most can be determined according to the behavior of the user, for example, the IP log accessed by the user is classified, ^^ the webpage category that the user uses most is determined according to the IP category that the user accesses the most, or the search term input by the user can be performed. Classification, which determines the category of webpages that users use the most, based on the category to which the user's most used terms belong. Of course, there are other implementations that determine the type of webpage that the user uses the most based on the user's behavior, all of which are well known to those skilled in the art, and are not listed here.

After determining the category of the webpage that the user uses the most, when the first user clicks on a webpage When the page category vector corresponding to the clicked webpage is determined to correspond to an element of the webpage category that is most used by the first user, the value of the element is increased. The method of increasing the value of the element may specifically add 1 to the value of the element. Repeat this action when another user clicks on the web page.

When the second user searches for the webpage, first determine the webpage category corresponding to the search content input by the second user, and sort the searched at least one webpage according to the value of the element corresponding to the determined webpage category in each webpage category vector.

FIG. 1 shows a flow of a webpage sorting method provided by an embodiment of the present invention, which is described in detail below. In step S101, the web page category vector established by the user is stored.

Among them, the vector is a one-dimensional matrix, which can save the score of things on all elements of a certain set. In the embodiment of the present invention, by assigning a vector to the webpage, the value of each category in the category set is saved, for example, if the category set is {"sports", "news"}, then the webpage vector saves the webpage. For the score of "sports" and the score of "news", these two scores can be read by accessing the vector. In the actual application process, the size of the category collection is on the hundreds of levels, so the web page vector saves the score of each category of each of the hundreds of categories.

Using an n-dimensional vector for all web pages is called a web page category vector. The dimension η of the vector is equal to the number of categories of the web page category set. The meaning of the vector is the weight of the web page in each category, that is, the web page is in each category. What is the proportion, because a web page does not necessarily belong to a category, a vector can be used to indicate the weight of the web page on each category, and the weight of each category can be represented by an element, indicating each category An array of elements constitutes the vector. Among them, in the prior art, most websites are able to establish a category set A according to the content of the current Internet web page, such as history, military, tourism, humanities, automobiles, and the like.

In step S102, the IP logs accessed by the user are classified, and the expert category of the user is determined according to the IP category that the user accesses the most. The process of obtaining the IP log accessed by the user is described as follows. The typical structure of the search engine shown in FIG. 2 includes a crawler, an indexer, a retriever, etc., wherein the crawler works mainly to allocate a uniform resource locator to the webpage. Uniform Resource Locator Identify (URLID) and download webpage. The crawler assigns a unique identifier ID to each Internet webpage to distinguish different URLIDs. This URLID corresponds to a structure, including the text content of the webpage, and the webpage. Additional properties, etc.

The crawler downloads the web page from the Internet and assigns a unique URLID to the original database. The indexer reads the web page information from the original database and indexes it, and stores it in the index database.

When the user inputs the search information for information retrieval, the retriever receives the user input, obtains the record from the index database and returns it to the user after sorting, and records the user's operation log to the user behavior log.

Among them, when determining the expert category of the user, the algorithm used is as follows.

Define the expert array UserTypeD , where UserType[i] represents the expert category of the i-th user.

For (each user i) defines the category counter array TypeCounter[]

Read all historical search records for user i

For (user i's every search serch[ j ]) classifies Serchfj], gets category ID = a

TypeCounter[a] = TypeCounter[a] +1 , give the user this type of other ¹ J port 1 UserType[i] = category with the most category counter TypeCounter Return to the expert category UserType[].

Among them, the user's expert category represents the most used web page category.

For example, the user inputs the search information "T43", and the search engine classifies the retrieved character string to obtain the category "computer". When the search engine sorts the search results, the role of the web page category vector is considered, and the "computer" is Pages with larger weights are ranked first.

In step S103, when the user clicks on a certain webpage in the search engine search result, the value of the webpage category vector corresponding to the webpage is added according to the determined expert category of the user.

For example, when the user searches the search engine, the user clicks on a web page. If the user belongs to the expert of the web page category vector, the category weight of the web page is added to the corresponding vector. That is, the web page clicked by the user adds a value to the corresponding value of the web page category according to the expert category of the user, that is, increases the weight of the element.

In the specific implementation process, when the value of the webpage category vector corresponding to the webpage clicked by the user is added according to the expert category of the user, the algorithm used is as follows.

IF (user clicks on the web page) to determine the user's expert category

IF (user belongs to expert category a, ^ e A ) Web page category vector a value increases by 1

In step S104, when the user searches through the search engine, the results of the user search are optimally sorted by referring to the scores in the web page category.

The algorithm used in this step is as follows.

IF (user search term "KKK") The "KKK" is classified, and the category of "kkk" is a, and the search engine calls the retriever to obtain the search result.

The search results are pre-sorted as an embodiment of the present invention, where the search results are sorted using the pagerank technique.

For (each search result page c) Query the web page category vector corresponding to the c web page, and read the expert recommendation value of the web page for category a.

According to the expert recommendation value of ^, adjust the sorting result of this page C, and put the big one in advance. Returns the sorted page collection and displays the sorted page results.

FIG. 3 shows the structure of a webpage sorting apparatus provided by an embodiment of the present invention.

The web page category vector storage module 11 stores a web page category vector established by the user, wherein each vector in the web page category vector is used to identify the weight of the web page corresponding to the vector in the web page category set. Each webpage corresponds to at least one webpage category related to the content, and each webpage corresponds to one webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents at least one webpage category corresponding to the webpage. the weight of.

The user expert category determining module 12 classifies the IP logs accessed by the user, and determines the expert category of the user according to the IP category that the user accesses the most. When the user clicks the webpage according to the search result, the webpage category vector adding module 13 adds a score to the webpage category vector corresponding to the webpage according to the expert category of the user determined by the user expert category determining module 12. The specific process has been described above and will not be repeated here. When the user enters an index through the search engine for information retrieval, the webpage optimization ranking module

14 Optimize the searched webpages by referring to the webpage category vector of the webpage. The web page display module 15 will optimize the sorted web page display.

The embodiment of the present invention divides the user expert category according to the IP log accessed by the user, and adds a score to the value of the webpage category vector corresponding to the webpage according to the webpage clicked by the user, and when the user retrieves the information, the user is based on the webpage category vector. Sorting the search results, solving the problem in the prior art that the user clicks on the number of clicks directly, causing the user to click maliciously and blindly add points.

The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. Within the scope.

Claims

Claim

A method for sorting web pages, comprising:

Each webpage corresponds to at least one webpage category related to the content, and each webpage corresponds to one webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage. ;

The webpage category that the user has used most is determined according to the webpage visited by the user;

2. The method for sorting webpages according to claim 1, wherein determining the types of webpages that are most used by the user according to the webpages visited by the user comprises:

The IP logs accessed by the user are classified, and the webpage category that the user uses most is determined according to the IP category that the user accesses the most.

The web page sorting method according to claim 2, wherein the web page category vector is an n-dimensional vector, and n is equal to the number of web page categories.

The method of sorting a webpage according to claim 2, wherein the value of the element in the webpage category vector is a click number, and the value of the determined determined element comprises:

Add 1 to the value of the determined element.

The method for sorting web pages according to claim 2, wherein the method further comprises:

Show sorted pages.

6. A web page sorting apparatus, comprising: a first module, configured to determine, according to a webpage visited by the user, a webpage category that is most used by the user, and determine a webpage category vector corresponding to the webpage; wherein each webpage corresponds to at least one webpage category related to the content, and each webpage corresponds to one webpage category a vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage;

The third module is configured to: when the second user searches for a webpage, determine a webpage category corresponding to the search content, and search for at least the value of the element corresponding to the determined webpage category in each webpage category vector from the second module. Sort a page.

The webpage sorting apparatus according to claim 6, wherein the first module comprises:

a first unit, configured to determine a webpage category corresponding to the webpage, and a webpage category vector corresponding to the webpage and including weights of the webpage categories;

And a second unit, configured to determine, according to the webpage accessed by the user, one of the webpage categories determined by the first unit as the webpage category that is most used by the user.

8. The web page sorting apparatus according to claim 7, wherein:

The first unit is configured to classify IP logs accessed by the user, and determine a webpage category that is most used by the user according to the IP category that the user accesses the most.

The web page sorting apparatus according to claim 8, wherein the web page category vector is an n-dimensional vector, and n is equal to the number of web page categories.

10. The web page sorting apparatus according to claim 8, wherein the apparatus further comprises:

a fourth module, configured to receive a webpage sorted by the third module, and display the sorting After the page.