CN101079064B - Web page sequencing method and device - Google Patents
Web page sequencing method and device Download PDFInfo
- Publication number
- CN101079064B CN101079064B CN2007100761642A CN200710076164A CN101079064B CN 101079064 B CN101079064 B CN 101079064B CN 2007100761642 A CN2007100761642 A CN 2007100761642A CN 200710076164 A CN200710076164 A CN 200710076164A CN 101079064 B CN101079064 B CN 101079064B
- Authority
- CN
- China
- Prior art keywords
- webpage
- user
- classification
- web page
- expert
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention provides a method and a device for arranging web page suitable to the computer application field. The method comprises the following steps: saving the web page category vector established by the saving user; classifying IP journal accessed by the users; determining the expert category of users according to web page category accessed most by users; increasing points of web page category vector value corresponding the web page according to the determined expert category when the user hits the web page according to the searching result of the searching engine; classifying index terms and obtaining the web page category of the index terms when the user inputs the index terms for information retrieval through the searching engine, and optimizing and arranging the searched web page according to the web page category vector corresponding to the web page category of the index terms. The invention dissolves the problems of hitting maliciously by the users and increasing points blindly caused by increasing points to the web page directly dependent on the hit number of users in the prior art.
Description
Technical field
The invention belongs to computer application field, relate in particular to a kind of Web page sequencing method and device.
Background technology
Search engine is the very fierce field of current competition, and search engine also has user experience except content abundant.In general, the problem that search engine is faced now is not that information is not enough but information is excessive, searches for a keyword often and up to ten million results can occur.
In actual application, when the user uses search engine, wish that all (Uniform Resource Locator has just comprised that the user wishes the information that obtains in URL), therefore has been ranked into the key factor that improves the search engine quality for first page even preceding 5 uniform resource locators.Famous search engine Google can become world-class search engine at short notice, exactly because webpage grade (pagerank) technology of its invention can effectively solve sequencing problem.
But nowadays, each network company has been understood and major part has all adopted the pagerank technology, the ranking results of the search engine that in fact current any one is bigger is not by a single algorithm, but by the overall result of tens even up to a hundred factors.Algorithm commonly used is not only pagerank, also has hits algorithm (a kind of searching algorithm based on hyperlink), and Hilltop algorithm (a kind of search engine rank algorithm that is applicable to macrotaxonomy) etc. are carried out deception because single algorithm is easy to be penetrated.Certainly, adopt the purpose of big quantity algorithm to have only one, promptly allow ranking results be close to the users more.
In the prior art, when the result of user by search engine searches sorted, directly webpage is carried out bonus point by user's number of clicks.Because this method is not divided the user, causes the user to click maliciously, and bonus point is recommended very blindly.
Summary of the invention
The purpose of the embodiment of the invention is to provide a kind of Web page sequencing method, is intended to solve and directly leans on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, and bonus point is recommended problem very blindly.
The embodiment of the invention is achieved in that a kind of Web page sequencing method, said method comprising the steps of:
The webpage categorization vector that the storage user establishes;
The IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of user;
As user during according to the search engine retrieving result webpage clicking, the value bonus point of the classification identical in the webpage categorization vector of expert's classification according to the user who determines with expert's classification of user to this webpage; Described webpage categorization vector is the weight of this webpage in each webpage classification;
When the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
Another purpose of the embodiment of the invention is to provide a kind of webpage collator, and described device comprises:
Webpage categorization vector memory module is used to store the webpage categorization vector that the user establishes;
User expert's classification determination module is used for the IP daily record of user capture is classified, and the webpage classification that user capture is maximum is determined expert's classification of user;
The webpage categorization vector adds sub-module, is used for as user during according to the search engine retrieving result webpage clicking value bonus point of the classification identical with expert's classification of user in the webpage categorization vector of expert's classification to this webpage according to the user who determines; Described webpage categorization vector is the weight of this webpage in each webpage classification;
Webpage optimization sorting module, be used for when the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
The embodiment of the invention is carried out user expert's category division according to the IP daily record of user capture to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user, solved and directly leaned on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, bonus point is recommended problem very blindly.
Description of drawings
Fig. 1 is the process flow diagram of the Web page sequencing method that provides of the embodiment of the invention;
Fig. 2 is the exemplary block diagram of search engine;
Fig. 3 is the structural drawing of the webpage collator that provides of the embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The embodiment of the invention is according to procotol (the Internet Protocol of user capture, IP) daily record is carried out user expert's category division to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user.
Fig. 1 shows the flow process of the Web page sequencing method that the embodiment of the invention provides, and details are as follows.
In step S101, the webpage categorization vector that the storage user establishes.
Wherein, vector is the matrix of one dimension, can preserve the score value of things to all elements of some set.The embodiment of the invention is by distributing a vector to webpage, preserve the value of this webpage to each classification in the classification set, for example, if the classification set is { " physical culture ", " news " }, the vector of webpage has just been preserved this webpage to the score value of " physical culture " and the score value of " news " so, can read this two score values by visiting vector.In actual application, the size of classification set is all on rank up to a hundred, so the webpage vector has just been preserved the score value of each webpage to each classification of these up to a hundred classifications.
Use the vector of a n dimension the webpage categorization vector to all webpages, the dimension n of vector equals the categorical measure of webpage classification set A, this vectorial implication is the weight of this webpage in each classification, be how many ratios that this webpage accounts in each classification is, because a webpage not necessarily belongs to a classification, just can represent with a vector how many weight of this webpage on each classification be.Wherein, in the prior art, a classification set A can both be established according to the content of current internet web page in most website, for example history, military affairs, tourism, humanity, automobile etc.
In step S102, the IP daily record of user capture is classified, determine expert's classification of user according to the maximum IP classification of user capture.
The process prescription of IP daily record that obtains user capture is as follows, the typical structure of search engine as shown in Figure 2, comprise reptile (crawler), index, searcher etc., wherein the work of reptile mainly is identify label (the Uniform Resource Locator Identify to webpage distributing uniform resource localizer, URLID) and download webpage, reptile distributes a unique identifier ID for each pages of Internet, distinguish different URLID, the corresponding structure of this URLID has comprised the content of text of webpage, the adeditive attribute of webpage etc.
Reptile is from the Internet download webpage, and distribution unique URL ID, deposits raw data base in.Index reads info web from raw data base and sets up index, and deposits index data base in.
When the user imported retrieving information and carries out information retrieval, searcher was accepted user's input, obtain returning to this user after record and the ordering from index data base, simultaneously user's operation log recording to the user behavior daily record.
Wherein, when determining expert's classification of user, used algorithm is as follows,
Definition expert array UserType[], UserType[i wherein] expert's classification of i user of expression.
For example, the user imports retrieving information " T43 ", and search engine is classified to the character string of retrieval, obtaining classification is " computer " class, when search engine sorts to the result who retrieves, consider the effect of webpage categorization vector, the webpage bigger the weight of " computer " comes the front.
In step S103, when the user when search engine retrieving result is clicked certain webpage, according to expert's classification of the user who determines value bonus point to the webpage categorization vector of this webpage correspondence.
For example, after the user searches for search engine, select to have clicked a webpage, if this user belongs to the expert of webpage categorization vector, just the classification weight of this webpage bonus point on the vector of correspondence.The i.e. webpage clicked of this user is worth bonus point according to expert's classification of this user accordingly to this webpage categorization vector.
In specific implementation process, according to expert's classification of user the value of the webpage categorization vector of this user's webpage clicking correspondence is being added timesharing, the algorithm of using is as follows,
In step S104, when the user retrieves by search engine, the result of user search is optimized ordering with reference to the score value in the webpage categorization vector.
Wherein, the algorithm that this step is used is as follows,
IF (user search entry " KKK ")
{
" KKK " classified, and the classification that obtains " kkk " is a)
Search engine calls searcher and obtains result for retrieval.
Result for retrieval is carried out presort,, with the pagerank technology Search Results is sorted herein as embodiments of the invention.
For (each result for retrieval webpage c)
{
The webpage categorization vector of inquiry c webpage correspondence, reading this webpage is U about expert's recommendation of classification a
a
According to expert's recommendation is U
aSize adjust the ranking results of this webpage c, U
aBig shifts to an earlier date.
}
Return the collections of web pages after the ordering, and the web results after will sorting shows.
Fig. 3 shows the structure of the webpage collator that the embodiment of the invention provides.
The webpage categorization vector that webpage categorization vector memory module 11 storage users establish, wherein, each vector in this webpage categorization vector is used for identifying the weight of the webpage of this vector correspondence in the set of webpage classification.
User expert's classification determination module 12 is classified the IP daily record of user capture, determine expert's classification of user according to the maximum IP classification of user capture, as user during according to the search engine retrieving result webpage clicking, the webpage categorization vector adds the value bonus point of expert's classification of the user that sub-module 13 determines according to user expert's classification determination module 12 to the webpage categorization vector of this webpage correspondence, detailed process is stated at preamble, just repeats no more herein.
When the user carried out information retrieval by search engine input index, webpage optimization sorting module 14 was optimized ordering with reference to the webpage categorization vector of webpage to the webpage of searching for, and web displaying module 15 is with the web displaying after the optimization sorting.
The embodiment of the invention is carried out user expert's category division according to the IP daily record of user capture to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user, solved and directly leaned on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, bonus point is recommended problem very blindly.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.
Claims (4)
1. a Web page sequencing method is characterized in that, said method comprising the steps of:
The webpage categorization vector that the storage user establishes;
The IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of this user;
As user during according to the search engine retrieving result webpage clicking, the value bonus point of the classification identical in the webpage categorization vector of expert's classification according to the user who determines with expert's classification of user to this webpage; Described webpage categorization vector is the weight of this webpage in each webpage classification;
When the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
2. Web page sequencing method as claimed in claim 1 is characterized in that, described method further comprises:
With the web displaying after the optimization sorting.
3. a webpage collator is characterized in that, described device comprises:
Webpage categorization vector memory module is used to store the webpage categorization vector that the user establishes;
User expert's classification determination module is used for the IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of this user;
The webpage categorization vector adds sub-module, is used for as user during according to the search engine retrieving result webpage clicking value bonus point of the classification identical with expert's classification of user in the webpage categorization vector of expert's classification to this webpage according to the user who determines; Described webpage categorization vector is the weight of this webpage in each webpage classification;
Webpage optimization sorting module, be used for when the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
4. webpage collator as claimed in claim 3 is characterized in that, described device further comprises: the web displaying module is used for the web displaying after the optimization sorting.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100761642A CN101079064B (en) | 2007-06-25 | 2007-06-25 | Web page sequencing method and device |
PCT/CN2008/070608 WO2009000174A1 (en) | 2007-06-25 | 2008-03-27 | Method and device of web page rank |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100761642A CN101079064B (en) | 2007-06-25 | 2007-06-25 | Web page sequencing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101079064A CN101079064A (en) | 2007-11-28 |
CN101079064B true CN101079064B (en) | 2011-11-30 |
Family
ID=38906543
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007100761642A Active CN101079064B (en) | 2007-06-25 | 2007-06-25 | Web page sequencing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN101079064B (en) |
WO (1) | WO2009000174A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108182186A (en) * | 2016-12-08 | 2018-06-19 | 广东精点数据科技股份有限公司 | A kind of Web page sequencing method based on random forests algorithm |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101515360A (en) | 2009-04-13 | 2009-08-26 | 阿里巴巴集团控股有限公司 | Method and server for recommending network object information to user |
CN101840420B (en) * | 2010-04-02 | 2011-12-28 | 清华大学 | Search aid system, search aid method and program |
CN101996240A (en) * | 2010-10-13 | 2011-03-30 | 蔡亮华 | Method and device for providing information |
CN102542474B (en) | 2010-12-07 | 2015-10-21 | 阿里巴巴集团控股有限公司 | Result ranking method and device |
CN102541857A (en) * | 2010-12-08 | 2012-07-04 | 腾讯科技(深圳)有限公司 | Webpage sorting method and device |
CN102722503A (en) * | 2011-03-31 | 2012-10-10 | 北京百度网讯科技有限公司 | Method and device for sequencing search results |
CN102231152B (en) * | 2011-05-25 | 2014-09-03 | 北京捷讯华泰科技有限公司 | Searching method for precisely inquiring based on IP (Internet Protocol) address of mobile terminal |
CN102956009B (en) | 2011-08-16 | 2017-03-01 | 阿里巴巴集团控股有限公司 | A kind of electronic commerce information based on user behavior recommends method and apparatus |
CN103164804B (en) | 2011-12-16 | 2016-11-23 | 阿里巴巴集团控股有限公司 | The information-pushing method of a kind of personalization and device |
CN109344321B (en) * | 2012-05-08 | 2021-11-02 | 潍坊久宝智能科技有限公司 | System for obtaining user personalized features |
TWI465948B (en) * | 2012-05-25 | 2014-12-21 | Gemtek Technology Co Ltd | Method for dlna pre-browsing and customizing browsing result and digital media device using the same |
CN102722545B (en) * | 2012-05-25 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | A kind of method, device and equipment for sorting to releasing news |
CN103399861B (en) * | 2013-07-04 | 2017-03-08 | 百度在线网络技术(北京)有限公司 | A kind of network address in Web side navigation recommends methods, devices and systems |
CN104636366B (en) * | 2013-11-11 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Method and device for acquiring search result queue |
CN105224657B (en) * | 2015-09-30 | 2018-10-12 | 北京奇虎科技有限公司 | A kind of information recommendation method and electronic equipment based on search engine |
CN107153656B (en) * | 2016-03-03 | 2020-12-01 | 阿里巴巴集团控股有限公司 | Information searching method and device |
CN105763633B (en) * | 2016-04-14 | 2019-05-21 | 上海牙木通讯技术有限公司 | A kind of correlating method of domain name and website visiting behavior |
CN107870941B (en) * | 2016-09-27 | 2021-11-02 | 北京搜狗科技发展有限公司 | Webpage sorting method, device and equipment |
CN106777201B (en) * | 2016-12-23 | 2021-01-08 | 北京奇元科技有限公司 | Method and device for sorting recommended data on search result page |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6947924B2 (en) * | 2002-01-07 | 2005-09-20 | International Business Machines Corporation | Group based search engine generating search results ranking based on at least one nomination previously made by member of the user group where nomination system is independent from visitation system |
CN1389811A (en) * | 2002-02-06 | 2003-01-08 | 北京造极人工智能技术有限公司 | Intelligent search method of search engine |
US7028027B1 (en) * | 2002-09-17 | 2006-04-11 | Yahoo! Inc. | Associating documents with classifications and ranking documents based on classification weights |
US7693827B2 (en) * | 2003-09-30 | 2010-04-06 | Google Inc. | Personalization of placed content ordering in search results |
US20050256848A1 (en) * | 2004-05-13 | 2005-11-17 | International Business Machines Corporation | System and method for user rank search |
-
2007
- 2007-06-25 CN CN2007100761642A patent/CN101079064B/en active Active
-
2008
- 2008-03-27 WO PCT/CN2008/070608 patent/WO2009000174A1/en active Application Filing
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108182186A (en) * | 2016-12-08 | 2018-06-19 | 广东精点数据科技股份有限公司 | A kind of Web page sequencing method based on random forests algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN101079064A (en) | 2007-11-28 |
WO2009000174A1 (en) | 2008-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101079064B (en) | Web page sequencing method and device | |
US20200311155A1 (en) | Systems for and methods of finding relevant documents by analyzing tags | |
CN101908071B (en) | Method and device thereof for improving search efficiency of search engine | |
CN101154224B (en) | Websites navigation method and system thereof | |
TWI391834B (en) | Systems for and methods of finding relevant documents by analyzing tags | |
CN101295319B (en) | Method and device for expanding query, search engine system | |
US8832058B1 (en) | Systems and methods for syndicating and hosting customized news content | |
CN111708740A (en) | Mass search query log calculation analysis system based on cloud platform | |
US20050065959A1 (en) | Systems and methods for clustering search results | |
US20010047353A1 (en) | Methods and systems for enabling efficient search and retrieval of records from a collection of biological data | |
US20070250501A1 (en) | Search result delivery engine | |
US20070162448A1 (en) | Adaptive hierarchy structure ranking algorithm | |
CN102004782A (en) | Search result sequencing method and search result sequencer | |
CN101641697A (en) | Related search queries for a webpage and their applications | |
CN103577489A (en) | Method and device of searching web browsing history | |
CN102722498A (en) | Search engine and implementation method thereof | |
CN102214183A (en) | Search engine query method for combining feedback contents of pages with fixed ranking | |
KR20040017008A (en) | System and method for offering information using a search engine | |
KR100671077B1 (en) | Server, Method and System for Providing Information Search Service by Using Sheaf of Pages | |
Lei et al. | Improved relevance ranking in WebGather | |
JP4094844B2 (en) | Document collection apparatus for specific use, method thereof, and program for causing computer to execute | |
CN1838123A (en) | Information search method and system based on fixed keyword | |
Surendiran | Similarity Matrix Approach in Web Clustering | |
Choudhary et al. | Various link algorithms in web mining | |
CN117370485A (en) | Method and system for building index library and retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20160106 Address after: The South Road in Guangdong province Shenzhen city Fiyta building 518057 floor 5-10 Nanshan District high tech Zone Patentee after: Shenzhen Tencent Computer System Co., Ltd. Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403 Patentee before: Tencent Technology (Shenzhen) Co., Ltd. |