CN101079064B - Web page sequencing method and device - Google Patents

Web page sequencing method and device Download PDF

Info

Publication number
CN101079064B
CN101079064B CN2007100761642A CN200710076164A CN101079064B CN 101079064 B CN101079064 B CN 101079064B CN 2007100761642 A CN2007100761642 A CN 2007100761642A CN 200710076164 A CN200710076164 A CN 200710076164A CN 101079064 B CN101079064 B CN 101079064B
Authority
CN
China
Prior art keywords
webpage
user
classification
web page
expert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007100761642A
Other languages
Chinese (zh)
Other versions
CN101079064A (en
Inventor
刘致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2007100761642A priority Critical patent/CN101079064B/en
Publication of CN101079064A publication Critical patent/CN101079064A/en
Priority to PCT/CN2008/070608 priority patent/WO2009000174A1/en
Application granted granted Critical
Publication of CN101079064B publication Critical patent/CN101079064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention provides a method and a device for arranging web page suitable to the computer application field. The method comprises the following steps: saving the web page category vector established by the saving user; classifying IP journal accessed by the users; determining the expert category of users according to web page category accessed most by users; increasing points of web page category vector value corresponding the web page according to the determined expert category when the user hits the web page according to the searching result of the searching engine; classifying index terms and obtaining the web page category of the index terms when the user inputs the index terms for information retrieval through the searching engine, and optimizing and arranging the searched web page according to the web page category vector corresponding to the web page category of the index terms. The invention dissolves the problems of hitting maliciously by the users and increasing points blindly caused by increasing points to the web page directly dependent on the hit number of users in the prior art.

Description

A kind of Web page sequencing method and device
Technical field
The invention belongs to computer application field, relate in particular to a kind of Web page sequencing method and device.
Background technology
Search engine is the very fierce field of current competition, and search engine also has user experience except content abundant.In general, the problem that search engine is faced now is not that information is not enough but information is excessive, searches for a keyword often and up to ten million results can occur.
In actual application, when the user uses search engine, wish that all (Uniform Resource Locator has just comprised that the user wishes the information that obtains in URL), therefore has been ranked into the key factor that improves the search engine quality for first page even preceding 5 uniform resource locators.Famous search engine Google can become world-class search engine at short notice, exactly because webpage grade (pagerank) technology of its invention can effectively solve sequencing problem.
But nowadays, each network company has been understood and major part has all adopted the pagerank technology, the ranking results of the search engine that in fact current any one is bigger is not by a single algorithm, but by the overall result of tens even up to a hundred factors.Algorithm commonly used is not only pagerank, also has hits algorithm (a kind of searching algorithm based on hyperlink), and Hilltop algorithm (a kind of search engine rank algorithm that is applicable to macrotaxonomy) etc. are carried out deception because single algorithm is easy to be penetrated.Certainly, adopt the purpose of big quantity algorithm to have only one, promptly allow ranking results be close to the users more.
In the prior art, when the result of user by search engine searches sorted, directly webpage is carried out bonus point by user's number of clicks.Because this method is not divided the user, causes the user to click maliciously, and bonus point is recommended very blindly.
Summary of the invention
The purpose of the embodiment of the invention is to provide a kind of Web page sequencing method, is intended to solve and directly leans on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, and bonus point is recommended problem very blindly.
The embodiment of the invention is achieved in that a kind of Web page sequencing method, said method comprising the steps of:
The webpage categorization vector that the storage user establishes;
The IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of user;
As user during according to the search engine retrieving result webpage clicking, the value bonus point of the classification identical in the webpage categorization vector of expert's classification according to the user who determines with expert's classification of user to this webpage; Described webpage categorization vector is the weight of this webpage in each webpage classification;
When the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
Another purpose of the embodiment of the invention is to provide a kind of webpage collator, and described device comprises:
Webpage categorization vector memory module is used to store the webpage categorization vector that the user establishes;
User expert's classification determination module is used for the IP daily record of user capture is classified, and the webpage classification that user capture is maximum is determined expert's classification of user;
The webpage categorization vector adds sub-module, is used for as user during according to the search engine retrieving result webpage clicking value bonus point of the classification identical with expert's classification of user in the webpage categorization vector of expert's classification to this webpage according to the user who determines; Described webpage categorization vector is the weight of this webpage in each webpage classification;
Webpage optimization sorting module, be used for when the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
The embodiment of the invention is carried out user expert's category division according to the IP daily record of user capture to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user, solved and directly leaned on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, bonus point is recommended problem very blindly.
Description of drawings
Fig. 1 is the process flow diagram of the Web page sequencing method that provides of the embodiment of the invention;
Fig. 2 is the exemplary block diagram of search engine;
Fig. 3 is the structural drawing of the webpage collator that provides of the embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The embodiment of the invention is according to procotol (the Internet Protocol of user capture, IP) daily record is carried out user expert's category division to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user.
Fig. 1 shows the flow process of the Web page sequencing method that the embodiment of the invention provides, and details are as follows.
In step S101, the webpage categorization vector that the storage user establishes.
Wherein, vector is the matrix of one dimension, can preserve the score value of things to all elements of some set.The embodiment of the invention is by distributing a vector to webpage, preserve the value of this webpage to each classification in the classification set, for example, if the classification set is { " physical culture ", " news " }, the vector of webpage has just been preserved this webpage to the score value of " physical culture " and the score value of " news " so, can read this two score values by visiting vector.In actual application, the size of classification set is all on rank up to a hundred, so the webpage vector has just been preserved the score value of each webpage to each classification of these up to a hundred classifications.
Use the vector of a n dimension the webpage categorization vector to all webpages, the dimension n of vector equals the categorical measure of webpage classification set A, this vectorial implication is the weight of this webpage in each classification, be how many ratios that this webpage accounts in each classification is, because a webpage not necessarily belongs to a classification, just can represent with a vector how many weight of this webpage on each classification be.Wherein, in the prior art, a classification set A can both be established according to the content of current internet web page in most website, for example history, military affairs, tourism, humanity, automobile etc.
In step S102, the IP daily record of user capture is classified, determine expert's classification of user according to the maximum IP classification of user capture.
The process prescription of IP daily record that obtains user capture is as follows, the typical structure of search engine as shown in Figure 2, comprise reptile (crawler), index, searcher etc., wherein the work of reptile mainly is identify label (the Uniform Resource Locator Identify to webpage distributing uniform resource localizer, URLID) and download webpage, reptile distributes a unique identifier ID for each pages of Internet, distinguish different URLID, the corresponding structure of this URLID has comprised the content of text of webpage, the adeditive attribute of webpage etc.
Reptile is from the Internet download webpage, and distribution unique URL ID, deposits raw data base in.Index reads info web from raw data base and sets up index, and deposits index data base in.
When the user imported retrieving information and carries out information retrieval, searcher was accepted user's input, obtain returning to this user after record and the ordering from index data base, simultaneously user's operation log recording to the user behavior daily record.
Wherein, when determining expert's classification of user, used algorithm is as follows,
Definition expert array UserType[], UserType[i wherein] expert's classification of i user of expression.
For example, the user imports retrieving information " T43 ", and search engine is classified to the character string of retrieval, obtaining classification is " computer " class, when search engine sorts to the result who retrieves, consider the effect of webpage categorization vector, the webpage bigger the weight of " computer " comes the front.
In step S103, when the user when search engine retrieving result is clicked certain webpage, according to expert's classification of the user who determines value bonus point to the webpage categorization vector of this webpage correspondence.
For example, after the user searches for search engine, select to have clicked a webpage, if this user belongs to the expert of webpage categorization vector, just the classification weight of this webpage bonus point on the vector of correspondence.The i.e. webpage clicked of this user is worth bonus point according to expert's classification of this user accordingly to this webpage categorization vector.
In specific implementation process, according to expert's classification of user the value of the webpage categorization vector of this user's webpage clicking correspondence is being added timesharing, the algorithm of using is as follows,
In step S104, when the user retrieves by search engine, the result of user search is optimized ordering with reference to the score value in the webpage categorization vector.
Wherein, the algorithm that this step is used is as follows,
IF (user search entry " KKK ")
{
" KKK " classified, and the classification that obtains " kkk " is a)
Search engine calls searcher and obtains result for retrieval.
Result for retrieval is carried out presort,, with the pagerank technology Search Results is sorted herein as embodiments of the invention.
For (each result for retrieval webpage c)
{
The webpage categorization vector of inquiry c webpage correspondence, reading this webpage is U about expert's recommendation of classification a a
According to expert's recommendation is U aSize adjust the ranking results of this webpage c, U aBig shifts to an earlier date.
}
Return the collections of web pages after the ordering, and the web results after will sorting shows.
Fig. 3 shows the structure of the webpage collator that the embodiment of the invention provides.
The webpage categorization vector that webpage categorization vector memory module 11 storage users establish, wherein, each vector in this webpage categorization vector is used for identifying the weight of the webpage of this vector correspondence in the set of webpage classification.
User expert's classification determination module 12 is classified the IP daily record of user capture, determine expert's classification of user according to the maximum IP classification of user capture, as user during according to the search engine retrieving result webpage clicking, the webpage categorization vector adds the value bonus point of expert's classification of the user that sub-module 13 determines according to user expert's classification determination module 12 to the webpage categorization vector of this webpage correspondence, detailed process is stated at preamble, just repeats no more herein.
When the user carried out information retrieval by search engine input index, webpage optimization sorting module 14 was optimized ordering with reference to the webpage categorization vector of webpage to the webpage of searching for, and web displaying module 15 is with the web displaying after the optimization sorting.
The embodiment of the invention is carried out user expert's category division according to the IP daily record of user capture to the user, and the webpage of clicking according to the user is to the value bonus point of the webpage categorization vector of this webpage correspondence, when user search information, sort according to the Search Results of this webpage categorization vector to the user, solved and directly leaned on user's number of clicks that webpage is carried out bonus point and causes the user to click maliciously in the prior art, bonus point is recommended problem very blindly.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (4)

1. a Web page sequencing method is characterized in that, said method comprising the steps of:
The webpage categorization vector that the storage user establishes;
The IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of this user;
As user during according to the search engine retrieving result webpage clicking, the value bonus point of the classification identical in the webpage categorization vector of expert's classification according to the user who determines with expert's classification of user to this webpage; Described webpage categorization vector is the weight of this webpage in each webpage classification;
When the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
2. Web page sequencing method as claimed in claim 1 is characterized in that, described method further comprises:
With the web displaying after the optimization sorting.
3. a webpage collator is characterized in that, described device comprises:
Webpage categorization vector memory module is used to store the webpage categorization vector that the user establishes;
User expert's classification determination module is used for the IP daily record of user capture is classified, and the webpage classification that user capture is maximum is defined as expert's classification of this user;
The webpage categorization vector adds sub-module, is used for as user during according to the search engine retrieving result webpage clicking value bonus point of the classification identical with expert's classification of user in the webpage categorization vector of expert's classification to this webpage according to the user who determines; Described webpage categorization vector is the weight of this webpage in each webpage classification;
Webpage optimization sorting module, be used for when the user carries out information retrieval by search engine input index entry, to the processing of classifying of described index entry, obtain the classification of described index entry, according in the webpage categorization vector of search result web page correspondence about the size of the value of index entry classification, adjust the ranking results of webpage, and big the shifting to an earlier date of the value of this index entry classification.
4. webpage collator as claimed in claim 3 is characterized in that, described device further comprises: the web displaying module is used for the web displaying after the optimization sorting.
CN2007100761642A 2007-06-25 2007-06-25 Web page sequencing method and device Active CN101079064B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2007100761642A CN101079064B (en) 2007-06-25 2007-06-25 Web page sequencing method and device
PCT/CN2008/070608 WO2009000174A1 (en) 2007-06-25 2008-03-27 Method and device of web page rank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100761642A CN101079064B (en) 2007-06-25 2007-06-25 Web page sequencing method and device

Publications (2)

Publication Number Publication Date
CN101079064A CN101079064A (en) 2007-11-28
CN101079064B true CN101079064B (en) 2011-11-30

Family

ID=38906543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100761642A Active CN101079064B (en) 2007-06-25 2007-06-25 Web page sequencing method and device

Country Status (2)

Country Link
CN (1) CN101079064B (en)
WO (1) WO2009000174A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515360A (en) 2009-04-13 2009-08-26 阿里巴巴集团控股有限公司 Method and server for recommending network object information to user
CN101840420B (en) * 2010-04-02 2011-12-28 清华大学 Search aid system, search aid method and program
CN101996240A (en) * 2010-10-13 2011-03-30 蔡亮华 Method and device for providing information
CN102542474B (en) 2010-12-07 2015-10-21 阿里巴巴集团控股有限公司 Result ranking method and device
CN102541857A (en) * 2010-12-08 2012-07-04 腾讯科技(深圳)有限公司 Webpage sorting method and device
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results
CN102231152B (en) * 2011-05-25 2014-09-03 北京捷讯华泰科技有限公司 Searching method for precisely inquiring based on IP (Internet Protocol) address of mobile terminal
CN102956009B (en) 2011-08-16 2017-03-01 阿里巴巴集团控股有限公司 A kind of electronic commerce information based on user behavior recommends method and apparatus
CN103164804B (en) 2011-12-16 2016-11-23 阿里巴巴集团控股有限公司 The information-pushing method of a kind of personalization and device
CN109344321B (en) * 2012-05-08 2021-11-02 潍坊久宝智能科技有限公司 System for obtaining user personalized features
TWI465948B (en) * 2012-05-25 2014-12-21 Gemtek Technology Co Ltd Method for dlna pre-browsing and customizing browsing result and digital media device using the same
CN102722545B (en) * 2012-05-25 2015-11-25 百度在线网络技术(北京)有限公司 A kind of method, device and equipment for sorting to releasing news
CN103399861B (en) * 2013-07-04 2017-03-08 百度在线网络技术(北京)有限公司 A kind of network address in Web side navigation recommends methods, devices and systems
CN104636366B (en) * 2013-11-11 2020-06-02 腾讯科技(深圳)有限公司 Method and device for acquiring search result queue
CN105224657B (en) * 2015-09-30 2018-10-12 北京奇虎科技有限公司 A kind of information recommendation method and electronic equipment based on search engine
CN107153656B (en) * 2016-03-03 2020-12-01 阿里巴巴集团控股有限公司 Information searching method and device
CN105763633B (en) * 2016-04-14 2019-05-21 上海牙木通讯技术有限公司 A kind of correlating method of domain name and website visiting behavior
CN107870941B (en) * 2016-09-27 2021-11-02 北京搜狗科技发展有限公司 Webpage sorting method, device and equipment
CN106777201B (en) * 2016-12-23 2021-01-08 北京奇元科技有限公司 Method and device for sorting recommended data on search result page

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6947924B2 (en) * 2002-01-07 2005-09-20 International Business Machines Corporation Group based search engine generating search results ranking based on at least one nomination previously made by member of the user group where nomination system is independent from visitation system
CN1389811A (en) * 2002-02-06 2003-01-08 北京造极人工智能技术有限公司 Intelligent search method of search engine
US7028027B1 (en) * 2002-09-17 2006-04-11 Yahoo! Inc. Associating documents with classifications and ranking documents based on classification weights
US7693827B2 (en) * 2003-09-30 2010-04-06 Google Inc. Personalization of placed content ordering in search results
US20050256848A1 (en) * 2004-05-13 2005-11-17 International Business Machines Corporation System and method for user rank search

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182186A (en) * 2016-12-08 2018-06-19 广东精点数据科技股份有限公司 A kind of Web page sequencing method based on random forests algorithm

Also Published As

Publication number Publication date
CN101079064A (en) 2007-11-28
WO2009000174A1 (en) 2008-12-31

Similar Documents

Publication Publication Date Title
CN101079064B (en) Web page sequencing method and device
US20200311155A1 (en) Systems for and methods of finding relevant documents by analyzing tags
CN101908071B (en) Method and device thereof for improving search efficiency of search engine
CN101154224B (en) Websites navigation method and system thereof
TWI391834B (en) Systems for and methods of finding relevant documents by analyzing tags
CN101295319B (en) Method and device for expanding query, search engine system
US8832058B1 (en) Systems and methods for syndicating and hosting customized news content
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
US20050065959A1 (en) Systems and methods for clustering search results
US20010047353A1 (en) Methods and systems for enabling efficient search and retrieval of records from a collection of biological data
US20070250501A1 (en) Search result delivery engine
US20070162448A1 (en) Adaptive hierarchy structure ranking algorithm
CN102004782A (en) Search result sequencing method and search result sequencer
CN101641697A (en) Related search queries for a webpage and their applications
CN103577489A (en) Method and device of searching web browsing history
CN102722498A (en) Search engine and implementation method thereof
CN102214183A (en) Search engine query method for combining feedback contents of pages with fixed ranking
KR20040017008A (en) System and method for offering information using a search engine
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
Lei et al. Improved relevance ranking in WebGather
JP4094844B2 (en) Document collection apparatus for specific use, method thereof, and program for causing computer to execute
CN1838123A (en) Information search method and system based on fixed keyword
Surendiran Similarity Matrix Approach in Web Clustering
Choudhary et al. Various link algorithms in web mining
CN117370485A (en) Method and system for building index library and retrieval system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160106

Address after: The South Road in Guangdong province Shenzhen city Fiyta building 518057 floor 5-10 Nanshan District high tech Zone

Patentee after: Shenzhen Tencent Computer System Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.