WO2009000174A1 - Procédé et dispositif de classement de pages web - Google Patents

Procédé et dispositif de classement de pages web Download PDF

Info

Publication number
WO2009000174A1
WO2009000174A1 PCT/CN2008/070608 CN2008070608W WO2009000174A1 WO 2009000174 A1 WO2009000174 A1 WO 2009000174A1 CN 2008070608 W CN2008070608 W CN 2008070608W WO 2009000174 A1 WO2009000174 A1 WO 2009000174A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
category
user
web page
vector
Prior art date
Application number
PCT/CN2008/070608
Other languages
English (en)
Chinese (zh)
Inventor
Zhiyuan Liu
Original Assignee
Tencent Technology (Shenzhen) Company Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology (Shenzhen) Company Limited filed Critical Tencent Technology (Shenzhen) Company Limited
Publication of WO2009000174A1 publication Critical patent/WO2009000174A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates to the field of computer applications, and in particular, to a web page sorting method and apparatus. Background of the invention
  • Search engines are an area where competition is currently fierce.
  • the focus of search engine competition is not only rich content, but also user experience.
  • the problem that search engines face now is not insufficient information but excessive information. Searching for a keyword often results in thousands of results.
  • the embodiment of the present invention is implemented by the method for sorting webpages.
  • the method includes: each webpage corresponding to at least one webpage category related to the content, and each webpage corresponds to one webpage a webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage;
  • the webpage category corresponding to the search content is determined, and the searched at least one webpage is sorted according to the value of the element corresponding to the determined webpage category in each webpage category vector.
  • a webpage sorting apparatus comprising: a first module, configured to determine, according to a webpage visited by a user, a webpage category that is most used by a user, and determine a webpage category vector corresponding to the webpage; wherein, each webpage Corresponding to at least one webpage category related to the content, each webpage corresponds to a webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents a weight of each of the at least one webpage category corresponding to the webpage;
  • a second module configured to: when the first user clicks on a webpage, determine, in the webpage category vector determined by the first module corresponding to the webpage that is clicked, an element corresponding to the webpage category that is used by the first user, The value of the determined element;
  • the third module is configured to: when the second user searches for a webpage, determine a webpage category corresponding to the search content, and search for at least the value of the element corresponding to the determined webpage category in each webpage category vector from the second module. Sort a page.
  • FIG. 1 is a flowchart of a web page sorting method according to an embodiment of the present invention
  • FIG. 2 is a structural diagram of a search engine in an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a webpage sorting apparatus according to an embodiment of the present invention. Mode for carrying out the invention
  • the embodiment of the present invention divides the user expert category according to the network protocol (Internet Protocol, IP) log accessed by the user, and adds a score to the value of the webpage category vector corresponding to the webpage according to the webpage clicked by the user, when the user retrieves the information, The user's search results are sorted according to the web page category vector.
  • IP Internet Protocol
  • Embodiments of the present invention provide a method for sorting web pages.
  • each web page corresponds to at least one web page category related to the content
  • each web page corresponds to a web page category vector.
  • the web page category vector includes at least one element, and at least one element included in the web page category vector respectively represents a weight of each of the at least one web page category corresponding to the web page.
  • the web page category vector is an n-dimensional vector, where n is equal to the number of web page categories. It should be noted that the web page category vector can be implemented with an array containing n elements, n equal to the number of web page categories.
  • the type of webpage that the user uses the most is determined based on the webpage that the user has visited, that is, the expert category of the user is determined.
  • the webpage category that the user uses most can be determined according to the behavior of the user, for example, the IP log accessed by the user is classified, ⁇ the webpage category that the user uses most is determined according to the IP category that the user accesses the most, or the search term input by the user can be performed.
  • Classification which determines the category of webpages that users use the most, based on the category to which the user's most used terms belong.
  • there are other implementations that determine the type of webpage that the user uses the most based on the user's behavior all of which are well known to those skilled in the art, and are not listed here.
  • the value of the element is increased.
  • the method of increasing the value of the element may specifically add 1 to the value of the element. Repeat this action when another user clicks on the web page.
  • the second user searches for the webpage, first determine the webpage category corresponding to the search content input by the second user, and sort the searched at least one webpage according to the value of the element corresponding to the determined webpage category in each webpage category vector.
  • FIG. 1 shows a flow of a webpage sorting method provided by an embodiment of the present invention, which is described in detail below.
  • step S101 the web page category vector established by the user is stored.
  • the vector is a one-dimensional matrix, which can save the score of things on all elements of a certain set.
  • the value of each category in the category set is saved, for example, if the category set is ⁇ "sports", "news" ⁇ , then the webpage vector saves the webpage. For the score of "sports" and the score of "news", these two scores can be read by accessing the vector.
  • the size of the category collection is on the hundreds of levels, so the web page vector saves the score of each category of each of the hundreds of categories.
  • a web page category vector Using an n-dimensional vector for all web pages is called a web page category vector.
  • the dimension ⁇ of the vector is equal to the number of categories of the web page category set.
  • the meaning of the vector is the weight of the web page in each category, that is, the web page is in each category. What is the proportion, because a web page does not necessarily belong to a category, a vector can be used to indicate the weight of the web page on each category, and the weight of each category can be represented by an element, indicating each category An array of elements constitutes the vector.
  • most websites are able to establish a category set A according to the content of the current Internet web page, such as history, military, tourism, humanities, automobiles, and the like.
  • step S102 the IP logs accessed by the user are classified, and the expert category of the user is determined according to the IP category that the user accesses the most.
  • the process of obtaining the IP log accessed by the user is described as follows.
  • the typical structure of the search engine shown in FIG. 2 includes a crawler, an indexer, a retriever, etc., wherein the crawler works mainly to allocate a uniform resource locator to the webpage. Uniform Resource Locator Identify (URLID) and download webpage.
  • the crawler assigns a unique identifier ID to each Internet webpage to distinguish different URLIDs. This URLID corresponds to a structure, including the text content of the webpage, and the webpage. Additional properties, etc.
  • the crawler downloads the web page from the Internet and assigns a unique URLID to the original database.
  • the indexer reads the web page information from the original database and indexes it, and stores it in the index database.
  • the retriever receives the user input, obtains the record from the index database and returns it to the user after sorting, and records the user's operation log to the user behavior log.
  • the algorithm used is as follows.
  • the user's expert category represents the most used web page category.
  • the user inputs the search information "T43", and the search engine classifies the retrieved character string to obtain the category "computer”.
  • the search engine sorts the search results, the role of the web page category vector is considered, and the "computer" is Pages with larger weights are ranked first.
  • step S103 when the user clicks on a certain webpage in the search engine search result, the value of the webpage category vector corresponding to the webpage is added according to the determined expert category of the user.
  • the user clicks on a web page. If the user belongs to the expert of the web page category vector, the category weight of the web page is added to the corresponding vector. That is, the web page clicked by the user adds a value to the corresponding value of the web page category according to the expert category of the user, that is, increases the weight of the element.
  • step S104 when the user searches through the search engine, the results of the user search are optimally sorted by referring to the scores in the web page category.
  • the algorithm used in this step is as follows.
  • the search results are pre-sorted as an embodiment of the present invention, where the search results are sorted using the pagerank technique.
  • Each search result page c Query the web page category vector corresponding to the c web page, and read the expert recommendation value of the web page for category a.
  • FIG. 3 shows the structure of a webpage sorting apparatus provided by an embodiment of the present invention.
  • the web page category vector storage module 11 stores a web page category vector established by the user, wherein each vector in the web page category vector is used to identify the weight of the web page corresponding to the vector in the web page category set.
  • Each webpage corresponds to at least one webpage category related to the content, and each webpage corresponds to one webpage category vector, the webpage category vector includes at least one element, and the at least one element respectively represents at least one webpage category corresponding to the webpage. the weight of.
  • the user expert category determining module 12 classifies the IP logs accessed by the user, and determines the expert category of the user according to the IP category that the user accesses the most.
  • the webpage category vector adding module 13 adds a score to the webpage category vector corresponding to the webpage according to the expert category of the user determined by the user expert category determining module 12. The specific process has been described above and will not be repeated here.
  • the webpage optimization ranking module When the user enters an index through the search engine for information retrieval, the webpage optimization ranking module
  • the web page display module 15 will optimize the sorted web page display.
  • the embodiment of the present invention divides the user expert category according to the IP log accessed by the user, and adds a score to the value of the webpage category vector corresponding to the webpage according to the webpage clicked by the user, and when the user retrieves the information, the user is based on the webpage category vector. Sorting the search results, solving the problem in the prior art that the user clicks on the number of clicks directly, causing the user to click maliciously and blindly add points.

Abstract

La présente invention concerne un procédé et un dispositif de classement de pages web adaptés au champ d'application informatique. Selon le procédé : chaque page web correspond à au moins un type de page web lié au contenu, et chaque page web correspond à un vecteur de type de page web contenant au moins un élément représentant le poids respectif du ou des types de page web auxquels la page web correspond ; lorsqu'un premier utilisateur clique sur une page web, l'élément du vecteur de la page web cliquée correspondant au type de page web que le premier utilisateur utilise le plus est déterminé, et la valeur de l'élément déterminé est augmentée ; lorsqu'un second utilisateur recherche des pages web, les types de page web auxquels le contenu de la recherche correspond sont déterminés, et la ou les pages obtenues sont triées sur la base de la valeur de l'élément de chaque vecteur de page web correspondant au type de page web déterminé. Cela permet de résoudre les problèmes des clics vicieux des utilisateurs et de l'ajout aveugle de points aux pages web, qui sont suscités en ajoutant des points aux pages web directement selon les nombres de clics des utilisateurs dans la technique antérieure.
PCT/CN2008/070608 2007-06-25 2008-03-27 Procédé et dispositif de classement de pages web WO2009000174A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2007100761642A CN101079064B (zh) 2007-06-25 2007-06-25 一种网页排序方法及装置
CN200710076164.2 2007-06-25

Publications (1)

Publication Number Publication Date
WO2009000174A1 true WO2009000174A1 (fr) 2008-12-31

Family

ID=38906543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/070608 WO2009000174A1 (fr) 2007-06-25 2008-03-27 Procédé et dispositif de classement de pages web

Country Status (2)

Country Link
CN (1) CN101079064B (fr)
WO (1) WO2009000174A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2709647C1 (ru) * 2016-04-14 2019-12-19 Шанхай Яму Коммюникейшн Текнолоджи Ко., Лтд Способ ассоциирования доменного имени с характеристикой посещения веб-сайта

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101515360A (zh) 2009-04-13 2009-08-26 阿里巴巴集团控股有限公司 向用户推荐网络目标信息的方法和服务器
CN101840420B (zh) * 2010-04-02 2011-12-28 清华大学 搜索辅助系统与搜索辅助方法
CN101996240A (zh) * 2010-10-13 2011-03-30 蔡亮华 提供信息的方法和装置
CN102542474B (zh) 2010-12-07 2015-10-21 阿里巴巴集团控股有限公司 查询结果排序方法及装置
CN102541857A (zh) * 2010-12-08 2012-07-04 腾讯科技(深圳)有限公司 一种网页排序方法和装置
CN102722503A (zh) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 一种对检索结果进行排序的方法及装置
CN102231152B (zh) * 2011-05-25 2014-09-03 北京捷讯华泰科技有限公司 基于移动终端ip地址进行精确查询的搜索方法
CN102956009B (zh) 2011-08-16 2017-03-01 阿里巴巴集团控股有限公司 一种基于用户行为的电子商务信息推荐方法与装置
CN103164804B (zh) 2011-12-16 2016-11-23 阿里巴巴集团控股有限公司 一种个性化的信息推送方法及装置
CN103390008B (zh) * 2012-05-08 2018-09-28 六六鱼信息科技(上海)有限公司 一种获取用户个性化特征的方法和系统
CN102722545B (zh) * 2012-05-25 2015-11-25 百度在线网络技术(北京)有限公司 一种用于对已发布信息进行排序的方法、装置与设备
TWI465948B (zh) * 2012-05-25 2014-12-21 Gemtek Technology Co Ltd 前置瀏覽及瀏覽資料客製化的方法及其數位媒體裝置
CN103399861B (zh) * 2013-07-04 2017-03-08 百度在线网络技术(北京)有限公司 一种网址导航中的网址推荐方法、装置和系统
CN104636366B (zh) * 2013-11-11 2020-06-02 腾讯科技(深圳)有限公司 一种获取搜索结果队列的方法和装置
CN105224657B (zh) * 2015-09-30 2018-10-12 北京奇虎科技有限公司 一种基于搜索引擎的信息推荐方法及电子设备
CN107153656B (zh) * 2016-03-03 2020-12-01 阿里巴巴集团控股有限公司 一种信息搜索方法和装置
CN107870941B (zh) * 2016-09-27 2021-11-02 北京搜狗科技发展有限公司 一种网页排序方法、装置及设备
CN108182186B (zh) * 2016-12-08 2020-10-02 广东精点数据科技股份有限公司 一种基于随机森林算法的网页排序方法
CN106777201B (zh) * 2016-12-23 2021-01-08 北京奇元科技有限公司 搜索结果页上的推荐数据的排序方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1389811A (zh) * 2002-02-06 2003-01-08 北京造极人工智能技术有限公司 搜索引擎的智能化搜索方法
US20030131000A1 (en) * 2002-01-07 2003-07-10 International Business Machines Corporation Group-based search engine system
US20050256848A1 (en) * 2004-05-13 2005-11-17 International Business Machines Corporation System and method for user rank search
WO2006017364A1 (fr) * 2004-07-13 2006-02-16 Google, Inc. Personnalisation de classement de contenu place dans des resultats de recherche
US7028027B1 (en) * 2002-09-17 2006-04-11 Yahoo! Inc. Associating documents with classifications and ranking documents based on classification weights

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030131000A1 (en) * 2002-01-07 2003-07-10 International Business Machines Corporation Group-based search engine system
CN1389811A (zh) * 2002-02-06 2003-01-08 北京造极人工智能技术有限公司 搜索引擎的智能化搜索方法
US7028027B1 (en) * 2002-09-17 2006-04-11 Yahoo! Inc. Associating documents with classifications and ranking documents based on classification weights
US20050256848A1 (en) * 2004-05-13 2005-11-17 International Business Machines Corporation System and method for user rank search
WO2006017364A1 (fr) * 2004-07-13 2006-02-16 Google, Inc. Personnalisation de classement de contenu place dans des resultats de recherche

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2709647C1 (ru) * 2016-04-14 2019-12-19 Шанхай Яму Коммюникейшн Текнолоджи Ко., Лтд Способ ассоциирования доменного имени с характеристикой посещения веб-сайта
RU2709647C9 (ru) * 2016-04-14 2020-04-02 Шанхай Яму Коммюникейшн Текнолоджи Ко., Лтд Способ ассоциирования доменного имени с характеристикой посещения веб-сайта

Also Published As

Publication number Publication date
CN101079064A (zh) 2007-11-28
CN101079064B (zh) 2011-11-30

Similar Documents

Publication Publication Date Title
WO2009000174A1 (fr) Procédé et dispositif de classement de pages web
Davison Recognizing nepotistic links on the web
US6560600B1 (en) Method and apparatus for ranking Web page search results
Wu et al. Identifying link farm spam pages
US7383299B1 (en) System and method for providing service for searching web site addresses
US8751511B2 (en) Ranking of search results based on microblog data
US9443022B2 (en) Method, system, and graphical user interface for providing personalized recommendations of popular search queries
US10025855B2 (en) Federated community search
US20050060290A1 (en) Automatic query routing and rank configuration for search queries in an information retrieval system
US20070233808A1 (en) Propagating useful information among related web pages, such as web pages of a website
US20060095430A1 (en) Web page ranking with hierarchical considerations
CN111708740A (zh) 基于云平台的海量搜索查询日志计算分析系统
US20070067304A1 (en) Search using changes in prevalence of content items on the web
US20060282413A1 (en) System and method for a search engine using reading grade level analysis
KR20070098521A (ko) 웹 크롤링 프로세스 동안 웹 사이트에 우선순위를 부여하기위한 시스템 및 방법
WO2007051397A1 (fr) Systeme d’extraction d’informations et procede d’extraction d’informations
Baeza-Yates Web usage mining in search engines
WO2005111787A2 (fr) Procedes et appareils permettant de mettre en place un moteur de recherche local
US20150186385A1 (en) Method, System, and Graphical User Interface For Improved Search Result Displays Via User-Specified Annotations
US20070094250A1 (en) Using matrix representations of search engine operations to make inferences about documents in a search engine corpus
US20100125781A1 (en) Page generation by keyword
US9275145B2 (en) Electronic document retrieval system with links to external documents
EP1938214A1 (fr) Recherche basee sur des changements de frequence d'occurrence d'elements de contenu sur le web
WO1997049048A1 (fr) Systeme et procede de recherche de documents hypertextes
US20140046951A1 (en) Automated substitution of terms by compound expressions during indexing of information for computerized search

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08715344

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 446/CHENP/2010

Country of ref document: IN

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.04.10)

122 Ep: pct application non-entry in european phase

Ref document number: 08715344

Country of ref document: EP

Kind code of ref document: A1