WO2016066066A1 - 以锚文本作为网页标题的方法和装置 - Google Patents

以锚文本作为网页标题的方法和装置 Download PDF

Info

Publication number
WO2016066066A1
WO2016066066A1 PCT/CN2015/092752 CN2015092752W WO2016066066A1 WO 2016066066 A1 WO2016066066 A1 WO 2016066066A1 CN 2015092752 W CN2015092752 W CN 2015092752W WO 2016066066 A1 WO2016066066 A1 WO 2016066066A1
Authority
WO
WIPO (PCT)
Prior art keywords
anchor text
title
anchor
web page
webpage
Prior art date
Application number
PCT/CN2015/092752
Other languages
English (en)
French (fr)
Inventor
魏少俊
Original Assignee
北京奇虎科技有限公司
奇智软件(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201410602298.3A external-priority patent/CN104331458B/zh
Priority claimed from CN201410602297.9A external-priority patent/CN104317931B/zh
Application filed by 北京奇虎科技有限公司, 奇智软件(北京)有限公司 filed Critical 北京奇虎科技有限公司
Publication of WO2016066066A1 publication Critical patent/WO2016066066A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a method and apparatus for using anchor text as a web page title.
  • the page title is a high-level summary of a web page that reflects the core content of the web page. Search algorithms usually give higher weight to words in the title, so the page title is very important in SEO (Search Engine Optimization).
  • the webmaster will add a lot of keywords to the page title, such as some duplicates or pages. Keywords with irrelevant content lead to long titles. For example, the title below, “Android (Android) development video tutorial - Lao Luo Android development video tutorial - video tutorial - mobile development portal", in fact, the real valuable information in the title is "Lao Luo Android development video tutorial.” Long title does not have a material impact on the user's browsing, and a terminal with a limited screen size (such as a mobile phone) can result in a significant waste of screen display space.
  • the search engine cuts off the title, but the effect of cutting off the fixed length is obviously not good. Therefore, how to provide a web page title that is concise and can summarize the content of the webpage has become a technical problem to be solved at present.
  • the present invention has been made in order to provide a method and corresponding apparatus for anchoring text as a web page title that overcomes the above problems or at least partially solves or alleviates the above problems.
  • a method for using anchor text as a web page title including:
  • the anchor text corresponding to the external anchor text link is substituted for the original web page title as the web page title of the web page.
  • replacing the anchor text corresponding to the external anchor text link with the original web page title as the web page title of the web page includes:
  • An anchor text is selected from the one or more different anchor texts as the web page title of the web page.
  • an apparatus for using anchor text as a web page title includes:
  • a determining module adapted to determine an external anchor text link of the web page pointing to the title to be determined
  • the processing module is adapted to replace the original webpage title corresponding to the anchor text of the external anchor text link as the webpage title of the webpage.
  • the processing module further includes:
  • Obtaining a submodule adapted to obtain one or more different anchor texts corresponding to one or more external links of the webpage pointing to the title to be determined;
  • a selection sub-module adapted to select an anchor text from the one or more different anchor texts as a web page title of the web page.
  • a computer program comprising computer readable code, when said computer readable code is run on a computing device, causing said computing device to perform an anchor text as a web page title .
  • an external anchor text link of the webpage pointing to the title to be determined is determined, and the anchor text corresponding to the external anchor text link is used as the webpage title of the webpage instead of the original webpage title.
  • the anchor text corresponding to the external anchor text link is a description of the web page pointed to by the external anchor text link by other web pages, and can accurately describe the content of the pointed web page.
  • the anchor text is not added to many keywords by the webmaster or administrator of the webpage where the other webpage is located, such as some keywords that are duplicated or not related to the webpage content;
  • the original web page title of the webpage pointed to by the webpage is created by the webmaster of the webpage where the webpage is located, and the webmaster adds a lot of keywords to the webpage title, resulting in a long title. Therefore, the anchor text corresponding to the external anchor text link is compared with the original webpage title as the new webpage title, and the description of the pointed webpage is more concise in the format or the number of words, and is not added to the irrelevant key. Words make the description of the web page more accurate and objective.
  • the present invention replaces the original webpage title with the anchor text corresponding to the external anchor text link as the webpage title of the webpage, and can provide a webpage title that is simple and accurate and objectively summarizes the webpage content.
  • FIG. 1A shows a flow chart of a method of using anchor text as a web page title, in accordance with one embodiment of the present invention
  • FIG. 1B shows a flow chart of a method of using anchor text as a web page title in accordance with another embodiment of the present invention
  • FIG. 2 is a schematic diagram showing a title displayed in a search result of a mobile terminal (such as a mobile phone) with an original web page title;
  • FIG. 3 is a schematic diagram showing the use of the anchor text as a web page title in a search result of a mobile terminal using the present invention
  • FIG. 4A is a schematic structural diagram of an apparatus for using anchor text as a web page title according to an embodiment of the present invention.
  • 4B is a schematic structural diagram of an apparatus for using anchor text as a web page title according to another embodiment of the present invention.
  • FIG. 5 shows a block diagram of a computing device for performing a method of anchoring text as a web page title in accordance with the present invention
  • Figure 6 shows a storage unit for holding or carrying program code implementing a method of anchor text as a web page title in accordance with the present invention.
  • FIG. 1A shows a flowchart of a method for using anchor text as a web page title according to an embodiment of the present invention. As shown in FIG. 1A, the method includes at least the following steps S102 to S104.
  • Step S102 Determine an external anchor text link of the webpage pointing to the title to be determined.
  • Step S104 replacing the anchor text corresponding to the external anchor text link with the original web page title.
  • an external anchor text link of the webpage pointing to the title to be determined is determined, and the anchor text corresponding to the external anchor text link is used as the webpage title of the webpage instead of the original webpage title.
  • the anchor text corresponding to the external anchor text link is a description of the web page pointed to by the external anchor text link by other web pages, and can accurately describe the content of the pointed web page.
  • the anchor text is not added to many keywords by the webmaster or administrator of the webpage where the other webpage is located, such as some keywords that are duplicated or not related to the webpage content;
  • the original web page title of the webpage pointed to by the webpage is created by the webmaster of the webpage where the webpage is located, and the webmaster adds a lot of keywords to the webpage title, resulting in a long title. Therefore, the anchor text corresponding to the external anchor text link is compared with the original webpage title as the new webpage title, and the description of the pointed webpage is more concise in the format or the number of words, and is not added to the irrelevant key. Words make the description of the web page more accurate and objective.
  • the present invention replaces the original webpage title with the anchor text corresponding to the external anchor text link as the webpage title of the webpage, and can provide a webpage title that is simple and accurate and objectively summarizes the webpage content.
  • FIG. 1B illustrates a flow chart of a method of using anchor text as a web page title in accordance with another embodiment of the present invention. As shown in FIG. 1B, the method includes at least steps S102 to S108.
  • Step S102 Determine an external anchor text link of the webpage pointing to the title to be determined.
  • Step S106 Acquire one or more different anchor texts corresponding to one or more external links of the webpage to which the title is to be determined.
  • Step S108 Select one anchor text from one or more different anchor texts as the webpage title of the webpage.
  • one or more different anchor texts corresponding to one or more external links of the webpage corresponding to the title to be determined are acquired, and then one anchor text is selected from one or more different anchor texts as the The page title of the page.
  • the anchor text corresponding to the external link is a description of the webpage pointed to by the external link by other webpages, and can accurately describe the content of the webpage pointed to.
  • the invention selects a more suitable anchor text from one or more different anchor texts as the webpage title of the webpage, and the description of the pointed webpage is more concise in the format or the number of words, and is not added irrelevant.
  • the keywords make the description of the webpage more accurate and objective.
  • the present invention selects an anchor text from one or more different anchor texts as the web page title of the web page, and can provide a web page title that is simple and accurate and objectively summarizes the web content.
  • the external anchor text link mentioned in the above step S102 refers to a link imported from another website to its own website, and the link appears in the form of anchor text.
  • the anchor text form here is in the form of text or a picture.
  • the web page to be determined is the web page b of the B website, from the A network.
  • the link "B page b of the website B" is imported to the page b of the B website, then the link "Web page b of the website B" of the website A can be used as the external anchor text of the web page pointing to the title to be determined (ie the web page b of the B website).
  • Link, here link "B page b of the website” may be in the form of text or pictures.
  • the external anchor text link of the webpage pointing to the title to be determined may be determined by the link relationship between the webpages captured by the webpage grabber, where the webpage crawler may be a web crawler, a web spider, a web robot or the like.
  • the web page title may be a title displayed in the search result, or a title recorded when the search engine includes the web page.
  • the web page title of the site where the web page is located or the title of the web page created or determined by the administrator for the web page is used as the title displayed in the search result or the title recorded when the search engine includes the web page. That is, if the webpage title of the present invention is the title displayed in the search result, or the title recorded when the search engine includes the webpage, the original webpage title of the webpage may be the webmaster or administrator of the webpage where the webpage is located. The title of the page that was created or determined.
  • the anchor text corresponding to the external anchor text link is used as the webpage title of the webpage instead of the original webpage title, which can provide a webpage title that is simple and accurate and objectively summarizes the webpage content.
  • each anchor text of one or more different anchor texts corresponds to one or more external links
  • the present invention provides a preferred method of clustering, as mentioned in step S106, obtaining one or more external links
  • Corresponding one or more different anchor texts in which the anchor text corresponding to each external link in one or more external links may be acquired, and corresponding to each external link in the acquired one or more external links
  • the anchor text is clustered to generate multiple groups of anchor text, where the anchor text in each group is the same.
  • the anchor text corresponding to each of the plurality of groups is then used as one or more different anchor texts corresponding to one or more external links.
  • the one or more external links of the webpage pointing to the title to be determined are link 1, link 2, link 3, link 4, link 5, and link 6, and the anchor text corresponding to each link is anchor text A and anchor respectively.
  • Text B, anchor text C, anchor text B, anchor text C, anchor text D at this time, these anchor texts can be clustered and analyzed, and the same anchor text is clustered into one group, thus obtaining a plurality of groups, and thus obtained
  • One or more different anchor texts are anchor text A, anchor text B, anchor text C, and anchor text D.
  • step S108 selects one anchor text from one or more different anchor texts as the web page title of the web page.
  • selecting one anchor text from one or more anchor texts instead of the original webpage title as the webpage title of the webpage can be implemented in various ways, such as according to the text length of the anchor text or the level of the anchor text, which will be described in detail below. Two ways.
  • Method 1 a way of selecting an anchor text from one or more anchor texts according to the text length of the anchor text.
  • the text length of each anchor text in one or more anchor texts may be determined, and then an anchor text is selected from the anchor text whose text length is less than or equal to the specified length to replace the original webpage title as the webpage title of the webpage.
  • the specified length here may be determined according to actual conditions or requirements, such as determining the size of the terminal that presents the search result or taking the average of the plurality of anchor text lengths as the specified length, and the like.
  • Method 2 a method of selecting an anchor text from one or more anchor texts according to the level of the anchor text.
  • the parameter values of each anchor text in one or more anchor texts may be obtained, and the level of each anchor text is calculated according to the obtained parameter values of each anchor text, and then the anchor text of the specified level is selected instead.
  • the original page title is the page title of the page.
  • the parameter value of each anchor text may be the total number of external links corresponding to each anchor text, and the total number of web pages corresponding to each anchor text corresponding to the external resource locator URL of the web page and the external link of the main domain, each The total number of pages of the external link of the main domain different from the URL of the webpage corresponding to the URL of the webpage, the page rank of the webpage where the external link corresponding to each anchor text is PageRank, the number of times the external link corresponding to each anchor text is clicked, etc. .
  • the weights of the parameter values of each anchor text may be determined, and then the parameter values of each anchor text are weighted to calculate the level of each anchor text.
  • obtaining parameter values of each anchor text in one or more different anchor texts is P1, P2, P3, P4, and P5, respectively indicating the total number of external links corresponding to each anchor text
  • each anchor text corresponds to The total number of web pages where the uniform resource locator URL of the web page is the same as the external link of the main domain, the total number of web pages corresponding to each anchor text and the external link of the main domain different from the URL of the web page, and the external content corresponding to each anchor text.
  • Determining the weight values of the parameter values of each anchor text are respectively a1, a2, a3, a4, a5, according to the parameter values P1, P2, P3, P4, P5 of each anchor text and the parameter values of each anchor text.
  • the weights a1, a2, a3, a4, a5 are weighted for one or more of the parameter values of each anchor text to obtain the rank of each anchor text.
  • the calculated rank of each anchor text can be sorted, and the anchor text ranked first (ie, the highest rank) is selected as the anchor text of the specified rank.
  • one anchor text may be selected from one or more anchor texts in combination with the above manners one and two. For example, first, the anchor text whose text length is less than or equal to the specified length is first determined, and then the determined anchor text level is calculated, and then the anchor text of the specified level is selected to replace the original webpage title as the webpage title of the webpage. As another example, the text length is used as a parameter value for each anchor text, and the level of each anchor text is calculated.
  • the above list is merely illustrative, and other combinations are also applicable to the present invention.
  • FIG. 2 shows a schematic diagram of the original webpage title as the title displayed in the search result of the mobile terminal (such as a mobile phone), and the original webpage title can be found.
  • “Android development video tutorial - Lao Luo Android development video tutorial - video tutorial - mobile development portal” displayed too long on the mobile terminal, resulting in a significant waste of screen display space.
  • FIG. 3 is a schematic diagram showing the use of the anchor text as a webpage title in a mobile terminal search result according to the present invention, and replacing the original webpage title with the anchor text "Lao Luo Android Development Video tutorial” "Android Development Video tutorial - Lao Luo Android development video tutorial - video tutorial - mobile development portal, makes the title more concise without loss of information, and can save the screen display space location.
  • an embodiment of the present invention further provides an apparatus for using anchor text as a web page title to implement the above method of using anchor text as a web page title.
  • FIG. 4A is a block diagram showing the structure of an apparatus for anchor text as a web page title, in accordance with one embodiment of the present invention.
  • the apparatus at least includes: a determining module 410 and a processing module 420.
  • a determining module 410 adapted to determine an external anchor text link of the webpage pointing to the title to be determined
  • the processing module 420 is coupled to the determining module 410, and is adapted to replace the original webpage title with the anchor text corresponding to the external anchor text link as the webpage title of the webpage.
  • the external anchor text link determined by the determination module 410 refers to a link imported from another website to its own website, and the link appears in the form of anchor text.
  • the anchor text form here is in the form of text or a picture.
  • the web page title includes a title displayed in the search results or a title recorded when the search engine includes the web page.
  • the web page title of the site where the web page is located or the title of the web page created or determined by the administrator for the web page is used as the title displayed in the search result or the title recorded when the search engine includes the web page. That is, if the web page title of the present invention is When the title displayed in the search result, or the title recorded when the search engine includes the webpage, the original webpage title of the webpage may be the webpage title of the webpage where the webpage is located or the webpage title created or determined by the administrator for the webpage.
  • the processing module 420 further includes an obtaining submodule 421 and a selecting submodule 422.
  • the obtaining sub-module 421 is adapted to acquire one or more different anchor texts corresponding to one or more external links of the webpage to be determined by the title;
  • the selection sub-module 422, coupled to the acquisition sub-module 421, is adapted to select an anchor text from one or more different anchor texts as the web page title of the web page.
  • the obtaining sub-module 421 is further adapted to: parse the web page of the title to be determined, determine one or more external links to the web page; and obtain one or more different anchor texts corresponding to the one or more external links.
  • the webpage of the webpage to be determined may be parsed, and the link relationship between the webpages captured by the webpage crawler may be obtained, thereby determining one or more external links pointing to the webpage, wherein the web crawler may be a web crawler or a webpage. Spiders, web robots, etc.
  • each anchor text in one or more different anchor texts corresponds to one or more external links.
  • the obtaining sub-module 421 is further adapted to acquire one or more different anchor texts corresponding to one or more external links by means of clustering, that is, obtain each external link in one or more external links.
  • clustering that is, obtain each external link in one or more external links.
  • clustering the anchor text corresponding to each external link in the acquired one or more external links to generate multiple groups of anchor text, wherein the anchor text in each group is the same;
  • the respective anchor texts are one or more different anchor texts corresponding to one or more external links.
  • the selection sub-module 422 is further adapted to select an anchor text from one or more anchor texts according to the text length of the anchor text, ie to determine the text length of each anchor text in the one or more anchor texts; An anchor text whose text length is less than or equal to the specified length is selected to replace the original web page title as the web page title of the web page.
  • the selection sub-module 422 is further adapted to: determine a level of each of the anchor texts in the one or more different anchor texts; select the anchor text of the specified level as the web page title of the web page.
  • the selection sub-module 422 is further adapted to: obtain parameter values of each anchor text in one or more different anchor texts; calculate a level of each anchor text according to the obtained parameter values of each anchor text .
  • the selection sub-module 422 is further adapted to: determine respective weights of parameter values for each anchor text; weighting the parameter values of each anchor text to calculate a level of each anchor text.
  • the specified level is the highest level.
  • the calculated level of each anchor text can be sorted, and the anchor text ranked first (ie, the highest level) is selected as the anchor text of the specified level.
  • the parameter value of each anchor text includes at least one of the following:
  • the embodiment of the present invention can achieve the following beneficial effects:
  • an external anchor text link of the webpage pointing to the title to be determined is determined, and the anchor text corresponding to the external anchor text link is used as the webpage title of the webpage instead of the original webpage title.
  • the anchor text corresponding to the external anchor text link is a description of the web page pointed to by the external anchor text link by other web pages, and can accurately describe the content of the pointed web page.
  • the anchor text is not added to many keywords by the webmaster or administrator of the webpage where the other webpage is located, such as some keywords that are duplicated or not related to the webpage content;
  • the original web page title of the webpage pointed to by the webpage is created by the webmaster of the webpage where the webpage is located, and the webmaster adds a lot of keywords to the webpage title, resulting in a long title. Therefore, the anchor text corresponding to the external anchor text link is compared with the original webpage title as the new webpage title, and the description of the pointed webpage is more concise in the format or the number of words, and is not added to the irrelevant key. Words make the description of the web page more accurate and objective.
  • the present invention replaces the original webpage title with the anchor text corresponding to the external anchor text link as the webpage title of the webpage, and can provide a webpage title that is simple and accurate and objectively summarizes the webpage content.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • the various component embodiments of the present invention may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of some or all of the components of the device with anchor text as the title of the web page in accordance with an embodiment of the present invention.
  • the invention can also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • a program implementing the invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 5 illustrates a computing device that can implement anchor text as a web page title in accordance with the present invention.
  • the computing device conventionally includes a processor 510 and a computer program product or computer readable medium in the form of a memory 520.
  • the memory 520 can be, for example, a flash memory, an EEPROM (electrical An electronic memory such as an erasable programmable read only memory, an EPROM, a hard disk, or a ROM.
  • Memory 520 has a memory space 530 for program code 531 for performing any of the method steps described above.
  • storage space 530 for program code may include various program code 531 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to memory 520 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • the storage unit includes computer readable code 531 ', ie, code readable by a processor, such as 510, that when executed by a computing device causes the computing device to perform each of the methods described above step. "an embodiment," or "an embodiment," or "an embodiment," In addition, it is noted that the phrase "in one embodiment" is not necessarily referring to the same embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种以锚文本作为网页标题的方法及装置,该方法包括:确定指向待确定标题的网页的外部锚文本链接(S102);将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题(S104)。若所述外部锚文本链接的个数为多个,还包括:获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本(S106);从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题(S108)。该方法和装置将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。

Description

以锚文本作为网页标题的方法和装置 技术领域
本发明涉及互联网技术领域,尤其涉及一种以锚文本作为网页标题的方法和装置。
背景技术
网页标题是对一个网页的高度概括,它体现了网页的核心内容。搜索算法通常会对标题中的词赋予更高权重,因而网页标题在SEO(Search Engine Optimization,搜索引擎优化)中非常重要,站长会在网页标题中加入很多关键词,如一些重复或与网页内容不相关的关键词,导致标题很长。比如下面这个标题,“Android(安卓)开发视频教程-老罗Android开发视频教程-视频教程-移动开发门户”,实质上该标题中真正有价值的信息是“老罗Android开发视频教程”。长标题不会对用户的浏览产生实质性影响,而对于屏幕大小有限的终端(如手机)会导致明显的屏幕显示空间浪费。
相关技术中,搜索引擎对标题进行截断处理,然而截断固定长度效果显然不好。因而,如何提供简洁又能概括网页内容的网页标题成为目前亟待解决的技术问题。
发明内容
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决或者减缓上述问题的以锚文本作为网页标题的方法和相应的装置。
根据本发明的一个方面,提供了一种以锚文本作为网页标题的方法,包括:
确定指向待确定标题的网页的外部锚文本链接;
将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题。
进一步的,若所述外部锚文本链接的个数为多个,将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题,包括:
获取指向待确定标题的网页的一个或多个外部链接对应的一个或多 个不同的锚文本;
从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题。
根据本发明的另一个方面,提供了一种以锚文本作为网页标题的装置,包括:
确定模块,适于确定指向待确定标题的网页的外部锚文本链接;
处理模块,适于将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题。
进一步的,若所述外部锚文本链接的个数为多个,所述处理模块还包括:
获取子模块,适于获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本;
选择子模块,适于从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题。
根据本发明的另一个方面,还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行以锚文本作为网页标题的方法。
根据本发明的另一个方面,还提供了一种计算机可读介质,其中存储了上述计算机程序。
本发明的有益效果为:
依据本发明提供的技术方案,确定指向待确定标题的网页的外部锚文本链接,将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题。外部锚文本链接对应的锚文本是其他网页对外部锚文本链接所指向网页的一种描述,能够准确描述所指向网页的内容。与该所指向网页的原本的网页标题不同的是:锚文本中不会被其他网页所在站点的站长或管理员加入很多关键词,如一些重复或与网页内容不相关的关键词等;而该所指向网页的原本的网页标题是由该指向网页的所在站点的站长制作的,站长会在网页标题中加入很多关键词,导致标题很长。因而,外部锚文本链接对应的锚文本作为新的网页标题与原本的网页标题相比较,其对该所指向网页的描述在格式或字数上体现为更加简洁,且不会被加入不相关的关键词,使得新的网页标题对网页的描述更为准确、客观。综上可知,本发明将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1A示出了根据本发明一个实施例的以锚文本作为网页标题的方法的流程图;
图1B示出了根据本发明另一个实施例的以锚文本作为网页标题的方法的流程图;
图2示出了以原本的网页标题作为移动终端(如手机)搜索结果中显示的标题的示意图;
图3示出了采用本发明的以锚文本作为网页标题在移动终端搜索结果中显示的示意图;
图4A示出了根据本发明一个实施例的以锚文本作为网页标题的装置的结构示意图;
图4B示出了根据本发明另一个实施例的以锚文本作为网页标题的装置的结构示意图;
图5示出了用于执行根据本发明的以锚文本作为网页标题的方法的计算设备的框图;以及
图6示出了用于保持或者携带实现根据本发明的以锚文本作为网页标题的方法的程序代码的存储单元。
具体实施方式
下面结合附图和具体的实施方式对本发明作进一步的描述。为解决上述技术问题,本发明实施例提供了一种以锚文本作为网页标题的方法,图1A示出了根据本发明一个实施例的以锚文本作为网页标题的方法的流程图。如图1A所示,该方法至少包括以下步骤S102至步骤S104。
步骤S102、确定指向待确定标题的网页的外部锚文本链接。
步骤S104、将外部锚文本链接对应的锚文本替代原本的网页标题作 为网页的网页标题。
依据本发明提供的技术方案,确定指向待确定标题的网页的外部锚文本链接,将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题。外部锚文本链接对应的锚文本是其他网页对外部锚文本链接所指向网页的一种描述,能够准确描述所指向网页的内容。与该所指向网页的原本的网页标题不同的是:锚文本中不会被其他网页所在站点的站长或管理员加入很多关键词,如一些重复或与网页内容不相关的关键词等;而该所指向网页的原本的网页标题是由该指向网页的所在站点的站长制作的,站长会在网页标题中加入很多关键词,导致标题很长。因而,外部锚文本链接对应的锚文本作为新的网页标题与原本的网页标题相比较,其对该所指向网页的描述在格式或字数上体现为更加简洁,且不会被加入不相关的关键词,使得新的网页标题对网页的描述更为准确、客观。综上可知,本发明将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。
图1B示出了根据本发明另一个实施例的以锚文本作为网页标题的方法的流程图。如图1B所示,该方法至少包括步骤S102至步骤S108。
步骤S102、确定指向待确定标题的网页的外部锚文本链接。
步骤S106、获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本。
步骤S108、从一个或多个不同的锚文本中选择一个锚文本作为网页的网页标题。
依据本发明提供的技术方案,获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本,进而从一个或多个不同的锚文本中选择一个锚文本作为该网页的网页标题。外部链接对应的锚文本是其他网页对外部链接所指向网页的一种描述,能够准确描述所指向网页的内容。本发明从一个或多个不同的锚文本中选择一个更加合适的锚文本作为网页的网页标题,其对该所指向网页的描述在格式或字数上体现为更加简洁,且不会被加入不相关的关键词,使得对网页的描述更为准确、客观。综上可知,本发明从一个或多个不同的锚文本中选择一个锚文本作为该网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。
上文步骤S102中提及的外部锚文本链接是指从别的网站导入到自己网站的链接,且该链接是以锚文本的形式出现。这里的锚文本形式如文本或图片的形式。举例来说,待确定标题的网页为B网站的网页b,从A网 站的链接“B网站的网页b”导入到B网站的网页b,那么A网站的链接“B网站的网页b”可以作为指向待确定标题的网页(即B网站的网页b)的外部锚文本链接,这里链接“B网站的网页b”可以是以文本形式或图片形式。进一步地,可以通过网页抓取器抓取的网页之间的链接关系来确定指向待确定标题的网页的外部锚文本链接,这里的网页抓取器可以是网络爬虫、网页蜘蛛、网络机器人等。
然后,将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题。其中,网页标题可以是搜索结果中显示的标题,或搜索引擎收录网页时记录的标题等。在传统的搜索引擎中,会直接使用网页所在站点的站长或管理员为该网页制作或确定的网页标题作为搜索结果中显示的标题或搜索引擎收录网页时记录的标题。也就是说,若本发明的网页标题为搜索结果中显示的标题,或搜索引擎收录网页时记录的标题时,则网页的原本的网页标题可以为网页所在站点的站长或管理员为该网页制作或确定的网页标题。正如前文介绍,由于网页标题在SEO中非常重要,因而站长或管理员会在网页标题中加入很多关键词,导致标题很长。长标题不会对用户的浏览产生实质性影响,而对于屏幕大小有限的终端(如手机)会导致明显的屏幕显示空间浪费。本发明实施例将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。
若一个或多个不同的锚文本中每个锚文本对应一个或多个外部链接,本发明提供了一种优选的聚类的方法,如步骤S106所提及的,获取一个或多个外部链接对应的一个或多个不同的锚文本,在该方案中,可以获取一个或多个外部链接中每个外部链接对应的锚文本,进而对获取的一个或多个外部链接中每个外部链接对应的锚文本进行聚类,生成锚文本的多个分组,其中,每个分组中的锚文本相同。之后将多个分组各自对应的锚文本作为一个或多个外部链接对应的一个或多个不同的锚文本。举例来说,指向待确定标题的网页的一个或多个外部链接为链接1、链接2、链接3、链接4、链接5、链接6,每个链接对应的锚文本分别为锚文本A、锚文本B、锚文本C、锚文本B、锚文本C、锚文本D,此时可以对这些锚文本进行聚类分析,将相同锚文本聚类到一个分组,这样得到多个分组,进而得到的一个或多个不同的锚文本为锚文本A、锚文本B、锚文本C以及锚文本D。
在步骤S106获取指向待确定标题的网页的一个或多个外部链接对应 的一个或多个不同的锚文本之后,步骤S108从一个或多个不同的锚文本中选择一个锚文本作为网页的网页标题。这里,从一个或多个锚文本中选择一个锚文本替代原本的网页标题作为网页的网页标题可以通过多种方式来实现,如根据锚文本的文本长度或锚文本的等级,下面将详细介绍这两种方式。
方式一,根据锚文本的文本长度从一个或多个锚文本中选择一个锚文本的方式。在方式一中,可以确定一个或多个锚文本中每个锚文本的文本长度,随后从文本长度小于或等于指定长度的锚文本中选择一个锚文本替代原本的网页标题作为网页的网页标题。这里的指定长度可以根据实际情况或需求来确定,如根据呈现搜索结果的终端的尺寸来确定或者将多个锚文本长度的平均值作为指定长度等等。
方式二,根据锚文本的等级从一个或多个锚文本中选择一个锚文本的方式。在方式二中,可以获取一个或多个锚文本中每个锚文本的参数值,并根据获取的每个锚文本的参数值,计算每个锚文本的等级,进而选择指定等级的锚文本替代原本的网页标题作为网页的网页标题。这里,每个锚文本的参数值可以是每个锚文本对应的外部链接的总数,每个锚文本对应的、与网页的统一资源定位符URL同主域的外部链接所在网页的总数,每个锚文本对应的、与网页的URL不同主域的外部链接所在网页的总数,每个锚文本对应的外部链接所在网页的网页等级PageRank,每个锚文本对应的外部链接被点击的次数,等等。进一步地,可以确定每个锚文本的参数值各自的权重,进而对每个锚文本的参数值进行加权处理,以计算每个锚文本的等级。举例来说,获取一个或多个不同的锚文本中每个锚文本的参数值为P1、P2、P3、P4、P5,分别表示每个锚文本对应的外部链接的总数,每个锚文本对应的、与网页的统一资源定位符URL同主域的外部链接所在网页的总数,每个锚文本对应的、与网页的URL不同主域的外部链接所在网页的总数,每个锚文本对应的外部链接所在网页的网页等级PageRank,每个锚文本对应的外部链接被点击的次数。确定每个锚文本的参数值各自的权重分别为a1、a2、a3、a4、a5,根据每个锚文本的参数值P1、P2、P3、P4、P5以及每个锚文本的参数值各自的权重a1、a2、a3、a4、a5,对每个锚文本的参数值中的一个或多个进行加权处理,得到每个锚文本的等级。
另外,可以对计算得到的每个锚文本的等级进行排序,选择排在最前面的等级(即最高等级)的锚文本作为指定等级的锚文本。
此外,还可以结合上述方式一和方式二来从一个或多个锚文本中选择一个锚文本。例如,首先确定出文本长度小于或等于指定长度的锚文本,然后计算确定出的锚文本的等级,进而选择指定等级的锚文本替代原本的网页标题作为网页的网页标题。又例如,将文本长度作为每个锚文本的参数值,进而计算每个锚文本的等级。当然,上述列举仅是示意性的,还可以有其它结合的方式均适用于本发明。
下面通过一具体实施例来详细介绍本发明提供的以锚文本作为网页标题的方法。该具体实施例是以网页标题为搜索结果中显示的标题为例,图2示出了以原本的网页标题作为移动终端(如手机)搜索结果中显示的标题的示意图,可以发现原本的网页标题“Android开发视频教程-老罗Android开发视频教程-视频教程-移动开发门户”在移动终端上显示过长,导致明显的屏幕显示空间浪费。图3示出了采用本发明的以锚文本作为网页标题在移动终端搜索结果中显示的示意图,以锚文本“老罗Android开发视频教程”替代了原本的网页标题“Android开发视频教程-老罗Android开发视频教程-视频教程-移动开发门户”,使得标题更加简洁而不损失信息,并能够节省屏幕显示空间位置。
基于同一发明构思,本发明实施例还提供了一种以锚文本作为网页标题的装置,以实现上述以锚文本作为网页标题的方法。
图4A示出了根据本发明一个实施例的以锚文本作为网页标题的装置的结构示意图。参见图4A,该装置至少包括:确定模块410以及处理模块420。
现介绍本发明实施例的以锚文本作为网页标题的装置的各组成或器件的功能以及各部分间的连接关系:
确定模块410,适于确定指向待确定标题的网页的外部锚文本链接;
处理模块420,与确定模块410相耦合,适于将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题。
在一个实施例中,确定模块410确定的外部锚文本链接是指从别的网站导入到自己网站的链接,且该链接是以锚文本的形式出现。这里的锚文本形式如文本或图片的形式。
在一个实施例中,网页标题包括搜索结果中显示的标题或搜索引擎收录网页时记录的标题。在传统的搜索引擎中,会直接使用网页所在站点的站长或管理员为该网页制作或确定的网页标题作为搜索结果中显示的标题或搜索引擎收录网页时记录的标题。也就是说,若本发明的网页标题为 搜索结果中显示的标题,或搜索引擎收录网页时记录的标题时,则网页的原本的网页标题可以为网页所在站点的站长或管理员为该网页制作或确定的网页标题。
在一个实施例中,若外部锚文本链接的个数为多个,参照图4B,上述处理模块420还包括获取子模块421和选择子模块422。
现介绍获取子模块421和选择子模块422的功能和链接关系,以及与上述处理模块420之间的连接关系:
获取子模块421,适于获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本;
选择子模块422,与获取子模块421相耦合,适于从一个或多个不同的锚文本中选择一个锚文本作为网页的网页标题。
在一个实施例中,获取子模块421还适于:解析待确定标题的网页,确定指向网页的一个或多个外部链接;获取一个或多个外部链接对应的一个或多个不同的锚文本。例如,可以通过解析待确定标题的网页,得到网页抓取器抓取的网页之间的链接关系,进而确定指向网页的一个或多个外部链接,其中,网页抓取器可以是网络爬虫、网页蜘蛛、网络机器人等。
在一个实施例中,一个或多个不同的锚文本中每个锚文本对应一个或多个外部链接。
在一个实施例中,获取子模块421还适于通过聚类的方法来获取一个或多个外部链接对应的一个或多个不同的锚文本,即获取一个或多个外部链接中每个外部链接对应的锚文本;对获取的一个或多个外部链接中每个外部链接对应的锚文本进行聚类,生成锚文本的多个分组,其中,每个分组中的锚文本相同;将多个分组各自对应的锚文本作为一个或多个外部链接对应的一个或多个不同的锚文本。
在一个实施例中,选择子模块422还适于根据锚文本的文本长度从一个或多个锚文本中选择一个锚文本,即确定一个或多个锚文本中每个锚文本的文本长度;从文本长度小于或等于指定长度的锚文本中选择一个锚文本替代原本的网页标题作为网页的网页标题。
在一个实施例中,选择子模块422还适于:确定一个或多个不同的锚文本中每个锚文本的等级;选择指定等级的锚文本作为网页的网页标题。
在一个实施例中,选择子模块422还适于:获取一个或多个不同的锚文本中每个锚文本的参数值;根据获取的每个锚文本的参数值,计算每个锚文本的等级。
在一个实施例中,选择子模块422还适于:确定每个锚文本的参数值各自的权重;对每个锚文本的参数值进行加权处理,计算每个锚文本的等级。
在一个实施例中,指定等级为最高等级。可以对计算得到的每个锚文本的等级进行排序,选择排在最前面的等级(即最高等级)的锚文本作为指定等级的锚文本。
在一个实施例中,每个锚文本的参数值包括下列至少之一:
每个锚文本对应的外部链接的总数;
每个锚文本对应的、与网页的统一资源定位符URL同主域的外部链接所在网页的总数;
每个锚文本对应的、与网页的URL不同主域的外部链接所在网页的总数;
每个锚文本对应的外部链接所在网页的网页等级PageRank;
每个锚文本对应的外部链接被点击的次数。
根据上述任意一个优选实施例或多个优选实施例的组合,本发明实施例能够达到如下有益效果:
依据本发明提供的技术方案,确定指向待确定标题的网页的外部锚文本链接,将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题。外部锚文本链接对应的锚文本是其他网页对外部锚文本链接所指向网页的一种描述,能够准确描述所指向网页的内容。与该所指向网页的原本的网页标题不同的是:锚文本中不会被其他网页所在站点的站长或管理员加入很多关键词,如一些重复或与网页内容不相关的关键词等;而该所指向网页的原本的网页标题是由该指向网页的所在站点的站长制作的,站长会在网页标题中加入很多关键词,导致标题很长。因而,外部锚文本链接对应的锚文本作为新的网页标题与原本的网页标题相比较,其对该所指向网页的描述在格式或字数上体现为更加简洁,且不会被加入不相关的关键词,使得新的网页标题对网页的描述更为准确、客观。综上可知,本发明将外部锚文本链接对应的锚文本替代原本的网页标题作为网页的网页标题,能够提供简洁又能准确、客观地概括网页内容的网页标题。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一 个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的以锚文本作为网页标题的装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图5示出了可以实现根据本发明的以锚文本作为网页标题的计算设备。该计算设备传统上包括处理器510和以存储器520形式的计算机程序产品或者计算机可读介质。存储器520可以是诸如闪存、EEPROM(电 可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器520具有用于执行上述方法中的任何方法步骤的程序代码531的存储空间530。例如,用于程序代码的存储空间530可以包括分别用于实现上面的方法中的各种步骤的各个程序代码531。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图6所述的便携式或者固定存储单元。该存储单元可以具有与图5的计算设备中的存储器520类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码531’,即可以由例如诸如510之类的处理器读取的代码,这些代码当由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本发明的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
此外,还应当注意,本说明书中使用的语言主要是为了可读性和教导的目的而选择的,而不是为了解释或者限定本发明的主题而选择的。因此,在不偏离所附权利要求书的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。对于本发明的范围,对本发明所做的公开是说明性的,而非限制性的,本发明的范围由所附权利要求书限定。

Claims (26)

  1. 一种以锚文本作为网页标题的方法,包括:
    确定指向待确定标题的网页的外部锚文本链接;
    将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题。
  2. 根据权利要求1所述的方法,其中,所述网页标题包括搜索结果中显示的标题或搜索引擎收录网页时记录的标题。
  3. 根据权利要求1或2所述的方法,其中,若所述外部锚文本链接的个数为多个,将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题,包括:
    获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本;
    从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题。
  4. 根据权利要求3所述的方法,其中,所述获取指向待确定标题的网页的一个或多个外部链接对应的一个或多个不同的锚文本,包括:
    解析待确定标题的网页,确定指向所述网页的一个或多个外部链接;
    获取所述一个或多个外部链接对应的一个或多个不同的锚文本。
  5. 根据权利要求1-4任一项所述的方法,其中,所述一个或多个不同的锚文本中每个锚文本对应一个或多个外部链接。
  6. 根据权利要求1-5任一项所述的方法,其中,获取所述一个或多个外部链接对应的一个或多个不同的锚文本,包括:
    获取所述一个或多个外部链接中每个外部链接对应的锚文本;
    对获取的所述一个或多个外部链接中每个外部链接对应的锚文本进行聚类,生成锚文本的多个分组,其中,每个分组中的锚文本相同;
    将所述多个分组各自对应的锚文本作为所述一个或多个外部链接对应的一个或多个不同的锚文本。
  7. 根据权利要求1-6任一项所述的方法,其中,从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题,包括:
    确定所述一个或多个锚文本中每个锚文本的文本长度;
    从文本长度小于或等于指定长度的锚文本中选择一个锚文本替代原本的网页标题作为所述网页的网页标题。
  8. 根据权利要求1-7任一项所述的方法,其中,从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题,包括:
    确定所述一个或多个不同的锚文本中每个锚文本的等级;
    选择指定等级的锚文本作为所述网页的网页标题。
  9. 根据权利要求1-8任一项所述的方法,其中,确定所述一个或多个不同的锚文本中每个锚文本的等级,包括:
    获取所述一个或多个不同的锚文本中每个锚文本的参数值;
    根据获取的所述每个锚文本的参数值,计算所述每个锚文本的等级。
  10. 根据权利要求1-9任一项所述的方法,其中,根据获取的所述每个锚文本的参数值,计算所述每个锚文本的等级,包括:
    确定所述每个锚文本的参数值各自的权重;
    对所述每个锚文本的参数值进行加权处理,计算所述每个锚文本的等级。
  11. 根据权利要求1-10任一项所述的方法,其中,所述指定等级为最高等级。
  12. 根据权利要求1-11任一项所述的方法,其中,所述每个锚文本的参数值包括下列至少之一:
    每个锚文本对应的外部链接的总数;
    每个锚文本对应的、与所述网页的统一资源定位符URL同主域的外部链接所在网页的总数;
    每个锚文本对应的、与所述网页的URL不同主域的外部链接所在网页的总数;
    每个锚文本对应的外部链接所在网页的网页等级PageRank;
    每个锚文本对应的外部链接被点击的次数。
  13. 一种以锚文本作为网页标题的装置,包括:
    确定模块,适于确定指向待确定标题的网页的外部锚文本链接;
    处理模块,适于将所述外部锚文本链接对应的锚文本替代原本的网页标题作为所述网页的网页标题。
  14. 根据权利要求13所述的装置,其中,所述网页标题包括搜索结果中显示的标题或搜索引擎收录网页时记录的标题。
  15. 根据权利要求13或14所述的装置,其中,若所述外部锚文本链接的个数为多个,所述处理模块还包括:
    获取子模块,适于获取指向待确定标题的网页的一个或多个外部链接 对应的一个或多个不同的锚文本;
    选择子模块,适于从所述一个或多个不同的锚文本中选择一个锚文本作为所述网页的网页标题。
  16. 根据权利要求13-15所述的装置,其中,所述获取子模块还适于:
    解析待确定标题的网页,确定指向所述网页的一个或多个外部链接;
    获取所述一个或多个外部链接对应的一个或多个不同的锚文本。
  17. 根据权利要求13-16任一项所述的装置,其中,所述一个或多个不同的锚文本中每个锚文本对应一个或多个外部链接。
  18. 根据权利要求13-17任一项所述的装置,其中,所述获取子模块还适于:
    获取所述一个或多个外部链接中每个外部链接对应的锚文本;
    对获取的所述一个或多个外部链接中每个外部链接对应的锚文本进行聚类,生成锚文本的多个分组,其中,每个分组中的锚文本相同;
    将所述多个分组各自对应的锚文本作为所述一个或多个外部链接对应的一个或多个不同的锚文本。
  19. 根据权利要求13-18任一项所述的装置,其中,所述选择子模块还适于:
    确定所述一个或多个锚文本中每个锚文本的文本长度;
    从文本长度小于或等于指定长度的锚文本中选择一个锚文本替代原本的网页标题作为所述网页的网页标题。
  20. 根据权利要求13-19任一项所述的装置,其中,所述选择子模块还适于:
    确定所述一个或多个不同的锚文本中每个锚文本的等级;
    选择指定等级的锚文本作为所述网页的网页标题。
  21. 根据权利要求13-20任一项所述的装置,其中,所述选择子模块还适于:
    获取所述一个或多个不同的锚文本中每个锚文本的参数值;
    根据获取的所述每个锚文本的参数值,计算所述每个锚文本的等级。
  22. 根据权利要求13-21任一项所述的装置,其中,所述选择子模块还适于:
    确定所述每个锚文本的参数值各自的权重;
    对所述每个锚文本的参数值进行加权处理,计算所述每个锚文本的等级。
  23. 根据权利要求13-22任一项所述的装置,其中,所述指定等级为最高等级。
  24. 根据权利要求13-23任一项所述的装置,其中,所述每个锚文本的参数值包括下列至少之一:
    每个锚文本对应的外部链接的总数;
    每个锚文本对应的、与所述网页的统一资源定位符URL同主域的外部链接所在网页的总数;
    每个锚文本对应的、与所述网页的URL不同主域的外部链接所在网页的总数;
    每个锚文本对应的外部链接所在网页的网页等级PageRank;
    每个锚文本对应的外部链接被点击的次数。
  25. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算设备上运行时,导致所述计算设备执行根据权利要求1-12中的任一个所述的以锚文本作为网页标题的方法。
  26. 一种计算机可读介质,其中存储了如权利要求25所述的计算机程序。
PCT/CN2015/092752 2014-10-31 2015-10-23 以锚文本作为网页标题的方法和装置 WO2016066066A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201410602298.3A CN104331458B (zh) 2014-10-31 2014-10-31 以锚文本作为网页标题的方法和装置
CN201410602298.3 2014-10-31
CN201410602297.9A CN104317931B (zh) 2014-10-31 2014-10-31 网页标题的确定方法和装置
CN201410602297.9 2014-10-31

Publications (1)

Publication Number Publication Date
WO2016066066A1 true WO2016066066A1 (zh) 2016-05-06

Family

ID=55856604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/092752 WO2016066066A1 (zh) 2014-10-31 2015-10-23 以锚文本作为网页标题的方法和装置

Country Status (1)

Country Link
WO (1) WO2016066066A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087515A1 (en) * 2000-11-03 2002-07-04 Swannack Christopher Martyn Data acquisition system
CN101383782A (zh) * 2008-10-16 2009-03-11 深圳市迅雷网络技术有限公司 一种获取网络资源标识的方法及系统
CN102135967A (zh) * 2010-01-27 2011-07-27 华为技术有限公司 网页关键词提取方法、装置及系统
CN104317931A (zh) * 2014-10-31 2015-01-28 北京奇虎科技有限公司 网页标题的确定方法和装置
CN104331458A (zh) * 2014-10-31 2015-02-04 北京奇虎科技有限公司 以锚文本作为网页标题的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020087515A1 (en) * 2000-11-03 2002-07-04 Swannack Christopher Martyn Data acquisition system
CN101383782A (zh) * 2008-10-16 2009-03-11 深圳市迅雷网络技术有限公司 一种获取网络资源标识的方法及系统
CN102135967A (zh) * 2010-01-27 2011-07-27 华为技术有限公司 网页关键词提取方法、装置及系统
CN104317931A (zh) * 2014-10-31 2015-01-28 北京奇虎科技有限公司 网页标题的确定方法和装置
CN104331458A (zh) * 2014-10-31 2015-02-04 北京奇虎科技有限公司 以锚文本作为网页标题的方法和装置

Similar Documents

Publication Publication Date Title
JP5588981B2 (ja) 検索クエリーに応答したディスカッションスレッドへの投稿の提供
US8869025B2 (en) Method and system for identifying advertisement in web page
US9152729B2 (en) Auditing of webpages
US10713291B2 (en) Electronic document generation using data from disparate sources
US20080282186A1 (en) Keyword generation system and method for online activity
CN103034633B (zh) 生成扩展的搜索结果页面摘要的方法及装置
US10025807B2 (en) Dynamic data acquisition method and system
JP5856139B2 (ja) 仮想ドキュメントを用いたインデックス付与と検索
US20150161129A1 (en) Image result provisioning based on document classification
JP2012506576A (ja) サーチ結果の提供
US11580177B2 (en) Identifying information using referenced text
WO2017000613A1 (zh) 在搜索结果页中生成提示信息的方法及装置
US7962523B2 (en) System and method for detecting templates of a website using hyperlink analysis
US20160034540A1 (en) Synthesis of webpage snippets using sub-pages of the webpage
US20110238653A1 (en) Parsing and indexing dynamic reports
US20170255653A1 (en) Method for categorizing images to be associated with content items based on keywords of search queries
WO2015003664A1 (zh) 一种下载处理方法、装置、服务器及客户端设备
US9990425B1 (en) Presenting secondary music search result links
CN104331458B (zh) 以锚文本作为网页标题的方法和装置
WO2016101727A1 (zh) 基于问答的搜索结果调整方法和装置
CN104951536B (zh) 搜索方法及装置
WO2015143911A1 (zh) 推送包含时效性信息的网页的方法和装置
WO2016066066A1 (zh) 以锚文本作为网页标题的方法和装置
CN110825976A (zh) 网站页面的检测方法、装置、电子设备及介质
CN107463570B (zh) 一种文献检索/分析方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15854962

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15854962

Country of ref document: EP

Kind code of ref document: A1