CN100476830C - Network resource searching method and system - Google Patents

Network resource searching method and system Download PDF

Info

Publication number
CN100476830C
CN100476830C CNB2007101003098A CN200710100309A CN100476830C CN 100476830 C CN100476830 C CN 100476830C CN B2007101003098 A CNB2007101003098 A CN B2007101003098A CN 200710100309 A CN200710100309 A CN 200710100309A CN 100476830 C CN100476830 C CN 100476830C
Authority
CN
China
Prior art keywords
resource
page
index
resources
information
Prior art date
Application number
CNB2007101003098A
Other languages
Chinese (zh)
Other versions
CN101097578A (en
Inventor
挺 刘
周连强
贾建坤
高立琦
Original Assignee
北京金山软件有限公司;北京金山数字娱乐科技有限公司;哈尔滨工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山软件有限公司;北京金山数字娱乐科技有限公司;哈尔滨工业大学 filed Critical 北京金山软件有限公司;北京金山数字娱乐科技有限公司;哈尔滨工业大学
Priority to CNB2007101003098A priority Critical patent/CN100476830C/en
Publication of CN101097578A publication Critical patent/CN101097578A/en
Application granted granted Critical
Publication of CN100476830C publication Critical patent/CN100476830C/en

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing
    • Y02D10/40Reducing energy consumption at software or application level
    • Y02D10/45Information retrieval in databases

Abstract

本发明公开了一种网络资源检索方法及系统,以解决现有的网页信息检索,耗费用户时间和精力而无法快速、准确地获取资源的问题。 The present invention discloses a method and system for retrieving network resources to address existing web information retrieval, users spend time and effort and can not quickly and accurately obtain the issue of resources. 所述方法包括:创建网页索引,并对网页中包含的资源,创建对应每个网页的资源索引;接收用户输入的检索关键词,并在网页索引中查询符合所述关键词的网页;在资源索引中查询所述符合关键词的网页包含的资源;将包含所述符合关键词的网页信息和相应资源信息的检索结果显示。 The method comprising: creating an index page, and the page contains the resource, creating a resource index corresponding to each page; receiving a search keyword input by a user, and the pages that match the query keyword in the index page; resource resource index pages containing the query keyword compliance; search result page including the information coincides with the keyword and the corresponding resource information display. 本发明在页面的一侧(例如左侧)显示网页正文摘要,另一侧(例如右侧)显示对应的资源信息(如资源名称,资源链接),用户可以直观地获知每个网页中都包含了哪些可下载的资源,通过直接下载可快速地获取自己想要的各种资源。 The present invention shows a summary text pages on one side of the page (e.g., left side), the other side (e.g. right) displays the resource information (e.g., resource name, resource link) corresponding to the user can intuitively know each page contains What downloadable resources to get the resources they want quickly via direct download.

Description

一种网络'资源检索方法及系统 A network 'resource retrieval method and system for

技术领域 FIELD

本发明涉及搜索引擎技术,特别是涉及一种网络资源检索方法及系统。 The present invention relates to a search engine technology, particularly to a method and system for retrieving network resources. 背景技术 Background technique

随着网络技术的快速发展,网页所承载的信息内容越来越多,例如MP3、 应用软件、学习课程等。 With the rapid development of network technology, information content pages carried more and more, such as MP3, application software, learning courses. 因此在很多情况下,用户在进行Web信息检索时,不仅仅关心页面上的内容,同时也关心页面上所含有的各种资源链接,如音频文件、视频文件等。 Therefore, in many cases, users are performing Web information retrieval, not only concerned about the content on the page, but also concerned about the resources contained in the links on the page, such as audio files, video files and so on.

现有的网页信息检索,例如百度、googie等,假如用户输入关键词检索某个视频资源,在搜索结果页面中返回了包含该关键词的网页链接及页面内容的简要介绍;用户需要点击所选页面链接,通过浏览该页面,才能确定该页面中是否包含需要的资源或所关心的其他内容,进一步进行下载或获取。 Existing web information retrieval, such as Baidu, googie, etc., if the user enters a keyword search video resources, returned briefly link to this page and page content containing the keyword in the search results page; users need to click on the selected page links, browse through the page in order to determine whether the page containing the resource or other content required of interest, download or get further.

按照上述方法,用户可以通过查找网页获取所关心的信息或者资源。 According to the above method, the user can obtain information or resources of interest by looking up web pages. 但是, 由于在检索结果的页面中,用户无法得知每个网页中都包含了哪些可下载的资源,因此需要用户耗费时间和精力进一步进行筛选,而无法快速地获取到自己想要的资源。 However, since the page of search results, the user can not know every page which are included in the downloadable resources, thus requiring the user to spend time and energy to further screening, and can not quickly get to the resource you want. 而且,大部分网页中的资源名称都用了简单的标识,用户通过关键字检索网页时,经常无法获得准确的结果。 Furthermore, most of the pages of the resource names with a simple logo, the user pages through keyword search, often unable to obtain accurate results.

例如, 一个网页内容中包含了"大学听力第一册"关键词,该网页中提供了"partl.mp3" , "part2.即3" , "part3.mp3"等资源,用户需要检索到该页面并进行资源下载。 For example, a web content contains a "university hearing the first volume" keywords, the page provides a "partl.mp3", "part2. That 3", "part3.mp3" and other resources, the user needs to retrieve the page and resources to download. 用户在以"大学听力第一册,,为关键词进行搜索网页时, 可能会返回一系列与"大学听力第一册"相关的网页内容,但不一定每个网页中都包含以上资源的下载,用户需要进一步浏览网页进行筛选;若用户以"parU.mp3"为关键词进行搜索,经常搜索出的网页内容除包含大学听力第一册外,可能还包括其他不相关的资源,例如某个电影的下载片断也叫partl, mp3,用户同样需要进一步进行筛选。 Users in the "hearing when the university first book ,, pages for keyword search may return a series of" Listening of the first volume of "relevant web content, but not necessarily each page contains more resources to download the user needs to visit the website for further screening; if a user searches for the keyword to "parU.mp3", often search out web content in addition to containing the Listening first volume, but may also include other non-related resources, such as a download movie clips, also known as partl, mp3, users also need further screening.

总之,虽然现有的搜索网站提供了特定资源的直接下载,例如百度提供的mp3的检索,但是不能满足用户对各种资源下载的需求。 In short, although the existing search Web site provides a direct download specific resources, such as retrieval of Baidu mp3 provided, but can not meet the needs of users to download a variety of resources. 发明内容 SUMMARY

本发明所要解决的技术问题是提供一种网络资源检索方法及系统,以解决现有的网页信息检索,需要用户耗费时间和精力进一步进行'歸选,而无法快速、 准确地获取资源的问题。 The present invention solves the technical problem is to provide a method and system for retrieval of network resources, to solve the existing web information retrieval, the user needs to spend time and effort to further 'normalizing election, and can not quickly and accurately obtain the issue of resources.

为解决上述技术问题,冲艮据本发明提供的具体实施例,本发明公开了以下 To solve the above technical problems, according to Burgundy red embodiment of the present invention provides, the present invention discloses the following

技术方案: Technical solutions:

一种网络资源检索方法,包括: Method for retrieving network resources, including:

创建网页索引,并对网页中包含的资源,创建对应每个网页的资源索引; 接收用户输入的检索关键词,并在网页索引中查询符合所述关键词的网 Create a web page index, and resource pages included, to create a resource index corresponding to each page; receiving a search keyword entered by the user, and query network in line with the key words in the web index

页; page;

在资源索引中查询所述符合关键词的网页包含的资源; 将包含所述符合关键词的网页信息和相应资源信息的检索结果显示。 In line with the query resource index pages containing the keyword resource; including the search result web page information coincides with the keyword and the corresponding resource information display. 优选的,在页面的一侧显示网页信息,另一侧显示相应的资源信息。 Preferably, display page information page on one side, the other side of the display corresponding resource information. 优选的,按照资源与所述关键词的相关性高低,将网页包含的所有资源排 Preferably, in accordance with the keyword resource correlation level, the page contains all resources row

序,并将排名靠前的部分资源信息显示。 Sequence, and the top-ranking partial resource information display.

其中,以资源所在网页的URL为索引建立资源索引。 Among them, the URL of the page where the resources to establish a resource index for the index.

所述方法还包括:根据用户的不同侧重点,按照侧重网页内容或者侧重资 The method further comprising: Depending on the user's focus, or focus on web content in accordance with the focusing funding

源内容,对检索到的网页信息进行排序。 Source content, the retrieved pages to sort information. 一种网络资源检索系统,包括: A network resource retrieval system, including:

索引单元,用于创建网页索引,并对网页中包含的资源,创建对应每个网页的资源索引; Indexing means for creating an index page, and the page contains the resource, creating a resource index corresponding to each page;

检索单元,用于在网页索引中查询符合检索关键词的网页,并在资源索引中查询所述符合关键词的网页包含的资源; Retrieval means for retrieving pages that match the query keyword in the index page, and the page contains the query resources in the resource coincides with the keyword index;

查询代理单元,用于接收用户输入的检索关键词,并通过所述检索单元的检索,将包含符合关键词的网页信息和相应资源信息的检索结果显示给用户。 Query Broker unit for receiving a user's input search keyword, and the search by the search unit, including the search result web page information and the corresponding resource information coincides with the keyword to the user.

优选的,所述查询代理单元在页面的一侧显示网页信息,另一侧显示相应的资源信息。 Preferably, the Query Broker unit displays the web page information page on one side, the other side of the display corresponding resource information.

所述系统还包括:排序单元,用于根据用户的不同侧重点,按照侧重网页内容或者侧重资源内容,对检索到的网页信息进行排序。 Said system further comprising: a sorting unit according to the user's different focus, focus on web content in accordance with content of the resource, or a focus, the retrieved web page information sorted.

其中,所述排序单元还按照资源与所述关键词的相关性高低,将网页包含的所有资源排序,并将排名靠前的部分资源信息通过所述查询代理单元显示。 Wherein said sorting means sorting all resources further in accordance with the keyword resource correlation level, the page contains, and ranking information display portion by said resource query proxy unit. 其中,所述索引单元以资源所在网页的URL为索引建立资源索引。 Wherein the index unit URL of the webpage where the resources to establish a resource index for the index. 根据本发明提供的具体实施例,本发明公开了以下技术效果: 首先,通过建立网页索引和对应网页的资源索引,能够将符合用户检索关键词的网页信息和资源信息同时显示。 According to a particular embodiment of the present invention provides, the present invention discloses the following technical effects: First, the resource index and indexed pages corresponding to the page, the user is able to meet the search keyword information and resource information pages displayed simultaneously. 所述将资源信息直接展示,用户可以直观地获知每个网页中都包含了哪些可下载的资源,而无需进入资源所在页面, 用户通过在检索结果页面直接下载,即可快速地获取自己想要的各种资源。 The resource information directly display, the user can intuitively know each page which contains downloadable resources, without the resources to enter the page where users download directly through the search results page, you can quickly get what you want various resources.

而且,所述显示界面新颖,在页面的一侧(例如左侧)显示网页正文摘要, 另一侧(例如右侧)显示对应的资源信息(如资源名称,资源链接),突破了传统搜索引擎的显示方式。 Further, the new display screen, displaying a summary text pages on one side of the page (e.g., left side), the other side (e.g. right) displays the resource information (e.g., resource name, resource link) corresponding to the break through the traditional search engine displayed.

其次,结果页面中网页的摘要介绍,对相应网页中的资源提供了一个辅助性的说明,用户可以根据资源所在页面的摘要信息判断该资源是否为所需。 Secondly, the results summary page page introduction, provides a supplementary explanation of the corresponding Web page resource, the user can determine whether the resource is needed based on the information summary page of the resource is located. 因此,资源所在页面的摘要信息作为用户判断该资源的依据,增加了用户判断资源内容的准确性,从而提高了用户获取资源的准确性。 Therefore, summary information on the page as the user to determine where resources according to the resource, increasing the accuracy of the content of the user to determine the resource, thereby improving the accuracy of user access to resources.

再次,在进行^^索结果排序时,考虑用户的侧重方向(侧重网页内容或侧重资源内容),将网页中的资源的锚也作为指标进行权重的计算。 Again, during the search result sort ^^, considering the direction of the user's focus (focus or emphasis on page content resource content), the anchor web resources is also performed as a calculation of the weight index. 根据用户的侧重点返回的检索结果顺序不同,可以更好地满足用户的需求。 The search results returned by the user focus in a different order, to better meet the needs of users. 附图说明 BRIEF DESCRIPTION

图1是本发明实施例所述快检索网页所含资源的步骤流程图; The procedure of Example 1 is a flowchart of the fast search page from the resources of the present invention;

图3是本发明实施例中检索结果的页面显示效果图; 图4是本发明实施例所述快检索网页所含资源的系统结构图。 FIG 3 is a search result page in the embodiment of the present invention display embodiment; Figure 4 is an embodiment of the present invention the fast retrieval system configuration diagram of the resource included in the page. 具体实施方式 Detailed ways

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。 For the above objects, features and advantages of the invention more comprehensible, the present invention is further the following detailed description in conjunction with the accompanying drawings and specific embodiments.

针对在检索结果的页面中,用户无法得知每个网页中都包含了哪些可下载的资源,以及由于资源名称简单,用户无法获得准确的检索结果的问题,本发明实施例提供了一种可快速检索网页中包含的资源的方法。 Provided a page for search results, the user can not know every page which are included in the downloadable resources, and because the resource name is simple, the user can not get accurate search results in the problem, embodiments of the present invention methods of resource quick search pages contained. 通过创建网页索引,并创建以资源所在网页的URL为索引的资源索引,可以在^全索网页时,将网页中的资源一同检索出来,并同时显示在检索结果页面中,便于用户直接下载,快速地获取自己想要的各种资源。 By creating web pages indexed and create a resource index a URL web resources where the index may be in the ^ full search pages, will join retrieved pages of resources, and at the same time displayed on the search results page, user-friendly direct download, Quick access to the resources they want.

参照图l,是本发明实施例所述快速检索网页所含资源的步骤流程图。 Referring to FIG. L, it is a flowchart of the embodiment to quickly retrieve pages from the resources of the embodiment of the present invention. under

面将以Web搜索中的资源获取为例进行说明。 Web search will face in access to resources as an example.

步骤IOI,利用网页抓取工具,从互联网获取网页。 Step IOI, using a web crawler to get pages from the Internet. 步骤102,对获取的网页建立索引。 Step 102, indexed pages to fetch. 具体过程是:提取网页正文,并根据网页的编码对网页正文进行相应的编码转换;然后对正文进行分词处理,去掉"的、啊、哦"等等停用词;再对剩下的正文关键词,以所述正文关键词为索引, 建立倒排索引。 The specific process is: extracting text pages, and the corresponding body of the page transcoding the encoded page; then perform word processing text, remove "the, ah, oh" stop words like; and then the rest of the body of the key word to the text as a keyword index, create an inverted index. 建立倒排索引的示例如下: Examples create an inverted index is as follows:

文本1的正文关4建词是:aaa bbb ccc ddd; 文本2的正文关4定词是:bbb ddd yyy; 以关键词建立倒排索引后:aaa 1 4 built off the body of the text word is 1: aaa bbb ccc ddd; 4 off the body of the text given word is 2: bbb ddd yyy; after establishing the keyword inverted index: aaa 1

bbb 1,2 ccc 1 ddd 1,2 bbb 1,2 ccc 1 ddd 1,2

yyy 2 yyy 2

如果需要查找哪些文本中含有关键词bbb时,只需取出该关键词所对应的文本号1, 2即可。 If the text which need to find keywords contained in BBB, simply remove the text corresponding to the keyword number 1, 2 can be.

步骤103,分析网页中可能含有的资源链接,创建一个独立的资源索引。 In step 103, the analysis pages may contain links to resources, and create a separate resource index. 创建步骤如下: Created as follows:

首先,获取网页中以(〈a href="链接" >名称</&>}标签标识的链接以及锚文本。通常情况下,{<ahref="链接,, >名称〈/a〉)为HTML语言,用以定义一个链接,其中"名称"即为显示在网页中的文字,称为锚文本。例如,在个人网站上把中央电视台(www. cctv. com)作为新闻频道的链接,访问者通过点击网站上的"新闻频道"就能进入http: 〃www. cctv. com网站,那么"新闻频道" 就是中央电视台网站首页的锚文本。 First, a web page to (<a href="链接"> Name </ &>} link and anchor text tag identification. In general, {<ahref = "Link ,,> Name </a>) as HTML language, used to define a link, where "name" is the text displayed in a web page, called anchor text. for example, on a personal website to China Central Television (www. cctv. com) as a link to the news channel, the visitor by "news channel" on the website will be able to enter http: 〃www cctv com site, "news channel" is the China Central Television Home anchor text...

其次,判断获取的链接是否为资源。 Second, determine whether the acquired link resources. 如果链接以".mp3"、 ". exe,'之类的字符串结尾,则显然是可以下载的资源;如果链接中含有"? If the link to the ".mp3", ". End of the string exe, 'and the like, then obviously you can download resources; if the link contains"? "、 "&"等信息, 则该链接可能为重定向链接,需要进一步确认其是否对应一个资源。关于如何判断一个链接是否为资源链接,可采用本领域技术人员所熟知的各种方法实现,在此不作详细"i兌明。 "," & "And other information, the link may be a link redirected, need a further confirmation that the corresponding resources for the link is a link resources, various methods may be employed to those skilled in the art how to achieve the judgment, in no details "i against Ming.

再次,经判断后,如果是资源链接,则对每个包含资源的网页,创建一个独立的资源索引。 Again, after judgment, if a resource link, each page contains resources to create a separate resource index.

本步骤中以资源所在网页的URL为索引,如图2所示,是所述实施例中网页正文索引与资源索引之间的关系示意图。 In this step, where the URL of the webpage resource index, FIG. 2 is a diagram showing the relationship between the body of the page indexed embodiment embodiment the resource index. 图中,"特征项,,即为建立网页正文索引的索引关键词,每个"特征项"都对应着一系列的网页URL,其中每个包含资源的网页URL又对应着一系列该网页所包含的所有资源。 Figure, "is the feature item ,, create pages indexed text keyword index, each" feature item "corresponds to a series of page URL, where each page URL that contains the resource and the corresponding series of web pages It contains all the resources.

当然,在建立资源索引时,也可以选取其他索引词,例如每个网页在网页索引中的位置编号等。 Of course, in the establishment of resource index, the index can also choose other words, for example, each page in the web index position number.

步骤104,用户在搜索框中输入查询关键词,并触发查询事件。 Step 104, the user enters a query in the search field, and event triggering queries. 步骤105,服务器收到所述查询事件后,获取用户输入的查询关键词。 Step 105, the server receives the query event, get user input query keywords. 步骤106,对获得的查询关键词进行分词处理。 Step 106, the query keywords to get word processing. 所述分词处理是为了获取 The segmentation process is to obtain

关键词中最常用的词根,例如关键词为"中国政府推出知识产权新举措",分词结果可能为"中国"、"政府"、"知识产权"、"举措",或者是"中国政府"、 "知识产权举措,,等等,能有效的排除不是常用組合的搭配,例如"国政", 这样可以减少搜索的词根。 Key words most commonly used root, for example, for the keyword "Chinese government introduced new measures of intellectual property rights" may be the result of word "China", "government", "intellectual property", "initiative", or "Chinese government" "IP initiative ,, etc., can effectively exclude not commonly used in combination with, for example," national policy ", thus reducing the root search.

步骤107,在网页索引中进行查询,获取符合所述关键词的网页。 Step 107, queries the index page, the acquisition of the pages that match the keywords. 例如图2所示中,在以"特征项"为索引词的网页索引中,查找出"特征项"是所述关键词的索引,该"特征项"对应的所有网页即为符合所述关键词的网页。 For example, shown in Figure 2, the page index to "feature item" word index, look out "feature item" is the keyword index, the "feature item" all pages corresponding to the key that is in line with web word.

步骤108,查找每个网页URL对应的资源索引,在对应的资源索引中找到该网页包含的所有资源。 Step 108, the resource index to find every page corresponding to the URL of the page to find all the resources contained in the corresponding resource index. 本发明与传统的信息检索不同,在检索与用户关键词符合的网页信息时, 一同将网页中包含的资源信息也检索出来。 The present invention is different from the traditional information retrieval, when retrieving user keyword matching web page information, the resource information together with the page contains also retrieved.

步骤109,对检索返回的结果进行显示,检索结果包括网页信息和网页中包含的资源信息。 Step 109, returning retrieval results are displayed, the search result web page information including resource information and the page contains. 显示方式是在结果页面的一侧显示检索到的网页信息,另一侧显示对应的资源信息。 A display to display the retrieved web page information on the side of the results page, the other side of the display information corresponding to the resource.

参照图3所示,是本发明实施例中检索结果的页面显示效果图。 Referring to FIG. 3, it is the search result page in the embodiment of the present invention display embodiment of FIG. 本例中, 在页面的左侧显示冲全索到的网页的正文摘要及链接等信息,在页面的右侧显示对应该网页的资源名称及资源链接等信息。 In this example, display the full text Abstract red cable to the page of information and links on the left side of the page, the page should be displayed on the resource name and resource links and other information on the right side of the page.

本发明实施例优选的,在显示所述检索结果之前,先对检索结果进行排序 Preferred embodiments of the present invention, prior to displaying the search result, the search results are sorted first

7处理,然后按照排序结果显示。 7 process, and then displays the sorted results. 其中对于网页信息的排序,排序规则分为侧重网页内容的排序和侧重资源内容的排序。 Where for sorting, collation web page information into the sort of focused web content and resources focused on the sort of content.

通常,服务器对检索出的相关网页,采用一定的策略进行排序,例如对网 Typically, the server to retrieve the relevant pages, using some sort of strategy, for example on the net

页进行打分,然后按照分数高低决定返回顺序;而打分的方法是参考几个指标, 然后按照网页与所述指标的相关性进行打分,比如关键词出现的频率或区分度等,最后对所述指标进行加权^:和得到网页的最后得分。 Scoring page, and then returns the order determined according to the scores; and the method of scoring is the reference number of indicators, and scoring by relevance to the index page, such as the keyword or the occurrence frequency discrimination, and finally the ^ weighted indicators: the web page and get the final score. 本发明所述实施例中, 由于引入了资源索引,所以在对网页打分时,将资源的锚的关键词出现的频率值也作为一个排序的指标,如果用户侧重资源,则调高这个指标所占的权重, 而如果用户侧重网页,则调低这个指标的权重,然后提高其他指标的权重。 Examples frequency values, the introduction of a resource index, so the page scored when the anchor of the resource keyword appears as an embodiment of the present invention may also sort indicator, if the user focuses on the resource, then the increase in the indicators accounting for the weight, but if you focus on the page, the lower the index weight, and improve other indicators of heavy weights.

因此,根据不同侧重点,对网页的排序结果也不同。 Thus, according to different emphases, sort results pages are also different. 通过设置用户选项, 如果用户在搜索时选择侧重网页内容,则在网页索引中检索出的网页内容所占的权值高;如果用户选择侧重资源内容,则在资源索引中检索出的资源的锚文本所占的权值高。 By setting user options, if the user chooses to focus on the page content when a search is retrieved in the web index page content of the right share of high value; if the user selects content focuses resource, the resource index retrieved in the resource anchor the right to share the text of a high value.

在显示网页所包含的资源信息时,由于显示空间所限,如果网页中的资源信息较多,通常选取部分显示。 When displaying resource information contained in the web page, due to the limited display space, if the resource information web page more often select the part of the display. 选取方法有多种,例如按照资源在网页中出现的先后顺序选取前几个,或者按照资源名称选取,等等。 There are a variety of selection methods, for example in the order of page resources appear before the select few, or selected according to the resource name, and the like. 本发明所述实施例中, 为给用户带来更好的使用体验,便于用户直观获取自己想要的资源,在选取要显示的资源时先对资源进行了排序。 The embodiment of the present invention, to further enhance the user experience, user-friendly and intuitive access to resources they want, on the first resource when the resource selection sorted to display. 按照资源与检索关键词的相关性,将相关性高的资源显示在页面。 According to related resources and search keywords in the high correlation of resources displayed on the page.

上述实施例提供了一种新颖的搜索结果展示界面,将网页包含的资源信息直接展示,用户可以直观地获知每个网页中都包含了哪些可下载的资源,无需进入资源所在页面即可直接下载;并且,用户可以根据资源所在页面的正文摘要信息,判断资源是否是自己需要的,进一步增加了资源获取的准确性。 The above embodiments provide a novel search results presentation. The resource page contains information direct display, the user can intuitively know each page which contains downloadable resources, no resources into the page where you can directly download ; in addition, users can text summary information page resource is located, determine whether resources they need to further increase the accuracy of resource acquisition.

而且,用户在:l叟索网页的时候,由于右侧显示出了对应的资源,用户可能会在无意中发现需要的资源,然后对资源进行下载,这样就激发了用户的潜在需求。 Moreover, users: When l Sou search pages, because the right side shows the corresponding resources, users may inadvertently find needed resources and resources for download, so you stimulate the potential needs of users. 如果用户觉得这个网站比较有新意、比较实用,然后会更多的访问,从而提高了网站的粘性。 If you feel that this site more innovative, more practical, and will be more accessible, thereby increasing the stickiness of the site.

本发明实施例还提供了一种资源获取系统,仍以Web搜索中的资源获取为例,参照图4,是本发明实施例所述快检索网页所含资源的系统结构图。 Embodiments of the present invention further provides an asset acquisition system, a Web search resources still obtain an example, with reference to FIG. 4, is a system configuration diagram according to the resource included in the search page fast embodiment of the present invention. 所述 The

8系统主要包括索引单元401、检索单元402和查询代理单元403。 8 system mainly includes an index unit 401, a retrieval unit 402 and Query Broker unit 403.

索引单元401用于建立网页索引和资源索引。 Index unit 401 for indexed pages and resources index. 建立网页索引时,索引单元401先提取网页正文,并根据网页的编码对网页正文进行相应的编码转换,然后对正文进行分词处理,以分词后的正文关键词为索引建立网页倒排索引。 When indexed pages, indexing unit 401 to extract the body of the page, and the corresponding transcoding body of the page according to the encoding page, then the text for word processing, the text after the order word keywords to create web pages inverted index is an index.

对应每个包含资源链接的网页,索引单元401还建立了单独的资源索引, 以资源所在网页的URL为索引词,可参见图2所示,通过查找网页的URL,即可找到网页包含的所有资源。 Corresponding to each page contains links to resources index unit 401 also established a separate resource index, a URL for the Web page where the resource index terms can be found in Figure 2, by finding the URL of the page, you can find pages containing all resources. 索引单元401首先需要分析网页获取其中的链接及锚文本,然后判断所述链接是否为资源链接,如果是资源链接,则为该网页中存在的所有资源建立一个资源链接。 Indexing unit 401 firstly need to analyze the Web page acquiring wherein the link and anchor text, and then determines whether the link is a link resources, if the resource is a link, for the presence of all the resources of the Web page to establish a link resource.

检索单元402用于根据索引单元401建立的网页索引和资源索引,查询与检索关键词符合的网页信息和资源信息。 402 pages indexed for retrieval unit and resource index based on the index established unit 401, pages of information and resource information query and search keyword matching. 首先,检索单元402对检索关键词进行分词处理,排除不是常用组合的搭配;然后,根据检索关键词查询网页索引, 获取符合所述关键词的网页;再根据网页URL查找到网页包含的所有资源。 First, the retrieval section 402 retrieves keywords word processing, trouble is not commonly used in combination with; then, based on the search query keyword pages indexed, get in line with the page keyword; then URL to find pages that contain all of the resources in accordance with the page . 这样,检索单元402在检索网页信息时, 一同将网页包含的资源信息也检索出来。 Thus, when the retrieval unit 402 to retrieve web page information, with the pages containing the resource information is also retrieved.

查询代理单元403用于接收用户输入的检索关键词,并传给检索单元402 处理;当外企索单元402返回4全索结果时,将所述4全索结果显示给用户。 Query Broker unit 403 for receiving user inputs a search keyword, search unit 402 and passed to processing; foreign cable unit 402 when the full 4 returns the search result, the search result is displayed to the 4 full user. 本发明提出了一种新颖的结杲展示方式,在结果页面的一侧(例如左侧)显示检索到的网页信息,如网页的正文摘要及链接等信息,另一侧(例如右侧)显示对应的资源信息,如资源名称及资源链接等信息。 The present invention provides a novel junction Gao display mode, displays the retrieved web page information, such as text and links to pages summary information on one side of the results page (e.g., left side), the other side (e.g. right) displays corresponding resource information such as the name of the resource and resource links and other information.

优选的,还提供了用户选项,根据用户选择侧重网页内容还是侧重资源内容,检索单元402先对检索到的网页信息和资源信息分别进行排序处理,再返回给查询代理单元403。 Preferably, the user also provides the option, according to user selection or focus focused page content resource content retrieval unit 402 to the retrieved web page information and resource information sorting process are then returned to the Query Broker unit 403. 在对网页进行排序时,将资源的锚的关键词出现的频率值也作为一个排序的指标,如果用户侧重资源,则调高这个指标所占的权重, 而如果用户侧重网页,则调低这个指标的权重,然后提高其他指标的权重。 When sorting the web pages, the frequency value of the resource anchor keyword appears also as an indicator of a sort, if you focus on resources, the right to increase the share of the weight of this indicator, but if you focus on the page, then lowered the index weight, and improve other indicators of heavy weights. 在对资源进行排序时,按照资源与检索关键词的相关性,将相关性高的资源排在前面。 When resources are sorted according to a search keyword associated with the resource, the resource having high correlation top surface. 如果网页中存在的资源较多,查询代理单元403显示部分资源信息。 If the page is present in more resources, the query agent unit 403 resource information display section.

所述系统的整体处理流程是:首先利用网页抓取工具404从互联网获取网页,并存入数据库405中;然后索引单元401从数据库405中提取网页正文, 创建网页索引和资源索引;当查询代理单元403接收用户输入的检索关键词后,由检索单元402实现信息检索,检索单元402通过查询网页索引和资源索引,将与所述检索关键词符合的网页信息和对应的资源信息进行排序处理后, 返回给查询代理单元403;查询代理单元403在页面的左侧显示网页的正文摘要及链接等信息,右侧显示对应的资源名称及资源链接等信息。 Of the overall processing of the system are: Firstly, web crawlers get 404 pages from the Internet and stored in the database 405; then the index unit 401 extracts the text from the database 405 pages, index pages and create resource index; when the query agent after the unit 403 receives a search keyword input by the user, achieved by the search unit 402 to retrieve information, the resource information retrieval unit 402 queries the page index and a resource index, the matching with the search keyword, and web page information corresponding to the sorting process , returned to the query agent unit 403; inquiry agent unit 403 displays text summary and links to web pages and other information on the left side of the page, on the right display the corresponding resource names and resource links and other information. 因此,用户可以直接在搜索结果页面下载自己需要的资源,提高了资源获取的速度和准确性。 Therefore, the user can download the resources they need directly in the search results page, increasing the speed and accuracy of access to resources.

图4所示系统中未详述的部分可以参见图1所示方法的相关部分,为了篇幅考虑,在此不再详述。 The system shown in FIG. 4 are not detailed in the relevant portion may be part of the method shown in FIG. 1 refer to, for space considerations, not described in detail herein.

以上对本发明所提供的一种网络资源检索方法及系统,进行了详细介绍, The foregoing method and system for retrieving network resources provided by the present invention, described in detail,

说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。 Description is used to help understand the method and core idea of ​​the present invention; while those of ordinary skill in the art, according to the ideas of the present invention, in the embodiments and application scopes change place. 综上所述,本说明书内容不应理解为对本发明的限制。 Therefore, the specification shall not be construed as limiting the present invention.

Claims (10)

1、一种网络资源检索方法,其特征在于,包括: 创建网页索引,并对网页中包含的资源,创建对应每个网页的资源索引; 接收用户输入的检索关键词,并在网页索引中查询符合所述关键词的网页; 在资源索引中查询所述符合关键词的网页包含的资源; 将包含所述符合关键词的网页信息和相应资源信息的检索结果显示。 1. A method for retrieving a network resource, characterized by, comprising: creating an index page, and the page contains the resource, creating a resource index corresponding to each page; receiving a search keyword input by the user, and query the index page pages that match the keyword; a resource in the resources to meet the index pages containing the query keyword; a search result including the web page information coincides with the keyword and the corresponding resource information display.
2、 根据权利要求1所述的方法,其特征在于:在页面的一侧显示网页信息,另一侧显示相应的资源信息。 2. The method according to claim 1, wherein: the web page information displayed on one side of the page, the other side of the display corresponding resource information.
3、 根据权利要求1所述的方法,其特征在于:按照资源与所述关键词的相关性高低,将网页包含的所有资源排序,并将排名靠前的部分资源信息显示。 3. The method according to claim 1, characterized in that: according to the keyword of resources and the level of correlation, all the pages containing resource ordering and ranking resource information display portion.
4、 根据权利要求1所述的方法,其特征在于:以资源所在网页的URL为索引建立资源索引。 4. The method of claim 1, wherein: the URL of the webpage where the resources to establish a resource index to index.
5、 根据权利要求1所述的方法,其特征在于,还包括:根据用户的不同侧重点,按照侧重网页内容或者侧重资源内容,对检索到的网页信息进行排序。 5. The method according to claim 1, characterized in that, further comprising: Depending on the user's focus, or focus on web content in accordance with the focus content resources, the retrieved web page information sorted.
6、 一种网络资源检索系统,其特征在于,包括:索引单元,用于创建网页索引,并对网页中包含的资源,创建对应每个网页的资源索引;^r索单元,用于在网页索引中查询符合^r索关4建词的网页,并在资源索引中查询所述符合关键词的网页包含的资源;查询代理单元,用于接收用户输入的检索关键词,并通过所述检索单元的检索,将包含符合关键词的网页信息和相应资源信息的检索结果显示给用户。 6, network resource retrieval system, characterized by comprising: indexing means for creating an index page, and the page contains the resource, corresponding to each page to create resource index; ^ r cable element, for page pages that match the query index ^ r built cable 4 off the word, and the resources to meet the query keyword included in the page resource index; query proxy unit for receiving a user's input search keyword, and said retrieving by the retrieval unit, including the search result web page information and the corresponding resource information coincides with the keyword to the user.
7、 根据权利要求6所述的系统,其特征在于:所述查询代理单元在页面的一侧显示网页信息,另一侧显示相应的资源信息。 7. The system of claim 6, wherein: the Query Broker unit displays the page on one side of the web page information, corresponding to the other side of the display resource information.
8、 根据权利要求6所述的系统,其特征在于,还包括:排序单元,用于根据用户的不同侧重点,按照侧重网页内容或者侧重资源内容,对检索到的网页信息进行排序。 8. The system according to claim 6, characterized in that, further comprising: a sorting unit according to the user's different focus, focus on web content in accordance with content of the resource, or a focus, the retrieved web page information sorted.
9、 根据权利要求8所述的系统,其特征在于:所述排序单元还按照资源与所述关键词的相关性高低,将网页包含的所有资源排序,并将排名靠前的部分资源信息通过所述查询代理单元显示。 9. The system according to claim 8, wherein: said unit further Sort all resource according to the resource with the keyword correlation level, the page contains, and ranking the resource information section the query Broker display unit.
10、 根据权利要求6所述的系统,其特征在于:所述索引单元以资源所在网页的URL为索引建立资源索引。 10. The system of claim 6, wherein: said indexing means is located in a resource URL of the webpage to establish a resource index to index.
CNB2007101003098A 2007-06-07 2007-06-07 Network resource searching method and system CN100476830C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007101003098A CN100476830C (en) 2007-06-07 2007-06-07 Network resource searching method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007101003098A CN100476830C (en) 2007-06-07 2007-06-07 Network resource searching method and system

Publications (2)

Publication Number Publication Date
CN101097578A CN101097578A (en) 2008-01-02
CN100476830C true CN100476830C (en) 2009-04-08

Family

ID=39011411

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101003098A CN100476830C (en) 2007-06-07 2007-06-07 Network resource searching method and system

Country Status (1)

Country Link
CN (1) CN100476830C (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546309B (en) 2008-03-26 2012-07-04 国际商业机器公司 Method and equipment for constructing indexes to resource content in computer network
CN101566984B (en) 2008-07-11 2011-02-09 博采林电子科技(深圳)有限公司 Search engine used in personal hand-held equipment and resource search method
CN101661490B (en) 2008-08-28 2013-01-02 国际商业机器公司 Search engine, client thereof and method for searching page
CN101398844B (en) 2008-10-28 2012-01-25 华为终端有限公司 Resource file searching method and mobile terminal
CN103942268B (en) * 2010-05-31 2018-11-13 百度在线网络技术(北京)有限公司 The method of application of a combination of search, device and application interface
CN102063454A (en) * 2010-05-31 2011-05-18 百度在线网络技术(北京)有限公司 Method and equipment combining search and application
CN102314456A (en) * 2010-06-30 2012-01-11 百度在线网络技术(北京)有限公司 Web page move search method and system
US8412771B2 (en) * 2010-10-21 2013-04-02 Yahoo! Inc. Matching items of user-generated content to entities
CN102682003B (en) * 2011-03-10 2017-02-08 北京音之邦文化科技有限公司 A method for determining the location of a particular sort of link resource, apparatus and equipment
CN103248641A (en) * 2012-02-07 2013-08-14 腾讯科技(深圳)有限公司 Network download method, device and system
CN103514221B (en) * 2012-06-28 2016-12-28 百度在线网络技术(北京)有限公司 One kind of site resources management method and apparatus for web
CN104820686B (en) * 2012-06-28 2019-06-21 北京奇虎科技有限公司 A kind of network search method and network searching system
CN102799663A (en) * 2012-07-13 2012-11-28 深圳市同洲电子股份有限公司 Input method and input method system
CN104123297B (en) * 2013-04-26 2018-04-06 宏碁股份有限公司 SUMMARY file search method and the distal end of the electronic device
CN103294507A (en) * 2013-05-09 2013-09-11 优视科技有限公司 Method and device for providing information of downloading resources
US20140379747A1 (en) * 2013-06-19 2014-12-25 Microsoft Corporation Identifying relevant apps in response to queries
CN103455567A (en) * 2013-08-18 2013-12-18 苏州量跃信息科技有限公司 Method and system for loading application interfaces based on search index entries
CN103605758B (en) * 2013-11-22 2017-09-08 中国科学院深圳先进技术研究院 File search method for a mobile terminal apparatus and
CN103955529B (en) * 2014-05-12 2018-05-01 中国科学院计算机网络信息中心 An Internet search for information presentation method of polymerization
CN104199862B (en) * 2014-08-15 2017-10-20 北京奇虎科技有限公司 Content-based provider of customized search method, server and system
CN104794165B (en) * 2015-03-26 2018-08-10 百度在线网络技术(北京)有限公司 Species page show method, apparatus and system
CN105608071A (en) * 2015-12-21 2016-05-25 北京奇虎科技有限公司 Generation method and device for determining machine learning algorithm of head word
CN105550335A (en) * 2015-12-22 2016-05-04 北京奇虎科技有限公司 Method and device for providing search abstract embedded with resource downloading information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691104B1 (en) 2000-01-12 2004-02-10 International Business Machines Corporation System and method for personalizing and applying a post processing tool system
CN1808426A (en) 2005-01-17 2006-07-26 马岩 Universal file search system and method
CN1809827A (en) 2000-11-21 2006-07-26 汤姆森许可公司 System and process for network site fragmented search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691104B1 (en) 2000-01-12 2004-02-10 International Business Machines Corporation System and method for personalizing and applying a post processing tool system
CN1809827A (en) 2000-11-21 2006-07-26 汤姆森许可公司 System and process for network site fragmented search
CN1808426A (en) 2005-01-17 2006-07-26 马岩 Universal file search system and method

Also Published As

Publication number Publication date
CN101097578A (en) 2008-01-02

Similar Documents

Publication Publication Date Title
Jansen et al. Determining the user intent of web search engine queries
US6490579B1 (en) Search engine system and method utilizing context of heterogeneous information resources
CN100403305C (en) System for generating search results including searching by subdomain hints and providing sponsored results by subdomain
CA2248911C (en) System and method for locating resources on a network using resource evaluations derived from electronic messages
US9164987B2 (en) Translating a search query into multiple languages
US8239377B2 (en) Systems and methods for enhancing search query results
US8341147B2 (en) Blending mobile search results
US6311194B1 (en) System and method for creating a semantic web and its applications in browsing, searching, profiling, personalization and advertising
US7062488B1 (en) Task/domain segmentation in applying feedback to command control
CN101124609B (en) Search systems and methods using in-line contextual queries
KR100313462B1 (en) A method of displaying searched information in distance order in web search engine
US6725275B2 (en) Streaming media search and continuous playback of multiple media resources located on a network
US20020099685A1 (en) Document retrieval system; method of document retrieval; and search server
US8458207B2 (en) Using anchor text to provide context
JP5175339B2 (en) Method and system for providing appropriate information to a user of the apparatus in the local network
US20020091835A1 (en) System and method for internet content collaboration
US8510453B2 (en) Framework for correlating content on a local network with information on an external network
US7676462B2 (en) Method, apparatus, and program for refining search criteria through focusing word definition
US20080250010A1 (en) Method and system for determining and pre-processing potential user queries related to content in a network
CN103136329B (en) More integrated query revised model
KR101554293B1 (en) Cross-language information retrieval
US6970863B2 (en) Front-end weight factor search criteria
US9940398B1 (en) Customization of search results for search queries received from third party sites
US20020069194A1 (en) Client based online content meta search
US6968332B1 (en) Facility for highlighting documents accessed through search or browsing

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model