CN101169780A - Semantic ontology retrieval system and method - Google Patents

Semantic ontology retrieval system and method Download PDF

Info

Publication number
CN101169780A
CN101169780A CN 200610149803 CN200610149803A CN101169780A CN 101169780 A CN101169780 A CN 101169780A CN 200610149803 CN200610149803 CN 200610149803 CN 200610149803 A CN200610149803 A CN 200610149803A CN 101169780 A CN101169780 A CN 101169780A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
ontology
text
index
semantic
file
Prior art date
Application number
CN 200610149803
Other languages
Chinese (zh)
Inventor
伟 王
琦 舒
琦 方
钟杰萍
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

本发明的实施例公开了一种基于语义本体的检索系统,该系统包括语义本体索引数据库和语义本体索引处理单元。 Embodiments of the present invention discloses a semantic ontology-based retrieval system, the system comprising an index database ontology and semantic ontology index processing unit. 语义本体搜索处理单元获取文本命中文件列表,并将文本命中文件列表与语义本体索引数据库中的语义本体索引进行匹配处理,得到文档语义分类表。 Semantic ontology text search processing unit acquires the hit list of files, and text files hit list matching process ontology and semantic ontology index index database to obtain the document semantic classification table. 使得该检索系统能够识别待检索文件的语义信息,并且使搜索结果呈现出了语义的分类结果。 Such that the retrieval system is capable of identifying semantic information to be retrieved file, and the search result showing a semantic classification result. 本发明的实施例还公开了一种基于语义本体的检索方法,该方法先为已建立文本索引的文件建立语义本体索引,在用户进行搜索时,对文本匹配结果进行语义本体索引匹配处理,使得最后的输出结果在传统的文本匹配结果上呈现出了语义的分类,方便了用户的查询。 Embodiments of the present invention also discloses a semantic ontology-based search method, the method to build ontologies index text index file established, when the user performs a search, the results of the matching text index matching process ontology, such that the final output results are presented in the traditional text-matching results of a semantic classification, facilitate the user's query.

Description

一种基于语义本体的检索系统和方法 A searching system and method based on semantic ontology

[0001] 技术领域 [0001] Technical Field

[0002] 本发明涉及信息检索技术,特别涉及一种基于语义本体的检索系统和方法。 [0002] The present invention relates to information retrieval and particularly to a retrieval system and method based on semantic ontology.

[0003] 背景技术 [0003] BACKGROUND OF THE INVENTION

[0004] 随着检索技术的飞速发展,基于文本的信息检索技术也逐渐趋于成熟,形成了一套完整的思路和完善的算法,并被广泛应用到了各类搜索引擎中,如谷歌(Google)、AltaVista、Lycos、雅虎(Yahoo)等。 [0004] With the rapid development of retrieval, text-based information retrieval technology is gradually maturing, form a complete set of ideas and improve the algorithm, and is widely applied to various search engines, such as Google (Google ), AltaVista, Lycos, Yahoo (Yahoo) and so on.

[0005] 图1为现有的一种文本搜索引擎的结构框图。 [0005] FIG. 1 is a block diagram of a structure of a conventional text search engine. 如图1所示,现有的文本搜索引擎包括:蜘蛛控制模块101、统一资源定位(URL)数据库102、网络蜘蛛103、URL提取模块104、网页数据库105、链接信息提取模块106、文本索引模块107、链接数据库108、索引数据库109、网页评级模块110和查询服务器111。 1, a conventional text search engine comprising: a control module 101 spiders, uniform resource locator (URL) database 102, spider 103, URL extraction module 104, web database 105, the link information extraction module 106, text indexing module 107, link database 108, index database 109, a rating module 110, and web server 111 queries.

[0006] 网络蜘蛛103从互联网上抓取网页,并把网页送入网页数据库105。 [0006] spiders crawling pages from the Internet 103, and into the web page database 105. URL提取模块104从网络蜘蛛103抓取的网页中提取URL,并把URL送入URL数据库102。 URL extraction module 104 extracts the URL from the web pages to crawl spider 103, and the URL into the URL database 102. 蜘蛛控制模块101从URL数据库102获取网页的URL,并控制网络蜘蛛103抓取其它网页,重复上述步骤直到把所有的网页抓取完。 Spider control module 101 from 102 acquires the URL of the webpage URL database, and controls the other web spiders crawling 103, repeat the above steps until all pages crawled finished.

[0007] 系统从网页数据库105中获取文本信息,并送入文本索引模块107,由文本索引模块107建立索引,再送入索引数据库109。 [0007] System database 105 obtained from the web page text information, the text indexing module 107 and sent, by the indexed text indexing module 107, 109 and then into the index database. 同时链接信息提取模块106从网页数据库105中获取链接信息,并送入链接数据库108。 While the link information extraction module 106 acquires the link information from the web page database 105, database 108 and sent to the link. 链接数据库108中的链接信息为网页评级模块110提供网页评级的依据。 Link in the link information database 108 provide the basis for web pages rating rating module 110.

[0008] 当用户通过查询服务器111提交查询请求时,查询服务器111在索引数据库109中查找与用户查询请求相关的网页,同时网页评级模块110把用户查询请求和链接数据库108中的链接信息结合起来对搜索结果进行相关度的评价,并通过查询服务器111对搜索结果按照其相关度进行排序,组织最后的页面返回给用户。 [0008] When a user submits a query request by query server 111, the query server 111 to find related pages for a user query requests in the index database 109, and the page rating module 110 user queries and links database 108 the link information combined Search results related to the evaluation of the degree, and sorted according to their relevance to the search query the server 111. As a result, the organization last page returned to the user.

[0009] 现有的文本检索技术虽然能搜索到包含用户的文本查询信息的文件,但是无法识别出搜索到的文件的内容及意义。 [0009] While the existing text retrieval query can search for text files contain user information, but can not identify the content and meaning of the searched file. 这是因为现有的文本检索技术是基于文本字符串匹配的,这种检索技术的问题是,当不同的词可以表示相同的意义或一个词在不同的语境中有不同的意义时,将会限制检索的查准率和查全率,导致搜索到的结果远远不能满足用户的需求,例如,当用户的搜索关键词为“天堂”时,无法判断符合用户搜索条件的文件是反映“天堂游戏”还是“天堂音乐”的内容。 This is because conventional text retrieval techniques is based on text strings match, such a problem that retrieval technique, when different words may represent the same meaning or a different meaning word has different contexts, it will It will limit the retrieval precision and recall, leading to the search results can not meet the needs of users, for example, when a user searches for the keyword "paradise", in line with the user can not determine the search criteria document reflects " heaven game "or" content paradise music ". 而语义网的提出为解决这些问题提供了契机。 The proposed Semantic Web has provided an opportunity to address these issues.

[0010] 语义网是由一群能够被计算机自动控制和识别其内容的网页构成的网络,是在现有的互联网基础上,为网页扩展计算机能够识别的数据,并增加专供计算机使用的文档,即用本体论语言对网页进行标注,明确其语义,从而使得网页信息不但被人所理解,也能被计算机自动控制和识别。 [0010] The Semantic Web is a group of computers that can be controlled automatically and recognize the content of the web page structure, the conventional Internet is based on a web page can be recognized by a computer data expansion, and increasing the document is intended for computer use, i.e., the web pages are denoted by ontology language, the semantic clear, so that only the web page information being appreciated, can also be computer controlled and automatic identification. 语义标注的网页一般以可扩展标记语言(XML)或超文本置标语言(Html)为数据做标注,以资源描述框架(RDF)作为数据描述模型,并结合语义本体,使被标注的数据具有明确的语义。 Semantic annotation pages generally extensible markup language (XML) or Hypertext Markup Language (the Html) do for data band, the Resource Description Framework (RDF) as the Data Model, and bind ontology, so that the marked data having clear semantics. 本体是一个源于哲学的概念,原意是指关于存在及其本质和规律的学说,后被人工智能领域引入,特指对概念化的一个显式的规格说明。 Body stems from a philosophical concept, intended to refer to theory regarding the presence and nature and law, after the introduction of artificial intelligence, especially an explicit specification of a conceptualization. 本体能够将领域中的各种概念及相互关系显式地、形式化地表达出来,从而将术语的语义显式地表达出来,因而在语义查询方面发挥着重要的作用。 Body can be various concepts and relationships explicitly field, formally expressed, thereby semantic terms explicitly expressed, and thus plays an important role in terms of the semantics of the query. 这里指的语义本体定义了组成主体领域概念的基本术语和它们之间的关系,并规定了组合基本术语和它们之间的关系定义词汇的外延规则。 Herein refers to a semantic ontology defines the relationship between the composition of their basic terms and concepts of the subject area, and the predetermined rule epitaxial relationship between the combination of definitions of terms and their basic terminology.

[0011] 语义检索的目的是通过从语义网上获取的数据,增强并改进传统的搜索结果。 Objective [0011] The semantic retrieval by the data obtained from the semantic web, enhance and improve the traditional search results. 图2是现有的一种语义搜索系统的结构框图。 FIG 2 is a block diagram showing the structure of a conventional semantic search system. 如图2所示,现有的语义搜索系统包括:查询接口201、查询预处理模块202、语义本体推理引擎203、标注本体库204、传统搜索模块205和结果返回接口206。 As shown, the conventional semantic search system 2 comprising: a query interface 201, the query pre-processing module 202, ontology reasoning engine 203 denoted ontology 204, traditional search module 205 and interface 206 to return the results.

[0012] 查询接口201获取用户的查询信息,将其发送给查询预处理模块202。 [0012] query interface 201 to obtain the user query information, send a query to a pre-processing module 202.

[0013] 查询预处理模块202分析用户的查询信息,通过切分词技术,将其切分成查询关键词,并发送给语义本体推理引擎203。 [0013] Query preprocessing module 202 analyzes the user's query information, word by segmentation technique, it will cut into a query keyword, and sends ontology reasoning engine 203.

[0014] 语义本体推理引擎203根据标注本体库204中定义的本体概念词汇及概念与概念之间的关系,匹配推理出查询关键词所对应的本体概念词汇,并将其返回给查询预处理模块202。 [0014] The ontology reasoning engine 203 Concept relationship between the body and the vocabulary label concept and the concept defined in the ontology 204, the matching inference query keyword vocabulary corresponding ontology concepts, and returns it to the query pre-processing module 202.

[0015] 查询预处理模块202将语义本体推理引擎203返回的本体概念词汇发送给传统搜索模块205,并指明按照语义搜索。 [0015] The preprocessing module 202 sends the query ontology reasoning engine 203 returns ontology concepts to the traditional word search module 205, and according to specified semantic search. 这里按照语义搜索是指在网页已被标注语义的情况下,按照网页标注的语义概念进行字符串匹配,而不是直接对网页自身的内容进行字符串匹配。 Here in accordance with the semantic search means has been marked in the case where the semantic web, the concept of string matching semantically annotated pages, rather than directly to the content of the page itself string matching.

[0016] 传统搜索模块205进行语义搜索,并将搜索结果发送给结果返回接口206。 [0016] The search module 205 performs traditional semantic search, and returns search results to the results interface 206. 结果返回接口206再将搜索结果返回给用户。 Result is returned search results interface 206 and then returned to the user.

[0017] 可以看出,上述语义搜索系统是将用户查询关键词与标注网页的语义概念词汇进行匹配。 [0017] As can be seen, the above-described system is a user semantic search query keywords matched with the semantic concept of the vocabulary label web.

[0018] 综上所述,现有的文本检索技术虽然能搜索到包含查询关键词的文件,但无法识别出搜索到的文件的语义信息;而现有的语义检索技术不再做关键词检索,导致搜索到的文件包含太多与用户查询信息不相符的结果,而且基于用户查询关键词与语义概念词汇的匹配效率也不尽如人意。 [0018] In summary, the existing text retrieval technology, while able to search for documents containing the query keywords, but can not identify the semantic search information file; and the existing semantic search technology is no longer doing a keyword search , leading to a search for a file that contains too much information to the user query results do not match, but also based on user queries match the efficiency of keyword and semantic concept vocabulary is not satisfactory. 所以,现有的检索技术的搜索准确度不高。 Therefore, the accuracy of existing search retrieval technology is not high.

[0019] 发明内容 [0019] SUMMARY OF THE INVENTION

[0020] 有鉴于此,本发明实施例的主要目的在于提供一种基于语义本体的检索系统,以提高搜索的准确度。 [0020] In view of this, the main object of an embodiment of the present invention to provide a semantic ontology-based search system, in order to improve the accuracy of search.

[0021] 本发明实施例的另一个目的在于提供一种基于语义本体的检索方法,以提高搜索的准确度。 [0021] Another object of embodiments of the present invention to provide a semantic ontology-based search method, to improve the accuracy of the search.

[0022] 为达到上述目的,本发明的技术方案是这样实现的: [0022] To achieve the above object, the technical solution of the present invention is implemented as follows:

[0023] 本发明实施例公开了一种基于语义本体的检索系统,该系统包括: [0023] Example embodiments of the present invention discloses a searching system based on ontologies, the system comprising:

[0024] 语义本体索引数据库,用于保存语义本体索引; [0024] The ontology index database for storing ontologies index;

[0025] 语义本体搜索处理单元,用于获取文本命中文件列表,并将文本命中文件列表与语义本体索引数据库中的语义本体索引进行匹配处理,得到文档语义分类表。 [0025] The ontology search processing unit for acquiring the hit text file list, and a text file listing the hit the ontology matching process ontology index index database to obtain semantic document classification.

[0026] 本发明实施例还公开了一种基于语义本体的检索方法,该方法包括以下步骤: [0026] Example embodiments of the present invention also discloses a semantic ontology-based search method, the method comprising the steps of:

[0027] A、获取已建立文本索引的文件,并为获取的文件建立语义本体索引; [0027] A, acquires the text index file has been established, and the establishment of ontology acquired file index;

[0028] B、获取文本命中文件列表,对文本命中文件列表进行语义本体索引匹配处理,得到文档语义分类表。 [0028] B, to obtain a text file hit list, the hit list of files to text indexing semantic ontology matching process to obtain a document semantic classification table.

[0029] 因此,本发明实施例提供的基于语义本体的检索系统和方法,具有以下优点:先为已建立文本索引的文件建立语义本体索引,在用户搜索时,对用户输入的文本查询信息进行文本索引匹配处理得到文本命中文件列表,再对文本命中文件列表进行语义本体索引匹配处理,得到文档语义分类表,使得文本检索结果具有了语义分类信息,提高了搜索的准确度。 [0029] Accordingly, embodiments of the present invention to provide a retrieval system and a semantic ontology-based, has the following advantages: first create ontology index file established text index, when a user searches for text input by a user query information text index matching process to obtain a text file list of hits, and then the text of the hit list of files indexed semantic ontology matching process to obtain a document semantic classification table, so that the text search results have a semantic classification information to improve the accuracy of the search.

[0030] 附图说明 [0030] BRIEF DESCRIPTION OF DRAWINGS

[0031] 图1是现有的文本搜索引擎的结构框图; [0031] FIG. 1 is a block diagram of a conventional text search engine;

[0032] 图2是现有的语义搜索系统的结构框图; [0032] FIG. 2 is a block diagram of a conventional semantic search system;

[0033] 图3是本发明实施例一种基于语义本体的检索系统的结构框图; [0033] FIG. 3 is a block diagram showing a configuration of the ontology-based retrieval system of the present invention;

[0034] 图4是本发明实施例中的语义本体索引处理单元建立语义本体索引的流程图; [0034] FIG. 4 is an embodiment of the ontology flowchart index unit establishes ontologies indexing process of the present invention;

[0035] 图5是图3所示的本发明实施例检索系统为用户执行搜索过程的流程图; [0035] FIG. 5 is a flowchart illustrating the search process for the user to perform retrieval system according to embodiments of the present invention shown in Figure 3;

[0036] 图6是本发明实施例定义的两个资源描述示意图; [0036] Figure 6 is a schematic view of two embodiments of resources defined in the present embodiment of the invention;

[0037] 图7是由图6推理出的结果示意图; [0037] FIG. 7 is a schematic diagram of the results inferred from Figure 6;

[0038] 图8是本发明实施例中的标注本体库为对实施例中的语义本体词汇建立的关系图; [0038] FIG. 8 is a diagram of the embodiment are denoted by the ontology repository for the ontology vocabulary established in the embodiment of the present invention;

[0039] 图9是图8中的语义本体词汇经过推理后的关系图。 [0039] FIG. 9 is ontology vocabulary in FIG. 8 after reasoning diagram.

[0040] 具体实施方式 [0040] DETAILED DESCRIPTION

[0041] 为使本发明的目的、技术方案和优点更加清楚,下面结合附图及具体实施例对本发明作进一步地详细描述。 [0041] To make the objectives, technical solutions, and advantages of the present invention clearer, the following specific embodiments of the present invention will be described in further detail in conjunction with the accompanying drawings and.

[0042] 图3是本发明实施例一种基于语义本体的检索系统的结构框图。 [0042] FIG. 3 is a block diagram showing a configuration of the ontology-based retrieval system of the present invention. 如图3所示,该系统包括:搜索接口模块301、文档语义分类规则引擎302、搜索处理模块303、语义本体推理引擎304、标注本体库305、索引数据库306、索引处理模块307、文件数据库308和网络文件抓取模块309。 As shown, the system 3 comprises: searching an interface module 301, document semantic classification rules engine 302, the search processing module 303, the inference engine 304 ontology denoted ontology repository 305, index database 306, index processing module 307, document database 308 and network file handling module 309. 其中,搜索处理模块303包括:文本搜索处理单元310、语义本体搜索处理单元311和排序处理单元312;索引数据库306包括:文本索引315和语义本体索引316;索引处理模块包括:文本索引处理单元313和语义本体索引处理单元314。 Wherein the search processing module 303 includes: text search processing unit 310, ontology search processing unit 311 and the sort processing unit 312; index database 306 comprises: the text index 315 and a semantic ontology index 316; index processing module comprises: a text index processing unit 313 semantic indexing unit 314 and the main body process.

[0043] 网络文件抓取模块309主要负责从互联网上抓取网页,并将抓取的网页保存到文件数据库308中。 [0043] Network File Handling module 309 is responsible crawl pages from the Internet and save the crawled pages to a file database 308. 网络文件抓取模块309一般是通过网页抓取程序,例如“网络机器人”或“网络蜘蛛”等,遍历网页空间,扫描一定网际协议(IP)地址范围内的网站,并沿着网络上的链接从一个网页到另一个网页,从一个网站到另一个网站,采集网络文件。 Network file handling module 309 generally through web crawling programs such as "web robots" or "spiders" and so on, traversing the web space, website scanning certain Internet Protocol address range (IP), and links on the network along from one page to another, from one site to another, collecting network file.

[0044] 文件数据库308用于存储供用户检索的文件,包括音频文件、视频文件和文本文件。 [0044] file for the user database 308 for storing the retrieved files, including audio files, video files and text files. 这些文件可以是网络文件,也可以是非网络文件。 These files can be a network file or a non-network file. 文件数据库308中的每一个文件都有一个唯一的文件标识(DocID)。 Each file database 308 has a unique document identification (DocID).

[0045] 索引处理模块307主要负责对已保存在文件数据库308中的文件进行分析,提取出文件内容的关键词、消除重复的文件等,为文件数据库308中的文件建立不同类型的索引信息。 [0045] index processing module 307 is responsible for the file has been saved in a file database 308 is analyzed to extract the contents of the file keyword, eliminate duplicate files, create different types of index information for the file database 308 files. 索引处理模块307包括文本索引处理单元313和语义本体索引处理单元314。 Index processing module 307 includes a text index processing unit 313 and processing unit 314 indexing semantic ontology.

[0046] 文本索引处理单元313是传统的建立文本索引的处理单元,通过分析文件内容,提取关键词和文件的标识信息,建立文本索引。 [0046] Text index processing unit 313 is a processing unit to establish a conventional text index by analyzing the content of the file, the file keyword and the extracted identification information to establish the text index. 鉴于传统的文本索引建立流程是成熟的现有技术,这里不再复述。 Given the traditional text indexing process is a mature art, not repeat them here.

[0047] 语义本体索引处理单元314负责为已建立文本索引的文件建立语义本体索引。 [0047] ontology index processing unit 314 is responsible for establishing semantic ontology index file has established text index. 首先分析已经建立文本索引的文件,判断其是否含有语义标注信息,如果某个文件含有语义标注信息,则提取相关的语义标注信息和文件标识信息,建立该文件的语义本体索引。 Firstly text index files has been established to determine whether it contains semantic annotation information, if a file contains semantic annotation information, the semantic annotation extract relevant information and documents identifying information, the establishment of ontology index the file.

[0048] 索引数据库306用来保存索引处理模块307建立的索引信息,即保存文本索引处理单元313建立的文本索引315和语义本体索引处理单元314建立的语义本体索引316。 [0048] The index DB 306 to save the index information processing module 307 to establish an index, i.e., to save the text index text index processing unit 313 to establish the ontology and semantic index 315 316 ontology index processing unit 314 is established.

[0049] 搜索处理模块303负责处理用户的查询请求,通过匹配用户的文本查询信息和文件的索引信息,将符合用户查询条件的文件以一定的排序顺序反馈给用户。 [0049] Search processing module 303 is responsible for handling the user's query request by matching the user's query text indexing information and files, it will meet the user's query file in a certain sort order back to the user. 搜索处理模块303包括文本搜索处理单元310、语义本体搜索处理单元311和排序处理单元312。 Search processing module 303 includes a text search processing unit 310, ontology search processing unit 311 processing unit 312 and sorting.

[0050] 文本搜索处理单元310负责将用户输入的文本查询信息与文本索引315进行匹配,查询出符合用户查询条件的文本命中文件标识信息。 Text query text index information [0050] The processing unit 310 is responsible for searching the text entered by the user 315 matches, the query text match the user query hit document identification information.

[0051] 语义本体搜索处理单元311负责把文本搜索处理单元310得出的文本命中文件标识信息与语义本体索引316进行匹配处理,对这些文本命中文件标识信息进行语义分类,得到文档语义分类表。 [0051] The ontology search processing unit 311 is responsible for searching the text processing unit 310 hits resulting text file identification information 316 ontology index matching process, the hit on the text file identification information of semantic classification to obtain semantic document classification.

[0052] 标注本体库305和语义本体推理引擎304负责对语义本体搜索处理单元311所产生的文档语义分类表中的本体概念词汇集进行语义推理,得到扩展的语义本体词汇集。 [0052] 305 and the inference engine 304 is responsible for the semantic ontology ontology document annotation semantic ontology class table search processing unit 311 in the generated vocabulary ontology concept semantic reasoning, to give extended ontology vocabulary. 其中标注本体库305保存了定义的语义本体概念词汇集及其语义本体概念之间的关系,语义本体推理引擎304定义了推理规则并执行推理操作。 Which marked ontology database 305 holds the relationship between the semantic ontology concepts and vocabulary definition of the concept of semantic ontology, ontology reasoning engine 304 defines the inference rules and perform reasoning operations.

[0053] 文档语义分类规则引擎302根据语义本体推理引擎304推理出的情况,触发自身定义的语义分类规则,对文档语义分类表进行扩展整合。 [0053] The semantics of the document classification rules engine 302 according to the situation ontology inference engine 304 infers trigger their semantic classification rule definition, the semantic classification of the document extended integration.

[0054] 排序处理单元312负责最后结果的排序优化,即对经过一系列处理,如文本索引匹配、语义本体索引匹配和语义推理扩展等,得到的语义文档分类表,计算其文档的相关性和重要性,并根据计算结果将搜索到的文件排序反馈给搜索接口模块301。 [0054] 312 final sorted result processing unit is responsible for ordering optimization, i.e. semantic document classification through a series of processing, such as text index matching, ontology and semantic reasoning index matching expansion obtained, the correlation is calculated and its documentation importance, and according to the results of the search to sort the files back to the search interface module 301.

[0055] 搜索接口模块301负责本系统和用户的交互操作,将用户输入的文本查询信息转发给搜索处理模块303;并将排序处理单元312的排序结果反馈给用户。 [0055] The search interface module 301 is responsible for the system and user interaction, the user enters query text forwarding information to the search processing module 303; and the sort result of the sort processing unit 312 feedback to the user.

[0056] 索引数据库306保存的文本索引315包括文本正向索引和文本倒排索引。 [0056] index database 306 stored in the text index 315 includes text and text forward index inverted index. 表1是文本正向索引表,表2是文本倒排索引表,如表1和表2所示: Table 1 is a text forward index table, Table 2 is inverted text index table, as shown in Table 1 and Table 2:

[0057] 表1 [0057] TABLE 1

[0058] [0058]

[0059] 表2 [0059] TABLE 2

[0060] [0060]

[0061] 从以上两个表格可以看出,文本正向索引是以文件标识为键值,建立文件标识与关键词之间的映射关系;而文本倒排索引以关键词为键值,建立关键词与文件标识之间的映射关系。 [0061] As can be seen from the above two tables, text forward index file is identified as a key value, establishes a mapping relationship between the identification file and key words; text inverted index and the keyword is key, establish key mapping the relationship between words and file identification.

[0062] 同样,索引数据库306保存的语义本体索引315包括语义本体正向索引和语义本体倒排索引。 [0062] Similarly, ontology Index Index 315 includes a database 306 stored semantic ontology and semantic ontology forward index inverted index. 表3是语义本体正向索引表,表4语义本体倒排索引表,如表3和表4所示: Table 3 ontology forward index table, Table 4 ontology inverted index table, as shown in Table 3 and Table 4:

[0063] 表3 [0063] TABLE 3

[0064] [0064]

[0065] [0065]

[0066] 表4 [0066] TABLE 4

[0067] [0067]

[0068] 语义本体正向索引是以文件标识为键值,建立文件标识与语义标识之间的映射关系;而语义本体倒排索引以语义标识为键值,建立语义标识与文件标识之间的映射关系。 [0068] ontology forward index file is identified as the key to establish the mapping between the semantic document identification and identity; between the inverted index and ontology semantics identified as key to establishing a semantic identity and identity documents Mapping relations.

[0069] 图4是本发明实施例中的语义本体索引处理单元314建立语义本体索引316的流程图。 [0069] FIG. 4 is a flowchart 314 index build ontologies 316 in the embodiment of the semantic ontology index processing unit embodiment of the present invention. 语义本体索引的建立流程是在文本索引处理单元建立了文本索引的基础上进行的,其执行触发条件是文本索引处理单元313已经对某个文件建立了文本索引。 Establish a semantic ontology flow index in the text index is to establish a processing unit on the basis of a text index made its execution triggering condition is text indexing processing unit 313 has established a text to a file index. 参见图4,语义本体索引的建立流程包括以下步骤: Referring to Figure 4, to establish process ontologies index comprises the steps of:

[0070] 步骤401,语义本体索引处理单元314首先读取经过文本索引处理单元313处理,建立了文本索引的文件。 [0070] Step 401, ontologies 314 first reads the index processing unit 313 through processing text index processing unit creates a file in the text index.

[0071] 步骤402,语义本体索引处理单元314判断所读取的文件是否被标注了语义标记。 [0071] Step 402, the index file ontology processing unit 314 determines whether the read is marked semantic tag. 如果该文件标注了语义标记,执行步骤403,否则结束对该文件建立语义本体索引的流程。 If the file is marked with semantic markup, step 403, otherwise the end of the process to establish a semantic ontology index the file.

[0072] 语义标注的文件与没有经过语义标注的文件之间的不同之处在于,语义标注的文件建立了本体概念映射信息。 [0072] The semantic annotation files without differences between the semantic annotation file that the semantic annotation file established ontology concept mapping information. 例如,一个文件标识为9,网址为http://grids.ucs.indiana.edu/ptliupages/publications/index.html的网页的内容主要是描述了有关做研究需要注意的事项,则可以将该网页标注为“研究(Research)”概念。 For example, a file identified as 9, the URL for the Web page content http://grids.ucs.indiana.edu/ptliupages/publications/index.html mainly describes the matters related to doing research note, you can the page labeled "research (Research)" concept. 现有的语义标注信息有些是以注释形式,有些是以XML包形式嵌入网页中的。 Some existing semantic annotation information is annotated form, some form of XML packet is embedded in a Web page. 在本例中,给出一个用斯坦福大学的文本标注工具OntoMat标注的,以注释形式表示的语义标注信息: In the present embodiment, a label is given Stanford text annotation tool OntoMat, semantic representation as a comment annotation information:

[0073] <html> [0073] <html>

[0074] <head> [0074] <head>

[0075] <! [0075] <! --<rdf:RDF xmlns:rdf=″http://www.w3.org/1999/02/22-rdf-syntax-ns#″ - <rdf: RDF xmlns: rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

[0076] xmlns:daml=″http://www.daml.org/2001/03/daml+oil#″ [0076] xmlns: daml = "http://www.daml.org/2001/03/daml+oil#"

[0077] xmlns=″http://annotation.semanticweb.org/iswc/iswc.daml#″ [0077] xmlns = "http://annotation.semanticweb.org/iswc/iswc.daml#"

[0078] <Research rdf:about=″http://grids.ucs.indiana.edu/ptliupages/publications/index.html″ [0078] <Research rdf: about = "http://grids.ucs.indiana.edu/ptliupages/publications/index.html"

[0079] </rdf:RDF> [0079] </ rdf: RDF>

[0080] --> [0080] ->

[0081] <title>Community Grids Publications</title> [0081] <title> Community Grids Publications </ title>

[0082] 本例表示网页http://grids.ucs.indiana.edu/ptliupages/publications/index.html的内容主要是关于“Research”。 [0082] The present embodiment is mainly expressed on page http://grids.ucs.indiana.edu/ptliupages/publications/index.html "Research". 对于用OntoMat工具标注的网页,其语义标注信息放置在Html头部中的注释信息中,以<rdf:RDF开头,以</rdf:RDF>结尾。 For use OntoMat callout web, which is placed in Html semantic annotation information in annotation information in the header to rdf <: at the beginning of the RDF to </ rdf: RDF> end. 因此,当语义本体索引处理单元314检测到语义标注信息是以<rdf:RDF开头,以</rdf:RDF>结尾的,则判定该网页文件是被语义标记标注过的。 Thus, when the ontology index processing unit 314 detects the semantic annotation information is <rdf: at the beginning of the RDF, to </ rdf: RDF> end, the page file is determined to be annotated semantic markup.

[0083] 步骤403,语义本体索引处理单元314读取文件的语义标注信息。 [0083] Step 403, the semantic ontology processing unit 314 reads the index file annotation information.

[0084] 在本实施例中语义本体索引处理单元314读取文件标识为9的网页的语义标注信息,即读取Html头部中的注释信息。 Examples ontologies index processing unit [0084] 314 in the present embodiment reads the file identified semantic annotation information page 9, i.e., annotation information reading Html header. 表5是提取语义标注信息格式表,如表5所示: Table 5 is a table to extract semantic annotation information format as shown in Table 5:

[0085] 表5 [0085] TABLE 5

[0086] [0086]

[0087] 步骤404,语义本体索引处理单元314从读取的语义标注信息当中提取语义本体概念词汇,建立语义本体索引。 [0087] Step 404, ontology processing unit 314 extracts the index ontology vocabulary concept, build ontologies semantic index information among the read annotation.

[0088] 在本实施例中语义本体索引处理单元314调用相关的RDF文档处理应用编程接口(API),从语义标注信息中提取语义本体概念词汇“Research”,建立网页9的语义本体正向索引,并同时转换成语义本体倒排索引,如表6和表7所示。 [0088] In the present embodiment, ontology index processing unit 314 calls the relevant RDF document processing application programming interface (the API), extracted ontology concept word "Research" from the semantic annotation information, establishing a semantic ontology web 9 forward index and simultaneously converted into semantic ontology inverted index, as shown in table 6 and table 7. 表6是网页9的语义本体正向索引,表7是网页9的语义本体倒排索引,如表6和表7所示: Table 6 page 9 ontologies forward index, page 9 of Table 7 is the inverted index ontologies, as shown in Table 6 and Table 7:

[0089] 表6 [0089] TABLE 6

[0090] [0090]

[0091] 表7 [0091] TABLE 7

[0092] [0092]

[0093] 步骤405,语义本体索引处理单元314将建立的语义本体正向索引和语义本体倒排索引保存到索引数据库306中,即形成了语义本体索引316的内容。 Save [0093] Step 405, ontology ontology index processing unit 314 to establish the forward index and the semantic ontology inverted index to the index database 306, i.e., the formation of the content of the ontology 316 index.

[0094] 建立语义本体索引之前,之所以要先经过文本索引处理单元313的处理步骤,是因为在用户搜索时要先查询出符合用户输入的文本查询信息的文件,然后再对这些文件进行语义本体索引匹配处理。 [0094] Prior to the establishment of ontology index, the reason must first go through the process of step text indexing processing unit 313, because the first query when users search query information in line with the text entered by the user's files, then these files semantics body index matching process. 文本索引处理单元313的处理步骤保证了每个建立了文本索引,并且有语义信息的文件,在语义本体索引316中都有对应的语义本体索引信息,从而避免因为直接从文件数据库308读取文件进行语义本体索引匹配而产生的文件具有语义本体索引而没有文本索引的情况。 Text index processing unit 313 ensures that each step of the establishment of a text index, and semantic information files, the semantic index information in the ontology body 316 has a corresponding index, so as to avoid directly read from the document data file 308 semantic ontology documents generated index matching semantic index without the body text index case.

[0095] 图5是图3所示的本发明实施例检索系统为用户执行搜索过程的流程图,如图5所示,包括以下步骤: [0095] FIG. 5 is a flowchart of a retrieval system of the present embodiment of the invention shown in FIG. 3 is a user performs a search process, shown in Figure 5, comprising the steps of:

[0096] 步骤501,搜索接口模块301获取用户输入的文本查询信息,并将其发送给搜索处理模块303。 [0096] Step 501, the interface module 301 acquires text search query information input by the user, and sends it to the search processing module 303. 本实施例中假设用户输入的查询信息为“天堂”。 Example assumes the query information input by the user is "paradise" in the present embodiment.

[0097] 步骤502,搜索处理模块303接收搜索接口模块301发送的文本查询信息,对其进行切分预处理,然后将切分后的查询关键词发送给文本搜索处理单元310。 [0097] Step 502, the processing module 303 receives text search interface module 301 sends the search query information, its partial pre-cut, then after segmentation search query keywords to a text processing unit 310.

[0098] 切分处理的具体过程在现有的描述搜索引擎的相关文献中都有描述,这里不再复述。 [0098] DETAILED slicing process of the process are described in the relevant literature describes a conventional search engine, will not repeat here. 本实施例中文本查询信息“天堂”经过切分预处理后的结果为关键词“天堂”。 Chinese query information present embodiment of the present embodiment, "Heaven" After segmentation result pretreated keyword "paradise."

[0099] 步骤503,文本搜索处理单元310匹配切分后的查询关键词与文本倒排索引,将匹配命中的文本命中文件列表发送给语义本体搜索处理单元311。 [0099] Step 503, after the query keyword and the text processing unit 310 matches the search text segmentation inverted index matching text hit the hit list to the ontology document search processing unit 311.

[0100] 文本搜索处理单元310接收到查询关键词后,向索引数据库306发送读取文本倒排索引的请求信息,索引数据库306根据请求返回文本索引315中的文本倒排索引。 After [0100] The text processing unit 310 receives the search query keyword, sending the text to read the index database 306 requesting information inverted index, the index database 306 returns the text in the text index 315 according to a request inverted index. 文本搜索处理单元310将用户查询关键词“天堂”与文本倒排索引进行匹配,获得一系列包含该关键词的网页文件标识——文本命中文件标识列表,并将文本命中文件列表发送给语义本体搜索处理单元311进行处理。 Text search processing unit 310 of the user query keywords "paradise" with text inverted index matching to obtain a series of web files to identify where the keywords - text file identifies the hit list, the hit list of files and text sent to the ontology The search processing unit 311 for processing.

[0101] 为简单起见,在本实施例中假设只对20个文件建立了索引。 [0101] For simplicity, it is assumed in the present embodiment only embodiment 20 of the index file creation. 表8是索引数据库306返回给文本搜索处理单元310的文本倒排索引表,如表8所示: Table 8 is the index database 306 returns the search text to a text processing unit 310 of the inverted index table, as shown in Table 8:

[0102] 表8 [0102] TABLE 8

[0103] [0103]

[0104] 表8中,每一行对应一个关键词和出现了该关键词的文件标识序列。 [0104] Table 8, each row corresponds to a keyword of the keyword appears and document identification sequence. 其中,文件标识序列的二进制总位数20表示建立索引的总文件个数,每个二进制位代表一个文件,二进制位的位置序号与文件标识序号相同,即第一个二进制位表示标识序号为1的文件,第二个二进制位表示标识序号为2的文件,依次类推。 Wherein the total number of bits of binary sequence file identification number 20 indicates the total file indexing, each bit representing a file, the same number and a bit position of the file identification number, i.e. the first bit of an identification number 1 file, the second bit of an identification number of the file 2, and so on. 若某个二进制位为0,表示相应的关键词没有在对应的文件中出现,若为1则表示相应的关键词在对应的文件中出现。 If the bit is a 0, it indicates that the corresponding keyword is not present in the corresponding file, if it is represented by a corresponding keyword appears in the corresponding file.

[0105] 文本搜索处理单元310将用户查询关键词“天堂”匹配到表8中的“天堂”关键词,将其后的文件标识序列,即文本命中文件列表11011011111110001011取出,发送到语义本体搜索处理单元311。 [0105] The text processing unit 310 searches the user query keywords "paradise" in Table 8 matched "paradise" keyword, the subsequent document identification sequence, i.e., a hit list of files 11011011111110001011 extracted text is sent to the search process ontology unit 311. 文本命中文件列表中二进制位为1的就是命中的文件了。 Hit list text file is a binary bit hit 1 of the document.

[0106] 同理若用户输入的文本查询信息为“天堂应用”,经过切分预处理后得到关键词“天堂”和关键词“应用”,因此只要分别匹配到文本倒排索引中的“天堂”和“应用”两个关键词,将其后的文件标识序列做与操作得到结果01011000100010001010,其中二进制位为1的表示在对应的文件中同时出现了“天堂”和“应用”两个关键词。 [0106] Similarly, if the user enters query text information "application paradise", obtained keyword "paradise" and the keyword "application" After pre-segmentation, so as long as the text are matched to the inverted index "paradise "and" application "two words, the subsequent document identification result obtained with the operating sequence do 01011000100010001010, wherein the binary bit is a 1 appeared in the corresponding file, while" two paradise "and" application "Image .

[0107] 步骤504,语义本体搜索处理单元311获得文本命中文件列表后,首先判断是否进行语义本体倒排索引匹配处理。 [0107] Step 504, after the ontologies text search processing unit 311 to obtain the hit list of files, first determines whether the semantic ontology inverted index matching process.

[0108] 语义本体搜索处理单元311进行判断的依据是文本命中文件的个数,若命中文件的个数大于某个阀值,则进行语义本体倒排索引匹配处理,执行步骤505;否则进行语义本体正向索引匹配处理,执行步骤506。 Under [0108] ontology search processing unit 311 determines that the number of hits a text file, the file if the number of hits greater than a certain threshold, the semantic ontology inverted index matching process, step 505 is executed; otherwise semantically body forward index matching process, step 506 is performed. 阀值可以作为预定义的数值存储在语义本体搜索处理单元311中,也可以是检索系统根据统计规律或其它条件动态调整的数值。 Threshold values ​​can be stored in a predefined semantic ontology search processing unit 311, the retrieval system may be dynamically adjusted based on the statistical laws of other conditions or values.

[0109] 语义本体搜索处理单元311接收到文本命中文件列表11011011111110001011后,累加计算得到这个二进制序列中1的个数为14,即文本命中文件个数为14。 After [0109] ontology search processing unit 311 receives the text file list 11011011111110001011 hit, accumulated calculated number 1 in the binary sequence is 14, i.e., the number of hits text file 14. 假设阀值为10,由于14大于10,因此进行语义本体倒排索引匹配处理。 Suppose the threshold value is 10, because 14 is greater than 10, so semantic ontology inverted index matching process. 若阀值为15,则由于14小于15,进行语义本体正向索引匹配处理。 If the threshold value is 15, because 14 is less than 15, a forward index matching process ontology.

[0110] 步骤505,语义本体搜索处理单元311对文本命中文件列表中的文件进行语义本体倒排索引匹配处理,得到文档语义分类表。 [0110] Step 505, the list of hits 311 pairs text file search processing unit ontologies for semantic ontology files inverted index matching processing, to obtain semantic document classification.

[0111] 首先,语义本体搜索处理单元311向索引数据库306发送读取语义本体倒排索引的请求消息。 [0111] First, the ontology search processing unit 311 transmits the read index ontology database 306 to an inverted index request message. 索引数据库306根据请求返回语义本体倒排索引。 Index database 306 returns ontology inverted index according to the request. 语义本体搜索处理单元311依次读出语义本体倒排索引中的每一条记录,将记录中的文件标识序列与文本命中文件列表做交集操作,即将两个二进制序列进行按位与操作,然后用操作结果覆盖语义本体倒排索引表中对应的文件标识序列。 Ontologies search processing unit 311 sequentially reads the ontology each inverted index record, the record in the text file identification sequence hit list of files intersection operation, i.e. two binary bit sequence operation, then the operation results ontologies covering inverted file index table identifies the corresponding sequence. 最后,过滤掉交集为空的记录,则原来的语义本体倒排索引表就变成了文档语义分类表。 Finally, the intersection is empty filter out the record, the original semantic ontology inverted index table becomes a document semantic classification table. 执行步骤507。 Step 507 is performed.

[0112] 表9是本实施例中索引数据库306返回给语义本体搜索处理单元311的语义本体倒排索引表,如表9所示: [0112] Table 9 shows the embodiment of the present embodiment, the index database 306 returns to the ontology ontology processing unit 311 searches the inverted index table as shown in Table 9:

[0113] 表9 [0113] Table 9

[0114] [0114]

[0115] 表9中假设建立索引的20个文件只涉及五个语义本体概念,即全部文件中的语义标识有五种。 [0115] Table 9 assumes indexed document only 20 five ontology concept, i.e. all semantic identifier file There are five. 每个语义标识后的文件标识序列表示该本体概念在20个文件中出现的情况。 The file identification sequence identifying each semantic concept shows a case where the body 20 appears in the file. 其表示方法同文本倒排索引中的文件标识序列,每个二进制位代表一个文件,二进制位的位置序号与文件的标识序号相同。 The method of representing a text with inverted file index identifying a sequence, each bit representing a file with the same file identification number and the position number of binary bits. 若某个二进制位为0,表示对应的文件没有标注相应的本体概念,若为1表示标注了相应的本体概念。 If the bit is a 0, it indicates that the corresponding file does not designate corresponding ontology concepts, when expressed as 1 are denoted by the corresponding ontology concepts. 例如流行音乐的文件标识序列是01011010110001100000,表示文件标识为2、4、5、7、9、10、14、15的文件被标注成流行音乐的概念,反映了这些文件的内容与流行音乐有关。 Such as popular music file identification sequence is 01011010110001100000, it indicates that the file is identified as 2,4,5,7,9,10,14,15 file is marked as pop music concept, reflecting the content of these documents and pop music related.

[0116] 语义本体搜索单元311读取表9所示语义本体倒排索引中的每一个文件标识序列,与文本命中文件列表11011011111110001011做按位与操作,将操作结果存入表9中对应的文件标识序列的位置,并覆盖原来的文件标识序列,最后过滤掉交集为空,既与操作结果为全零的语义标识项,产生文档语义分类表。 [0116] The search unit 311 reads the ontology as shown in Table 9 ontology inverted index is a file identifier for each sequence, the hit text file list 11011011111110001011 do bitwise AND operation on the operation result stored in the file corresponding to Table 9 location identification sequence, and overwrite the original file identification sequence, and finally filter out the intersection is empty, both the operating result for the semantic identification items all zeros, resulting documents semantic classification table. 表10是产生的文档语义分类表,如表10所示: Table 10 is a document produced semantic classification table, as shown in Table 10:

[0117] 表10 [0117] TABLE 10

[0118] [01]

[0119] 这样,就将文本命中文件列表11011011111110001011按语义分类了。 [0119] In this way, it will be a text file list 11011011111110001011 hit by a semantic classification.

[0120] 步骤506,语义本体搜索处理单元311对文本命中文件列表中的文件进行语义本体正向索引匹配处理,得到文档语义分类表。 [0120] Step 506, the list of hits 311 pairs text file search processing unit ontology document semantic ontology forward index matching processing, to obtain semantic document classification.

[0121] 首先,语义本体搜索处理单元311向索引数据库306发送读取语义本体正向索引的请求消息。 [0121] First, the ontology search processing unit 311 transmits the read index ontology database 306 indexed forward request message. 表11是索引数据库306根据语义本体搜索处理单元311的请求返回语义本体正向索引表,如表11所示: 11 is a table 306 returns the index database according to a request ontology ontology search processing unit 311 of forward index table, as shown in Table 11:

[0122] 表11 [0122] Table 11

[0123] [0123]

[0124] [0124]

[0125] 语义本体搜索处理单元311将文本命中文件列表11011011111110001011转化为具体的文件标识:1、2、4、5、7、8、9、10、11、12、13、17、19、20,并以每一个文件标识为查询条件在语义本体正向索引中匹配对应的记录,得到一个只包含这些文件标识的语义本体正向索引。 [0125] ontology text search processing unit 311 into the hit list of files 11011011111110001011 particular document identification: 1,2,4,5,7,8,9,10,11,12,13,17,19,20, and in each file is identified as corresponding to the query condition matching record in the ontology forward index to obtain a semantic ontology containing only those files identified by forward index. 表12是通过上述过程得到的语义本体正向索引表,如表12所示: Table 12 ontologies obtained through the above process forward index table, as shown in Table 12:

[0126] 表12 [0126] Table 12

[0127] [0127]

[0128] [0128]

[0129] 最后,以表12中出现的每一个语义本体概念为键值,统计出出现该键值的文件标识,完成正向索引到倒排索引的转换,产生文档语义分类表。 [0129] Finally, each ontology concepts appear in Table 12 is the key, the statistics appear that the key document identifier, to complete the forward index inverted index of conversion, generate documentation semantic classification table. 表13是通过上述过程得到文档语义分类表,如表13所示: Semantics of the document table 13 is obtained through the above classification process, as shown in Table 13:

[0130] 表13 [0130] TABLE 13

[0131] [0131]

[0132] 然后执行步骤507。 [0132] Step 507 is then performed.

[0133] 之所以分为语义本体倒排索引匹配处理和语义本体正向索引匹配处理,是考虑到效率问题。 [0133] The reason why the inverted index is divided into ontology and semantic ontology matching process forward index matching process, taking into account the efficiency. 因为在进行语义本体倒排索引匹配处理的过程中,需要用文本命中文件列表依次匹配语义本体倒排索引中的每一条记录,并且做交集操作,这种全表扫描语义本体倒排索引的过程,其计算量开销非常大。 Because the process is performed ontologies inverted index matching process, the hit list text file need sequentially match each ontology inverted index record, and make intersection operation, such a full table scan ontology inverted index process its computational overhead is very large. 因此,当文本命中文件的个数很少时,进行语义本体正向索引匹配处理可以减少计算量。 Thus, when there are few number of hits text file, semantic ontology forward index matching processing calculation amount can be reduced. 但无论用哪种匹配方法,最后产生的文档语义分类表都是相同的,即表13与表10相同。 But no matter what kind of matching method, the final document produced semantic classification is the same, i.e. same as in Table 10 Table 13.

[0134] 步骤507,语义本体搜索处理单元311利用语义本体推理引擎304、标注本体库305和文档语义分类规则引擎对文档语义分类表中的语义词汇进行推理,根据推理结果对语义分类表进行扩展,并将扩展后的文档语义分类表发送给排序处理单元312。 [0134] Step 507, ontology search processing unit 311 using the ontology inference engine 304 denoted ontology 305 and semantics of the document classification rules engine semantic word document semantic classification table reasoning, be extended semantic classification table according to the results of inference , and transmits the extended semantics of the document classification unit 312 to the sorting process.

[0135] 语义本体搜索处理单元311执行完语义本体索引匹配操作后,首先将文档语义分类表中的语义本体概念词汇发送到语义本体推理引擎304进行语义推理。 [0135] ontology search processing unit 311 after completion of the implementation of ontologies index matching operation, first sends the document lexical semantic ontology concept semantic classification of the table to the inference engine 304. The semantic ontology semantic reasoning. 语义本体推理引擎304根据本体标注库305中定义的语义本体概念及其关系和自身定义的推理规则,产生表示语义本体词汇之间关系的RDF文档,返回给语义本体搜索处理单元311。 The inference engine 304 semantic ontology ontology concepts defined in the library 305 denoted ontology and inference rules and their relationship definition, the relationship between the document is generated to represent RDF ontology vocabulary, the process returns to the search processing unit 311 ontology. 然后,语义本体搜索处理单元311将这个RDF文档与文档语义分类规则引擎302中定义的语义分类规则中的触发条件进行匹配,判断哪些语义分类规则需要触发,并触发相应的规则,产生经过推理扩展的文档语义分类表。 Then, ontologies search processing unit 311 triggers semantic classification rules of the RDF document with a document semantic classification rules engine 302 defined in matching to determine which semantic classification rule needs to be triggered, and triggers a corresponding rule, generated through reasoning extension documents semantic classification table. 最后,将扩展后的语义文档分类表发送给排序处理单元312。 Finally, the extended document semantics classification to a sort processing unit 312.

[0136] 本实施例中,语义本体搜索处理单元311将表10或表13中的四个语义本体概念词,流行音乐、电脑游戏、古典音乐、小说,发送到语义本体推理引擎304进行推理。 [0136] In this embodiment, ontology search processing unit 311 in Table 10 or Table four ontology concept words 13, pop music, computer games, classical music, novels, sent to the inference engine 304 ontology reasoning. 语义本体推理引擎304的推理原理是:根据资源的RDF三元组的表示形式,依据定义的推理规则进行推理处理。 Semantic ontology reasoning principles inference engine 304 are: RDF triples according to the representation of resources based reasoning process of the inference rule definition. RDF三元组的表现形式为:(主体,谓词,个体)。 Manifestations :( RDF triples as subject, predicate, individual). 例如定义两个如图6所示的资源描述:深圳601属于广东602;广东602属于中国603。 Such as resource description shown in FIG. 6 defines two: 601 belonging Shenzhen Guangdong 602; 602 Guangdong China 603 belongs. 同时定义一个推理规则为:(?a,属于,?b),(?b,属于,?c)→(?a,属于,?c)。 At the same time a definition of inference rules :(? A, belongs to,? B), (? B, belong,? C) → (? A, belongs to,? C). 该推理规则表达的含义是:如果a属于b,并且b属于c,则可以推理出a属于c。 The meaning expressed by the inference rule is: If a is b, and b belonging to c, it is possible to infer a part of c. 因此,从图6所示的关系可以推理出图7所示的结果:深圳601属于中国603。 Thus, the relationship shown in FIG. 6 may be inferred from the results shown in FIG. 7: 601 Shenzhen, China 603 belongs.

[0137] 假设标注本体库305中对本实施例的四个本体概念建立了如图8所示的关系:流行音乐801的父类为通俗音乐802,通俗音乐802和古典音乐803的父类均为音乐804;小说805的父类为文学806;电脑游戏807的父类为游戏。 [0137] Suppose the ontology database 305 denoted by four ontology concepts of the present embodiment establishes a relationship shown in FIG. 8: 801 pop popular music is parent 802, 802 popular music and classical music are parent 803 Music 804; parent of literary novels 805 806; 807 computer Games parent class for the game. 则经过推理规则推理后得到的四个本体概念的RDF关系如图9所示:流行音乐801和古典音乐803的父类均为音乐804;小说805的父类为文学806;电脑游戏807的父类为游戏808。 RDF concept of the relationship between the four bodies after the inference rules of inference get shown in Figure 9: popular music and classical music 801 803 804 parent classes are music; the father of literary novels 805 806; 807 computer games Father class 808 for the game. 其RDF三元组输出格式为: RDF triples its output format:

[0138] (流行音乐,父类,音乐) [0138] (pop music, the parent class, music)

[0139] (古典音乐,父类,音乐) [0139] (classical music, the parent class, music)

[0140] (小说,父类,文学) [0140] (fiction, parent, literature)

[0141] (电脑游戏,父类,游戏) [0141] (computer game, the parent class, games)

[0142] 文档语义分类规则引擎302中定义了这样一条语义分类规则:若多个三元组存在共同的个体,且谓词为“父类”,则在文档语义分类表中增加新的文档分类,类别名称为该个体的名称,文件标识序列为多个三元组中各主体词汇对应的文件标识序列的并集,即按位或操作的结果序列。 [0142] Document semantic classification rules engine 302 are defined in such a semantic classification rule: If there is a common multiple triples individual and the predicate is "parent class", then add a new document in the document classification semantic classification table, category name for the name of the individual, a plurality of file identification sequence triplets corresponding to each subject word and sets the file identification sequence, i.e. bitwise operations or sequences. 表14是上述的语义分类规则表,如表14所示: Table 14 is the aforementioned semantic classification rule table, as shown in Table 14:

[0143] 表14 [0143] TABLE 14

[0144] [0144]

[0145] 则经过语义推理处理并根据语义分类规则扩展整合后的文档语义分类表。 [0145] After the processing of semantic reasoning and semantic document classification based on the integration of the extended semantic classification rules. 表15是扩展后的文档语义分类表,如表15所示: Table 15 is the extended document semantic classification table as shown in Table 15:

[0146] 表15 [0146] TABLE 15

[0147] [0147]

[0148] 步骤508,排序处理单元312对经过语义推理后的文档语义分类表中的文件进行相关性和重要性的计算,然后按照计算结果对文件进行排序,最后将排序后的结果和文档语义分类信息发送给搜索接口模块301。 [0148] Step 508, the sort processing unit 312 of the document after the semantic reasoning semantic classification table files and importance of correlation calculation, and the calculation result according to sort files, final results will be sorted and semantics of the document Classified information is sent to the search interface module 301.

[0149] 步骤509,搜索接口模块301将接收到的排序结果和语义分类信息作为搜索结果反馈给用户。 [0149] Step 509, the interface module 301 receives the search results will be sorted and the semantic classification information as a search result to the user.

[0150] 以上所述,仅为本发明的较佳实施例而已,并非用来限定本发明的保护范围。 [0150] The above are only preferred embodiments of the present invention only, not intended to limit the scope of the present invention.

Claims (24)

  1. 1.一种基于语义本体的检索系统,其特征在于,该系统包括: An ontology-based retrieval system, characterized in that the system comprising:
    语义本体索引数据库,用于保存语义本体索引; Ontologies index database for storing ontologies index;
    语义本体搜索处理单元,用于获取文本命中文件列表,并将文本命中文件列表与语义本体索引数据库中的语义本体索引进行匹配处理,得到文档语义分类表。 Semantic ontology search processing unit for acquiring a text file the hit list, the hit list of files and text matching process with ontology index ontology index database to obtain the document semantic classification table.
  2. 2.如权利要求1所述的系统,其特征在于,该系统进一步包括语义本体索引处理单元,用于获取已建立文本索引的文件,并为获取的文件建立语义本体索引。 2. The system according to claim 1, characterized in that the system further comprises a semantic ontology index processing means for acquiring text index file has been established, and the establishment of ontology acquired file index.
  3. 3.如权利要求2所述的系统,其特征在于,该系统进一步包括: 3. The system according to claim 2, characterized in that the system further comprises:
    文本索引处理单元,用于为文件建立文本索引; Text index processing unit, for establishing a text index file;
    文本索引数据库,用于保存文本索引; Text index database for storing text index;
    文本搜索处理单元,用于匹配用户的文本查询信息与文本索引数据库中的文本索引,得到文本命中文件列表。 Text search processing unit, for matching the user's query text text text index and index information in the database to obtain a text file hit list.
  4. 4.如权利要求1、2或3所述的系统,其特征在于,该系统进一步包括: 4. The system of claim 2 or claim 3, characterized in that the system further comprises:
    语义本体推理引擎,根据标注本体库中的语义本体词汇集及语义本体词汇之间的关系,对所述文档语义分类表中的语义本体词汇集进行语义推理,得到扩展的语义本体词汇集; Semantic ontology reasoning engine, according to the relationship between the labeling semantic ontology vocabulary in the ontology database and ontology vocabulary, the semantic classification of the document table ontology vocabulary semantic reasoning, the expanded semantic ontology vocabulary;
    标注本体库,用于保存语义本体词汇集及语义本体词汇之间的关系; Annotation ontology, for storing a relationship between the semantic ontology and semantic ontology vocabulary word;
    文档语义分类规则引擎,用于保存语义分类规则,并根据语义本体推理引擎推理出的扩展的语义本体词汇集匹配触发对应的语义分类规则,对所述文档语义分类表进行扩展整合,得到扩展后的文档语义分类表。 Document semantic classification rules engine, for storing semantic classification rules, and triggers a corresponding semantic classification rule based matching ontology vocabulary inference engine infers extended ontologies, semantic classification of the document table extension integrated to obtain the extension documents semantic classification table.
  5. 5.如权利要求4所述的系统,其特征在于,该系统进一步包括排序处理单元,用于对所述扩展后的文档语义分类表中的文件进行排序处理。 5. The system according to claim 4, characterized in that the system further comprises a sort processing unit for the expansion of the semantic classification of the document file sorting process.
  6. 6.如权利要求5所述的系统,其特征在于,该系统进一步包括搜索接口模块,用于将用户的文本查询信息发送给所述文本搜索处理单元;并将所述排序处理单元的排序结果反馈给用户。 6. The system according to claim 5, characterized in that the system further comprises searching an interface module, configured to query the user's text information to the text search processing unit; sort result of the sorting and processing unit feedback to the user.
  7. 7.如权利要求3所述的系统,其特征在于,该系统进一步包括文件数据库,用于存储文件,供所述语义本体索引处理单元建立语义本体索引和所述文本索引处理单元建立文本索引使用。 7. The system according to claim 3, characterized in that the system further comprises a document database for storing files, the index for the processing unit to establish ontology ontology text index and the index processing unit establishes a text index using .
  8. 8.如权利要求7所述的系统,其特征在于,该系统进一步包括网络文件抓取模块,用于从互联网上抓取网络文件,并保存到所述文件数据库中。 8. The system according to claim 7, characterized in that the system further comprises a network file handling module for gripping network file from the Internet and saved to the database file.
  9. 9.如权利要求3所述的系统,其特征在于,所述文本索引包括文本正向索引和文本倒排索引;所述语义本体索引包括语义本体正向索引和语义本体倒排索引。 9. The system according to claim 3, wherein said text index comprises indexing text and forward the text inverted index; body index comprises the semantic ontology and semantic ontology forward index inverted index.
  10. 10.如权利要求1所述的系统,其特征在于,所述语义本体搜索处理单元对文本命中文件列表进行语义本体正向索引匹配处理或进行语义本体倒排索引匹配处理。 10. The system according to claim 1, wherein said search processing unit ontology document hit list of text forward index matching process ontology or ontologies inverted index matching process.
  11. 11.如权利要求10所述的系统,其特征在于,所述语义本体搜索处理单元,在文本命中文件列表中的文本命中文件的个数大于阀值时,进行语义本体倒排索引匹配处理,否则进行语义本体正向索引匹配处理。 11. The system according to claim 10, wherein said ontology search processing unit, when the number of hits a text file in a text document hit list is greater than the threshold, a semantic ontology inverted index matching process, otherwise semantic ontology forward index matching process.
  12. 12.如权利要求11所述的系统,其特征在于,所述的阀值是预先定义的固定数值或可动态调整的数值。 12. The system of claim 11, wherein said threshold is a fixed value or predefined value may be dynamically adjusted.
  13. 13.如权利要求3所述的系统,其特征在于,所述文本搜索处理单元和语义本体搜索处理单元集成在一个搜索处理模块中;所述文本索引处理单元和语义本体索引处理单元集成在一个索引处理模块中;所述文本索引数据库和语义本体索引数据库集成为一个索引数据库。 13. The system according to claim 3, wherein said processing unit and the text search semantic ontology search processing unit integrated in a search processing module; the processing unit and the text indexing semantic ontology index processing unit are integrated in a index processing module; the text index database and index database semantic ontology integrated into an index database.
  14. 14.如权利要求5所述的系统,其特征在于,所述文本搜索处理单元、语义本体搜索处理单元和排序处理单元集成在一个搜索处理模块中。 14. The system of claim 5, wherein said processing unit text search, the search processing unit ontologies and sorting search processing units are integrated into a processing module.
  15. 15.一种基于语义本体的检索方法,其特征在于,该方法包括以下步骤: A semantic ontology-based search method, characterized in that the method comprises the steps of:
    A、获取已建立文本索引的文件,并为获取的文件建立语义本体索引; A, get the file text index has been established, and the establishment of ontology index file obtained;
    B、获取文本命中文件列表,对文本命中文件列表进行语义本体索引匹配处理,得到文档语义分类表。 B, get the text file hit list, the hit list of files to text indexing semantic ontology matching process to obtain a document semantic classification table.
  16. 16.如权利要求15所述的方法,其特征在于, 16. The method according to claim 15, wherein,
    在步骤A之前进一步包括,为文件数据库中的文件建立文本索引的步骤; Further comprising, before the step A, the step of establishing a text index file in the database file;
    在步骤B之前进一步包括,对用户的文本查询信息进行文本索引匹配处理,得到文本命中文件列表的步骤。 Further comprising, prior to step B, text information to the user's query text index matching process, the step of obtaining a text file list of hits.
  17. 17.如权利要求15或16所述的方法,其特征在于,该方法进一步包括以下步骤: 17. The method of claim 15 or claim 16, wherein the method further comprises the steps of:
    C、对所述文档语义分类表中的语义本体词汇集进行语义推理,得到扩展的语义本体词汇集; C, the semantic classification of the document table ontology vocabulary semantic reasoning, the expanded semantic ontology vocabulary;
    D、根据推理出的扩展语义本体词汇集,对所述文档语义分类表执行扩展整合操作,得到扩展后的文档语义分类表。 D, according to the inferred ontology vocabulary extension, the document after performing semantic classification integration operation on the expanded semantic document classification table expanded.
  18. 18.如权利要求17所述的方法,其特征在于,该方法进一步包括:对所述扩展后的文档语义分类表中的文件进行排序处理的步骤。 18. The method according to claim 17, characterized in that, the method further comprising: after the step of ordering the processing of the extended semantic classification table document file.
  19. 19.如权利要求15所述的方法,其特征在于,步骤A中所述建立语义本体索引是建立语义本体正向索引和建立语义本体倒排索引;步骤B中所述对文本命中文件列表进行语义本体索引匹配处理,是进行语义本体倒排索引匹配处理或进行语义本体正向索引匹配处理。 19. The method according to claim 15, wherein said establishing step A ontology ontology is based indexing forward index and the establishment of semantic ontology inverted index; Step B, a hit list of files for text ontologies index matching process is inverted index matching ontology or ontologies process forward index matching process.
  20. 20.如权利要求15所述的方法,其特征在于,在步骤B之前进一步包括:在步骤B中,对文本命中文件列表进行语义本体倒排索引匹配处理,或进行语义本体正向索引匹配处理的判断步骤。 20. The method according to claim 15, characterized in that, further comprising prior to step B: In step B, a hit list of files text ontology inverted index matching process, or forward index matching process ontology the determination step.
  21. 21.如权利要求20所述的方法,其特征在于,所述的判断步骤是:当文本命中文件列表中的文本命中文件的个数大于阀值时在步骤B进行语义本体倒排索引匹配处理,否则在步骤B进行语义本体正向索引匹配处理。 21. The method according to claim 20, wherein said determining step: if the number of hits a text file text document hit list is greater than a threshold value in step B ontology inverted index matching processing otherwise, in step B ontology forward index matching process.
  22. 22.如权利要求21所述的方法,其特征在于,所述的阀值是是预先定义的固定数值或可动态调整的数值。 22. The method according to claim 21, wherein said fixed threshold values ​​Yes or dynamically adjusted value defined in advance.
  23. 23.如权利要求16所述的方法,其特征在于,所述建立文本索引是建立文本正向索引和建立文本倒排索引;所述对用户的文本查询信息进行文本索引匹配处理,是进行文本倒排索引匹配处理或进行文本正向索引匹配处理。 23. The method according to claim 16, wherein the establishing is based indexing text and text to create the literal forward index inverted index; text query the user for information processing index matching text, the text is inverted index matching processing or text forward index matching process.
  24. 24.如权利要求16所述的方法,其特征在于,在所述建立文本索引之前进一步包括:建立文件数据库的步骤。 24. The method according to claim 16, characterized in that, prior to the establishing further comprises text indexes: the step of establishing a database file.
CN 200610149803 2006-10-25 2006-10-25 Semantic ontology retrieval system and method CN101169780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610149803 CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610149803 CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Publications (1)

Publication Number Publication Date
CN101169780A true true CN101169780A (en) 2008-04-30

Family

ID=39390409

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610149803 CN101169780A (en) 2006-10-25 2006-10-25 Semantic ontology retrieval system and method

Country Status (1)

Country Link
CN (1) CN101169780A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799835A (en) * 2010-04-21 2010-08-11 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN101917413A (en) * 2010-07-29 2010-12-15 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN101944099A (en) * 2010-06-24 2011-01-12 西北工业大学 Method for automatically classifying text documents by utilizing body
CN101566984B (en) 2008-07-11 2011-02-09 博采林电子科技(深圳)有限公司 Search engine used in personal hand-held equipment and resource search method
CN102063453A (en) * 2010-05-31 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for searching based on demands of user
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102725759A (en) * 2010-02-05 2012-10-10 微软公司 Semantic table of contents for search results
CN102750277A (en) * 2011-04-18 2012-10-24 腾讯科技(深圳)有限公司 Method and device for obtaining information
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN103020283A (en) * 2012-12-27 2013-04-03 华北电力大学 Semantic search method based on dynamic reconfiguration of background knowledge
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN103177123A (en) * 2013-04-15 2013-06-26 昆明理工大学 Method for improving database retrieval information relevancy
CN103440284A (en) * 2013-08-14 2013-12-11 郭克华 Multimedia storage and search method supporting cross-type semantic search
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104615729A (en) * 2014-10-30 2015-05-13 南京源成语义软件科技有限公司 Network searching method based on semantic net technology
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN104866598A (en) * 2015-06-01 2015-08-26 北京理工大学 Heterogeneous database integrating method based on configurable templates
WO2016009321A1 (en) * 2014-07-14 2016-01-21 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN102750277B (en) * 2011-04-18 2016-12-14 深圳市世纪光速信息技术有限公司 Information acquisition method and apparatus
CN103886099B (en) * 2014-04-09 2017-02-15 中国人民大学 A semantic retrieval system and method for fuzzy concepts
CN107590166A (en) * 2016-12-20 2018-01-16 百度在线网络技术(北京)有限公司 Data generation method and device based on query contents

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566984B (en) 2008-07-11 2011-02-09 博采林电子科技(深圳)有限公司 Search engine used in personal hand-held equipment and resource search method
CN102725759B (en) * 2010-02-05 2015-11-25 微软技术许可有限责任公司 Semantic directory for search results
CN102725759A (en) * 2010-02-05 2012-10-10 微软公司 Semantic table of contents for search results
CN101799835A (en) * 2010-04-21 2010-08-11 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN101799835B (en) 2010-04-21 2012-07-04 中国测绘科学研究院 Ontology-driven geographic information retrieval system and method
CN102063453A (en) * 2010-05-31 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for searching based on demands of user
CN101944099A (en) * 2010-06-24 2011-01-12 西北工业大学 Method for automatically classifying text documents by utilizing body
CN101917413A (en) * 2010-07-29 2010-12-15 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN101917413B (en) 2010-07-29 2013-07-17 清华大学 Service assembly system and method based on service quality optimization and semantic information integration
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102073692B (en) * 2010-12-16 2016-04-27 北京农业信息技术研究中心 Semantic retrieval system and method based on agriculture ontology
CN102750277A (en) * 2011-04-18 2012-10-24 腾讯科技(深圳)有限公司 Method and device for obtaining information
CN102750277B (en) * 2011-04-18 2016-12-14 深圳市世纪光速信息技术有限公司 Information acquisition method and apparatus
CN102799677A (en) * 2012-07-20 2012-11-28 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102799677B (en) * 2012-07-20 2014-11-12 河海大学 Water conservation domain information retrieval system and method based on semanteme
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN102880645B (en) * 2012-08-24 2015-12-16 上海云叟网络科技有限公司 Semantic intelligent search methods
CN103020283A (en) * 2012-12-27 2013-04-03 华北电力大学 Semantic search method based on dynamic reconfiguration of background knowledge
CN103020283B (en) * 2012-12-27 2015-12-09 华北电力大学 Semantic retrieval method based on background knowledge of the dynamic reconfiguration
CN103136360A (en) * 2013-03-07 2013-06-05 北京宽连十方数字技术有限公司 Internet behavior markup engine and behavior markup method corresponding to same
CN103136360B (en) * 2013-03-07 2016-09-07 北京宽连十方数字技术有限公司 An Internet label engine behavior and the behavior of the engine should Tagging
CN103177123A (en) * 2013-04-15 2013-06-26 昆明理工大学 Method for improving database retrieval information relevancy
CN103177123B (en) * 2013-04-15 2016-05-11 昆明理工大学 A method of retrieving information related to a database to improve
CN103440284A (en) * 2013-08-14 2013-12-11 郭克华 Multimedia storage and search method supporting cross-type semantic search
CN103440284B (en) * 2013-08-14 2016-04-20 郭克华 Cross-type that supports semantic search of multimedia storage and search method
CN103886099B (en) * 2014-04-09 2017-02-15 中国人民大学 A semantic retrieval system and method for fuzzy concepts
WO2016009321A1 (en) * 2014-07-14 2016-01-21 International Business Machines Corporation System for searching, recommending, and exploring documents through conceptual associations and inverted table for storing and querying conceptual indices
CN104615729A (en) * 2014-10-30 2015-05-13 南京源成语义软件科技有限公司 Network searching method based on semantic net technology
CN104462060B (en) * 2014-12-03 2017-08-01 百度在线网络技术(北京)有限公司 And by calculating the similarity of text search processing computer implemented method and apparatus
CN104462060A (en) * 2014-12-03 2015-03-25 百度在线网络技术(北京)有限公司 Method and device for calculating text similarity and realizing search processing through computer
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN104866598A (en) * 2015-06-01 2015-08-26 北京理工大学 Heterogeneous database integrating method based on configurable templates
CN104866598B (en) * 2015-06-01 2018-05-08 北京理工大学 Heterogeneous database integration methods can be configured based on the template
CN105335510A (en) * 2015-10-30 2016-02-17 成都博睿德科技有限公司 Text data efficient searching method
CN107590166A (en) * 2016-12-20 2018-01-16 百度在线网络技术(北京)有限公司 Data generation method and device based on query contents

Similar Documents

Publication Publication Date Title
Diao et al. Path sharing and predicate evaluation for high-performance XML filtering
Agrawal et al. DBXplorer: A system for keyword-based search over relational databases
US6519586B2 (en) Method and apparatus for automatic construction of faceted terminological feedback for document retrieval
US6735583B1 (en) Method and system for classifying and locating media content
US6148289A (en) System and method for geographically organizing and classifying businesses on the world-wide web
US7346629B2 (en) Systems and methods for search processing using superunits
Wen et al. Clustering user queries of a search engine
US6199067B1 (en) System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches
US5920859A (en) Hypertext document retrieval system and method
US7567959B2 (en) Multiple index based information retrieval system
US20040143644A1 (en) Meta-search engine architecture
US20110161309A1 (en) Method Of Sorting The Result Set Of A Search Engine
US20090070346A1 (en) Systems and methods for clustering information
US20070192293A1 (en) Method for presenting search results
US6993534B2 (en) Data store for knowledge-based data mining system
US7702618B1 (en) Information retrieval system for archiving multiple document versions
US20060294155A1 (en) Detecting spam documents in a phrase based information retrieval system
US20050114324A1 (en) System and method for improved searching on the internet or similar networks and especially improved MetaNews and/or improved automatically generated newspapers
US20030163302A1 (en) Method and system of knowledge based search engine using text mining
US20090254540A1 (en) Method and apparatus for automated tag generation for digital content
US20030212649A1 (en) Knowledge-based data mining system
US20070260586A1 (en) Systems and methods for selecting and organizing information using temporal clustering
US20110270820A1 (en) Dynamic Indexing while Authoring and Computerized Search Methods
US20090100015A1 (en) Web-based workspace for enhancing internet search experience
US8359191B2 (en) Deriving ontology based on linguistics and community tag clouds

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C12 Rejection of an application for a patent