Connect public, paid and private patent data with Google Patents Public Datasets

Method and apparatus for ordering incidence relation search result

Info

Publication number
CN100524317C
CN100524317C CN 200710163152 CN200710163152A CN100524317C CN 100524317 C CN100524317 C CN 100524317C CN 200710163152 CN200710163152 CN 200710163152 CN 200710163152 A CN200710163152 A CN 200710163152A CN 100524317 C CN100524317 C CN 100524317C
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
method
apparatus
ordering
incidence
relation
Prior art date
Application number
CN 200710163152
Other languages
Chinese (zh)
Other versions
CN101140588A (en )
Inventor
孙小林
文坤梅
李瑞轩
琦 舒
赵艳涛
Original Assignee
华为技术有限公司;华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

本发明公开了一种关联关系搜索结果的排序方法及装置,该方法包括:解析本体的各个实例的三元组信息,构建实例关联关系图;根据输入的两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成搜索结果信息;计算领域相关度和/或关联关系长度和/或关联关系频度;对所述搜索结果信息进行排序。 The present invention discloses a method and apparatus for sorting the search results of a relationship, the method comprising: parsing an example of information of each triplet body, constructed example of FIG relationship; two examples of input, associated with an instance traversing the diagram the two paths of all the relationships between instances, generates search result information; frequency correlation and / or association length and / or computing relationship; the search result information is sorted. 该装置包括:包括:本体解析模块、关联关系搜索模块、关联关系排序模块。 The apparatus comprising: comprising: a body parsing module, a search module relationship, relationship sorting module. 由上述技术方案可知,通过计算领域相关度和/或关联关系长度和/或关联关系频度的参数,可以灵活的对搜索结果进行排序,从而使用户能够更准确、有效的从搜索结果中获取到其想要的信息。 It is seen from the above technical solutions, and / or association length and / or parameters relation between frequency and to be flexible on the search results are sorted by computing the correlation, thereby enabling the user to more accurately and effectively is obtained from the search results the information they want.

Description

一种关联关系搜索结果的排序方法及装置 Ranking method and apparatus of the search result associated relations

技术领域 FIELD

本发明涉及一种搜索结果的排序方法及装置,尤其是一种关联关系搜索结果的排序方法及装置。 The present invention relates to a search results sorting method and apparatus, in particular sorting method and apparatus for a relationship search result.

背景技术 Background technique

当今网络的一个主要目标就是信息共享,即不管什么样的平台、语言和协议, 用户都能访问到需要的信息,这正是当代网络背后的商业需求。 A major goal of today's networks is the sharing of information, that is, no matter what kind of platform, language and protocol, users can access the information you need, this is the network behind the needs of contemporary business. 因此,在过去十年里网络搜索集中在保证网络信息资源能被访问的标准上,语义网的发展以及语义网体系中的层次架构的发展可以使计算机在处理信息资源检索时更有效率。 Therefore, in the past decade focused on the search network to ensure network information resources that can be accessed by standard, level of development of infrastructure and the development of the Semantic Web Semantic Web system can make the computer more efficient in dealing with information resources retrieval.

当前对语义搜索技术的研究主要集中在语义网络资源中实例(即实例或实体) 的搜索,但在现实应用中,人们最感兴趣的并不是网络资源中的实体,而是它们之间的语义关联关系。 The current research on semantic search technology focused on semantic search in instances of network resources (that is, instances or entity), but in real-world applications, network resources are not the most interesting of the entity, but the semantics between them connection relation. 因此相应的研究重点也应该从传统网络中对关键词或语义注解的搜索转向对语义网中资源之间关联关系的搜索。 Therefore, the appropriate research priorities should shift from the traditional network search for relationships between resources in the Semantic Web to search for keywords or semantic annotations. 关联搜索应该能够提供一 Search association should be able to provide a

种有效的方法来回答比如"实体X和实体Y之间是否存在某种语义关联",目前已有针对语义关联的研究,并且取得了一定进展。 Kind of effective way to answer such as "exists between entities X and Y entities some semantic association," there are studies on semantic association, and made some progress.

另一个需要解决的重要问题是在实体关联被搜索之后,如何从用户的角度来决定这些关联关系的重要性,也即怎样对这些关联关系搜索结果进行排序。 Another important issue to be resolved is that after the entity is associated with the search, how to determine the importance of these relationships from the user's point of view, that is how these relationships sort search results. 随着语义资源的日益丰富,相对于实体本身的数量来说,实体之间的关联关系将会超过实体本身。 With the increasingly rich semantic resources, relative to the number of the entity itself, the relationship between the entity will exceed the entity itself. 因此关联关系的排序方法也显得尤为重要。 So the sort of relationship method is especially important. 研究关联关系搜索结果 Research association search results

的排序方法,有助于语义网的进一步发展,同时对语义搜索技术也将起到促进作用。 Sorting method, contribute to the further development of the Semantic Web, while semantic search technology will also play a role in promoting.

语义搜索结杲的排序是语义搜索需要解决的关键技术,知识库中实体之间关系的个数可能会远超出实体本身,传统的结果排序方法只能对文本信息进行排序, 无法识别语义信息,因此不能实现基于语义的结果排序。 Sort knot Gao semantic search is the key semantic search technology need to be addressed, the number of entities in the relationship between knowledge base may be far beyond the entity itself, the traditional method can only be the result of the sort of text information is sorted, it does not recognize semantic information, We can not achieve the sort results based on semantics. 目前多是结合传统搜索引擎结果排序算法和信息检索技术,尝试新的语义搜索结果排序方法,利用语义Web资源的重要性对结果集进行排序,将信息检索实例集中在语义元数据上,试图发现元数据上复杂的关系,提出了一种预测用户需求的排序方法来识别语义关联。 At present, many search engine results is a combination of traditional sorting algorithms and information retrieval techniques, try new semantic search results sorting method, the importance of using Semantic Web resources to sort the result set, will focus on information retrieval instances semantic metadata, trying to discover the complex relationship metadata, proposed a method for predicting the needs of users sorted to identify semantic association. 关联搜索排序方法的研究涉及到多个方面的技术,比如本体论、语义网、链接分析、社会网络学以及统计学等。 Research associated search sort method involves many aspects of technology, such as ontology, Semantic Web, link analysis, social network science and statistics and so on.

本体(0ntology)是共享概念模型的明确的形式化规范说明,在很多领域中, 如知识工程、自然语言处理、信息协同系统、智能信息集成、知识管理等都是研究的热点问题,它提供了一套对特定领域知识的共享的共同认识。 Body (0ntology) is clear and formal specification of a shared conceptualization, in many areas, such as knowledge engineering, natural language processing, collaborative information systems, intelligent information integration, knowledge management is a hot research question, it provides a common understanding of specific areas of knowledge sharing. 在某一领域中, 本体对概念进行了严格定义,通过概念之间的关系来确定概念的精确含义,表示共同认可的、可共享的知识,从而解决同一概念有多种词汇和同一词汇有多种概念(含义)的问题。 In some areas, the bulk of the concepts are strictly defined, to determine the precise meaning of the concept of the relationship between the concepts expressed common recognition of shared knowledge, so as to solve the same concept has a variety of vocabulary words and how the same problem species concept (meaning) of. 本体建模包括一套领域内重要概念的分层结构描述,通过"属性一值,,机制来描述每一个概念的重要属性,概念间的关系通过相应的逻辑语句进行描述,对领域内感兴趣的个体实例赋予一个或多个概念。对关联关系的研究是建立在本体之上的,而关联关系的搜索就是对本体中实体之间关系的搜索,因此, 掌握本体概念和作用是研究语义关联搜索排序的基础。 Concept within the ontology comprising a significant hierarchical structure described in the art, the important properties of each described by a concept of "attribute value ,, a mechanism, the relationship between the concepts described by corresponding logical statements, in the field of interest the individual assigned one or more examples of concept study of the relationship is built on top of the body, and the relationship of the search is the search for relationships between entities in the body, therefore, to grasp the concept and role of ontology is the study of semantic association Search sort of foundation.

语义网(Semantic Web)是万维网发明人Tim Berners-Lee倡导的下一代万维网,旨在赋予万维网上所有资源唯一的标识,并在资源之间建立起机器可处理的各类语义联系。 Semantic Web (Semantic Web) is the World Wide Web inventor Tim Berners-Lee advocated by the next generation of World Wide Web, aims to assign unique identifiers to all resources on the World Wide Web, and set up the machine between the resources to handle all kinds of semantic relations. 2003年提出语义搜索的概念。 In 2003 he proposed the concept of semantic search. 近年来,该领域内逐步展开了相关研究,并取得了初步的发展。 In recent years, in the field and gradually expand the research, and has achieved initial development. 语义网就是本体论在万维网的应用,作为下一代的万维网毫无疑问将影响到网站的构建方式和用户的使用方式上。 The Semantic Web is the World Wide Web Ontology in the application, it will no doubt affect the use of the site and the way to build the next generation of World Wide Web users.

链接分析(Hyperlink analys is)又称为结构分析(structure analysis), 以超链接作为主要输入研究Web的性质,尤其是隐藏的宏观性质。 Link Analysis (Hyperlink analys is), also known as structural analysis (structure analysis), as the primary input of hyperlinks on Web properties, especially hidden macroscopic properties. Web上的链接分 Link on the Web points

析是基于下面两个假设: Analysis is based on the following two assumptions:

假设1: 一个从页面A到负面B的超链接表示的是:页面A的作者对页面B的一 Assumption 1: a representation of a hyperlink from page A to B is negative: A pair of pages of page B

种推荐。 Kind of recommendation.

假设2:如果页面A和页面B是通过超链接连接起来的话,我们就认为它们有 Assumption 2: If page A and page B are linked by hyperlinks, then we think they have

可能是关于同一个主题的。 It may be about the same topic.

如果将页面看作顶点,链接看作有向边,整个Web就可以看作是一个有向图, If the pages seen as the apex, there is seen as a link to the side, the entire Web can be seen as a directed graph,

称为Web图(Web graph),可以用复杂网络理论来进行研究分析。 Figure called Web (Web graph), can be used to study and analyze complex network theory. 目前比较有名的链接分析算法有google的PageRank算法,HITS ( Hyperlink - Induced Topic Search)算法、ARC (Automatic Resource Compilation)算法等等。 Currently more well-known link analysis algorithm google PageRank algorithm, HITS (Hyperlink - Induced Topic Search) algorithm, ARC (Automatic Resource Compilation) algorithm and so on. 虽然它们是用于传统的万维网中的,但是链接分析将整个Web看作是一个有向图的概念。 Although they are used in traditional World Wide Web, but the entire Web link analysis is seen as a directed graph of the concept. 本体中实体之间的关联关系与计算机学科的另一个分支社会网络学研究有某 Another body of research branch of social networks and relationships between entities in computer science have a

7些的相似处,可以利用社会网络学研究的已有成果来获得用户所关心的关联关系有那一些,因此需要对社会网络学的研究现状有一个比较清楚的了解。 7 of these similarities, we can use the results of existing studies of social networks to obtain user associations are concerned that there are few, therefore the need for a clearer understanding of the status of social network research studies. 社会网络的研究缘于社会学、人类学、传染病学等学科的发展,逐渐地社会学家将其发展 Social networks due to the development of the discipline of sociology, anthropology, and other infectious diseases, is gradually sociologists to develop it

为强大的工具——社会网络分析(social network analysis, SNA )。 As a powerful tool - social network analysis (social network analysis, SNA). SNA通过映射和分析团体、组织、社区等内部人与人之间的关系,提供丰富的、系统的描述和分析社会关系网络的方法、工具和技术。 SNA by the relationship between the human internal mapping and analysis of groups, organizations, communities, provide a rich, description and methods for analyzing social network of systems, tools and techniques. SNA分析问题的理论视角主要集中在行为者之间的关系(网络拓朴结构)而不是行为者的某些特性上,并且强调行为者之间相互影响、依赖,从而产生整体涌现行为。 SNA theoretical perspective to analyze problems focused on certain characteristics of the relationship between actors (network topology) rather than actors, and emphasized the mutual influence between actors, dependence, resulting in overall behavior emerge. 社会关系网络是由多个节点(行为者)和节点之间的连线(行为者之间关系)组成的集合,用节点和连线来表示网络,这就使社会网络的分析得到较好地形式化界定。 Social network is a collection of connections (relationships between actors) between a plurality of nodes (actors) and nodes, with nodes and links to represent the network, which makes the social network analysis to get better formal definition. 因此,社会网络的数据至少应包括结构变量(structural variable )和组成变量(composition variable)。 Thus, the social network data at least include a variable structure (structural variable) and variable composition (composition variable). 结构变量测量两个行为者之间的某种特定关系,它是社会网络数据集的基石。 Structural variables measuring a particular relationship between two actors, it is the cornerstone of social network data sets. 例如,它可以测量人与人之间的信息、知识流动,或者企业间的贸易、投资等。 For example, it can measure the information, knowledge flows between people, or between enterprises of trade and investment. 组成变量,或者说是行为者的属性变量,通常是单个行为者层面的描述。 Variable composition, or that actors attribute variable, usually a single-level actors are described. 例如,它可以测量行为者的性别、专业,或企业的行业、规模等。 For example, it can measure gender actors, professional, or business industry, size and so on.

统计学是一门研究随机现象,以推断为特征的方法论科学,"由部分推及全体" 的思想贯穿于统计学的始终。 Statistical research is a random phenomenon, to infer characterized by scientific methodology, "a part of the push and all" thinking throughout the statistical always. 具体地说,它是研究如何搜集、整理、分析反映事物总体信息的数字资料,并以此为依据,对总体特征进行推断的原理和方法。 Specifically, it is the study of how to collect, compile and analyze information reflecting the overall things digital information, and as a basis, principles and methods of overall characteristics inferred. 用统计来认识事物的步骤是:研究设计一〉抽样调查一〉统计推断一〉结论。 Step by statistics to know things are: Design a> a sample survey> a statistical inference> conclusion. 这里,研究设计就是制定调查研究和实验研究的计划,抽样调查是搜集资料的过程,统计推断是分析资料的过程。 Here, the study design is to develop research programs and experimental studies, sampling is the process of gathering information, statistical inference is the process of analyzing data. 显然统计的主要功能是推断,而推断的方法是一种不完全归纳法,因为是用部分资料来推断总体。 The main function is obviously statistical inference, but inference method is an incomplete induction, because some of the information is used to infer a whole. 实体之间的关联关系数量是巨大的, 如果纯粹用人工计算的方式从中得出用户使用的侧重点是不可能的,只用利用一 The number of relationships between entities is great if calculated purely artificial way to draw the focus of users is not possible, only use a

种推理,分析的技术,由局部的抽样调查来得到全局总体的结论,而统计学正是这样的一门学科。 Kind of reasoning, technical analysis, a sample survey by the local population to get the global conclusions, but statistically it is such a subject.

现有搜索技术中,针对关键词的检索,只是在全文中做简单的词语匹配,也即对实体的检索,检索到的结果远不能符合用户的要求,因此,出现了是针对实体之间的关联关系进行的检索,但是现有的针对关联关系的检索中,不存在对关联关系的检索结果进行再进行排序的技术方案,使得用户无法高效、准确的搜索到其想要获得信息,无法满足用户的需求。 Existing search technology for keyword search, just do the simple words throughout the text matching, that retrieval of the entity, the retrieved results far from meet the user's requirements, therefore, is for emergence between entities retrieval of association, but the existing retrieval for the association, the association does not retrieve the result of the presence of another sort of technical solution, so that the user can not efficiently and accurately search for information they want to, can not satisfy needs of the user. 发明内容 SUMMARY

本发明实施例的目的是提供一种关联关系搜索结果的排序方法及装置,以使用户能够更准确、有效的从搜索结果中获取到其想要的信息。 Object of embodiments of the present invention to provide a relationship of the search results sorting method and apparatus to enable a user to more accurately and effectively obtain search results from the information they want.

为实现上述目的,本发明实施例提供了一种关联关系搜索结果的排序方法,包 To achieve the above object, an embodiment provides a method of sorting the search results according to the present invention, a relationship, the package

括:解析本体的各个实例的三元组信息,根据各个实例的三元组信息构建实例关联关系图;根据输入的所述本体中的任意两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间的所有关联关系的搜索结果信息;根据所述搜索结果信息,计算领域相关度、关联关系长度或关联关系频度; 根据领域相关度或关联关系长度或关联关系频度,或者根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序;其中,每个搜索结果信息的领域相关度""通过如下公式计算: Comprising: parsing each instance of the triplet information body, constructed in accordance with FIG example of association information of each triplet instance; two instances of the body according to any of the input, traversing the example of FIG association two relationship between the paths of all the examples, all search results are generated association relationship information between two instances; frequency according to the search result information correlation computing, length or association relationship; the correlation or association fIELD relationship between length or frequency relationship, or according to the related art degree, length relationship, relationship to the frequency of any combination of sorting the search result information; wherein, the information field of each search result relevancy "" by formula:

其中,i?为所述每个搜索结果信息对应的关联关系: Wherein, i is the search result information corresponding to each association?:

及={0'^,02,尸2,03,……,0,,—p《—其中n等于/e"g晰i?); /e"g晰i?)为该关联关 = 0 and { '^, 02, dead 2,03, ......, 0 ,, - p "- wherein n is equal to / e" g Xi i);?? / E "g Xi i) for the associated relationship

系的路径长度;"为调整因子,0<"<1;"为在所述每个搜索结果信息对应的关联关系中,属于用户感兴趣的领域D的实例Q和属性《的集合: K = {O, w"《")n(《ei?)n(0, eD)n(f eD)} Path length of the line; "adjustment factor, 0 <" <1; "for example in the D field of each search result corresponding to the association information, interested users belonging to Q and Properties" set: K = {O, w "" ") n (" ei?) n (0, eD) n (f eD)}

^为在所述每个搜索结果信息对应的关联关系中,不属于用户感兴趣的领域 ^ Each search result as the corresponding relationship information, the user does not belong to areas of interest

"的实例"和属性《的集合:iV, = w S ei?) n (fei?) ng £>) n Z))}; ? Set "instance" and Properties ": iV, = w S ei) n (fei) ng £>) n Z))};?

每个搜索结果信息的关联关系长度&通过如下公式计算: £ 一1 丄一i^_ The length of each search result association information is calculated by the following formula &: £ 1 a a i ^ _ Shang

每个搜索结果信息的关联关系频度&通过如下公式计算: Frequency of association for each search result information & calculation by the following formula:

为关联关系的相对出度;w"为关联关系^的相对入度。 本发明实施例还提供了一种关联关系搜索结果的排序装置,包括:本体解析模块,用于解析本体的各个实例的三元组信息,根据各个实例的三元组信息构建实例关联关系图;关联关系搜索模块,用于根据输入的所述本体中的任意两个实例, 遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间 For the relative degree of association relationship; w "is a relative relationship ^ embodiment of the present invention further provides a search result sorting apparatus association, comprising: a body parsing module configured to parse the body of each example triplet information, examples of the association relationship information in FIG constructed in accordance with various examples of triplet; relationship search module, for example in accordance with any of the two bodies inputted traversing in the example of FIG association two examples All relationships between the path generated between two instances

的所有关联关系的搜索结果信息;关联关系排序模块,用于根据所述搜索结果信息,计算领域相关度、关联关系长度或关联关系频度;根据领域相关度或关联关系长度或关联关系频度,或者根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序;其中,每个搜索结果信息的领域相关度""通过如下公式计算: All results association relationship information; association relation ordering module, according to the search result for frequency information, computing the correlation, the length or association relationship; The relationship correlation or relationship or the length of the field frequency , or by sorting the search result information according to the related art degree, length relationship, any combination of the frequency relationship; wherein the information field of each search result relevancy "" is calculated by the following equation:

nw丄,,w、 W ,! nw Shang ,, w, W,! kl 、 kl,

£)„ =^ + (1 —j)x-^-x(l--^-) £) "= ^ + (1 -j) x - ^ - x (l - ^ -)

k /e"洲i?) /e"g,/z(i?) k / e "continent i?) / e" g, / z (i?)

其中,^为所述每个搜索结果信息对应的关联关系: Wherein ^ said search result information corresponding to each association:

/? = {0,,/^02,尸2,03……,,;,,0,,},其中n等于/e"-(/?); ^"g快(i?)为该关联关 ? / = {0,, / ^ 02, P 2,03 ...... ,,; ,, 0 ,,}, where n is equal to / e "- (/?); ^" G fast (? I) for the associated turn off

系的路径长度;"为调整因子,0<〃<1;"为在所述每个搜索结果信息对应的关联关系中,属于用户感兴趣的领域"的实例Q和属性《的集合: y = {(9, o, S e A)n(S e /?)n(O, e D)n(《e D)} Path length of the line; "adjustment factor, 0 <〃 <1;" each of the search result corresponding to the association information, the user is in the field of interest "Examples of Q and Properties" set: y = {(9, o, S e A) n (S e /?) n (O, e D) n ( "e D)}

^'为在所述每个搜索结果信息对应的关联关系中,不属于用户感兴趣的领域D的实例0'和属性5的集合:yV, = {<9, or 51(0, ei?) n (《ei?) n (0, g £>) n (fg ")}; 每个搜索结果信息的关联关系长度"通过如下公式计算:丄"=~ ~~ 丄w = 1 — ^ 'Is the association of each search result corresponding to the information, are not examples of the art of interest to the user D 0' and a set of attributes 5: yV, = {<9, or 51 (0, ei?) n ( "? ei) n (0, g £>) n (fg")}; relationship of the length of each search result information "is calculated by the following equation: Shang" Shang ~ = ~ w = 1 -

每个搜索结果信息的关联关系频度F«通过如下公式计算: Each search result information in association frequency F. «Calculated by the following equation:

尸《 = Corpse "=

为关联关系的相对出度;为关联关系/?的相对入度。 The relative degree of relationship;? Is the relationship / relative penetration. 由上述技术方案可知,本发明实施例通过计算领域相关度和/或关联关系长度和/或关联关系频度的参数,可以灵活的对搜索结果进行排序,从而使用户能够更准确、有效的从搜索结果中获取到其想要的信息。 The above technical solutions apparent from, by computing correlation and / or association length and / or parameters relation between the frequency of embodiments of the invention, the flexibility of the search results are sorted, thereby enabling the user to more accurately and effectively from search results to get the information they want.

下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。 The following drawings and embodiments, detailed description of the further aspect of the present invention. 附图说明 BRIEF DESCRIPTION

图1为本发明的实施例一的关联关系排序方法流程图; FIG embodiment 1 of the present invention a sort of relationship flowchart of a method embodiment;

图2为本发明的实施例二的关联关系搜索结果的排序装置的结构示意图;图3为本发明的实施例三的本体结构示意图一; Sorting device according schematic structural embodiment search results according to a second relationship of FIG 2 of the present invention; body structure according to a third embodiment of the present invention. FIG. 3 a schematic diagram;

图4为本发明的实施例三的束休结构示意图二: Third Embodiment beam off two structural diagram of the present invention FIG 4:

图5为本发明的实施例三的用户界面示意图; The user interface according to a third embodiment of the present invention. FIG. 5 a schematic view;

图6为本发明的实施例三的搜索结果界面示意图; Search results interface according to a third embodiment of the present invention, FIG. 6 is a schematic diagram;

图7为本发明的实施例四的用户界面示意图一; The user interface in FIG. 7 according to a fourth embodiment of the present invention, a schematic diagram;

图8为本发明的实施例四的搜索结果界面示意图一; Search results interface according to a fourth embodiment of the present invention. FIG. 8 a schematic diagram;

图9为本发明的实施例四的用户界面示意图二; The user interface according to a fourth embodiment of the present invention. FIG. 9 Diagram II;

图IO为本发明的实施例四的搜索结杲界面示意图二; Gao search result screen according to a fourth embodiment of the present invention, FIG IO Diagram II;

图11为本发明的实施例四的搜索结果界面示意图三; Search result screen in FIG. 11 according to a fourth embodiment of the present invention. FIG three;

图12为本发明的实施例四的用户界面示意图三; The user interface according to a fourth embodiment of the present invention. FIG. 12 a schematic view of three;

图13为本发明的实施例四的搜索结果界面示意图四; Search result screen 13 according to a fourth embodiment of the present invention. FIG four;

图14为发明实施例的系统排序结果和人为排序结果数据对比图。 FIG 14 is a system of the invention to sort the results of Examples and Comparative Results human sort embodiment of FIG.

具体实施方式 detailed description

语义关联关系的搜索即对本体中实例之间的语义关联进行搜索,即在某个知识领域中的两个实例,如果它们通过一个或多个属性直接连冲妄在一起,或者是相似(相同或衍生)的属性间接连接在一起,就称之为语义关联。 Searching for a semantic relationship between semantic association, i.e. for example the body of search, i.e., in a field of knowledge in two instances, even if they are washed together jump directly through one or more attributes, or similar (identical or derived) attributes indirectly connected together, it is called semantic association. 这种关联关系构成了一个关联关系图,各个实例即为图中的各个节点。 This constitutes a correlation relationship diagram, that is, each instance of each node in the graph. 语义关联是基于资源描述框架(Resource Description Framework,简称RDF)属性序列的观点,可以看作知识库中做了标记的路径。 Semantic association is based on the Resource Description Framework (Resource Description Framework, referred to as RDF) view sequence properties, can be regarded as a knowledge base to do a tag path.

语义关联结果的排序依赖于统计学、链接分析、社会网络和词法等相关技术。 Sort semantic association results depend on statistics, link analysis, lexical and other social networks and related technologies.

在语义关联排序方法中,本发明实施例主要考虑了几种关键的排序标准尺度供用户在搜索时根据自己的需求来设置。 In the semantic relevance sorting process, embodiments of the present invention is mainly considered several key ordering criteria for the user to set the scale in accordance with their needs at the time of the search.

设计合适的排序方法,首先必须能识别出影响排序的关键因素。 Design appropriate sorting method, you must first be able to identify the key factors affecting the ranking. 对任意搜索2 = (<9',0«),该搜索表示用户希望查询实例Q和"之间存在的关联关系。其查询结果为关联关系W=W,i^2,g"3,"《—^J,其由关联关系路径上的所有实例Q如属性尸的集合—构成。以下三种园素—为影响排序的关键因素,分别为领域相关度、语义关联路径长度和关联关系频度。 To virtually any 2 = (<9 ', 0 «) which indicates the user wishes to search for a query Q and Examples" relationship between As a result of the query relationship W = W, i ^ 2, g "3," "- ^ J, which is by all instances of Q as a set of attributes on the corpse association path - configured Park three factors - is a key factor affecting the ordering, respectively correlation art, semantic association and relationships frequency path length degree.

1) 领域相关度是指在某个关联关系中,出现的所有实例及属性与用户感兴趣领域的相关性大小,其大小记为^。 1) Related art refers to all instances of attributes to the user areas of interest and relevance in a relationship in size, appears, which is referred to as ^ size.

领域相关度可由用户自行调节与赋值,用户可能会对某些领域更感兴趣,不同的用户,其感兴趣的领域也会发生变化。 Relevance fields User-adjustable with the assignment, the user may be more interested in certain areas, different users, its areas of interest will change. 在具体应用中,可划分出若干领域,并对不同领域赋予不同的权值。 In a specific application, it can be divided into several areas, and different weights given to different areas. 对用户感兴趣领域,可赋予相对较高的权值。 Areas of interest to the user can be given a relatively higher weights. 某个领域由与之相关的所有实例和属性集合组成。 A field by all instances and attributes associated with the collection of. 如学术领域一般包含实例"教师"、"论文"、"课程"和属性"发表"、"授课"等。 As the academic field generally contains examples of "teacher", "paper", "course" and attribute "published", "teaching" and so on. 某关联关系中属于用户感兴趣领域的实例和属性越多,则该关联关系的领域相关度应该越大。 The more the user belongs to a relationship instance and attribute areas of interest, the relationship of the degree of correlation should be greater art. 相关度越大的关联关系,则是用户更感兴趣的结果。 The greater the degree of correlation relationship, the user is more interested in the result.

2) 语义关联长度是指连接两实例之间语义关联路径的长度对关联关系结果排序的影响,其大小记为^。 2) Effect of semantic association length is the length of the connection path between semantic association of two examples of sorting relationships result, the size of which is denoted ^.

对查询结果A-^,A^,S,A……,K"W,其关联路径的长度为"。 Query Results A - ^, A ^, S, A ......, K "W, the length of which is associated path."

一般情况下,两关联实例路径长度越短,则说明其关系越重要。 In general, examples of the shorter path length associated with two, then the more important relationship. 某些情况下,则相反。 In some cases, the opposite. 例如在国家外汇部门或者安全部门,用户期望通过复杂的关联关系发现潜在的犯罪嫌疑人或者恐怖分子,用户感兴趣的信息可能隐含在较长的语义关联关系中,这时较长的关联关系相对于较短的一般关联关系,应该被赋予更高的相关度。 For example in the foreign exchange department or security department, and users expect through complex relationships to identify potential terrorist suspects or information, interested users may be implicit in the semantic relationships in a long, long relationship at this time relative to the generally shorter relationships, should be given a higher degree of correlation.

3) 关联关系频度是指在某个关联关系中,出现的所有实例的出入度对关联关系结果排序的影响,其大小记为&。 3) frequency relationship affects all means of access to instances of a certain relationship, the relationship appears to sort the results, which is referred to as & size.

在计算关联关系频度之前,首先需要了解实例的入度和出度的概念。 Before calculating the correlation relationship between the frequency, first need to understand the concept of degree and out of the instance. 类似网页排序(PageRank)技术, 一个实例有更大的入度和出度,则表明其具有更高的重要性。 Similarly PageRank (the PageRank) techniques, one example greater degree and out of, it indicates that it has a higher significance. 如在教育领域,作为"学校,,这一概念的两个具体实例"清华大学"和"长江大学",在RDF图中,"清华大学"具有更大的入度和出度,表明"清华大学,,相对于"长江大学"而言,"清华大学"具有更高的知名 As in the field of education, as "school ,, the concept of two specific examples of" Tsinghua University "and" Yangtze University, "in RDF graph," Tsinghua University "has a greater degree and out-degree, suggesting that" Tsinghua University ,, relative "Yangtze University", "Tsinghua University" has a higher well-known

/夂。 / Fan. 六々j尺向m /、乂夂日'、j'六—wii j 口j屑i乍疋叉刀"里安"曰3头1夕'j,》亇丁 J 々 six feet to m /, qe Fan day ', j' six cuttings -wii j j i port at first fork blade piece goods "Dorian" said Xi 1 3 'j, "Ma D

包含了"重要"实例的关联关系,在排序时可赋予较高的权值。 It contains a relationship "important" instance, when sorting can be given a higher weight.

对搜索2 = (G',《),其搜索的结果遍历实例关联关系图中这两个实例间 Search 2 = (G ', "), the result of the search traversal example in FIG relationship between the two instances

的所有关联关系的路径后,得到的关联关系集: 《;W,^02,,尸2,.,03,.,……,0(„,《„,0丄/ = 1,2"."附。 Set associative relationship between the paths of all of the relationships, obtained: "; W, ^ 02 ,, dead 2,, 03,, ......, 0 (..", "", 0 Shang / = 1 ". "attached.

实施例一 Example a

以这三种影响排序的关键因素为基础,本发明实施例提出了一种关联关系排序方法,如图l所示包括如下步骤: In these three key factors affect the ordering is based, an embodiment of a sorting method according to the present invention, association, shown in Figure l comprises the steps of:

步骤l、解析本体的各个实例的三元组信息,根据各个实例的三元组信息构建实例关联关系图; Step L, the triplet information parsed each instance of the body, constructed in accordance with FIG example of association information of each instance of triplet;

步骤2、根据输入的所述本体中的任意两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间的所有关联关系的搜索结果信息; Step 2, two instances of the body according to any of the input, the traversal path in the example of FIG association relationship between all instances of two, to generate search results for all of the relationships between the two instances of information;

根据用户输入的两个实例,到已经建好的实例关联图中进行搜索,搜索方式为图的深度优先搜索,在搜索过程中将所有联系这两个实例的其它实例和属性保存下来。 The two examples of user input, to the example of FIG association has been built in search, depth first search is a search method of FIG, preserved during the search for all instances of the connection between the two other examples and attributes. 利用图搜索算法搜索两实例之间存在的所有关联关系。 Search all the relationship instances between the two using a graph search algorithm. 将提取出的三元组存储在邻接矩阵中,此邻接矩阵即为该本体中所有实例之间直接关联关系在数据结构中图的表示。 In the adjacency matrix, the adjacency matrix of the body is the direct correlation between the extracted all instances triple store is a diagram showing a relationship in the data structure. 采用深度优先的图搜索算法,算法首先从图中某个顶点vl出发(即为用户输入的起始搜索实例对应的图中的节点),访问此顶点,然后依次从vl的未被访问的邻接点出发继续按深度优先探测,并将所有访问过的顶点作一个visited (已访问)标识,以便以后再次遇到这个顶点时可以跳过。 FIG depth-first search algorithm, the algorithm first from a vertex vl figure (ie Examples of user input initiating searches corresponding nodes in the graph), access this vertex, and have not been accessed sequentially from the adjacent vl starting point continue to probe the depth-first, and all the vertices visited as a visited (visited) identification, in order to meet again later when you can skip this vertex. 当到达的点正好是要求的终点v2 (即为用户输入的终止搜索实例对应的图中的节点),则说明找到了一条起始点vl到终点v2的路径,将该路径保存下来,并继续搜索,直到将所有连接vl和v2的路径被搜索出来。 When the end point reaches v2 is exactly required (i.e. corresponding to the example of FIG terminate the search user input node), then find a starting end of the path vl to v2, and the preserved path, and continues searching until all paths connecting vl and v2 are searched out. 否则如果所有的顶点都探测完毕后仍然没有找到任何路径,则说明vl和v2之间不存在关联关系。 Otherwise, if all vertices are still probing after completion did not find any path, then there is no relationship between vl and v2.

13步骤3、根据所述搜索结果信息,计算领域相关度和/或关联关系长度 13, Step 3, the relationship between the length of the search result according to related information, the computing and / or associated

和/或关联关系频度; And / or frequency relationship;

步骤4、根据领域相关度或关联关系长度或关联关系频度、或者根据 Step 4, according to the related art or the degree of relationship between the length of the frequency relationship or association, or according to

领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序。 Correlation art, the length of the relationship, any combination of the frequency relationship to sort the search result information.

在上述步骤3中,领域相关度、关联关系长度、关联关系频度可以采用如下计算方法: In the above step 3, the field correlation, the length relationship, the frequency relationship can be calculated using the method:

(1)计算领域相关度A。 (1) calculate the correlation degree field A.

首先获取关联关系中的实例所属类(实例的概括、抽象描述),然后将此类与用户所选择的感兴趣的领域的类相比较,如果匹配就说明此类是用户希望重点强调的类,获取所有关联关系路径中的所有此类的实例及属性,根据获取到的实例及属性,通过如下公式计算。 Examples acquired first in their class association relationship (outlined instance, abstract description), and the class of such field of interest selected by the user is compared, if the match is to illustrate such user wishes to emphasize the class, Get all the examples and the properties of all such relationships in the path, based on the acquired instance and attributes, it is calculated by the following equation.

对搜索^-(Q"'》,设定用户感兴趣领域为D。 Search ^ - (Q " '", a user set field of interest is D.

属于领域"的实例和属性集合为: And examples of attribute set is in the field "as:

"={(9, or《e /?) n (尸")n (<9, e D) n (fe ")} "= {(9, or" e /?) N (dead ") n (<9, e D) n (fe")}

不属于领域。 It does not belong to the field. 的实例和属性集合为: Examples and attribute set to:

7V, = {<9, o, S l(Q ") n (《e /?) n (0, g £>) n (《g D)} 7V, = {<9, o, S l (Q ") n (" e /?) N (0, g £>) n ( "g D)}

则关联关系的领域相关度为: The relationship of the correlation areas:

其中,/e"g晰7?)表示关联关系的长度,c/是为了避免^^而设定的调整因子,"的大小设定在0和1之间,可自行设定, 一般取0<"<0.1。计算方法表明,领域相关度与属于领域"的实例和属性个数成正比,与不属于领域"的实例和属性个数成反比,也即关联关系中属于领域"的实例和属性越多,其领域相关度越大。 Wherein, / e "g 7 Xi?) Denotes a length of the relationship, c / ^^ set in order to avoid the adjustment factor," size is set between 0 and 1, can be set, and generally 0 < "<0.1. calculation showed that the correlation is in the field of the art 'and is proportional to the number of instance attributes, do not belong to the field of" number of instance attributes and inversely, i.e. the relationship is in the field "and examples more attributes, the greater the degree of correlation its domain.

(2)计算语义关联长度A'。 (2) computing a semantic correlation length A '.

/e"g靖/e"g, (公式2) / E "g Yasushi / e" g, (Equation 2)

(公式l) (Formula l)

14公式(2)分两种情况,第一种计算方法表明语义关联路径长度越短, 则该路径越有价值,即语义关联长度^越大,第二种计算方法表达的含义完全相反,对语义关联较长的路径赋予较大的语义关联路径值。 14 Equation (2) two cases, the first calculation method showed that the shorter the path length of semantic association, the more valuable the path, i.e. the semantic ^ larger correlation length, the second calculation expression meaning the opposite of longer paths impart greater semantic association semantic value associated path. 用户可结合实际需求选择合适的语义关联长度计算公式。 The user can choose the right with the actual needs semantic correlation length is calculated. (3)计算关联关系频度&。 (3) calculate the frequency & relationship.

关联关系频度实际上是图中节点的相对出入度来计算得到的,因此, 首先要得到路径中每个节点的出度和入度,根据这个出度和入度来计算出节点的相对出入度。 Relationship relative frequency is actually out of the nodes in FIG be calculated, therefore, first to get out of and into the path of each node, based on this calculated relative out of and into the egress of degree. 同时还要从所有节点的相对出入度中选出最大的相对出入度。 But also selected the maximum relative out from the relative degree of access to all nodes. 将这些值带入下面公式进行计算。 These values ​​into the following formula.

首先,定义实例Q的绝对入度^'、绝对出度力Q、相对入度^'、相对出度^"。定义相对入度和相对出度,是出于对关联关系频度^标准化的考虑。 First, the definition of the absolute example of Q ^ ', the absolute power of Q, the relative degree ^', the relative degree ^. "Definition of the relative penetration and the relative degree is out of association normalized frequency ^ consider.

在RDF图中,假设有A个实体指向实例Q, Q指向p个其他的实例,则定 In the RDF graph Suppose A point entity instances Q, Q point p other instances, the fixed

义实例Q的绝对入度^': 4=A。 The sense of absolute example of Q ^ ': 4 = A.

定义实例Q的绝对出度:绝对出度为^40, = p 。 The definition of an absolute example of Q: the absolute degree of ^ 40, = p.

定义其相对入度W: 。 Defines the relative penetration of W:.

其中是实例"指向Q且分配给Q的入度,则: S =丄 Which is an example of "assigned to the point Q and Q is the degree, then: S = Shang

」'^^,其中,^A是节点)的绝对出度。 '' ^^, which, ^ A is the node) of the absolute degree.

因此,相对入度^'为: "="A Thus, the relative degree of ^ 'is: "=" A

相对出度定义为:' 」Q,其中JQ是实例Q的绝对出度。 The opposite is defined as: "" Q, wherein JQ is an example of Q degrees absolute. 在关联关系及中出现p个实例,在此基础上,我们可定义关联关系^的相对入度^"。 p instances appear in the relationship and in, on this basis, we can define the relative relationship of the degree ^ ^. "

仏=~~^^其中似^("是所有实体中的最大入度数。也即:_^_f * 相'。 Fo = ~ ^ ^ ^ wherein like ( "all entities i.e. the maximum degree into:. _ ^ _ F * phase".

关联关系^?的相对出度/?&=~~^^,也即: ? ^ Relationship of the relative degree / & = ~ ^^, that is?:

斷'="~^^t(l-丄)。 OFF '= "~ ^^ t (l- Shang).

关联关系的频率权值F"由关联关系中实体的出入度大小同时决定, Frequency relationship weights F "while the magnitude of the discrepancy is determined by the relationship of entities,

<formula>formula see original document page 16</formula> <Formula> formula see original document page 16 </ formula>

<formula>formula see original document page 16</formula> <Formula> formula see original document page 16 </ formula>

其中M"x(/)是所有实体中的最大入度数,对实体/,有^个实体指向/, 力^表示实体"的绝对出度,^"表示实体Q的绝对出度。 Wherein M "x (/) is the largest of all the entities degree, entity / entities have directed ^ /, ^ represents the force entity" absolute degrees out, "represents an absolute entity Q degrees.

在上述步骤4的具体排序操作中,可以单独以领域相关度、关联关系长度和关联关系频度之一作为排序的依据,也可以以他们的任意组合来作为排序的依据。 Specific sequencing operation in the step 4, the correlation may be used alone in the art, the length of one relationship and frequency relationship as the basis for sorting, may also be any combination of them as the basis for sorting. 作为优选的实施例,可以综合考虑三者来进行排序,具体可以通过加权求和的方式来计算关联关系的总权值,每个因素的加权系数可以根据用户的实际需要来灵活设置没,可以通过用户输入的方式或者系 As a preferred embodiment, three may be considered to sort, particularly total weight may be calculated by a weighted summation association manner, the weighting coefficient of each factor may not be the actual needs of users flexibly set, can be by way of user input or based

统预定义的方式来设置加权系数,具体如下: Conventional manner predefined weighting coefficients are set as follows:

<formula>formula see original document page 16</formula>(公式4) 其中,4+^+^=1,用户可根据应用中的实际需求对变量赋值。 <Formula> formula see original document page 16 </ formula> (Equation 4) where 4 = ^ + ^ + 1, the user can assign variable according to the actual needs of the application. A表 A table

示领域相关度大小,"表示语义关联长度大小,^表示关联关系频度大小。 根据计算所得的各关联关系的总权值,按从大到小的标准对所有关联 Size of the correlation shown art, "semantic correlation length indicates the size, ^ represents a frequency relationship size. The total weight of each of the resulting association calculation descending standards for all associated

关系搜索结果进行排序。 Relations sort search results. 实施例二 Second Embodiment

本发明的实施例还提供了一种关联关系搜索结果的排序装置,如图2 所示,包括:本体解析模块l,用于解析本体的各个实例的三元组信息,根据各个实例妁三元组信-息构建实例关联关系塌;关联关系搜索模块2,用于根据输入的所述本体中的任意两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间的所有关联关系的搜索结果信息;关联关系排序模块3,用于根据所述搜索结果信息,计算领域相关度和/或关联关系长度和/或关联关系频度;根据领域相关度或关联关系长度或关联关系频度、或者根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序。 Embodiments of the present invention also provides a sorting device search results relationship, shown in Figure 2, comprising: a body L parsing module, for parsing the triplet information of each instance of the body, according to various example matchmaker three yuan letter group - examples of information to build relationships collapse; relationship search module 2, for instance in accordance with any of the two bodies inputted, traversal path in the example of FIG association relationship between all instances of the two, generating all the associations between two instances of the search result information; association relation ordering module 3, according to the search result information, computing correlation and / or association length and / or frequency relationship; the related art or the length or the degree of association relationships frequency, or according to the related art degree, length relationship, any combination of the frequency relationship to sort the search result information. 在上述装置中还可以包括,本体加载模块4,用于向装置中加载本体。 In the above device may further include a body loading module 4 for loading the apparatus main body. 其中,加载的本体即可以是自己构建,也可以通过在互联网上搜索得到, 加载后本体后,就可以进行相应的解析操作了。 Wherein the loading body which can construct their own, may be obtained by searching on the Internet, after loading the body, can be carried out corresponding analytical operated. 实施例三如图3所示,其为本体结构示意图一,根据对本体的分析,确定了如下一些主要的类:银行账户(Account); 书籍(Book); 学生所选的课程(Course); 客户(Customer ); 某个组织的员工(Employee);学校的教师(Faculty):其包括子类:大学里的教授(Professor ), Professor又包括子类:是导师的教授(Adviser ); 飞机的航班(Flight); 某个组织的领导(Leader); 组织或机构(Organization); 航班的乘客(Passenger);付费方式(Payment—Type ) ; Payment—Type下有两个子类: 一个是信17用卡(Credit — Card),另一个是经常飞行的旅客(Frequent_F1 ier );学生(Student-),可以是本科生、-硕士研究生或博士等等;-Student 类有子类:研究生(Grad—Student ) , Grad—Student下有子类:教师助理(TA);搭乘航班的票据(Ticket );根据上面建好的类,就可以为每个类加入相应的实例。 The third embodiment shown in Figure 3, which is a schematic view of a body structure, according to the analysis of the body, identified some of the major categories as follows: a bank account (the Account); Books (Book); Student selected courses (Course,); customer (customer); employees of an organization (employee); school teachers (faculty): including sub-categories: university professor (Professor), Professor subclass also includes: a tutor Professor (Adviser); aircraft flight (flight); leader of an organization (leader); organizations or institutions (organization); passenger flights (passenger); payment (payment-type); there are two sub-categories under payment-type: is a letter by 17 card (Credit - card), the other passengers (Frequent_F1 ier) regular flight; students (Student-), may be undergraduate - graduate or doctoral etc; -Student class has subclasses: Graduate (Grad-student ), there is a subclass of the Grad-Student: teacher assistant (the TA); a flight ticket (ticket); built according to the above classes, each class that can be added to the instance. 本体中的部分实例如图4所示,其为本体结构示意图二,表示为类Customer和Account加入部分实例后的关系。 Examples of the body portion shown in Figure 4, which is a schematic view of two of the body structure, showing the relationship between the added part and Examples class Customer Account. rl, r4、 r5、 r6、 r8、 r9、 rlO分别为上述两个类的实例。 rl, r4, r5, r6, r8, r9, rlO as examples of the above two classes, respectively. 其中,ri、 r5、 r6、 r9是Customer的实例,r4、 r8、 rlO是Account 的实例。 Wherein, ri, r5, r6, r9 is the instance of Customer, r4, r8, rlO are examples of the Account. R1是R8和R10账户所有者,R1可以从R10取款,R1可以存款到R8账户;R5是一个组织,R1是R5股东,R5是R4账户所有者;R5组织领导是R6,同时R6也是R8账户所有者.R6是R1的导师;R9也是一个组织,R6是R9股东,同时也是R9的领导。 R1 is R8 and R10 each account holder, R1 can withdraw the money from R10, R1 to R8 can deposit accounts; R5 is an organization, R1 is R5 shareholders, R5 is R4 account owner; organizational leadership R5, R6, and R6 is R8 accounts simultaneously R1 is the owner .R6 mentor; R9 is an organization, R6 is R9 shareholders, but also the leadership of R9. 从关联关系图中可知,实例之间存在多种不同的关联关系,用户在进行搜索时,要输入要搜索的起始实例和终止实例,如图5所示的用户界面, 其输入的两个实例为rl和r9,对rl和r9之间的关联关系进行;险索,根据用户输入领域相关度、关联关系长度和关联关系频度的加权系数,结合上述实施例中的计算方法,计算获得rl和r6之间的关联关系的总权值,根据总权值的大小对搜索结果进行排序,将用户希望得到的关联关系优先返回。 Relationship seen from the figures, the presence of association between a plurality of different instances, a user conducting a search, to be inputted to search for the start and termination instances example, the user interface shown in Figure 5, the two input examples r9 and rl, rl of the relationship between the conduct and r9; insurance cable, according to the related art of the user input, the relationship between length and a weighting factor associated with frequency of association, in conjunction with the above-described calculation method of the embodiment, obtained by calculation the total weight of the relationship between the rl and r6, according to the size of the total weight for sorting the search results, the user desires to give priority to return association. 用户得到的排序后的搜索结果如图6所示。 Sorting the search results obtained after 6 shown in FIG. 实施例四下面针对另一个本体排序方法进行说明。 According to a fourth embodiment will be described further below body sorting method. 图7为本实施例的用户搜索界面,按照系统默认的排序标准搜索出两个实例的关联关系,比如孙小林和卢正鼎之间的关联关系。 Example 7 FIG user search interface such as the relationship between Sun Xiaolin and Lu Zhengding present embodiment, the association relationship between the two search system according to an example of default sorting criteria. 图8为搜索的结果。 Figure 8 is a result of the search. 对第一条结果做一个简单的说明。 The results of the first to do a simple explanation. 1.孙小林发表著作基于本体的多域访问控制策略集成研究作者文坤梅发表著作paperGQ5 作者李瑞轩发表著作在Java2环境中实现可插入的认证及访问控制作者宋伟指导教师卢正鼎 1. Sun Xiaolin body of published works based on multi-domain access control policy Integration of Wen Mei Kun Li Ruixuan published works paperGQ5 of books published in Java2 implementing pluggable authentication and access control environment of Song Wei Lu Zhengding instructor

Total Value (关联关系的总权值)—:0. 508706 Total Value (total weight of association) -: 0508706

其中孙小林是一个起始搜索实例,他发表了基于本体的多域访问控制策略集成研究这样一篇论文,而这篇论文还有另一个作者文坤梅,而文坤梅发表了另一篇论文paper005,后面一直按照这样一种关系推倒,知道与卢正鼎这样一个实例产生关联关系。 Sun Xiaolin which is a starting search instance, he published a study of multi-domain integration strategy based access control body of this paper, and the paper has another author Kun Wen Mei, and Mei-kun text published another paper paper005, has been back down in such a relationship, know that such an instance associate relationship with Lu Zhengding.

而在大多数情况下,系统默认的排序标准给出的排序结果并不能满足用户的需求,比如说用户希望找出孙小林和卢正鼎之间比较直接的关联关系,如果用户要得到这种直接的关联关系,可能要翻到后面几页,甚至几十页。 In most cases, the default sort results sort criteria given and can not meet the needs of users, such as the user wants to find a more direct relationship between Sun Xiaolin and Lu Zhengding, if you want to get this straight relationship, you may want to turn back a few pages, or even dozens of pages. 而本发明实施例可以让用户自己对排序标准进行设置,我们可以将Length weight (关联关系长度的权值)设为O. 7,而Context weight (领域相关度的权值)的权值设为O. 2、 Node In&Out weight (关联关系频度的权值)设为和O. 1,将Long Association Pr ior (长度关联优先)选项去掉。 While the embodiments of the invention may let the user set the sorting criteria, we can Length weight (weight relationship length) is set to O. 7, the Context weight (related to the weight of the art) a weight set O. 2, Node in & Out weight (weight relationships frequency value) is set, and O. 1, the Long association Pr ior (correlation length is preferred) option removed. 如图9所示: Figure 9:

图10为按照用户的设定的标准进行排序后返回的搜索结果。 FIG 10 is a return to the standard set are sorted in the user's search results.

从结果中可以看到,两个实例间的直接关联关系排在的第一个,这个关联关系表明孙小林的指导老师是卢正鼎,而且其它较短的关联也排在靠前的位置,这样就达到了用户的需求。 You can see from the results, the direct relationship between the two came in the first instance, this relationship indicates that the instructor is Sun Xiaolin Lu Zhengding, and other short association also ranked in the top positions, so to achieve the needs of users.

下面利用默认排序标准(去掉了长关联优先)对两个实例孙小林和文坤梅之间的关联关系进行搜索,搜索结果如图1 l所示。 Below using default ordering criteria (priority associated removed length) of association between the two instances Wen Sun Xiaolin plum gon search, the search results shown in FIG. 1 l.

第一条关联关系表明文坤梅和孙小林都是本体中的《多域访问控制策略集成研究》这篇论文的作者。 The first association shows that Kun Wen Mei and Sun Xiaolin body is author of "multi-domain access control policy Integration" of this paper. 但是用户可能对两个实例通过"教师"等某些概念联系起来的路径更感兴趣,那么用户就可以在设置排序标准时强调这些类(即设定用户感兴趣的领域),如下图12所示。 However, two users may be more interested in certain instances linked by the concept of "teacher" and other path, the user can highlight these classes (i.e., user setting areas of interest) setting ranking criteria when, as shown in FIG. 12 .

在排序标准中加入了用户感兴趣的类后的搜索结果如图13所示: Search Results After addition of the category of interest to the user in ordering criteria 13:

这次的排序结果与上一次的结果有所不同,前几条关联关系中就包括了"李瑞轩"和"卢正鼎"这样一些"教师"类的实例。 The results will be sorted and once the results are different, in front of several relationships including examples of "Li Ruixuan" and "Lu Zhengding," so some of the "teachers" category. 下面提供一些实际的评估测试数据:在实际应用中,因为表述语义关联的方式有多种,而且用户的排序标准也具有很强的主观性,很难有一个统一的标准能对此进行评估。 Provide realistic assessment data the following test: In practical applications, because of the way the expression of a variety of semantic association, and users sort criteria also highly subjective, difficult to have a unified standard can assess this. 五个用户对排序方法进行了测试,并对测试结果进行评估。 Five users sorting methods were tested, and the test results are evaluated. 给出不同的语义关联: 关系查询,随机选取,提供每个查询的排序标准。 Given different semantic association: relational query, randomly selected, to provide sorting criteria for each query. 同时还需提供各种类型的实例以及关联关系,以便用户能判断某个关联是否与其感兴趣的领域相关。 Examples also required to provide various types and relationships, so that the user can determine whether a related art of interest associated therewith. 然后用户根据其感兴趣的领域和排序标准对关联关系排序。 The user then sorted according to the association relationship between its field of interest and sorting criteria. 考虑到不同的用户在对结果排序时都有一定的主观性,因此所有用户的平均值可以看作一种参考。 Taking into account different user when sorting the results are somewhat subjective, and therefore the average of all users may be seen as a reference. 定制了五种比较具有代表性的排序组合。 A combination of five kinds of custom ordering more representative. 在每一个测试查询中,强调两种因素(即通过赋予它们较高的权值)。 In each test queries, we stressed two factors (that is, by giving them a higher weight value). 以图3的本体结构为基础,下表列出了搜索的排序标准和意义。 In the body structure based on FIG. 3, the table lists the search criteria and sorted significance. 表l序号 排序标准 意义1 查询"Passenger,,和"0rgan i za ti on,,的只于应的实例之间的关联关系,强调短的关联关系长度和含有运输类型的类(例如Ticket和Flight 等)的路径。 No. l in Table 1 ordering criteria significance of the association between the query "Passenger ,, and" only to be examples of 0rgan i za ti on ,,, emphasizing shorter length and category association comprising transport type (e.g., and Flight Ticket etc.) path. 为了测试排序方法抓住直接关联路径的能力和某些感兴趣的领域的关联关系。 To test the sorting method to seize the relationship field directly related to the ability to route and some interest. 2 查询两个"Cu st ome r"类的实例之间的关联关系。 2 two example of the association between the query "Cu st ome r" category. 强调长的关联关系长度和杏有组织的类(例如Organization等) 的^各径。 It emphasizes long association organized length and apricot (e.g. Organization like) ^ each path. 为了测试用户搜索出长路径和某些感兴趣的领域的关联关系。 To test a user searches for a relationship in the field of long paths and some interest. 3 查询两个"Customer"类型的实例之间的关联关系。 3 queries association between two instances of "Customer" type. 强调长的关联关系长度和包含重要节点的关联关系。 It emphasizes the relationship between the length of a long association relationship and contains important nodes. 测试系统搜索稀有关联和包含重要节点关联的能20力。 Search rare test system associated with associated power and energy 20 important node. 4 - -查询"Cu st ome r "类型和"A c count" 类型的实例之间的关联关系。 4 - - query the association between the "Cu st ome r" type and instance "A c count" type. 强调含有纟且织的类(例如Organization等)和包含重要节点(即关联关系频度高)的关联关系。 Association emphasized (e.g. Organization, etc.) containing Si and woven and contains important node (i.e., a higher frequency of association) of. -可以-用于语-义-分析系统,例如查出客户与账户之间的联系,用于反洗钱检测。 - can - for language - meaning - analysis system, for example, find out the connection between the customer and accounts for anti-money laundering detection. 5 查询两个"Customer"类型的实例之间的关联关系。 5 queries association between two instances of "Customer" type. 分别给关联关系长度、 关联关系频度和领域相关度选择不同的权值。 Respectively, to the relationship between the length of the associated relationship correlation frequency and select a different field weights. 根据用户的需求选择三个排序标准后的排序结果。 Sorting result of selecting the three ordering criteria according to user's needs. 为了证明排序方法的有效性,将数据以图表的方式整理出来,如图14 所示,它显示了系统排序结果(如图所示,包括查询l、查询2、查询3、 查询4和查询5)和人为排序结果(即理想的排序结果)的大致关系。 To demonstrate the effectiveness of the sorting process, the data is sorted out graphically in FIG. 14, which shows the results of sorting system (as shown, including L query, query 2, 3 query, the query and query 5 4 ) and approximate relationship human sort result (i.e., over the sorted result). 其中"理想排序"表示一种理想的情况,即系统排序结果和人为排序结果完全吻合。 Wherein "over Sorting" represents an ideal situation, i.e. system ranking result of human and sort results exactly. 从图中可以看到排序方法的测试效果还是比较理想的。 From the test results can be seen in FIG ordering method is quite satisfactory. 测试人员排序的结果与系统的排序结果比较接近,甚至有些排序结果与系统给出的结果是直接匹配的。 Sort results testers sort of result with the system close to, or even some sort results with the results of the system gives a direct match. 虽然只是一个有限和初步的评估测试,每个测试人员的排序标准之间还是存在一些分歧,但测试结果表明了排序方法的可行性,而且此方法拥有的灵活性足以满足多个用户的多种偏好,能够让他们获得满意的搜索结果。 Although only a limited and preliminary assessment test, there are some differences between the sort criteria for each person, but the test results show the feasibility of sequencing method, and this method has sufficient flexibility to meet a variety of multiple users preferences, allowing them to obtain a satisfactory search results. 最后应说明的是:以上实施例仅用以说明本发明的技术方案而非对其进行限制,尽管参照较佳实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对本发明的技术方案进行修改或者等同替换, 而这些修改或者等同替换亦不能使修改后的技术方案脱离本发明技术方案的津會神和范围。 Finally, it should be noted that: the above embodiments are intended to illustrate the present invention and not to limit it, although the invention has been described in detail, those of ordinary skill in the art should be understood with reference to preferred embodiments: it still It can be made to the embodiments of the present invention or modifications equivalents, and such modifications or equivalent replacements can not make the modified aspect from Tianjin Theological aspect of the present invention and scope thereof.

Claims (8)

1、一种关联关系搜索结果的排序方法,其特征在于,包括:解析本体的各个实例的三元组信息,根据各个实例的三元组信息构建实例关联关系图;根据输入的所述本体中的任意两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间的所有关联关系的搜索结果信息;根据所述搜索结果信息,计算领域相关度、关联关系长度或关联关系频度;根据领域相关度或关联关系长度或关联关系频度,或者根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序;其中,每个搜索结果信息的领域相关度DR通过如下公式计算:其中,R为所述每个搜索结果信息对应的关联关系:R={O1,P1,O2,P2,O3,……,On-1,Pn-1,On},其中n等于length(R);length(R)为该关联关系的路径长度;d为调整因子,0<d<1;Yi为 1 An association sorted search results, characterized by comprising: triplet information parsing each instance body, constructed in accordance with FIG example of association information of each triplet instances; in accordance with the input of the body any two instances, traversing the example in FIG association relationship of the two paths between all the examples, all search results are generated association relationship information between two instances; correlation information, computing based on the search result , association or length frequency relationship; to sort the search result information according to the related art or the degree of association or relationship longitudinal frequency, or length relationship, any combination of the relationship of the frequency according to the related art in ; wherein each search result relevancy information field DR is calculated by the following equation: wherein, R is the relationship of each search result information corresponding to: R = {O1, P1, O2, P2, O3, ......, On-1, Pn-1, On}, where n is equal to length (R); length (R) for the association relationship path length; d is an adjustment factor, 0 <d <1; Yi is 所述每个搜索结果信息对应的关联关系中,属于用户感兴趣的领域D的实例Oi和属性Pi的集合:Yi={Oi or Pi|(Oi∈R)∩(Pi∈R)∩(Oi∈D)∩(Pi∈D)}Ni为在所述每个搜索结果信息对应的关联关系中,不属于用户感兴趣的领域D的实例Oi和属性Pi的集合:每个搜索结果信息的关联关系长度LR通过如下公式计算:或每个搜索结果信息的关联关系频度FR通过如下公式计算:RSR为关联关系R的相对出度;RIR为关联关系R的相对入度。 Examples Oi Pi and set the properties of each search result information corresponding to the association, is in the field of interest to the user D: Yi = {Oi or Pi | (Oi∈R) ∩ (Pi∈R) ∩ (Oi ∈D) ∩ (Pi∈D examples set to Oi Ni-D field in said each search result information corresponding to the association, and users who are not interested in the property Pi)}: the search result information associated with each relationship between length LR is calculated by the following formula: or each search result information in association frequency FR is calculated by the following formula: the degree of the RSR relative relationship of R; RIRs penetration relative relationship of R.
2、 根据权利要求l所述的方法,其特征在于,根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序具体为:根据用户输入或系统预定义的所述领域相关度、关联关系长度、关联关系频度的加权系数,对所述领域相关度、关联关系长度、关联关系频度进行加权求和,得到综合相关度,根据综合相关度的大小来对所述搜索结果信息进行排序。 2. The method according to claim l, characterized in that, according to the related art degree, length relationship, relationship to the frequency of any combination of the particular sort search result information: a user input or according to a predefined system field of the correlation, the length relationship, association frequency weighting coefficients, the correlation field, a length relationship, association frequency weighted sum, to obtain the integrated correlation, according to the size of the integrated correlation to sort the search result information.
3、 根据权利要求1或2所述的方法,其特征在于,所述^^通过如下公式计算:其中,^"g^("为所述关联关系的路径长度; 及^为实例"的相对出度; 所述^q通过如下公式计算: i?o =i一_L其中,^q为实例q的绝对出度,zq.-p, p为所述关联关系中实例的个数。 3, the method according to claim 1 or 2, wherein said ^^ calculated by the following equation: where, ^ "^ G (" the length of the path for the association relationship; and ^ is an example of "opposite outdegree; ^ q by the following formula: i o = i a _L wherein, ^ q q is an example of an absolute, zq.-p, p is the number of instances in the association relationship?.
4、根据权利要求1或2所述的方法,其特征在于,所述^«通过如下公式计算:<formula>formula see original document page 4</formula>其中,^2g^(i?)为所述关联关系的路径长度;M"X(/)是所有实体中的最大相对入度;^为实例G的相对入度;/7为所述关联关系中实例的个数。 4. The method of claim 1 or claim 2, wherein the ^ «is calculated by the following equation: <formula> formula see original document page 4 </ formula> where, ^ 2g ^ (? I) of the path length of said associated relationship; M "X (/) is the maximum for all the relative entities; ^ is an example of the relative degree of G; / 7 is the number of instances in the association relationship.
5、 根据权利要求4所述的方法,其特征在于,所述^'通过如下公式计算:其中,^'是实例A指向实例Q且分配给Q的绝对入度,k为实例O,.的绝对入度。 5. The method as claimed in claim 4, wherein said ^ 'is calculated by the following formula: wherein, ^' example is an example of point A to Q, and Q is assigned the absolute degree, k is an example of O ,. absolute penetration.
6、 根据权利要求5所述的方法,其特征在于,所述&通过如下公式计算:<formula>formula see original document page 4</formula>其中,^^为实例A的绝对出度。 6. The method as claimed in claim 5, wherein said calculation by the following formula &: <formula> formula see original document page 4 </ formula> wherein ^^ is the absolute degree illustrating examples of A.
7、 一种关联关系搜索结果的排序装置,其特征在于,包括:本体解析模块,用于解析本体的各个实例的三元组信息,根据各个实例的三元组信息构建实例关联关系图;关联关系搜索模块,用于根据输入的所述本体中的任意两个实例,遍历实例关联关系图中所述两个实例间的所有关联关系的路径,生成两个实例间的所有关联关系的搜索结果信息;关联关系排序模块,用于根据所述搜索结果信息,计算领域相关度、 关联关系长度或关联关系频度;根据领域相关度或关联关系长度或关联关系频度,或者根据领域相关度、关联关系长度、关联关系频度的任意组合来对所述搜索结果信息进行排序;其中,每个搜索结果信息的领域相关度^通过如下公式计算:其中,及为所述每个搜索结果信息对应的关联关系: i?Mq,《"2,尸2,03.......其中n等于/e"g^(i?);/e^A(i?)为该 7, a search result sort means relationship, characterized by, comprising: a body parsing module, for each instance of the triplet information parsed body, constructed in accordance with FIG example of relationship information each instance of triplet; associated relationship Search results search module, for example in accordance with any of the two bodies inputted, graph traversal path associated with an instance of all the relationships between the two instances, generate all relationship instances between the two information; association relation ordering module, according to the search result information correlation computing, association or length frequency relationship; or longitudinal relationship or relationship according to the correlation frequency domain, or according to art affinity, length relationship, any combination of the frequency relationship to sort the search result information; wherein each field of search result information correlation ^ is calculated by the following formula: wherein, and the corresponding information for said each search results ? association relationship: i Mq, "" 2, wherein n is equal to dead ....... 2,03 / e "g ^ (i?); / e ^ a (i?) for 联关系的路径长度;J为调整因子,0"〈1;X为在所述每个搜索结果信息对应的关联关系中,属于用户感兴趣的领域"的实例Q和属性5的集合:K = {。 Path length associated relationship; J adjustment factor, 0; Examples of "<1 and X is in the field of each search result corresponding to the association information, interested users belonging to" Q and the set of attributes 5: K = {. ,. ei?) n(f ") n (O, e Z)) n (fe /))}^为在所述每个搜索结果信息对应的关联关系中,不属于用户感兴趣的领域"的实例Q和属性S的集合:M = {O, or S e •(《ei?) n (Q g D) n (《g £>)};每个搜索结果信息的关联关系长度、通过如下公式计算: 丄二1 Z =1__^_每个搜索结果信息的关联关系频度&通过如下公式计算:w 2 ;为关联关系及的相对出度;为关联关系*的相对入度。 ,. Ei?) N (f ") n (O, e Z)) n (fe /))} ^ each search result as the corresponding relationship information, the user does not belong to the area of ​​interest" in examples of Q and the set of attributes of S: M = {O, or S e • ( "? ei) n (Q g D) n (" g £>)}; relationship information length of each search result, by the following equation computing: Shang two 1 Z = 1 __ ^ _ each search frequency relationship information & calculation by the following formula: w 2; and the relationship of the relative degree; * relative to the association degree.
8、根据权利要求7所述的装置,其特征在于,还包括: 本体加载模块,用于向装置中加载本体。 8. The apparatus of claim 7, characterized in that, further comprising: a body loading means for loading the apparatus main body.
CN 200710163152 2007-10-10 2007-10-10 Method and apparatus for ordering incidence relation search result CN100524317C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710163152 CN100524317C (en) 2007-10-10 2007-10-10 Method and apparatus for ordering incidence relation search result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710163152 CN100524317C (en) 2007-10-10 2007-10-10 Method and apparatus for ordering incidence relation search result

Publications (2)

Publication Number Publication Date
CN101140588A true CN101140588A (en) 2008-03-12
CN100524317C true CN100524317C (en) 2009-08-05

Family

ID=39192540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710163152 CN100524317C (en) 2007-10-10 2007-10-10 Method and apparatus for ordering incidence relation search result

Country Status (1)

Country Link
CN (1) CN100524317C (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103615B (en) 2009-12-21 2014-03-26 北大方正集团有限公司 Three-segment sequential collecting method and system for retrieval results
CN102298591B (en) * 2010-06-28 2016-11-09 腾讯科技(深圳)有限公司 A relationship search method, apparatus and system for
JP5699744B2 (en) * 2011-03-30 2015-04-15 カシオ計算機株式会社 Search method, the search device, as well as, a computer program
CN102722503A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Method and device for sequencing search results
CN102508828A (en) * 2011-09-16 2012-06-20 浙江大学 Method for finding path relationship of graph based on multiple agent routes
US20130110830A1 (en) * 2011-10-31 2013-05-02 Microsoft Corporation Ranking of entity properties and relationships
CN102750375B (en) * 2012-06-21 2014-04-02 武汉大学 Service and tag recommendation method based on random walk
CN104376015A (en) * 2013-08-15 2015-02-25 腾讯科技(深圳)有限公司 Method and device for processing nodes in relational network
CN104731705B (en) * 2013-12-31 2017-09-01 北京理工大学 Based on a discovery dirty data channel complex network
CN104503978A (en) * 2014-11-26 2015-04-08 百度在线网络技术(北京)有限公司 Related entity recommending method and system
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN106294654B (en) * 2016-08-04 2018-01-19 首都师范大学 Method and system for ordering an ontological

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6480837B1 (en) 1999-12-16 2002-11-12 International Business Machines Corporation Method, system, and program for ordering search results using a popularity weighting
CN1489738A (en) 2001-01-26 2004-04-14 摩托罗拉公司 Storing data based on proximity
CN1659546A (en) 2001-03-19 2005-08-24 国际商业机器公司 Using continuous optimization for ordering categorical data sets in a data processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6480837B1 (en) 1999-12-16 2002-11-12 International Business Machines Corporation Method, system, and program for ordering search results using a popularity weighting
CN1489738A (en) 2001-01-26 2004-04-14 摩托罗拉公司 Storing data based on proximity
CN1659546A (en) 2001-03-19 2005-08-24 国际商业机器公司 Using continuous optimization for ordering categorical data sets in a data processing system

Also Published As

Publication number Publication date Type
CN101140588A (en) 2008-03-12 application

Similar Documents

Publication Publication Date Title
Marwick Knowledge management technology
Berners-Lee et al. Tabulator: Exploring and analyzing linked data on the semantic web
Jäschke et al. Tag recommendations in social bookmarking systems
Shen et al. Web service discovery based on behavior signatures
Xu et al. Exploring folksonomy for personalized search
Wang et al. Q2semantic: A lightweight keyword interface to semantic search
US7702685B2 (en) Querying social networks
Halpin et al. The complex dynamics of collaborative tagging
Liu et al. Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users’ future requests
Matsuo et al. POLYPHONET: an advanced social network extraction system from the web
US7065532B2 (en) System and method for evaluating information aggregates by visualizing associated categories
US20060155751A1 (en) System and method for document analysis, processing and information extraction
Luo et al. Building association link network for semantic link on web resources
Elmeleegy et al. Mashup advisor: A recommendation tool for mashup development
Hotho et al. BibSonomy: A social bookmark and publication sharing system
Sugumaran et al. Ontologies for conceptual modeling: their creation, use, and management
Mika Ontologies are us: A unified model of social networks and semantics
Wu et al. Harvesting social knowledge from folksonomies
US7103609B2 (en) System and method for analyzing usage patterns in information aggregates
Su et al. Semantic enrichment for ontology mapping
US7257569B2 (en) System and method for determining community overlap
US20120203734A1 (en) Automatic mapping of a location identifier pattern of an object to a semantic type using object metadata
Blake et al. A web service recommender system using enhanced syntactical matching
Wang et al. Recommendations based on semantically enriched museum collections
Wang et al. Ranking user's relevance to a topic through link analysis on web logs

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C41 Transfer of the right of patent application or the patent right
C14 Granted