CN104035917B - Map of knowledge management method and system based on semantic space mapping - Google Patents

Map of knowledge management method and system based on semantic space mapping Download PDF

Info

Publication number
CN104035917B
CN104035917B CN201410253673.8A CN201410253673A CN104035917B CN 104035917 B CN104035917 B CN 104035917B CN 201410253673 A CN201410253673 A CN 201410253673A CN 104035917 B CN104035917 B CN 104035917B
Authority
CN
China
Prior art keywords
semantic
vector
node
mapping
edge
Prior art date
Application number
CN201410253673.8A
Other languages
Chinese (zh)
Other versions
CN104035917A (en
Inventor
王晓平
肖仰华
汪卫
Original Assignee
复旦大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 复旦大学 filed Critical 复旦大学
Priority to CN201410253673.8A priority Critical patent/CN104035917B/en
Publication of CN104035917A publication Critical patent/CN104035917A/en
Application granted granted Critical
Publication of CN104035917B publication Critical patent/CN104035917B/en

Links

Abstract

本发明属于文本语义处理、语义网技术领域,具体为一种基于语义空间映射的知识图谱管理方法和系统。 The present invention belongs to a text semantic processing, semantic web technologies, and in particular to a knowledge-based method and system management profiles semantic space mapping. 本发明方法包括:语义向量构建、语义空间映射、知识图谱管理;知识图谱管理又包括三个分为:语义聚类、语义去重、语义标注。 The method of the present invention comprises: a semantic vector construction, semantic space mapping, mapping knowledge management; knowledge management map is divided into three further comprising: semantic cluster, semantic deduplication, semantic annotation. 对于知识图谱的边/结点,首先将描述其的文本单元向语义空间投影,并通过向量累积获得其在语义空间上的向量表示;在此基础上,实现知识图谱的多项管理任务;系统包括对应的语义向量构建、语义空间映射、知识图谱管理3个模块。 For the edge / node mapping knowledge, which will be described first text unit to the semantic space projection, which vector is obtained and accumulated in the semantic space represented by a vector; on this basis, to achieve a number of mapping knowledge management tasks; System corresponding semantic vector construct comprising, semantic space mapping, mapping knowledge management three modules. 本发明克服了传统知识图谱管理方法在进行语义比较时对词语变形、同义词变化、语法形式变化等因素敏感的缺点,并且向量累积的方式使其能轻松应对词语个数的不同,易于实现进一步的诸如语义聚类、语义去重、语义标注等知识图谱管理任务。 The present invention overcomes the traditional method of making a semantic mapping knowledge management comparison of deformation words, synonyms variations, changes and other factors syntax sensitive defects, and vector cumulative manner that it can easily cope with a different number of words, easy to implement further such as semantic clustering, semantic deduplication, semantic annotation maps and other knowledge management tasks.

Description

一种基于语义空间映射的知识图谱管理方法和系统 Map of knowledge management method and system based on semantic space mapping

技术领域 FIELD

[0001] 本发明属于文本语义处理、语义网技术领域,具体涉及一种基于语义空间映射的知识图谱管理方法和系统。 [0001] The present invention belongs to a text semantic processing, semantic web technologies, and in particular, to a method and system for mapping knowledge management based on semantic space mapping.

背景技术 Background technique

[0002] 构建知识图谱是大数据时代的一项重大工程,它能将杂乱的数据进行关联并整理成结构化的知识提供给用户,这一特征决定了它在许多领域都会有重要的应用,例如,目前的搜索引起都是基于关键字匹配进行搜索的,而当知识图谱建立起来后,在输入某个关键字后,就可以返回这个关键字的属性、类别、与其它实体的关系等关联信息,这样可以更准确、完善的为用户提供所需要的信息。 [0002] building knowledge map is a major project of the era of big data, it will be messy associate and organize data into a structured knowledge available to the user, this feature determines that it will be important in many fields of application, For example, the current search is cause to search based on keywords match, and when the mapping knowledge established, after entering a keyword, you can return the properties of this keyword, category relevance, relationships with other entities, etc. information, which can be more accurate and complete to provide users with the information they need. 知识图谱是实现语义搜索、机器自动问答、互联网广告推荐、个性化电子阅读等一系列应用的基石,而是否能有效地对知识图谱进行管理则将直接决定它在这些领域所发挥作用的大小。 Knowledge Mapping is to achieve semantic search, automated teller machines, Internet advertising is recommended, the cornerstone of personalized e-reader and a series of applications, and whether it can effectively manage the spectrum of knowledge will directly determine the role it played in these areas size.

[0003] 然而,目前的知识图谱构建中最终抽取出的是一种确定性的关系表示,而这种确定性描述在词语变形、同义词变化、语法形式变化等情况下的适应性不强,比如两条语义相似的边由于是以不同的词语进行描述,则会被看作是完全不同的两条边,这种处理方式不仅不合理,还会给知识图谱的管理如边/结点聚类、边/结点去重、边/结点标注等带来巨大困难,从而影响到知识图谱的有效应用。 [0003] However, current knowledge map construction is finally extracted a deterministic relationship representation, and this uncertainty description deformed words, in the case of adaptive synonymous variations, changes grammatical forms is not strong, such as Since the two sides of semantic similarity are described in different terms, it will be seen as two different sides, this approach is not only reasonable, but also to map the knowledge management as edge / cluster nodes , edge / node de-emphasis, edge / node labeling and other enormous difficulties, which affects the effective application of knowledge map.

发明内容 SUMMARY

[0004] 本发明针对当前知识图谱管理技术方法的不足,提出了一种基于语义空间映射的知识图谱管理方法和系统。 [0004] The present invention is directed to methods of lack of current knowledge and technology spectrum management proposed mapping knowledge management method and a system based on semantic space map.

[0005] 对于知识图谱的边/结点(即实体间关系/实体),首先将描述其的文本单元向语义空间投影并进行累积,从而获得该边/结点在语义空间上的向量表示;继而在文本语义向量化的基础上,可以进一步实现知识图谱的多项管理任务:可使用聚类方法并结合向量相似性度量来方便地进行边/结点的语义聚类,从而挖掘出语义相近的实体间关系/实体;可以在语义聚类的基础上,通过计算典型边/典型结点取代类集合来实现语义去重;可依据新加入的边/结点与已标注边/结点模型的语义距离实现关系/实体的自动标注等。 [0005] For the edge / node mapping knowledge (i.e., relationships between entities / Entity), will be described first text unit which is projected in the semantic space and accumulated, thereby obtaining the edge / node vectors point in the semantic space representation; in turn, the semantic text based on the quantization can be further implemented a number of mapping knowledge management tasks: clustering method may be used in combination with vector similarity measures to easily semantic cluster edge / node, similar to tap semantic the relationships between entities / entity; may be based on a semantic cluster, typical edge substituted by calculating / junction typical set of semantic class deduplication; may be added according to the new edge / edge node and the labeled / node model semantic relations from achieving / automatic labeling entity like.

[0006] 本发明提出的基于语义空间映射的知识图谱管理方法,具体步骤如下:语义向量构建、语义空间映射、知识图谱管理;其中: [0006] The present invention is proposed semantic mapping knowledge management based on spatial mapping, the following steps: constructing semantic vector, semantic space mapping, mapping knowledge management; wherein:

[0007] (1)语义向量构建的具体步骤如下: [0007] (1) Construction of semantic vector specific steps are as follows:

[0008] 是基于语料库构建语义向量库,使得文本单元映射到语义空间上的向量,其优点是文本单元之间的语义相似度可以根据对应向量在语义空间中的距离远近进行比较,语义接近的词,它们对应的语义向量在空间上的距离也会很近,这样就克服了词语间直接比较时受到的词语变形、同义词变化、语法形式变化的影响。 [0008] A Corpus-based semantic vector construct library, so that the text unit is mapped to the semantic vector space, the advantage that the semantic similarity between the text unit can be compared according to a distance corresponding to the distance vector in the semantic space, semantically close from words in their corresponding semantic vector space will close, thus overcoming the words between words in a direct comparison by deformation synonymous change, the impact of changes grammatical form.

[0009] 语义向量可通过多种方法计算获得,如Word2Vec方法、ESA(Explicit semantic analysis)方法、LSA(Latent semantic analysis)方法、共现词频率特征等等,优选地,采用Word2Vec方法(https://code.google.eom/p/word2vec/,同时参见文献[I,2,3])。 [0009] semantic vector can be obtained by calculation by various methods, such as methods Word2Vec, ESA (Explicit semantic analysis) method, LSA (Latent semantic analysis) method, co-existing word frequency characteristics, etc., preferably, using the method Word2Vec (https: //code.google.eom/p/word2vec/, at the same time see reference [I, 2,3]).

[0010] 构建语义向量的训练数据的选择原则是以大规模、百科类型的语料库来保证高覆盖率以及领域无关性,优选地,采用维基百科知识库(http: //www. wikipedia. org/)作为用W〇rd2Vec方法训练语义向量的语料库,并用训练结果构建语义向量库,以供其它模块在语义映射时使用。 [0010] Principle Construction on selected training data is based on large-scale semantic vector, encyclopedia type of corpus to ensure high coverage and field-independent, preferably using Wikipedia Knowledge Base (http:.. // www wikipedia org / ) semantic vector as a training corpus by W〇rd2Vec, and semantic vector library constructed with the training results for use by other modules when the semantic mapping.

[0011] (2)语义空间映射 [0011] (2) mapping the semantic space

[0012] 是将知识图谱中表示边结点的文本映射为语义空间中的向量,具体步骤如下: [0012] The knowledge is represented in the edge map node is mapped to the text in the semantic space vector, the following steps:

[0013] (2.1)对知识图谱中的边/结点(实体间关系/实体)中的词语进行过滤处理,去除其中无语义的停用词; [0013] (2.1) on the edge of mapping knowledge / nodes (relations between entities / Entity) words in filtration treatment, where no semantic remove stop words;

[0014] (2.2)对经上步操作处理后保留的每一个词语,从已经构建好的语义向量库中获取其在语义空间中的投影向量,然后将这些词语对应的语义向量进行累加,进而得到表征该边/结点的总体语义向量。 [0014] (2.2), obtaining after each word reserved by the operation processing step has already been built semantic vector from the library which projection vector in the semantic space, and then these words semantic vector corresponding to accumulate, and further generally been characterized semantic vector of the edge / node.

[0015] (3)知识图谱管理分为四个分步骤:语义聚类、语义去重、语义标注; [0015] (3) is divided into four sub-steps of mapping knowledge management: semantic cluster, semantic deduplication, semantic annotation;

[0016] (3.1)语义聚类,是在知识图谱构建基础上的进一步的语义挖掘,这对管理知识图谱十分重要,具体包括边聚类(关系聚类)和结点聚类(实体聚类)。 [0016] (3.1) semantic cluster is further constructed on the basis of the tap semantic mapping knowledge, it is very important for the management of knowledge profiles, including edge clusters (clustering relationships) and a node cluster (cluster entity ). 对于边聚类,既可以对连接不同结点对的边进行聚类,发现有着相似语义关系的实体对,也可以对一个结点的多条边进行聚类,挖掘出该结点的主要相关实体分类,甚至可以对连接同一对结点的多条边进行聚类,挖掘出它们间的主要关系分类;对于结点聚类,则可以发现语义相近的实体。 For clustering side, may be connected to cluster nodes on different sides, have similar semantic relationship found entity, may be clustered a plurality of edge nodes, primarily related to dig out the node entity classes, even for the same plurality of edges connecting nodes clustering excavated major classification of their relationship; for node cluster, it can be found in similar semantic entity.

[0017] 语义聚类的具体步骤如下: [0017] The semantic cluster specific steps are as follows:

[0018] 对待聚类的边/结点集合,首先基于构建好的语义向量库进行语义空间映射,然后进一步地对获得的这些语义向量进行聚类。 [0018] treat cluster / node set of edges, first semantic mapping based on the constructed semantic space vector library, and then further on those vectors obtained semantic cluster. 聚类方法可采用多种方法如层次聚类方法、 Kmeans方法等,优选地,采用层次聚类方法。 Clustering methods may be employed various methods such as hierarchical clustering, Kmeans method, preferably using the hierarchical clustering method. 相似性度量可采用多种度量如Co sine、 Ci tyblock、Euc IidearuMahalanobi s、Minkowski、Chebychev 等,优选地,米用Cosine相似度。 Similarity measure using a variety of metrics such as Co sine, Ci tyblock, Euc IidearuMahalanobi s, Minkowski, Chebychev the like, preferably with m Cosine similarity.

[0019] [0019]

Figure CN104035917BD00051

[0020] 其中,X和y分别为待比较的两个向量,Sim为计算得到的Cosine相似度结果。 [0020] wherein, X and y are two vectors to be compared, Sim Cosine similarity of the calculated results.

[0021] (3.2)语义去重 [0021] (3.2) semantics deduplication

[0022] 基于大数据构建的知识图谱普遍存在着这种情况:许多不同的边/结点尽管具体的表示形式(描述关系/实体的文本)不一,但其所表示的语义内容却是非常接近甚至是一致的,这将会导致知识图谱在规模增长的同时也伴随着冗余信息量的增加。 [0022] Based on a large map data to build knowledge this situation prevails: Many different side / junction Although specific representation (description of the relationship / text entity) varies, but it represents the semantic content is very even closer is the same, this will lead to knowledge map at the same scale of growth is also accompanied by an increase in the amount of redundant information. 从数据清洗角度出发,如果对这些边/结点进行统一表示、实现语义去重(边去重、结点去重),将会在减少语义边/结点的数量(即关系/实体的数量)的同时实现知识图谱的精简表示。 From the perspective of data cleaning, if the edges of the unified / nodes, said semantic de-emphasis (de-emphasis side, to node weight), the number will reduce the number of edge semantics / nodes (i.e., relationship / entity achieve knowledge map), while streamlining the representation.

[0023] 语义去重的具体步骤如下: [0023] DETAILED step to weight the following semantics:

[0024] 对于语义聚类的结果,对被聚在同一类中的边/结点集合,通过计算典型边/典型结点取代原先的类集合元素来降低语义信息的冗余性,其选取依据是: [0024] The results for the semantic cluster, to be gathered in the same edge class / set of nodes, typically by calculating the edge node Typical substituted / original class set elements to reduce the redundancy of semantic information, which is selected according to Yes:

[0025] [0025]

Figure CN104035917BD00052

[0026] 这里,1是待合并集合中对应第i个关系/实体的语义向量,V是待合并集合中所有关系/实体的累积语义向量,Sim (a,b)表不向量a和向量b的相似度,这里的相似性度量可米用多种度量如Cosine、Cityblock、Euc Iidean、Mahalanobi s、Minkowski、Chebychev 等,优选地,采用Cosine相似度。 [0026] Here, 1 is to be combined in the collection corresponding to the semantic vector i-th relation / entity, V is the cumulative semantic vector be combined set of all the relationships / entities, Sim (a, b) table does not vector a and vector b similarity, similarity measure used herein may be a variety of metrics such as rice Cosine, Cityblock, Euc Iidean, Mahalanobi s, Minkowski, Chebychev the like, preferably using Cosine similarity.

[0027] 通过用计算选取典型边/典型结点来进行关系/实体去重,将在有效降低知识图谱的存储空间、实现知识图谱精简表示的同时又不失去代表性。 [0027] The calculation is performed by selecting the typical relationship by side / Typical node / solid weight to the representation without losing while effectively reducing the storage space mapping knowledge, to achieve streamlined mapping knowledge representation.

[0028] (3.3)语义标注 [0028] (3.3) Semantic Annotation

[0029] 通过比较输入边/结点与已知边/结点模型的语义相似度,判断其所对应的模型, 然后为其贴上预先定义的已知类型范围内的相应标签,其好处是便于知识图谱中边/结点的统一表示和管理。 [0029] By comparing the input side / semantic similarity with known junction edge / node model, determining its corresponding model, and then pasted in advance respective tags within a defined range of known type, its benefits are facilitate unified representation and management of knowledge graph edges / nodes. 语义标注具体步骤如下: Semantic annotation following steps:

[0030] (3.3.1)边/结点模型构建: [0030] (3.3.1) edge / node model building:

[0031] 对于聚类后的边/结点,基于其对应的语义向量集合构建边/结点模型(也即关系/ 实体模型),模型的构建可使用多种方法如均值向量模型、高斯模型、人工神经网络、支持向量机等,优选地,使用均值向量模型;同时,手工为每一类关系/实体标定出其对应的类型标签。 [0031] For the side of the cluster / node based on its corresponding set of semantic vector construct edges / models node (i.e. the relationship / solid model), may be used to build a variety of methods, such as the model mean vector model, Gaussian model , artificial neural networks, support vector machine, preferably, the mean vector model; while, for each class relationships manual / calibrate its corresponding entity type label.

[0032] [0032]

Figure CN104035917BD00061

[0033] 其中,妬,j表示第i类中第J冷向量,i为该类中的样本个数,:面为均值向量。 [0033] wherein, jealous, j denotes the i-th vector of the class of J cold, i is the number of samples in the class,: surface is mean vector.

[0034] 在模型构建完成后,即将其添加进边/结点模型库。 [0034] After the model building is completed, which is about to add edge / node model library.

[0035] (3.3.2)边/结点识别 [0035] (3.3.2) edge / node identification

[0036] 对于待查询的边/结点,在按语义空间映射模块所述步骤获得其语义向量表征后, 将该向量与关系模型库中的边/结点模型依次进行比较,例如:对均值向量模型、高斯模型, 可直接比较向量间相似度或者是计算输入向量属于模型的概率值,遍历后取最高值对应的类别作为输出;对人工神经网络、支持向量机,则是直接输出对应的类别。 [0036] For the edge / node to be queried, the semantic vector which is obtained by the characterizing step of the semantic space mapping module, the edge / node model vector relational model database are sequentially compared, for example: the mean vector model, Gaussian model, direct comparison between the input vector similarity calculation or probability vector values ​​belonging to the model, taking the highest value after traversing as an output corresponding to the category; artificial neural networks, support vector machines, the corresponding output is a direct category.

[0037] 以均值向量模型为例,输出的类别CJass为: [0037] In the mean vector model as an example, as the output category CJass:

[0038] [0038]

Figure CN104035917BD00062

[0039] V为待识别的语义向量,为对应i类边/结点的均值向量,ie {1,2,…,N},N为边/ [0039] V to be recognized as a semantic vector, the mean vector of class i as corresponding edge / node, ie {1,2, ..., N}, N is a side /

Figure CN104035917BD00063

结点模型库中的模型数目,Sim (a,b)表示向量a和向量b的相似度,这里的相似性度量可采用多种度量如Cosine、Cityblock、Euc Iidean、Mahalanobi s、Minkowski、Chebychev 等,优选地,采用Cosine相似度。 The number of models in the model library node, Sim (a, b) represents the similarity vector a and vector b, where a similarity measure using a variety of metrics such as Cosine, Cityblock, Euc Iidean, Mahalanobi s, Minkowski, Chebychev, etc. , preferably, a Cosine similarity.

[0040] (3 · 3 · 3)边/结点语义标注 [0040] (3 * 3 * 3) edge / semantic annotation node

[0041] 对于上一步骤中输出的类别,从边/结点模型库中取出预先标注的相应类型标签赋给输入的边/结点,从而完成了语义标注过程。 [0041] For category output in the previous step, fetches the corresponding tag type from a pre-marked side / node assigned to the model library edge input / nodes, thereby completing the semantic annotation process.

[0042] 本发明还提供对应于上述方法的基于语义空间映射的知识图谱管理系统。 [0042] The present invention further provides the above method corresponding to the spatial mapping based on semantic knowledge management system map. 系统由三大模块组成:语义向量构建模块、语义空间映射模块、知识图谱管理模块。 The system consists of three modules: the semantic vector building blocks, semantic space mapping module mapping knowledge management module. 其中,知识图谱管理模块又包括三个子模块:语义聚类子模块、语义去重子模块、语义标注子模块。 Among them, mapping knowledge management module also includes three sub-modules: semantic cluster sub-module, semantic go heavy sub-module, semantic annotation module.

[0043] 具体内容如下: [0043] Details are as follows:

[0044] (1)语义向量构建模块: [0044] (1) Construction of vector semantic module:

[0045] 本模块的作用是基于语料库构建语义向量库,使得文本单元映射到语义空间上的向量,其优点是文本单元之间的语义相似度可以根据对应向量在语义空间中的距离远近进行比较,语义接近的词,它们对应的语义向量在空间上的距离也会很近,这样就克服了词语间直接比较时受到的词语变形、同义词变化、语法形式变化的影响。 [0045] The role of this module is constructed based on semantic vector library corpus, so that the text is mapped to a unit vector in the semantic space, the advantage that the semantic similarity between the text unit can be compared according to a distance corresponding to the distance in the vector space semantics semantic proximity of the words, their corresponding semantic vector space distance will close, thus overcoming the words in a direct comparison between the words when subjected to deformation, synonymous change, the impact of changes grammatical form.

[0046] 构建语义向量的训练数据的选择原则是以大规模、百科类型的语料库来保证高覆盖率以及领域无关性,优选地,采用维基百科知识库(http: //www. wikipedia. org/)作为用W〇rd2Vec方法训练语义向量的语料库,并用训练结果构建语义向量库,以供其它模块在语义映射时使用。 [0046] Principle Construction on selected training data is based on large-scale semantic vector, encyclopedia type of corpus to ensure high coverage and field-independent, preferably using Wikipedia Knowledge Base (http:.. // www wikipedia org / ) semantic vector as a training corpus by W〇rd2Vec, and semantic vector library constructed with the training results for use by other modules when the semantic mapping.

[0047] (2)语义空间映射模块,具体内容如下: [0047] (2) the semantic space mapping module, as follows:

[0048] 本模块是将知识图谱中表示边结点的文本映射为语义空间中的向量: [0048] This module is the knowledge graph representing the edge node mapped to the text in the semantic space vector:

[0049] (2.1)对知识图谱中的边/结点(实体间关系/实体)中的词语进行过滤处理,去除其中无语义的停用词; [0049] (2.1) on the edge of mapping knowledge / nodes (relations between entities / Entity) words in filtration treatment, where no semantic remove stop words;

[0050] (2.2)对经上步操作处理后保留的每一个词语,从已经构建好的语义向量库中获取其在语义空间中的投影向量,然后将这些词语对应的语义向量进行累加,进而得到表征该边/结点的总体语义向量。 [0050] (2.2), obtaining after each word reserved by the operation processing step has already been built semantic vector from the library which projection vector in the semantic space, and then these words semantic vector corresponding to accumulate, and further generally been characterized semantic vector of the edge / node.

[0051] (3)知识图谱管理模块,具体内容如下: [0051] (3) mapping knowledge management module, as follows:

[0052] 该模块负责完成知识图谱的管理,它又包括三个子模块:语义聚类子模块、语义去重子模块、语义标注子模块。 [0052] This module is responsible for completing the mapping knowledge management, it consists of three sub-modules: semantic cluster sub-module, semantic go heavy sub-module, semantic annotation module. 分别对应于知识图谱管理步骤中的3个分步骤; Respectively correspond to the mapping knowledge management step 3 substep;

[0053] (3.1)语义聚类子模块 [0053] (3.1) semantic cluster sub-module

[0054] 语义聚类是在知识图谱构建基础上的进一步的语义挖掘,这对管理知识图谱十分重要,具体包括边聚类(关系聚类)和结点聚类(实体聚类)。 [0054] semantic cluster is built semantic knowledge map further excavation on the base, which is important for the management of knowledge profiles, including edge clusters (clustering relationships) and a node cluster (cluster entities). 对于边聚类,既可以对连接不同结点对的边进行聚类,发现有着相似语义关系的实体对,也可以对一个结点的多条边进行聚类,挖掘出该结点的主要相关实体分类,甚至可以对连接同一对结点的多条边进行聚类, 挖掘出它们间的主要关系分类;对于结点聚类,则可以发现语义相近的实体; For clustering side, may be connected to cluster nodes on different sides, have similar semantic relationship found entity, may be clustered a plurality of edge nodes, primarily related to dig out the node entity classes, may be clustered or even more edges connected to the same node, the relationship between the main excavated classification thereof; for the node cluster, it can be found close to the semantic entity;

[0055] (3.2)语义去重子模块 [0055] (3.2) to the weight of semantic sub-module

[0056] 基于大数据构建的知识图谱普遍存在着这种情况:许多不同的边/结点尽管具体的表示形式(描述关系/实体的文本)不一,但其所表示的语义内容却是非常接近甚至是一致的,这将会导致知识图谱在规模增长的同时也伴随着冗余信息量的增加。 [0056] Based on a large map data to build knowledge this situation prevails: Many different side / junction Although specific representation (description of the relationship / text entity) varies, but it represents the semantic content is very even closer is the same, this will lead to knowledge map at the same scale of growth is also accompanied by an increase in the amount of redundant information. 从数据清洗角度出发,如果对这些边/结点进行统一表示、实现语义去重(边去重、结点去重),将会在减少语义边/结点的数量(即关系/实体的数量)的同时实现知识图谱的精简表示。 From the perspective of data cleaning, if the edges of the unified / nodes, said semantic de-emphasis (de-emphasis side, to node weight), the number will reduce the number of edge semantics / nodes (i.e., relationship / entity achieve knowledge map), while streamlining the representation.

[0057] 语义去重的具体内容如下: [0057] In particular semantic content to weight as follows:

[0058] 对于语义聚类的结果,对被聚在同一类中的边/结点集合,通过计算典型边/典型结点取代原先的类集合元素来降低语义信息的冗余性,其选取依据是: [0058] The results for the semantic cluster, to be gathered in the same edge class / set of nodes, typically by calculating the edge node Typical substituted / original class set elements to reduce the redundancy of semantic information, which is selected according to Yes:

[0059] [0059]

Figure CN104035917BD00071

[0060] 这里,%是待合并集合中对应第i个关系/实体的语义向量,V是待合并集合中所有关系/实体的累积语义向量,Sim (a,b)表不向量a和向量b的相似度,这里的相似性度量可米用多种度量如Cosine、Cityblock、Euc Iidean、Mahalanobi s、Minkowski、Chebychev 等,优选地,采用Cosine相似度; [0060] Here,% is to be merged set of corresponding semantic vector i-th relation / entity, V is the cumulative semantic vector be combined set of all the relationships / entities, Sim (a, b) table does not vector a and vector b similarity, similarity measure used herein may be a variety of metrics such as rice Cosine, Cityblock, Euc Iidean, Mahalanobi s, Minkowski, Chebychev the like, preferably using Cosine similarity;

[0061] 通过用计算选取典型边/典型结点来进行关系/实体去重,将在有效降低知识图谱的存储空间、实现知识图谱精简表示的同时又不失去代表性; [0061] Typical side is performed by selecting / nodes typically calculated using the relationship / solid weight to the representation without losing while effectively reducing the storage space mapping knowledge, to achieve streamlined mapping knowledge representation;

[0062] (3.3)语义标注子模块 [0062] (3.3) semantic labeling submodule

[0063] 该模块通过比较输入边/结点与已知边/结点模型的语义相似度,判断其所对应的模型,然后为其贴上预先定义的已知类型范围内的相应标签,其好处是便于知识图谱中边/ 结点的统一表示和管理。 [0063] The module compares the input side / edge nodes with known / semantic similarity node model, determining its corresponding model, and its corresponding label affixed within a predefined range of known type, which benefit is to facilitate knowledge graph edge / node unified representation and management. 该子模块具体内容如下: The sub-module is as follows:

[0064] (3.3.1)边/结点模型构建: [0064] (3.3.1) edge / node model building:

[0065] 对于聚类后的边/结点,基于其对应的语义向量集合构建边/结点模型(也即关系/ 实体模型),模型的构建可使用多种方法如均值向量模型、高斯模型、人工神经网络、支持向量机等,优选地,使用均值向量模型;同时,手工为每一类关系/实体标定出其对应的类型标签。 [0065] For the side of the cluster / node based on its corresponding set of semantic vector construct edges / models node (i.e. the relationship / solid model), may be used to build a variety of methods, such as the model mean vector model, Gaussian model , artificial neural networks, support vector machine, preferably, the mean vector model; while, for each class relationships manual / calibrate its corresponding entity type label.

[0066] [0066]

Figure CN104035917BD00081

[0067] 其中,表示第i类中第J'个向量,m为该类中的样本个数,薇为均值向量。 [0067] wherein, i represents the class of J 'vectors, m is the number of samples in the class, it is the mean vector Wei.

[0068] 在模型构建完成后,即将其添加进边/结点模型库。 [0068] After the model building is completed, which is about to add edge / node model library.

[0069] (3.3.2)边/结点识别 [0069] (3.3.2) edge / node identification

[0070] 对于待查询的边/结点,在按语义空间映射模块所述步骤获得其语义向量表征后, 将该向量与关系模型库中的边/结点模型依次进行比较,例如:对均值向量模型、高斯模型, 可直接比较向量间相似度或者是计算输入向量属于模型的概率值,遍历后取最高值对应的类别作为输出;对人工神经网络、支持向量机,则是直接输出对应的类别。 [0070] For the edge / node to be queried, the semantic vector which is obtained by the characterizing step of the semantic space mapping module, the edge / node model vector relational model database are sequentially compared, for example: the mean vector model, Gaussian model, direct comparison between the input vector similarity calculation or probability vector values ​​belonging to the model, taking the highest value after traversing as an output corresponding to the category; artificial neural networks, support vector machines, the corresponding output is a direct category.

[0071] 以均值向量模型为例,输出的类别CJass为: [0071] In the mean vector model as an example, as the output category CJass:

[0072] [0072]

Figure CN104035917BD00082

[0073] V为待识别的语义向量, [0073] V to be recognized as a semantic vector,

Figure CN104035917BD00083

为对应i类边/结点的均值向量,ie {1,2,…,N},N为边/ 结点模型库中的模型数目,Sim (a,b)表示向量a和向量b的相似度,这里的相似性度量可采用多种度量如Cosine、Cityblock、Euc Iidean、Mahalanobi s、Minkowski、Chebychev 等,优选地,采用Cosine相似度。 I class corresponding edge / node mean vectors, ie {1,2, ..., N}, N is the number of sides of the model / model library node, Sim (a, b) represents the vector a and vector b are similar degrees, where the similarity measure using a variety of metrics such as Cosine, Cityblock, Euc Iidean, Mahalanobi s, Minkowski, Chebychev the like, preferably using Cosine similarity.

[0074] (3 · 3 · 3)边/结点语义标注 [0074] (3 * 3 * 3) edge / semantic annotation node

[0075] 对于上一步骤中输出的类别,从边/结点模型库中取出预先标注的相应类型标签赋给输入的边/结点,从而完成了语义标注过程。 [0075] For category output in the previous step, fetches the corresponding tag type from a pre-marked side / node assigned to the model library edge input / nodes, thereby completing the semantic annotation process.

[0076] 本发明的有益效果 [0076] Advantageous effects of the present invention.

[0077] 本发明通过将表示知识图谱边/结点的文本映射为语义向量,克服了传统知识图谱管理方法在进行语义比较时对词语变形、同义词变化、语法形式变化等因素敏感的缺点, 并且向量累积的方式使其能轻松应对词语个数的不同,易于实现进一步的知识图谱管理任务如语义聚类、语义去重、语义标注,在增强处理灵活性的同时,也提高了语义比较的准确性。 [0077] The present invention is represented by the edge mapping knowledge / text node semantic vector mapped to overcome the traditional method of making a semantic mapping knowledge management comparison of deformation words, synonyms changes, variations and other sensitive grammatical factors disadvantages and vector cumulative manner so that it can easily cope with the number of different words, easy to implement further mapping knowledge management tasks such as semantic clustering, semantic deduplication, semantic annotation, while enhancing the flexibility of the process, but also improve the accuracy of semantic comparison sex.

附图说明 BRIEF DESCRIPTION

[0078] 图1:系统模块图。 [0078] Figure 1: a block diagram of the system.

[0079] 图2:层次聚类结果图(边聚类)。 [0079] Figure 2: results of hierarchical clustering graph (edge ​​clusters). 横坐标为实体对的序号,纵坐标为类间距离。 Abscissa is the distance between the class of the entity number, the ordinate.

[0080] 图3:层次聚类结果图(结点聚类)。 [0080] FIG. 3: FIG results of hierarchical clustering (node ​​clusters). 横坐标为实体的序号,纵坐标为类间距离。 The abscissa is the number of entities, the ordinate is the distance between classes.

[0081] 图4:语义去重-典型边选取。 [0081] FIG. 4: Semantic deduplication - typically selected edge. 横坐标为实体的序号,纵坐标为相似度。 The abscissa is the number of entities, the ordinate as a similarity.

具体实施方式 Detailed ways

[0082] 以下用实例来演示本发明的具体实施方式,系统各模块依次进行处理如下: [0082] The following examples used to demonstrate specific embodiments of the present invention, the system modules for processing sequence as follows:

[0083] (1)语义向量构建 [0083] (1) Construction of semantic vector

[0084] 基于整个英文维基库(http://www.wikipedia.org/)的文本语料,使用Word2Vec 进行训练,训练输出的向量维度为500维。 [0084] Based on the entire corpus of English text Wiki Library (http://www.wikipedia.org/) using Word2Vec training, training output vector dimension is 500-dimensional.

[0085] (2)语义空间映射 [0085] (2) mapping the semantic space

[0086] 对于边/结点上的词语,在去除停用词后,从训练好的语义向量库中取出对应的语义向量,然后再进行向量累加,从而得到该边/结点的语义向量表征。 [0086] For the words on the side / nodes, after removing stop words, taken from a corresponding semantic vector trained semantic vector library, and then performs vector accumulated to obtain semantic vector characterizing the edge / node .

[0087] (3)语义聚类 [0087] (3) semantic cluster

[0088] (3.1)边语义聚类 [0088] (3.1) side semantic cluster

[0089] 输入例子,格式为: [0089] Examples of the input format:

[0090] 序号:{结点1},{边},{:结点2} [0090] ID: Node {1}, {side}, {:} node 2

[0091] 1: {Shanghai} , {large city} , {China} [0091] 1: {Shanghai}, {large city}, {China}

[0092] 2: {ipad} , {product} , {Apple} [0092] 2: {ipad}, {product}, {Apple}

[0093] 3: {Barack Obama} , {president} , {USA} [0093] 3: {Barack Obama}, {president}, {USA}

[0094] 4: {Kindle} , {manufacture} , {Amazon} [0094] 4: {Kindle}, {manufacture}, {Amazon}

[0095] 5: {New York} , {metropolis} , {USA} [0095] 5: {New York}, {metropolis}, {USA}

[0096] 6: {Dmitry Medvedev} , {Prime Minister} , {Russia} [0096] 6: {Dmitry Medvedev}, {Prime Minister}, {Russia}

[0097] 层次聚类结果图(边聚类)如图2所示。 [0097] FIG results of hierarchical clustering (clustering side) as shown in FIG.

[0098] 取阈值为0.8,聚类结果如下: [0098] thresholding value is 0.8, the clustering results are as follows:

[0099] 第一类:2、4 [0099] The first category: 2,4

[0100] 第二类:1、5 [0100] The second category: 1,5

[0101] 第三类:3、6 [0101] The third category: 3,6

[0102] 聚类结果正确; [0102] clustering result is correct;

[0103] (3.2)结点语义聚类 [0103] (3.2) semantic cluster nodes

[0104] 输入6个结点: [0104] Input Node 6:

[0105] 1 : {tuna} [0105] 1: {tuna}

[0106] 2: {tiger} [0106] 2: {tiger}

[0107] 3: {leopard} [0107] 3: {leopard}

[0108] 4: {car} [0108] 4: {car}

[0109] 5: {fish} [0109] 5: {fish}

[0110] 6: {train} [0110] 6: {train}

[0111] 层次聚类结果图(结点聚类)见图3所示。 [0111] FIG results of hierarchical clustering (node ​​cluster) shown in Figure 3.

[0112] 取阈值为0.8,聚类结果如下: [0112] thresholding value is 0.8, the clustering results are as follows:

[0113] 第一类:1、5 [0113] The first category: 1,5

[0114] 第二类:2、3 [0114] The second category: 2,3

[0115] 第三类:4、6 [0115] The third category: 4,6

[0116] 聚类结果正确。 [0116] clustering results correctly.

[0117] (4)语义去重 [0117] (4) Semantic deduplication

[0118] 例如,知识图谱中的两个结点:{Bill Gates}、{Microsoft},它们间如下的边在语义聚类后被聚在同一类中: [0118] For example, knowledge graph of two nodes: {Bill Gates}, {Microsoft}, between the edges thereof in the following semantic cluster after polymerization in the same class:

[0119] 1 : {CEO} [0119] 1: {CEO}

[0120] 2: {executives} [0120] 2: {executives}

[0121] 3: {president} [0121] 3: {president}

[0122] 4: {chief executive officer} [0122] 4: {chief executive officer}

[0123] 5: {current chairman} [0123] 5: {current chairman}

[0124] 6: {chairman} [0124] 6: {chairman}

[0125] 7 : {chair} [0125] 7: {chair}

[0126] 语义去重-典型边选取,见图4所示。 [0126] Semantic deduplication - selecting edges Typically, as shown in Figure 4.

[0127] 将所有这些边的语义向量累加后得到总体语义表征向量,然后依次计算各条边与该总体语义表征向量的相似度,并选取相似度最大的为典型边,序号为6,即{chairman},这样,仅用1条典型边就取代了原先被聚成同一类的7条边,达到了知识图谱精简表示、减少存储空间且不失代表性的目的。 [0127] After all of these sides semantic vector obtained cumulative overall semantic characterization vector, followed by calculating the similarity to the respective sides of the overall semantic characterization vector, and selecting the greatest similarity as a typical edge, number 6, i.e., { chairman}, so that, only one side will typically replace the original is clustered into the same class, edge 7, to map the compact representation of knowledge, to reduce the storage space and yet representative object.

[0128] (5)语义标注 [0128] (5) Semantic Annotation

[0129] 例如,对于完成聚类的一类关系的边集合: [0129] For example, for a class relationships side complete set of clusters:

[0130] 1: {large city} [0130] 1: {large city}

[0131] 2: {metropolis} [0131] 2: {metropolis}

[0132] 3: {megacity} [0132] 3: {megacity}

[0133] 4: {major city} [0133] 4: {major city}

[0134] 5: {big cities} [0134] 5: {big cities}

[0135] 6: {megacities} [0135] 6: {megacities}

[0136] 7 : {mega cities} [0136] 7: {mega cities}

[0137] 根据其对应的语义向量集合构建均值向量模型,并标定模型的类型标签为'metropolitan area''〇 [0137] The set of semantic vector construct corresponding mean vector model, and the model is calibrated for the type of label 'metropolitan area''〇

[0138] 对于新输入的一条边{big city},计算其对应的语义向量与边模型的相似度, [0138] For a new input side {big city}, calculate the corresponding edge vector and semantic similarity model,

[0139] Sim = 0.8434 [0139] Sim = 0.8434

[0140] 取阈值为0.8,则认为输入边与该类边表示的语义相同,因而将模型类型标签“metropolitan area”赋给输入边,从而完成语义标注过程,其好处是通过比较输入边与边模型的相似程度,为输入边贴上预先定义的已知类型范围内的标签,便于知识图谱中边的统一表不和管理。 [0140] thresholding value is 0.8, it is considered the same as the input side edges represent semantic class, and thus the model type label "metropolitan area" is assigned to the input side, thereby completing the semantic annotation process, which benefits by comparing the input side and side similarity of the model, the input side label within a predefined range of known type, to facilitate unified table mapping knowledge management and not the edges.

[0141] 参考文献 [0141] Reference

[0142] [1] Tomas Mikolovj eta I . Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLRj 2013. [0142] [1] Tomas Mikolovj eta I. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLRj 2013.

[0143] [2] Tomas Mikolovj et al. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. [0143] [2] Tomas Mikolovj et al. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.

[0144] [3] Tomas Mikolov, et al. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013〇 [0144] [3] Tomas Mikolov, et al. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013〇

Claims (3)

1. 一种基于语义空间映射的知识图谱管理方法,其特征在于具体步骤分为:语义向量构建、语义空间映射、知识图谱管理;其中: (1) 语义向量构建的具体步骤如下: 基于语料库构建语义向量库,使得文本单元映射到语义空间上的向量; 构建语义向量的训练数据采用维基百科知识库作为用Word2Vec方法训练语义向量的语料库,并用训练结果构建语义向量库; (2) 语义空间映射将知识图谱中表示边结点的文本映射为语义空间中的向量,具体步骤如下: (2.1) 对知识图谱中的边/结点中的词语进行过滤处理,去除其中无语义的停用词; (2.2) 对经步骤(2.1)操作处理后保留的每一个词语,从已经构建好的语义向量库中获取其在语义空间中的投影向量,然后将这些词语对应的语义向量进行累加,进而得到表征该边/结点的总体语义向量; (3) 知识图谱管理分为三个分步 A semantic mapping knowledge management based on spatial mapping, which is divided into specific steps wherein: semantic vector construct, semantic space mapping, mapping knowledge management; wherein: specific steps (1) semantic vector constructed as follows: Construction of Corpus-based semantic vector library, such that the text unit is mapped to a vector in the semantic space; constructing semantic vector of the training data using Wikipedia repository as training semantic vector by Word2Vec corpus, and to construct the semantic vector library with training results; (2) the semantic space map the knowledge graph representing text edge node mapped to the semantic space vector, the following steps: (2.1) on the side of the words in the knowledge Graph / nodes in the filtration treatment, where no semantic remove stop words; (2.2) each term by the operation process of the step (2.1) of reservations, which acquires projection vector in the semantic space has already been built semantic vector from the library, and then these words corresponding semantic vector accumulates in turn give Characterization of the general semantic vector edge / node; and (3) mapping knowledge management is divided into three sub-steps :语义聚类、语义去重、语义标注; (3.1) 语义聚类的具体步骤如下: 对待聚类的边/结点集合,首先基于构建好的语义向量库进行语义空间映射,然后进一步地对获得的这些语义向量进行聚类; (3.2) 语义去重的具体步骤如下: 对于语义聚类的结果,对被聚在同一类中的边/结点集合,通过计算典型边/典型结点取代原先的类集合元素来降低语义信息的冗余性,其选取依据是: : Semantic cluster, semantic deduplication, semantic annotation; DETAILED step (3.1) semantic cluster as follows: to treat cluster edge / node set, first space mapping based on the constructed semantic semantic vector library, and then further on these vectors obtained semantic cluster; DETAILED step (3.2) to the weight of the following semantics: semantic cluster of the results, were clustered in the same class of the edge / node set, typically by calculating a substituted side / typical node the original class set elements to reduce the redundancy of semantic information, which is selected based on:
Figure CN104035917BC00021
公式的含义是选取使函数取最大值时所对应的k作为Typical ,Typical是指选取的典型边或者典型结点; 这里,Vk是待合并集合中对应第k个关系/实体的语义向量,V是待合并集合中所有关系/实体的累积语义向量,Sim (a,b)表示向量a和向量b的相似度; (3.3) 语义标注具体步骤如下: (3.3.1) 边/结点模型构建: 对于聚类后的边/结点,基于其对应的语义向量集合构建边/结点模型; 同时,手工为每一类关系/实体标定出其对应的类型标签; The meaning of the formula is selected so that when the function corresponding to the maximum value as k Typical, Typical selection refers to typical or representative node side; here, Vk is set to be merged corresponding to the k-th relationship / entities semantic vector, V the combined cumulative semantic vector is set to be all the relationships / entities, Sim (a, b) represents the similarity vector a and vector b; (3.3) semantic annotation steps are as follows: (3.3.1) Construction of the model edge node / : for the cluster edge / node based on its corresponding set of semantic vector construct edges / models node; the same time, for each type of relationship manual / calibrate its corresponding entity type label;
Figure CN104035917BC00022
議^为对应i类边/结点的均值向量,ie {1,2,…,N},N为边/结点模型库中的模型数目; 其中,表示第i类中第个向量,&为该类中的样本个数,癒为均值向量; 在模型构建完成后,即将边/结点模型添加进边/结点模型库; (3.3.2) 边/结点识别对于待查询的边/结点,在按语义空间映射所述步骤获得边/结点语义向量表征后,将该向量与关系模型库中的边/结点模型依次进行比较,其中,对均值向量模型、高斯模型,可直接比较向量间相似度或者是计算输入向量属于模型的概率值,遍历后取最高值对应的类别作为输出;对人工神经网络、支持向量机,则是直接输出对应的类别; (3.3.3)边/结点语义标注对于步骤(3.3.2)中输出的类别,从边/结点模型库中取出预先标注的相应类型标签赋给输入的边/结点,从而完成了语义标注过程。 Discussion ^ i class mean vector corresponding edge / node, ie {1,2, ..., N}, N is the number of sides of the model / model library node; wherein i represents the class of vectors, & amp ; is the number of samples in the class, the more is the mean vector; model building after the completion, i.e. the edge / edge node added to the model / model library node; (3.3.2) edge / node to be queried for identification after the edge / node in the semantic space by mapping the edge of the step / point junction characterized semantic vector obtained, the edge / node model library vector and the relational model are compared sequentially, wherein the mean vector model, Gaussian model , can be directly compared between vector similarity calculating probability values ​​or input vector belongs model, take the highest value corresponding to the category as an output after traversing; artificial neural networks, support vector machines, the category corresponding to the direct output; (3.3. 3) edge / semantic annotation nodes for step (3.3.2) in the output category extraction side / node previously marked label assigned to the respective type of input from the edge / node model database, thereby completing the semantic annotation process .
2. 根据权利要求1所述的基于语义空间映射的知识图谱管理方法,其特征在于步骤(3.3.2)中,对于均值向量模型时,输出的类别为: 2. The method of claim semantic mapping knowledge management spatial mapping based on 1 wherein said step (3.3.2), a time model for the mean vector, output class is:
Figure CN104035917BC00031
公式的含义是选取使函数 Meaning that the selected function formula is
Figure CN104035917BC00032
I取最大值时所对应的HtSClass; V为待识别的语义向量,Sim (a,b)表示向量a和向量b的相似度。 When the maximum value I corresponding HtSClass; V to be recognized as a semantic vector, Sim (a, b) represents the similarity vector a and vector b.
3. 基于权利要求1所述方法的基于语义空间映射的知识图谱管理系统,其特征在于有下述三大模块组成:语义向量构建模块用于执行步骤(1)、语义空间映射模块用于执行步骤(2)、知识图谱管理模块用于执行步骤(3),其中:知识图谱管理模块,包括三个子模块:语义聚类子模块用于执行步骤(3.1)、语义去重子模块用于执行步骤(3.2)、语义标注子模块用于执行步骤(3.3)。 3. The method of claim 1, based on the mapping knowledge management system based on semantic space mapping, wherein the composition has the following three modules: a module for performing semantic vector constructed in step (1), for performing a semantic space mapping module step (2), mapping knowledge management module for performing step (3), wherein: the mapping knowledge management module includes three sub-modules: semantic cluster sub-module for performing step (3.1), to a weight of semantic sub-module for performing the steps (3.2), semantic annotation sub-module for performing step (3.3).
CN201410253673.8A 2014-06-10 2014-06-10 Map of knowledge management method and system based on semantic space mapping CN104035917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410253673.8A CN104035917B (en) 2014-06-10 2014-06-10 Map of knowledge management method and system based on semantic space mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410253673.8A CN104035917B (en) 2014-06-10 2014-06-10 Map of knowledge management method and system based on semantic space mapping

Publications (2)

Publication Number Publication Date
CN104035917A CN104035917A (en) 2014-09-10
CN104035917B true CN104035917B (en) 2017-07-07

Family

ID=51466688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410253673.8A CN104035917B (en) 2014-06-10 2014-06-10 Map of knowledge management method and system based on semantic space mapping

Country Status (1)

Country Link
CN (1) CN104035917B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462506A (en) * 2014-12-19 2015-03-25 北京奇虎科技有限公司 Method and device for establishing knowledge graph based on user annotation information
CN104794163B (en) * 2015-03-25 2018-07-13 中国人民大学 Entity sets extended method
CN104866593B (en) * 2015-05-29 2018-05-22 中国电子科技集团公司第二十八研究所 A kind of database search method of knowledge based collection of illustrative plates
CN105550190B (en) * 2015-06-26 2019-03-29 许昌学院 Cross-media retrieval system towards knowledge mapping
CN105335519A (en) * 2015-11-18 2016-02-17 百度在线网络技术(北京)有限公司 Model generation method and device as well as recommendation method and device
CN105740329B (en) * 2016-01-21 2019-04-05 浙江万里学院 A kind of contents semantic method for digging of unstructured high amount of traffic
CN105808931B (en) * 2016-03-03 2019-05-07 北京大学深圳研究生院 A kind of the acupuncture decision support method and device of knowledge based map
CN105787105B (en) * 2016-03-21 2019-04-19 浙江大学 A kind of Chinese encyclopaedic knowledge map classification system construction method based on iterative model
CN105824802B (en) * 2016-03-31 2018-10-30 清华大学 It is a kind of to obtain the method and device that knowledge mapping vectorization indicates
CN105956052A (en) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 Building method of knowledge map based on vertical field
CN106446148B (en) * 2016-09-21 2019-08-09 中国运载火箭技术研究院 A kind of text duplicate checking method based on cluster
CN106776564A (en) * 2016-12-21 2017-05-31 张永成 Semantic recognition method and system based on knowledge graph
CN106874378A (en) * 2017-01-05 2017-06-20 北京工商大学 Method for constructing knowledge graph based on entity extraction and relationship mining of rule model
CN107038261B (en) * 2017-05-28 2019-09-20 海南大学 A kind of processing framework resource based on data map, Information Atlas and knowledge mapping can Dynamic and Abstract Semantic Modeling Method
CN107103100B (en) * 2017-06-10 2019-07-30 海南大学 A kind of fault-tolerant intelligent semantic searching method based on map framework
CN110059271A (en) * 2019-06-19 2019-07-26 达而观信息科技(上海)有限公司 With the searching method and device of label knowledge network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079072A (en) * 2007-06-22 2007-11-28 中国科学院研究生院 Text clustering element study method and device
CN102646113A (en) * 2012-02-17 2012-08-22 清华大学 Method for semantic relativity between measurement concepts based on Wikipedia

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DK200301926A (en) * 2003-12-23 2005-06-24 Eskebaek Thomas Knowledge Operating system with ontologibaserede methods for extracting information and sogen for knowledge

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079072A (en) * 2007-06-22 2007-11-28 中国科学院研究生院 Text clustering element study method and device
CN102646113A (en) * 2012-02-17 2012-08-22 清华大学 Method for semantic relativity between measurement concepts based on Wikipedia

Also Published As

Publication number Publication date
CN104035917A (en) 2014-09-10

Similar Documents

Publication Publication Date Title
Botha et al. Compositional morphology for word representations and language modelling
CN101449271B (en) Annotate search
Augenstein et al. Lodifier: Generating linked data from unstructured text
TW201222291A (en) Method and device for providing text segmentation results with multiple granularity levels
Sun et al. Modeling mention, context and entity with neural networks for entity disambiguation
CN101510221A (en) Enquiry statement analytical method and system for information retrieval
EP2469421A1 (en) Method and apparatus for processing electronic data
CN104615589A (en) Named-entity recognition model training method and named-entity recognition method and device
US9613024B1 (en) System and methods for creating datasets representing words and objects
CN104834747A (en) Short text classification method based on convolution neutral network
JP5936698B2 (en) Word semantic relation extraction device
CN101901235A (en) Method and system for document processing
US20130326325A1 (en) Annotating Entities Using Cross-Document Signals
Vu et al. An experiment in integrating sentiment features for tech stock prediction in twitter
CN103679462A (en) Comment data processing method and device and searching method and system
US20120243789A1 (en) Fast image classification by vocabulary tree based image retrieval
Ren et al. Label noise reduction in entity typing by heterogeneous partial-label embedding
CN104376406A (en) Enterprise innovation resource management and analysis system and method based on big data
Devika et al. Sentiment analysis: a comparative study on different approaches
Gao et al. Stability analysis of learning algorithms for ontology similarity computation
CN102903008A (en) Method and system for computer question answering
Nagy et al. Multiagent ontology mapping framework for the semantic web
CN107257970A (en) The problem of being carried out from structuring and unstructured data sources answer
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01