WO2023087463A1 - 一种知识库补全方法、终端设备以及计算机存储介质 - Google Patents

一种知识库补全方法、终端设备以及计算机存储介质 Download PDF

Info

Publication number
WO2023087463A1
WO2023087463A1 PCT/CN2021/138475 CN2021138475W WO2023087463A1 WO 2023087463 A1 WO2023087463 A1 WO 2023087463A1 CN 2021138475 W CN2021138475 W CN 2021138475W WO 2023087463 A1 WO2023087463 A1 WO 2023087463A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge base
entities
entity
class
completion method
Prior art date
Application number
PCT/CN2021/138475
Other languages
English (en)
French (fr)
Inventor
杨之乐
郭媛君
王猛
吴承科
王尧
冯伟
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2023087463A1 publication Critical patent/WO2023087463A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Definitions

  • the present application relates to the technical field of natural language processing, in particular to a knowledge base completion method, a terminal device and a computer storage medium.
  • Constraint management refers to factors such as materials, equipment, labor, and permits that restrict the smooth progress of construction projects. From all aspects, doing a good job of constraint management can effectively improve the efficiency of construction projects.
  • the present application provides a knowledge base completion method, a terminal device and a computer storage medium.
  • the present application provides a method for completing a knowledge base, and the method for completing a knowledge base includes:
  • the information of entities is iteratively restored in the knowledge representation to form a complete knowledge base.
  • mapping relationship between entities and ontology classes in the project data after the establishment of the mapping relationship between entities and ontology classes in the project data, it also includes:
  • the attribute class includes numerical class and Boolean class.
  • mapping relationship between entities and ontology classes in the project data after the establishment of the mapping relationship between entities and ontology classes in the project data, it also includes:
  • s represents the similarity between the entity in the new project data and the existing ontology class, represents the entity vector in the newly added item data, A vector representing the existing ontology class.
  • the iterative recovery of entity information in the knowledge representation includes:
  • neighborhood node information to iterate the target node to restore the information of the entity corresponding to the target node.
  • the knowledge base completion method also includes:
  • the neighborhood node information is obtained.
  • the knowledge base completion method also includes:
  • the triplets whose feature scores are higher than or equal to the preset threshold are retained, and the triplets whose feature scores are lower than the preset threshold are eliminated.
  • the relationship between the entities includes constraint relationship, task relationship and/or attribute relationship.
  • the present application also provides a terminal device, where the terminal device includes a memory and a processor, wherein the memory is coupled to the processor;
  • the memory is used for storing program data
  • the processor is used for executing the program data to realize the above knowledge base completion method.
  • the present application also provides a computer storage medium, the computer storage medium is used for storing program data, and when the program data is executed by a processor, it is used to implement the above knowledge base completion method.
  • the terminal device obtains project data; establishes the mapping relationship between entities and ontology classes in project data; constructs knowledge representation based on the mapping relationship between entities and ontology classes; iteratively restores entity information in knowledge representation to form a complete knowledge base.
  • the knowledge base completion method of the present application restores the lost entity information, completes the project knowledge base, and improves the management efficiency of project data.
  • Fig. 1 is a schematic flow chart of an embodiment of the knowledge base completion method provided by the present application
  • FIG. 2 is a schematic flowchart of another embodiment of the knowledge base completion method provided by the present application.
  • FIG. 3 is a schematic flowchart of another embodiment of the knowledge base completion method provided by the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a terminal device provided by the present application.
  • FIG. 5 is a schematic structural diagram of another embodiment of a terminal device provided by the present application.
  • Fig. 6 is a schematic structural diagram of an embodiment of a computer storage medium provided by the present application.
  • the knowledge base completion method provided in this application can be applied to the field of construction management, that is, project data of construction management is managed through a knowledge base. In other embodiments, it can also be applied to project management in other fields, such as aviation data management, transportation data management, etc. In the following embodiments, the knowledge base completion method of the present application will be described in detail using project data of construction management.
  • This application proposes a knowledge base based on graph deep learning, graph deep learning
  • the framework has strong self-learning ability, generalization ability and robustness. In view of the fact that construction projects are gradually becoming more complex, people need to achieve a unified and efficient management of materials, equipment, and tasks on the construction site. This method can effectively improve people's management efficiency of construction projects.
  • FIG. 1 is a schematic flowchart of an embodiment of a knowledge base completion method provided by the present application.
  • the knowledge base completion method of the embodiment of the present application is suitable for knowledge bases that are incomplete or lack certain features.
  • a knowledge base completion model based on graph deep learning is used to introduce domain knowledge and ontology rules to enrich data sources and improve models.
  • Accuracy first encode the relationship between entities and entities by introducing a graph neural network in the word embedding stage, and then introduce a decoder based on a graph convolutional neural network to identify the missing entity-relationship-entity triplet information.
  • the knowledge base completion method of this application introduces two types of graph neural networks and other auxiliary information to build models, which improves the accuracy of the model on related data sets and effectively shortens the time for constructing the knowledge base in the construction field.
  • the construction field knowledge base has good self-learning ability and generalization ability, and has good scalability.
  • the knowledge base completion method of the present application is applied to a terminal device, wherein the terminal device of the present application may be a server, or may be a system in which the server and the terminal device cooperate with each other.
  • the terminal device of the present application may be a server, or may be a system in which the server and the terminal device cooperate with each other.
  • various parts included in the terminal device such as various units, subunits, modules, and submodules, may all be set in the server, or may be set in the server and the terminal device separately.
  • the above server may be hardware or software.
  • the server When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server When the server is software, it can be implemented as multiple software or software modules, such as software or software modules used to provide a distributed server, or as a single software or software module, which is not specifically limited here.
  • the knowledge base completion method in the embodiment of the present application may be implemented in a manner in which a processor invokes computer-readable instructions stored in a memory.
  • the knowledge base completion method in the embodiment of the present application specifically includes the following steps:
  • Step S11 Obtain project data.
  • the project data may be specific data of an engineering project in the construction field, such as floor height, indoor area, stair position, and the like.
  • Step S12 Establish the mapping relationship between entities and ontology classes in the project data.
  • the terminal device maps entities in project data to domain classes, such as ontology classes.
  • domain classes such as ontology classes.
  • class mapping is to establish a relationship between entities and ontology classes.
  • the initial ontology in the knowledge base only has three categories: constraints, tasks, and project participants.
  • the embodiment of this application can also introduce attribute classes into the ontology.
  • attribute data is relatively sparse, adding all the attribute data to the model will damage the performance of the knowledge base completion model.
  • the embodiment of the present application only introduces two subclasses of the attribute class, namely the numerical class and the Boolean class.
  • the class mapping of attribute data is very simple. For example, regular expressions are used to directly extract attributes containing numeric values such as dates, and true and false characters. These attributes are mapped to the numeric class and Boolean class respectively.
  • the class mapping of other entities relies on semantic similarity, and the terminal device first needs to extract word embedding representations of entity and ontology classes to form a vector space model (VSM, Vector space model). Then, the terminal device creates a list for each class ontology, which includes the class name and its synonyms obtained in the engineering dictionary. For the relationship between entities, the terminal device can search the representation of each name in the VSM list, and finally judge the semantic relationship between two entities by calculating the cosine similarity between the entity and the corresponding vector of the class.
  • VSM Vector space model
  • the knowledge base completion method in the embodiment of the present application specifically includes the following steps:
  • Step S21 Obtain newly added item data.
  • the newly added item data is the item data of the name of the entity or class that cannot be found in the VSM.
  • the terminal device divides the names of these project data into several parts through marking, and finds the embedding of each part in the VSM, and finally averages these embeddings to get the final result. details as follows:
  • Step S22 Calculate the similarities between the entities in the newly added item data and all existing ontology classes.
  • the terminal device needs to calculate the similarity between each existing ontology class in the current knowledge base and the entity in the newly added project data.
  • the specific similarity calculation formula is as follows:
  • s represents the similarity between the entity in the new project data and the existing ontology class, represents the entity vector in the newly added item data, A vector representing the existing ontology class.
  • Step S23 Add the entity in the newly added item data to the existing ontology class with the highest similarity.
  • the terminal device compares the similarity between each existing ontology class and the entity in the newly added item data, and adds the entity to the existing ontology class with the highest similarity.
  • Step S13 Construct knowledge representation based on the mapping relationship between entities and ontology classes.
  • rule-based data enrichment first expresses all the rules with basic relations, and then finds rule bodies that can meet the conditions for building triples, and these rule bodies can be used to build new triples Semantics of groups and relations are enriched, and newly inferred triples are also fed into the semantic enrichment process.
  • the terminal device can construct an encoder based on the knowledge base representation of the graph neural network, and embed entities and relationships into the model.
  • the detailed process is as follows:
  • the terminal device defines a knowledge representation.
  • the goal of knowledge representation model is to classify potential triples to predict missing triples.
  • Step S14 Iteratively restore entity information in knowledge representation to form a complete knowledge base.
  • the terminal device adopts the neighborhood node sampling based on the attention mechanism.
  • the attention mechanism is used to calculate the importance of all neighboring nodes of each target node, and then adopt it, which can effectively improve the accuracy of the knowledge representation model.
  • the terminal device adopts multi-head information aggregation, and the embedding of nodes after each iteration is calculated by summing the embeddings of all triplets of the node set weighted according to the attention value. Furthermore, multiple attention values are used to stabilize the encoding process and collect more neighborhood information. The embodiment of the present application calculates by averaging multiple attention results to combine information and save computing power.
  • the terminal device collects information through the graph neural network
  • the information in the original embedding of the nodes may be lost.
  • This loss has a negative impact on knowledge representation models, especially when the original embeddings are not randomly initialized. Therefore, the end device can transform the original embedding of each node by a matrix, and then add to the entity after the last iteration to recover the information of this entity.
  • the specific formula is as follows:
  • W0 represents the transformation matrix, Indicates the neighborhood node information, Indicates the target node.
  • FIG. 3 is a schematic flowchart of another embodiment of the knowledge base completion method provided by the present application.
  • the knowledge base completion method in the embodiment of the present application specifically includes the following steps:
  • Step S31 Construct a triple based on the entities in the project data and the relationships between the entities.
  • the relationship between entities used to construct triples includes but not limited to the following relationships: constraint relationship, task relationship and attribute relationship.
  • Step S32 Calculate feature scores of triplets.
  • the decoder ConvKB uses a two-dimensional convolutional neural network filter to scan triplets to extract features embedded in triplets.
  • the decoder ConvKB presents the extracted features in the form of scores, and this feature score reflects the possibility that the triplet is valid.
  • the specific formula of feature score is as follows:
  • Concat represents the splicing function
  • Relu represents the activation function, which is used to extract the vector features of triplets
  • represents the convolution kernel
  • L represents the number of convolution kernels
  • i-th entity represents the jth entity
  • K represents the neighborhood order
  • v represents the transpose of the result vector of the splicing function.
  • Step S33 retain the triplets whose feature scores are higher than or equal to the preset threshold, and remove the triplets whose feature scores are lower than the preset threshold.
  • the terminal device retains triplets whose characteristic scores are higher than or equal to the preset threshold as valid triplets, and regards triplets whose characteristic scores are lower than the preset threshold as invalid triplets Elimination is carried out to complete the construction of knowledge representation.
  • This application provides a knowledge base completion model based on graph deep learning, which includes the following steps: adding two subclasses of the attribute class, the data class and the Boolean class, to the model to enrich the data in the model. Then establish a belonging relationship between entities and ontology classes, map them to VSM, and judge their semantic relationship through the cosine similarity between vectors. Define knowledge representations to predict missing triples by classifying latent triples. Then the nodes in the knowledge representation are sampled based on the attention mechanism and then embedded in the model. In the process, the information update matrix is used to save the original information in order to prevent the loss of the original information of the nodes. Two-dimensional convolution is performed on the obtained triplets to extract features, and the features are quantized in the form of scores to reflect the effectiveness of the triplets.
  • the terminal device acquires project data; establishes the mapping relationship between entities and ontology classes in project data; builds knowledge representation based on the mapping relationship between entities and ontology classes; iteratively restores entity information in knowledge representation to form a complete knowledge base .
  • the knowledge base completion method of the present application can automatically identify the missing triples by restoring the lost entity information, better support the downstream management function of the constraint management method based on the software package, and complete the project knowledge base. Improve the management efficiency of project data.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • FIG. 4 is a schematic structural diagram of an embodiment of the terminal device provided in the present application.
  • the terminal device 400 of the embodiment of the present application includes an acquisition module 41, a mapping module 42, a construction module 43, and a completion module 44; wherein,
  • the acquiring module 41 is configured to acquire project data.
  • the mapping module 42 is configured to establish a mapping relationship between entities and ontology classes in the project data.
  • the construction module 43 is configured to construct knowledge representation based on the mapping relationship between the entities and ontology classes.
  • the completion module 44 is configured to iteratively restore entity information in the knowledge representation to form a complete knowledge base.
  • FIG. 5 is a schematic structural diagram of another embodiment of the terminal device provided in the present application.
  • the terminal device 500 in this embodiment of the present application includes a memory 51 and a processor 52, where the memory 51 and the processor 52 are coupled.
  • the memory 51 is used to store program data
  • the processor 52 is used to execute the program data to implement the knowledge base completion method described in the above-mentioned embodiments.
  • the processor 52 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 52 may be an integrated circuit chip with signal processing capabilities.
  • the processor 52 can also be a general-purpose processor, a digital signal processor (DSP, Digital Signal Process), an application specific integrated circuit (ASIC, Application Specific Integrated Circuit), a field programmable gate array (FPGA, Field Programmable Gate Array) or other possible Program logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Program logic devices discrete gate or transistor logic devices, discrete hardware components.
  • the general purpose processor may be a microprocessor or the processor 52 may be any conventional processor or the like.
  • the present application also provides a computer storage medium.
  • the computer storage medium 600 is used to store program data 61.
  • the program data 61 is executed by the processor, it is used to implement the knowledge base complement described in the above-mentioned embodiments. full method.
  • the present application also provides a computer program product, wherein the computer program product includes a computer program, and the computer program is operable to cause a computer to execute the knowledge base completion method as described in the embodiment of the present application.
  • the computer program product may be a software installation package.
  • the knowledge base completion method described in the above embodiments of the present application may be stored in a device, such as a computer-readable storage medium, when implemented in the form of a software function unit and sold or used as an independent product.
  • a device such as a computer-readable storage medium
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) or a processor (processor) execute all or part of the steps of the methods described in various embodiments of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种知识库补全方法、终端设备以及计算机存储介质,涉及自然语言处理技术领域。该知识库补全方法包括:获取项目数据(S11);建立项目数据中实体与本体类的映射关系(S12);基于实体与本体类的映射关系构建知识表示(S13);在知识表示中迭代恢复实体的信息,形成完整的知识库(S14)。该知识库补全方法通过恢复丢失的实体信息,补全项目知识库,提高对项目数据的管理效率。

Description

一种知识库补全方法、终端设备以及计算机存储介质 技术领域
本申请涉及自然语言处理技术领域,特别是涉及一种知识库补全方法、终端设备以及计算机存储介质。
背景技术
我国是基建大国,基础建设过程中产生了大量的建筑项目,如何合理高效的管理这些建筑项目就成为了急需解决的问题。大量研究发现大概三分之一的建筑项目都无法在时间和金钱项目上达到预期目标,造成这一现象的原因就是约束管理的困难。约束管理是指材料、设备、劳动力、许可证明等制约建筑项目平稳推进的因素。从各个方面来看,做好约束管理可以有效的提高建筑工程项目推进的效率。
做好约束管理的有效途径之一就是采用现代的基于软件包的管理方法。这种方法在项目开始前就做好对建筑项目和其约束条件的分析,然后将建筑项目分为几个部分后分别解决它们的约束条件。大量的实验数据证明基于软件包的管理方法可以有效提高管理建筑项目的效率。深度学习作为机器学习研究中的一个领域,其灵感源自于对人工神经网络的研究,其应用十分广泛,但其在与自然语言处理结合时仍有许多问题尚未解决。在建筑项目管理知识库构建中,部分信息不完整的知识库严重影响了管理效率。
发明内容
本申请提供了一种知识库补全方法、终端设备以及计算机存储介质。
本申请提供了一种知识库补全方法,所述知识库补全方法包括:
获取项目数据;
建立所述项目数据中实体与本体类的映射关系;
基于所述实体与本体类的映射关系构建知识表示;
在所述知识表示中迭代恢复实体的信息,形成完整的知识库。
其中,所述建立所述项目数据中实体与本体类的映射关系之后,还包括:
建立所述项目数据中属性与属性类的映射关系;
基于所述实体与本体类的映射关系,以及所述属性与属性类的映射关系构建知识表示;
其中,所述属性类包括数值类和布尔类。
其中,所述建立所述项目数据中实体与本体类的映射关系之后,还包括:
获取新增项目数据;
分别计算所述新增项目数据中的实体与所有已有本体类的相似度;
将所述新增项目数据中的实体添加到所述相似度最大的已有本体类中。
其中,计算所述新增项目数据中的实体与已有本体类的相似度的公式如下:
Figure PCTCN2021138475-appb-000001
其中,s表示所述新增项目数据中的实体与已有本体类的相似度,
Figure PCTCN2021138475-appb-000002
表示所述新增项目数据中的实体向量,
Figure PCTCN2021138475-appb-000003
表示所述已有本体类的向量。
其中,所述在所述知识表示中迭代恢复实体的信息,包括:
获取所述实体对应的目标节点在所述知识表示中的位置;
基于注意力机制对所述目标节点的邻域节点进行采样,以获取邻域节点信息;
利用所述邻域节点信息对所述目标节点进行迭代,以恢复所述目标节点对 应的实体的信息。
其中,所述知识库补全方法,还包括:
将所述目标节点的领域节点按照不同的注意力值加权平均后,得到所述邻域节点信息。
其中,所述知识库补全方法,还包括:
基于所述项目数据中的实体,以及实体之间的关系构建三元组;
计算所述三元组的特征分数;
保留所述特征分数高于或等于预设阈值的三元组,剔除所述特征分数低于所述预设阈值的三元组。
其中,所述实体之间的关系包括约束关系、任务关系和/或属性关系。
本申请还提供了一种终端设备,所述终端设备包括存储器和处理器,其中,所述存储器与所述处理器耦接;
其中,所述存储器用于存储程序数据,所述处理器用于执行所述程序数据以实现上述的知识库补全方法。
本申请还提供了一种计算机存储介质,所述计算机存储介质用于存储程序数据,所述程序数据在被处理器执行时,用以实现上述的知识库补全方法。
本申请的有益效果是:终端设备获取项目数据;建立项目数据中实体与本体类的映射关系;基于实体与本体类的映射关系构建知识表示;在知识表示中迭代恢复实体的信息,形成完整的知识库。通过上述方式,本申请的知识库补全方法通过恢复丢失的实体信息,补全项目知识库,提高对项目数据的管理效率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。其中:
图1是本申请提供的知识库补全方法一实施例的流程示意图;
图2是本申请提供的知识库补全方法另一实施例的流程示意图;
图3是本申请提供的知识库补全方法又一实施例的流程示意图;
图4是本申请提供的终端设备一实施例的结构示意图;
图5是本申请提供的终端设备另一实施例的结构示意图;
图6是本申请提供的计算机存储介质一实施例的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的知识库补全方法可以应用于建筑管理领域,即对建筑管理的项目数据通过知识库的方式进行管理。在其他实施例中,也可以应用于其他领域的项目管理,例如航空数据管理、运输数据管理等。在以下实施例中,统一以建筑管理的项目数据对本申请的知识库补全方法进行详细阐述。
进一步地,由于相关的传统建筑领域知识库受到缺少信息节点和特征匮乏影响,无法找到重要信息来完成相关的建筑项目管理,本申请对此提出一种基于图深度学习的知识库,图深度学习框架具有很强的自主学习能力,泛化能力和鲁棒性。针对现在建筑项目逐渐变得复杂,人们需要对建筑工地上的材料、装备、任务实现一个统一高效的管理等问题,该方法可以有效地提高人们对建筑工程项目的管理效率。
具体请参阅图1,图1是本申请提供的知识库补全方法一实施例的流程示意图。
本申请实施例的知识库补全方法适用于未完成的或者缺失某些特征的知识库,通过一种基于图深度学习的知识库补全模型引入领域知识和本体规则来丰富数据来源以及提高模型精确度,先通过在词嵌入阶段引入图神经网络来编码实体和实体间的关系,然后引入了一种基于图卷积神经网络的解码器在实体-关系-实体的三元组上来识别丢失的信息。本申请的知识库补全方法通过引入两种图神经网络和其他辅助信息构建模型,提高了模型在相关数据集上的准确率并有效的缩短了构造建筑领域知识库的时间,对数量庞大的建筑领域知识库具有良好的自主学习能力和泛化能力,并且具有良好的扩展性。
其中,本申请的知识库补全方法应用于一种终端设备,其中,本申请的终端设备可以为服务器,也可以为由服务器和终端设备相互配合的系统。相应地,终端设备包括的各个部分,例如各个单元、子单元、模块、子模块可以全部设置于服务器中,也可以分别设置于服务器和终端设备中。
进一步地,上述服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块,例如用来提供分布式服务器的软件或软件模块,也可以实现成单个软件或软件模块,在此不做具体限定。在一些可能的实现方式中,本申请实施例的知识库补全方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
具体而言,如图1所示,本申请实施例的知识库补全方法具体包括以下步骤:
步骤S11:获取项目数据。
在本申请实施例中,项目数据可以为建筑领域的工程项目的具体数据,如楼层高度、室内面积、楼梯位置等。
步骤S12:建立项目数据中实体与本体类的映射关系。
在本申请实施例中,终端设备将项目数据中的实体映射到域类,如本体类中,类映射的目的是为了在实体和本体类之间建立一种所属的关系。
一般而已,知识库中最初的本体只有约束、任务、项目参与者三个类。为 了能够更好地为模型丰富数据,本申请实施例还可以将属性类引入本体中。但是,由于属性数据比较稀疏,将属性数据全加入模型中会损坏知识库补全模型的表现,本申请实施例只引入属性类中的两个子类,即数值类和布尔类。
具体地,属性数据的类映射很简单,例如,正则表达式用于直接提取包含日期等数值,以及真假字符的属性,这些属性分别映射到数值类和布尔类中。
其他实体的类映射依赖于语义相似性,终端设备首先要提取实体和本体类的词嵌入表示,以形成向量空间模型(VSM,Vector space model)。然后,终端设备为每个类本体创建一个列表,其中,包括在工程字典中获得的类名及其同义词。对于实体之间的关系,终端设备可以在VSM搜索列表中每个名称的表示,最后通过计算实体和类相对应向量之间的余弦相似度来判断两个实体之间的语义关系。
进一步的,对于一些无法在VSM中找到的实体或类的名称,本申请实施例提供另一种知识库补全方法,具体请参阅图2,图2是本申请提供的知识库补全方法另一实施例的流程示意图。
具体而言,如图2所示,本申请实施例的知识库补全方法具体包括以下步骤:
步骤S21:获取新增项目数据。
在本申请实施例中,新增项目数据即无法在VSM中找到的实体或者类的名称的项目数据。对于这部分项目数据,终端设备把这些项目数据的名称通过标记分为几个部分,并在VSM中寻找每个部分的嵌入,最后把这些嵌入平均后得到最后的结果。具体如下:
步骤S22:分别计算新增项目数据中的实体与所有已有本体类的相似度。
在本申请实施例中,终端设备需要计算当前知识库中各个已有本体类与新增项目数据中的实体的相似度,具体相似度计算公式如下:
Figure PCTCN2021138475-appb-000004
其中,s表示所述新增项目数据中的实体与已有本体类的相似度,
Figure PCTCN2021138475-appb-000005
表示所述新增项目数据中的实体向量,
Figure PCTCN2021138475-appb-000006
表示所述已有本体类的向量。
步骤S23:将新增项目数据中的实体添加到相似度最大的已有本体类中。
在本申请实施例中,终端设备比较各个已有本体类与新增项目数据中的实体的相似度,将实体添加到相似度最大的已有本体类中。
步骤S13:基于实体与本体类的映射关系构建知识表示。
在本申请实施例中,基于规则的数据丰富首先将所有规则采用基本的关系表示,然后,在其中找出可以满足构建三元组条件的规则体,这些规则体可以用来构建新的三元组和丰富关系的语义,新推断的三元组也被输入到语义丰富的过程中。
具体地,终端设备可以构造一个基于图神经网络的知识库表示的编码器,将实体和关系嵌入到模型中,详细过程如下:
终端设备定义知识表示,本申请实施例的知识表示KB=(E,R,T)是由实体E、关系R、有效的三元组T表示的三元组。其中,知识表示模型的目标是对潜在的三元组进行分类来预测缺失的三元组。
步骤S14:在知识表示中迭代恢复实体的信息,形成完整的知识库。
在本申请实施例中,终端设备采用基于注意力机制的邻域节点采样。具体地,知识表示中具有不同类型的节点和关系,而节点在其邻域中可以发挥不同的角色。本申请实施例通过注意力机制来计算每个目标节点的所有邻域节点的重要性后,再进行采用,可以有效提高知识表示模型的准确率。
另外,终端设备采用多头信息聚合,每次迭代后节点的嵌入是通过对按照注意力值加权的节点集的所有三元组的嵌入相加来计算的。此外,多个注意力值被用来稳定编码过程和收集更多的邻域信息。本申请实施例通过平均多个注意力结果来计算,以合并信息并节省计算能力。
终端设备通过图神经网络收集信息时,节点的原始嵌入中信息可能会丢失。这种损失对知识表示模型有负面的影响,特别是当原始嵌入没有随机初始 化时。因此,终端设备可以对每个节点的原始嵌入由一个矩阵进行转换,然后在最后一次迭代后添加到实体中,以恢复该实体的信息。具体公式如下:
Figure PCTCN2021138475-appb-000007
其中,W0表示转换矩阵,
Figure PCTCN2021138475-appb-000008
表示邻域节点信息,
Figure PCTCN2021138475-appb-000009
表示目标节点。
具体地,对于上述步骤S13中三元组的有效性,可以通过构造一个基于卷积知识表示的解码器ConvKB,来评估每一个嵌入三元组的有效性。具体过程请参阅图3,图3是本申请提供的知识库补全方法又一实施例的流程示意图。
具体而言,如图3所示,本申请实施例的知识库补全方法具体包括以下步骤:
步骤S31:基于项目数据中的实体,以及实体之间的关系构建三元组。
在本申请实施例中,实体与实体之间用于构建三元组的关系包括但不限于以下关系:约束关系、任务关系以及属性关系。
步骤S32:计算三元组的特征分数。
在本申请实施例中,解码器ConvKB使用一个二维的卷积神经网络过滤器来扫描三元组以提取嵌入三元组的特征。解码器ConvKB将提取到的特征以分数的形式表现出来,这个特征分数反映了三元组有效的可能性。其中,特征分数的具体公式如下:
Figure PCTCN2021138475-appb-000010
其中,Concat表示拼接函数,Relu表示激活函数,用于提取三元组的向量特征,τ表示卷积核,L表示卷积核数量,
Figure PCTCN2021138475-appb-000011
表示第i个实体,
Figure PCTCN2021138475-appb-000012
表示第j个实体,
Figure PCTCN2021138475-appb-000013
表示实体
Figure PCTCN2021138475-appb-000014
和实体
Figure PCTCN2021138475-appb-000015
之间的关系,K表示邻域阶数,v表示拼接函数的结果向量的转置。
步骤S33:保留特征分数高于或等于预设阈值的三元组,剔除特征分数低于预设阈值的三元组。
在本申请实施例中,终端设备将特征分数高于或等于预设阈值的三元组作 为有效的三元组进行保留,将特征分数低于预设阈值的三元组作为无效的三元组进行剔除,从而完成知识表示的构建。
本申请提供了一种基于图深度学习的知识库补全模型,包括有以下步骤:将属性类中的两个子类数据类和布尔类加入到模型中以丰富模型中的数据。然后在实体和本体类之间建立一种所属的关系,将其映射到VSM中,通过向量间的余弦相似度判断他们的语义关系。定义知识表示,通过潜在的三元组进行分类来预测缺失的三元组。然后对知识表示中的节点采用基于注意力机制的采样后再嵌入模型,过程中为了防止节点原始信息的丢失采用信息更新矩阵来保存原始信息。对得到的三元组进行二维卷积来提取特征,并将特征以分数的形式量化来反映三元组的有效性。
在本申请中,终端设备获取项目数据;建立项目数据中实体与本体类的映射关系;基于实体与本体类的映射关系构建知识表示;在知识表示中迭代恢复实体的信息,形成完整的知识库。通过上述方式,本申请的知识库补全方法通过恢复丢失的实体信息,可以自动识别缺失的三元组,更好地支持基于软件包的约束管理方法的下游管理功能,补全项目知识库,提高对项目数据的管理效率。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
为实现上述实施例的知识库补全方法,本申请还提出了一种终端设备,具体请参阅图4,图4是本申请提供的终端设备一实施例的结构示意图。
本申请实施例的终端设备400包括获取模块41、映射模块42、构建模块43以及补全模块44;其中,
所述获取模块41,用于获取项目数据。
所述映射模块42,用于建立所述项目数据中实体与本体类的映射关系。
所述构建模块43,用于基于所述实体与本体类的映射关系构建知识表示。
所述补全模块44,用于在所述知识表示中迭代恢复实体的信息,形成完整的知识库。
为实现上述实施例的知识库补全方法,本申请还提出了另一种终端设备,具体请参阅图5,图5是本申请提供的终端设备另一实施例的结构示意图。
本申请实施例的终端设备500包括存储器51和处理器52,其中,存储器51和处理器52耦接。
存储器51用于存储程序数据,处理器52用于执行程序数据以实现上述实施例所述的知识库补全方法。
在本实施例中,处理器52还可以称为CPU(Central Processing Unit,中央处理单元)。处理器52可能是一种集成电路芯片,具有信号的处理能力。处理器52还可以是通用处理器、数字信号处理器(DSP,Digital Signal Process)、专用集成电路(ASIC,Application Specific Integrated Circuit)、现场可编程门阵列(FPGA,Field Programmable Gate Array)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器52也可以是任何常规的处理器等。
本申请还提供一种计算机存储介质,如图6所示,计算机存储介质600用于存储程序数据61,程序数据61在被处理器执行时,用以实现如上述实施例所述的知识库补全方法。
本申请还提供一种计算机程序产品,其中,上述计算机程序产品包括计算机程序,上述计算机程序可操作来使计算机执行如本申请实施例所述的知识库补全方法。该计算机程序产品可以为一个软件安装包。
本申请上述实施例所述的知识库补全方法,在实现时以软件功能单元的形式存在并作为独立的产品销售或使用时,可以存储在装置中,例如一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器 (processor)执行本发明各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (10)

  1. 一种知识库补全方法,其特征在于,所述知识库补全方法包括:
    获取项目数据;
    建立所述项目数据中实体与本体类的映射关系;
    基于所述实体与本体类的映射关系构建知识表示;
    在所述知识表示中迭代恢复实体的信息,形成完整的知识库。
  2. 根据权利要求1所述的知识库补全方法,其特征在于,
    所述建立所述项目数据中实体与本体类的映射关系之后,还包括:
    建立所述项目数据中属性与属性类的映射关系;
    基于所述实体与本体类的映射关系,以及所述属性与属性类的映射关系构建知识表示;
    其中,所述属性类包括数值类和布尔类。
  3. 根据权利要求1所述的知识库补全方法,其特征在于,
    所述建立所述项目数据中实体与本体类的映射关系之后,还包括:
    获取新增项目数据;
    分别计算所述新增项目数据中的实体与所有已有本体类的相似度;
    将所述新增项目数据中的实体添加到所述相似度最大的已有本体类中。
  4. 根据权利要求3所述的知识库补全方法,其特征在于,
    计算所述新增项目数据中的实体与已有本体类的相似度的公式如下:
    Figure PCTCN2021138475-appb-100001
    其中,s表示所述新增项目数据中的实体与已有本体类的相似度,
    Figure PCTCN2021138475-appb-100002
    表示所述新增项目数据中的实体向量,
    Figure PCTCN2021138475-appb-100003
    表示所述已有本体类的向量。
  5. 根据权利要求1所述的知识库补全方法,其特征在于,
    所述在所述知识表示中迭代恢复实体的信息,包括:
    获取所述实体对应的目标节点在所述知识表示中的位置;
    基于注意力机制对所述目标节点的邻域节点进行采样,以获取邻域节点信息;
    利用所述邻域节点信息对所述目标节点进行迭代,以恢复所述目标节点对应的实体的信息。
  6. 根据权利要求5所述的知识库补全方法,其特征在于,
    所述知识库补全方法,还包括:
    将所述目标节点的领域节点按照不同的注意力值加权平均后,得到所述邻域节点信息。
  7. 根据权利要求1所述的知识库补全方法,其特征在于,
    所述知识库补全方法,还包括:
    基于所述项目数据中的实体,以及实体之间的关系构建三元组;
    计算所述三元组的特征分数;
    保留所述特征分数高于或等于预设阈值的三元组,剔除所述特征分数低于所述预设阈值的三元组。
  8. 根据权利要求7所述的知识库补全方法,其特征在于,
    所述实体之间的关系包括约束关系、任务关系和/或属性关系。
  9. 一种终端设备,其特征在于,所述终端设备包括存储器和处理器,其中,所述存储器与所述处理器耦接;
    其中,所述存储器用于存储程序数据,所述处理器用于执行所述程序数据以实现权利要求1-8任一项所述的知识库补全方法。
  10. 一种计算机存储介质,其特征在于,所述计算机存储介质用于存储程序数据,所述程序数据在被处理器执行时,用以实现权利要求1-8任一项所述的知识库补全方法。
PCT/CN2021/138475 2021-11-17 2021-12-15 一种知识库补全方法、终端设备以及计算机存储介质 WO2023087463A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111364269.4 2021-11-17
CN202111364269.4A CN114168741A (zh) 2021-11-17 2021-11-17 一种知识库补全方法、终端设备以及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2023087463A1 true WO2023087463A1 (zh) 2023-05-25

Family

ID=80479468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138475 WO2023087463A1 (zh) 2021-11-17 2021-12-15 一种知识库补全方法、终端设备以及计算机存储介质

Country Status (2)

Country Link
CN (1) CN114168741A (zh)
WO (1) WO2023087463A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045933A (zh) * 2015-09-08 2015-11-11 中国人民解放军海军工程大学 船舶装备维修保障信息关系数据库模式与本体间映射方法
US20190087724A1 (en) * 2017-09-21 2019-03-21 Foundation Of Soongsil University Industry Cooperation Method of operating knowledgebase and server using the same
CN111291139A (zh) * 2020-03-17 2020-06-16 中国科学院自动化研究所 基于注意力机制的知识图谱长尾关系补全方法
CN112699248A (zh) * 2020-12-24 2021-04-23 厦门市美亚柏科信息股份有限公司 一种知识本体构建方法、终端设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045933A (zh) * 2015-09-08 2015-11-11 中国人民解放军海军工程大学 船舶装备维修保障信息关系数据库模式与本体间映射方法
US20190087724A1 (en) * 2017-09-21 2019-03-21 Foundation Of Soongsil University Industry Cooperation Method of operating knowledgebase and server using the same
CN111291139A (zh) * 2020-03-17 2020-06-16 中国科学院自动化研究所 基于注意力机制的知识图谱长尾关系补全方法
CN112699248A (zh) * 2020-12-24 2021-04-23 厦门市美亚柏科信息股份有限公司 一种知识本体构建方法、终端设备及存储介质

Also Published As

Publication number Publication date
CN114168741A (zh) 2022-03-11

Similar Documents

Publication Publication Date Title
CN112434169B (zh) 一种知识图谱的构建方法及其系统和计算机设备
JP7468929B2 (ja) 地理知識取得方法
WO2022205833A1 (zh) 无线网络协议知识图谱构建分析方法、系统、设备及介质
CN111444410B (zh) 一种基于知识图谱的关联交易挖掘识别方法及装置
WO2020093761A1 (zh) 一种面向软件缺陷知识的实体、关系联合抽取方法
CN111563192B (zh) 实体对齐方法、装置、电子设备及存储介质
CN111666350B (zh) 一种基于bert模型的医疗文本关系抽取的方法
CN108399268B (zh) 一种基于博弈论的增量式异构图聚类方法
WO2023274059A1 (zh) 交替序列生成模型训练方法、从文本中抽取图的方法
CN112052940B (zh) 基于向量压缩与重构的社交网络特征动态提取方法
CN112650833A (zh) Api匹配模型建立方法及跨城市政务api匹配方法
Chen et al. Distribution knowledge embedding for graph pooling
CN112015890B (zh) 电影剧本摘要的生成方法和装置
WO2023087463A1 (zh) 一种知识库补全方法、终端设备以及计算机存储介质
CN114219089B (zh) 一种新一代信息技术产业知识图谱的构建方法及设备
CN112115261B (zh) 基于对称和互逆关系统计的知识图谱数据扩展方法
CN114385827A (zh) 面向会议知识图谱的检索方法
Nguyen-Xuan et al. Sketch recognition using lstm with attention mechanism and minimum cost flow algorithm
CN111078886B (zh) 基于dmcnn的特殊事件提取系统
Mahmoud et al. Using semantic web technologies to improve the extract transform load model
CN114036319A (zh) 一种电力知识抽取方法、系统、装置及存储介质
WO2021072892A1 (zh) 基于神经网络混合模型的法律条文检索方法及相关设备
CN110688446B (zh) 一种句义数学空间表示方法、系统、介质和设备
WO2021185171A1 (zh) 特征量化模型训练、特征量化、数据查询方法及系统
CN112685574B (zh) 领域术语层次关系的确定方法、装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21964608

Country of ref document: EP

Kind code of ref document: A1