CN111782769A

CN111782769A - A Knowledge Graph Intelligent Question Answering Method Based on Relation Prediction

Info

Publication number: CN111782769A
Application number: CN202010628423.3A
Authority: CN
Inventors: 赵芬; 李银国; 侯杰; 李俊; 王新恒
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2020-10-16
Anticipated expiration: 2040-07-01
Also published as: CN111782769B

Abstract

The invention relates to a knowledge graph intelligent question answering method based on relationship prediction, and belongs to the field of natural language processing. The method includes steps: S1: input question Q, and preprocess the question; S2: use entity recognition technology to identify the entity e _question in the question, and map the entity e _question to the corresponding entity e KGs in the _KGs ; S3: query the KGs The category c of the entity e _KGs in the question Q is replaced with the category c of the entity e _question in the question Q, marked as Q _c ; S4: The relationship r is mapped from Q _c ; S5: In KGs, if the entity e _KGs and the relationship r are between S6: Learning new vector representations of central entities e _KGs ; S7: Inferring hidden relationships in KGs based on existing related triples; S8: Entities and relationships-based knowledge graph reasoning to obtain answer A. The invention can find the corresponding relationship of "question sentence entity--knowledge graph entity", and the corresponding relationship of "question sentence natural language description-knowledge graph semantic relationship".

Description

A Knowledge Graph Intelligent Question Answering Method Based on Relation Prediction

技术领域technical field

本发明属于自然语言处理领域，涉及基于关系预测的知识图谱智能问答方法。The invention belongs to the field of natural language processing, and relates to a knowledge graph intelligent question answering method based on relationship prediction.

背景技术Background technique

传统搜索引擎的基于关键词的搜索方式，缺乏对自然语言的语义分析和语义理解，已经越来越难以满足人们的需求。对使用者而言，符合人类自然语言表达的交互方式才是最佳的，当问答系统表现出足够的智能时，就可以满足使用者对这种交互方式的需求。Google在2012年提出知识图谱(Knowledge Graphs，KGs)的概念，进一步将问答系统向智能化的方向推动。随着知识图谱技术的发展，智能问答系统展现出了新的发展前景。社交类网站的兴起为智能问答系统的研究提供了大量不同领域的真实问答语料，给机器从数据层面理解自然语言问句提供了极大的方便。而如Freebase、DBpedia等大规模知识图谱的蓬勃发展，则为智能问答系统提供了高质量结构化知识来源。基于知识图谱的智能问答领域，是现阶段问答领域研究中的重要方向。其在人工智能时代具有远大的应用前景，也为移动互联网和人类社会信息入口提供了技术和理论上的支持。The keyword-based search method of traditional search engines lacks semantic analysis and semantic understanding of natural language, making it increasingly difficult to meet people's needs. For users, the best interaction method is in line with human natural language expression. When the question answering system exhibits sufficient intelligence, it can meet the user's needs for this interaction method. In 2012, Google proposed the concept of Knowledge Graphs (KGs), which further promoted the question answering system in the direction of intelligence. With the development of knowledge graph technology, intelligent question answering system has shown new development prospects. The rise of social networking sites has provided a large number of real question and answer corpora in different fields for the research of intelligent question answering systems, and it has provided great convenience for machines to understand natural language questions from the data level. The vigorous development of large-scale knowledge graphs such as Freebase and DBpedia has provided high-quality structured knowledge sources for intelligent question answering systems. The field of intelligent question answering based on knowledge graph is an important direction in the field of question answering research at this stage. It has great application prospects in the era of artificial intelligence, and also provides technical and theoretical support for the mobile Internet and the entrance of human social information.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种基于关系预测的知识图谱智能问答方法，通过构建一种基于注意力的图嵌入方法来推断KGs中隐藏的关系，从而补全KGs中缺失的关系，提高基于知识图谱的问答系统(Question Answering over Knowledge Graphs,KGs-QA)的准确性。In view of this, the purpose of the present invention is to provide a knowledge graph intelligent question answering method based on relationship prediction, by constructing an attention-based graph embedding method to infer the hidden relationship in KGs, so as to complete the missing relationship in KGs, Improve the accuracy of Question Answering over Knowledge Graphs (KGs-QA) based on knowledge graphs.

为达到上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

基于关系预测的知识图谱智能问答方法，该方法包括以下步骤：A knowledge graph intelligent question answering method based on relationship prediction, the method includes the following steps:

S1：输入问题Q，对问题进行预处理；S1: Input the question Q, and preprocess the question;

S2：利用实体识别技术识别问题中的实体e_question,并将实体e_question映射到KGs中对应的实体e_KGs；S2: Use entity recognition technology to identify the entity e _question in the question, and map the entity e _question to the corresponding entity e KGs in the _KGs ;

S3：查询KGs中实体e_KGs的类别c，用类别c替换问题Q中的实体e_question，标记为Q_c；S3: Query the category c of the entity e _KGs in the KGs, replace the entity e _question in the question Q with the category c, and mark it as Q _c ;

S4：从Q_c中映射出关系r；S4: Map the relation r from Q _c ;

S5：在KGs中，如果实体e_KGs和关系r之间缺少联系；S5: In KGs, if there is a lack of connection between entity e _KGs and relation r;

S6：学习中心实体e_KGs的新的向量表示；S6: Learning new vector representations of central entities e _KGs ;

S7：基于已有的相关三元组推断KGs中隐藏的关系；S7: Infer hidden relationships in KGs based on existing related triples;

S8：基于实体和关系的知识图推理，获得答案A。S8: Knowledge graph reasoning based on entities and relationships, get answer A.

可选的，所述步骤S1具体为：通过HanLP与Stanford parser中的CRF句法分析器与最大熵依存句法分析器将文本划分为词或短语，并获取词性、词序、关键词和依存关系量化描述。Optionally, the step S1 is specifically: dividing the text into words or phrases by using the CRF parser and the maximum entropy dependency parser in HanLP and Stanford parser, and obtaining part of speech, word order, keywords and quantitative descriptions of dependencies. .

可选的，所述步骤S2具体为：利用双向长短期记忆网络Bi-LSTM模型对问句中每个单词是否为实体进行预测；Optionally, the step S2 is specifically: using a bidirectional long short-term memory network Bi-LSTM model to predict whether each word in the question is an entity;

采用前向和后向两个LSTM单元对输入序列进行处理，最后输出向量为两个LSTM输出向量的拼接；The input sequence is processed by two LSTM units, forward and backward, and the final output vector is the concatenation of the two LSTM output vectors;

模型的输出向量为y＝(y₁,y₂,...,y_n)，其中n为输入序列的长度，该模型输出向量长度与输入序列是保持一致的，y_i对应输入问句中第i个单词的标注信息，如果为“1”则代表为寻找的实体，反之则不是。The output vector of the model is y=(y ₁ , y ₂ ,...,y _n ), where n is the length of the input sequence, the length of the output vector of the model is consistent with the input sequence, and y _i corresponds to the input question. The annotation information of the i-th word, if it is "1", it represents the entity to be found, otherwise it is not.

可选的，所述步骤S3具体为：利用潜在狄利克雷主题模型来概念化问句中的实体，以便于对实体的理解，增加其可解释性；Optionally, the step S3 is specifically: using a latent Dirichlet topic model to conceptualize the entity in the question, so as to facilitate the understanding of the entity and increase its interpretability;

通过结合主题模型潜在狄利克雷分配和一个大规模概率KGs，捕获单词之间的语义关系，开发一个基于语料库的上下文相关概念化框架。We develop a corpus-based framework for context-sensitive conceptualization by combining topic model latent Dirichlet assignments with a large-scale probabilistic KGs that capture semantic relationships between words.

可选的，所述步骤S4具体为：在关系链接任务中引入卷积神经网络CNNs模型；通过深度神经网络模型，提取问句中关于关系的语义信息，同时对候选实体的所有关系用同样的模型进行处理，将得到的问句属性向量和知识图谱属性向量进行相似度匹配，得到最终链接的正确关系。Optionally, the step S4 is specifically: introducing a convolutional neural network CNNs model in the relationship linking task; extracting semantic information about the relationship in the question through the deep neural network model, and using the same for all relationships of candidate entities. The model is processed, and the obtained question attribute vector and the knowledge graph attribute vector are matched for similarity, and the correct relationship of the final link is obtained.

可选的，所述步骤S5具体为：基于步骤S2识别出来的实体和S4链接出来的关系，将基于实体和关系的知识图推理简化为子图匹配问题；Optionally, the step S5 is specifically: based on the entity identified in the step S2 and the relationship linked by the S4, simplify the knowledge graph reasoning based on the entity and the relationship into a subgraph matching problem;

在知识图谱中，若没有找到匹配，即实体e_KGs和关系r之间缺少联系，则进行下一步的关系预测任务。In the knowledge graph, if no match is found, that is, there is a lack of connection between the entity e _KGs and the relationship r, the next task of relationship prediction is performed.

可选的，所述步骤S6具体为：为解决不能获取隐藏在三元组周围的领域中信息的问题，优化向量学习模型，提出基于注意力的特征嵌入方法，该方法捕获任意给定实体的邻域内的实体特征和关系特征；Optionally, the step S6 is specifically: in order to solve the problem that information in the field hidden around the triplet cannot be obtained, optimize the vector learning model, and propose an attention-based feature embedding method, which captures any given entity. Entity features and relational features within the neighborhood;

在模型中封装关系集群和多跳关系；为得到一个中心实体的新的向量表示，通过线性变换来学习中心实体邻域内存在的每个相关事实集的特征向量。Relation clusters and multi-hop relations are encapsulated in the model; to obtain a new vector representation of a central entity, a linear transformation is used to learn feature vectors for each set of relevant facts present in the central entity's neighborhood.

可选的，所述步骤S7具体为：通过识别现有的相关事实三元组(h,r,t)推断最初隐藏的关系，其中，h代表头语义实体，r代表语义关系，t代表尾语义实体；即学习中心实体邻域内的多跳实体和多跳关系，在n-hop领域之间引入一条辅助边，实现关系预测任务。Optionally, the step S7 is specifically: infer the initially hidden relationship by identifying the existing relevant fact triples (h, r, t), where h represents the head semantic entity, r represents the semantic relationship, and t represents the tail. Semantic entities; that is, learning multi-hop entities and multi-hop relationships in the neighborhood of the central entity, and introducing an auxiliary edge between n-hop domains to realize the task of relationship prediction.

可选的，所述步骤S8具体为：KGs-QA通过学习2782个意图的2700万个模板，以支持二元事实问答。Optionally, the step S8 is specifically: KGs-QA supports binary fact question answering by learning 27 million templates of 2782 intents.

本发明的有益效果在于：本发明采用知识图的方法对自然语言问句中的实体和关系进行推理，从而找到“问句实体--知识图实体”的对应关系，以及“问句自然语言描述——知识图语义关系”的对应关系，基于模板的知识图推理，得到相应的答案，使我们的自然语言理解功能不仅具备理解字面意思的能力，还具备逻辑推理，理解深层意思的能力。The beneficial effects of the present invention are as follows: the present invention adopts the method of knowledge graph to infer the entities and relationships in the natural language question, so as to find the corresponding relationship of "question entity-knowledge graph entity", and "question natural language description" ——The corresponding relationship of knowledge graph semantic relationship, template-based knowledge graph reasoning, get the corresponding answer, so that our natural language understanding function not only has the ability to understand the literal meaning, but also has the ability to logically reason and understand the deep meaning.

本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述，并且在某种程度上，基于对下文的考察研究对本领域技术人员而言将是显而易见的，或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will be set forth in the description that follows, and will be apparent to those skilled in the art based on a study of the following, to the extent that is taught in the practice of the present invention. The objectives and other advantages of the present invention may be realized and attained by the following description.

附图说明Description of drawings

为了使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作优选的详细描述，其中：In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be preferably described in detail below with reference to the accompanying drawings, wherein:

图1为本发明基于知识图谱的智能问答系统构建的流程图；Fig. 1 is the flow chart of the intelligent question answering system construction based on knowledge graph of the present invention;

图2为本发明本发明基于Bi-LSTM的实体识别模型示意图；Fig. 2 is the schematic diagram of the entity recognition model based on Bi-LSTM of the present invention;

图3为本发明基于CNNs的关系链接模型示意图；3 is a schematic diagram of a relational link model based on CNNs of the present invention;

图4为本发明基于知识图谱的实体识别和关系链接示意图；4 is a schematic diagram of entity identification and relationship linking based on knowledge graph of the present invention;

图5为本发明知识图的子图示意图；5 is a schematic diagram of a subgraph of the knowledge graph of the present invention;

图6为本发明基于实体和关系的知识图推理示意图。FIG. 6 is a schematic diagram of the knowledge graph reasoning based on entities and relationships according to the present invention.

具体实施方式Detailed ways

以下通过特定的具体实例说明本发明的实施方式，本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用，本说明书中的各项细节也可以基于不同观点与应用，在没有背离本发明的精神下进行各种修饰或改变。需要说明的是，以下实施例中所提供的图示仅以示意方式说明本发明的基本构想，在不冲突的情况下，以下实施例及实施例中的特征可以相互组合。The embodiments of the present invention are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the drawings provided in the following embodiments are only used to illustrate the basic idea of the present invention in a schematic manner, and the following embodiments and features in the embodiments can be combined with each other without conflict.

其中，附图仅用于示例性说明，表示的仅是示意图，而非实物图，不能理解为对本发明的限制；为了更好地说明本发明的实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；对本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。Among them, the accompanying drawings are only used for exemplary description, and represent only schematic diagrams, not physical drawings, and should not be construed as limitations of the present invention; in order to better illustrate the embodiments of the present invention, some parts of the accompanying drawings will be omitted, The enlargement or reduction does not represent the size of the actual product; it is understandable to those skilled in the art that some well-known structures and their descriptions in the accompanying drawings may be omitted.

本发明实施例的附图中相同或相似的标号对应相同或相似的部件；在本发明的描述中，需要理解的是，若有术语“上”、“下”、“左”、“右”、“前”、“后”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此附图中描述位置关系的用语仅用于示例性说明，不能理解为对本发明的限制，对于本领域的普通技术人员而言，可以根据具体情况理解上述术语的具体含义。The same or similar numbers in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there are terms “upper”, “lower”, “left” and “right” , "front", "rear" and other indicated orientations or positional relationships are based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must be It has a specific orientation, is constructed and operated in a specific orientation, so the terms describing the positional relationship in the accompanying drawings are only used for exemplary illustration, and should not be construed as a limitation of the present invention. situation to understand the specific meaning of the above terms.

图1为本发明基于知识图谱的智能问答系统构建的流程图。以Yahoo！Answer的自然语言问答语料为语义知识资源，知识图为语义表示方法。通过一种知识图谱的数据表示方式，本文利用知识图对自然语言的实体进行描述，完成问答任务，使模型的自然语言理解功能不仅具备理解字面意思的能力，还具备逻辑推理，理解深层意思的能力。下面结合附图给出一个利用知识图谱对智能问答系统的实施例以对本发明作进一步的阐述。FIG. 1 is a flowchart of the construction of an intelligent question answering system based on a knowledge graph according to the present invention. Take Yahoo! Answer's natural language question and answer corpus is a semantic knowledge resource, and a knowledge graph is a semantic representation method. Through a data representation method of knowledge graph, this paper uses the knowledge graph to describe the entities of natural language and complete the question and answer task, so that the natural language understanding function of the model not only has the ability to understand the literal meaning, but also has the ability to logically reason and understand the deep meaning. ability. An embodiment of an intelligent question answering system using a knowledge graph is given below in conjunction with the accompanying drawings to further illustrate the present invention.

如图1所示，本发明各部分具体实施细节如下：As shown in Figure 1, the specific implementation details of each part of the present invention are as follows:

1.输入问题Q，对问题进行预处理。通过HanLP与Stanford parser中的CRF句法分析器与最大熵依存句法分析器将文本划分为词或短语，并获取词性、词序、关键词和依存关系量化描述。1. Enter the question Q and preprocess the question. Divide the text into words or phrases through the CRF parser and maximum entropy dependency parser in HanLP and Stanford parser, and obtain quantitative descriptions of parts of speech, word order, keywords and dependencies.

2.利用实体识别技术识别问题中的实体e_question,并将实体e_question映射到KGs中对应的实体e_KGs。利用双向长短期记忆网络(Bi-directional Long Short-Term Memory,Bi-LSTM)模型对问句中每个单词是否为实体进行预测，如图2所示。采用前、后向两个LSTM单元对输入序列进行处理，最后输出向量为两个LSTM输出向量的拼接。相对于LSTM模型，Bi-LSTM模型在保留其优点的同时，通过分别训练前后向序列兼顾了上下文信息，可以更好的提取深层次的语义信息。模型的输出向量为y＝(y₁,y₂,...,y_n)，其中n为输入序列的长度，可以看到该模型输出向量长度与输入序列是保持一致的，y_i对应输入问句中第i个单词的标注信息，如果为“1”则代表为寻找的实体，反之则不是。2. Use entity recognition technology to identify the entity e _question in the question, and map the entity e _question to the corresponding entity e KGs in the _KGs . The Bi-directional Long Short-Term Memory (Bi-LSTM) model is used to predict whether each word in the question is an entity, as shown in Figure 2. The input sequence is processed by two LSTM units, forward and backward, and the final output vector is the concatenation of the two LSTM output vectors. Compared with the LSTM model, the Bi-LSTM model retains its advantages and takes into account the context information by separately training the forward and backward sequences, which can better extract deep semantic information. The output vector of the model is y=(y ₁ , y ₂ ,...,y _n ), where n is the length of the input sequence. It can be seen that the length of the output vector of the model is consistent with the input sequence, and y _i corresponds to the input The labeling information of the i-th word in the question sentence, if it is "1", it represents the entity to be found, otherwise it is not.

3.查询实体e_KGs的类别c，用类别c替换问题Q中的实体e_question，标记为Q_c。利用潜在狄利克雷(latent Dirichlet)主题模型来概念化问句中的实体，方便对实体的理解，增加其可解释性。通过结合主题模型潜在狄利克雷分配(latent Dirichlet allocation)和一个大规模KGs，捕获单词之间的语义关系，开发一个基于语料库的上下文相关概念化模型。该模型会自动对输入信息进行消除歧义(例如，问句“苹果公司的总部在哪里？”中的术语“苹果”，将被概念化为“公司”而不是“水果”)。概念化机制本身基于数百万个概念组成的大型语义网络，因此我们有足够的粒度来表示各种各样的问题。3. Query the category c of the entity e _KGs , and replace the entity e _question in the question Q with the category c, labeled Q _c . The latent Dirichlet topic model is used to conceptualize the entity in the question, which facilitates the understanding of the entity and increases its interpretability. We develop a corpus-based context-sensitive conceptualization model by combining topic model latent Dirichlet allocation with a large-scale KGs that capture semantic relationships between words. The model automatically disambiguates the input information (for example, the term "apple" in the question "Where is Apple's headquarters?" would be conceptualized as "company" rather than "fruit"). The conceptualization mechanism itself is based on a large semantic network of millions of concepts, so we have enough granularity to represent a wide variety of problems.

4.从Q_c中映射出关系r。在关系链接任务中引入卷积神经网络(ConvolutionalNeural Networks,CNNs)模型，如图3所示。通过深度神经网络模型，提取问句中关于关系的语义信息，同时对候选实体的所有关系用同样的模型进行处理，将得到的问句属性向量和知识图谱属性向量进行相似度匹配，得到最终链接的正确关系。4. Map the relation r from _Qc . The Convolutional Neural Networks (CNNs) model is introduced in the relational linking task, as shown in Figure 3. Through the deep neural network model, the semantic information about the relationship in the question sentence is extracted, and all the relations of the candidate entities are processed with the same model, and the similarity between the obtained question sentence attribute vector and the knowledge map attribute vector is performed to obtain the final link. correct relationship.

5.在KGs中，如果实体e_KGs和关系r之间缺少联系。将基于实体和关系的知识推理简化为子图匹配问题，在知识图谱中，如果没有找到匹配，也就是说，实体e_KGs和关系r之间缺少联系，如图4所示，进行下一步的关系预测任务。5. In KGs, if there is a missing link between entity e _KGs and relation r. The knowledge reasoning based on entities and relationships is simplified to the subgraph matching problem. In the knowledge graph, if no match is found, that is, there is a lack of connection between the entity e _KGs and the relationship r, as shown in Figure 4, proceed to the next step Relationship prediction task.

6.学习中心实体e_KGs的新的向量表示。为解决传统方法不能获取隐藏在三元组周围的领域中信息的问题，优化向量学习模型，提出基于注意力的特征嵌入方法，该方法可以捕获任意给定实体的邻域内的实体特征和关系特征。此外，我们还在模型中封装了关系集群和多跳关系。为了得到一个中心实体的新的向量表示，通过线性变换来学习中心实体邻域内存在的每个相关事实集的特征向量。6. Learning new vector representations of central entities e _KGs . In order to solve the problem that traditional methods cannot obtain information hidden in the domain around triples, a vector learning model is optimized, and an attention-based feature embedding method is proposed, which can capture entity features and relational features in the neighborhood of any given entity . In addition, we also encapsulate relational clusters and multi-hop relations in the model. To obtain a new vector representation of a central entity, a linear transformation is used to learn the eigenvectors of each relevant fact set existing in the central entity's neighborhood.

7.基于已有的相关三元组推断KGs中隐藏的关系。通过发现已存的相关事实三元组(h,r,t)推断最初隐藏的关系，其中，h代表头语义实体，r代表语义关系，t代表尾语义实体。更具体地说，学习中心实体邻域内的多跳实体和多跳关系，在n-hop领域之间引入一条辅助边，实现关系预测任务，如图5所示。7. Infer hidden relationships in KGs based on existing related triples. The initially hidden relationship is inferred by discovering the existing relevant fact triples (h, r, t), where h represents the head semantic entity, r represents the semantic relationship, and t represents the tail semantic entity. More specifically, multi-hop entities and multi-hop relationships within the neighborhood of the central entity are learned, and an auxiliary edge is introduced between n-hop domains to realize the task of relationship prediction, as shown in Figure 5.

8.基于实体和关系的知识图推理，获得答案A，如图6所示。基于大规模的KGs和大量的问答语料，设计一种新的问题表示方法：模板。学习2782个意图的2700万个模板。基于这些模板，KGs-QA有效地支持二元事实问答。8. Knowledge graph reasoning based on entities and relationships, get answer A, as shown in Figure 6. Based on large-scale KGs and a large amount of question answering corpus, we design a new question representation method: templates. Learn 27 million templates for 2782 intents. Based on these templates, KGs-QA effectively supports binary fact question answering.

图2为本发明基于Bi-LSTM的实体识别模型示意图。本发明将实体识别过程视为序列标注问题，利用Bi-LSTM模型对问句中每个单词是否为实体进行预测。比如问句“中国的首都是哪里”其分词结果为“中国/的/首都/是/哪里”，实体为“中国”，则其标注序列为(1,0,0,0,0)。序列中“1”代表“中国”与问句中实体关系最大。这个过程实际上是首先对自然语言问句进行分词、构建词典等数据处理，并将所有可能的实体作为候选放入候选实体集。FIG. 2 is a schematic diagram of the entity recognition model based on Bi-LSTM of the present invention. The invention regards the entity recognition process as a sequence labeling problem, and uses the Bi-LSTM model to predict whether each word in the question is an entity. For example, for the question "Where is the capital of China", the participle result is "China/的/capital/is/where", and the entity is "China", then its labeling sequence is (1,0,0,0,0). "1" in the sequence represents the largest relationship between "China" and the entity in the question. In fact, this process is to first perform data processing such as word segmentation and dictionary construction for natural language questions, and put all possible entities as candidates into the candidate entity set.

前、后向的LSTM分别对输入向量(x₁,x₂,...,x_t-1,x_t)进行处理，得到输出向量h_t，即

其中，

为前向序列的输出，

为后向序列的输出。Bi-LSTM得到的输出还会送入sigmoid层进行处理，即

模型的输出向量为y＝(y₁,y₂,...,y_n)，其中n为输入序列的长度，可以看到该模型输出向量长度与输入序列是保持一致的，y_i对应输入问句中第i个单词的标注信息，如果为“1”则代表为寻找的实体，反之则不是。在本发明中，使用均方误差作为模型的损失函数，即

其中，ω为权重，b为偏差，y_i为模型的预测值，z_i为目标值，λ为控制正规化的超参数，

为L2正规化。The forward and backward LSTM process the input vectors (x ₁ , x ₂ ,...,x _t-1 , x _t ) respectively to obtain the output vector h _t , namely

in,

is the output of the forward sequence,

is the output of the backward sequence. The output obtained by Bi-LSTM will also be sent to the sigmoid layer for processing, namely

The output vector of the model is y=(y ₁ , y ₂ ,...,y _n ), where n is the length of the input sequence. It can be seen that the length of the output vector of the model is consistent with the input sequence, and y _i corresponds to the input The labeling information of the i-th word in the question sentence, if it is "1", it represents the entity to be found, otherwise it is not. In the present invention, the mean square error is used as the loss function of the model, namely

where ω is the weight, b is the bias, _yi is the predicted value of the model, _zi is the target value, λ is the hyperparameter that controls the regularization,

Normalize for L2.

图3为本发明基于CNNs的关系链接模型示意图。关系链接的过程在本质上是对问句相关关系和候选关系的相关度进行衡量。基于这个思路，本文采用如图3所示的CNNs模型。值得说明的一点是，输入中问句向量部分，候选实体部分被“概念”替换，这样可以避免命名实体对映射结果的影响。从图中可以看出，本文使用了CNNs分别对问句向量和候选关系向量做卷积处理，得到问句和关系对应的语义向量。最后对得到的问句和关系语义向量进行相关度计算即可得到链接结果。经过CNNs模型分别处理后，可以得到问句的语义向量和候选关系的语义向量，通过计算余弦相似度来计算其语义相似度，即，

其中，θ是向量Q和向量R之间的夹角，Q_i是问题语义向量的第i个元素，R_j是候选关系的第j个元素。余弦距离计算两个向量之间的余弦角。在相同的向量空间中，角度越小，两者之间的余弦距离越近。FIG. 3 is a schematic diagram of a relational link model based on CNNs of the present invention. In essence, the process of relational linking is to measure the relevance between the relevant relation of the question and the candidate relation. Based on this idea, this paper adopts the CNNs model shown in Figure 3. It is worth noting that the part of the question vector in the input, the candidate entity part is replaced by the "concept", which can avoid the influence of the named entity on the mapping result. As can be seen from the figure, this paper uses CNNs to convolve the question sentence vector and the candidate relationship vector respectively, and obtain the semantic vector corresponding to the question sentence and the relationship. Finally, the correlation calculation is performed on the obtained question sentence and the relational semantic vector to obtain the link result. After being processed by the CNNs model, the semantic vector of the question and the semantic vector of the candidate relationship can be obtained, and the semantic similarity is calculated by calculating the cosine similarity, that is,

where θ is the angle between vector Q and vector R, Q _i is the i-th element of the question semantic vector, and R _j is the j-th element of the candidate relation. Cosine distance calculates the cosine angle between two vectors. In the same vector space, the smaller the angle, the closer the cosine distance between the two.

图5为本发明知识图的子图示意图。图中实线表示现有关系，虚线表示引入的辅助关系(隐藏关系)。例如，Bob出生在哪里？在此示例中，分析过程包括实体识别过程和关系链接过程，如图5所示。在上面的示例中，在KGs中找到实体和关系之后，我们注意到KGs存在不完整(即，实体之间的关系丢失)的情况，如图5所示。关系预测任务是通过对附近的实体分配不同的注意力来实现的，注意力以迭代的方式通过层传播，考虑到迭代次数的增加，实体的贡献越来越小。解决上述问题的一种有希望的解决方案是关系组合，通过在n跳邻居之间引入辅助边(虚线)来实现的，在这种情况下，n＝2，如图5所示。根据学习，我们注意到<Bob，兄弟，Andy>+<Andy，出生于，华盛顿>的重要性更大。在我们的模型中，引入了两个实体之间2跳邻居的辅助关系(虚线)。辅助关系的向量表示是所有相关关系(实线)的向量表示之和。在此示例中，辅助关系(虚线)可以理解为<Bob，兄弟，Andy>加上<Andy，出生于华盛顿州>。FIG. 5 is a schematic diagram of a subgraph of the knowledge graph of the present invention. The solid line in the figure represents the existing relationship, and the dashed line represents the introduced auxiliary relationship (hidden relationship). For example, where was Bob born? In this example, the analysis process includes an entity recognition process and a relationship linking process, as shown in Figure 5. In the above example, after finding entities and relationships in KGs, we noticed that KGs are incomplete (i.e., relationships between entities are missing), as shown in Figure 5. The relation prediction task is achieved by assigning different attention to nearby entities, and the attention is propagated through the layers in an iterative manner, considering that the contribution of the entities becomes smaller and smaller as the number of iterations increases. A promising solution to the above problem is relation composition, which is achieved by introducing auxiliary edges (dashed lines) between n-hop neighbors, in this case n = 2, as shown in Figure 5. Based on the study, we noticed that <Bob, brother, Andy> + <Andy, born in, Washington> are more important. In our model, an auxiliary relation (dashed line) of 2-hop neighbors between two entities is introduced. The vector representation of an auxiliary relationship is the sum of the vector representations of all dependent relationships (solid lines). In this example, the auxiliary relationship (dotted line) can be understood as <Bob, brother, Andy> plus <Andy, born in Washington State>.

图6为本发明基于实体和关系的知识图推理示意图。引入词嵌入概念将获取的知识图训练样本转换为为低维空间向量,使知识推理转化为通过构建模板处理自然语言问句的问题，从而找到“问句实体--知识图实体”的对应关系，以及“问句自然语言描述--知识图语义关系”的对应关系。研究基于深度学习的实体识别技术，利用Bi-LSTM来充分利用上下文信息定位问句的候选实体位置。研究基于深度学习的关系链接技术，利用CNNs提取深层次的语义特征，融入注意力机制，从而获取问句中相关关系的语义向量，并利用参数共享机制将三元组中候选关系输入同样的模型获取候选关系的语义向量。最后通过余弦相似度来计算选取最正确的候选关系。在KGs中，如果实体e_KGs和关系r之间缺少联系，如图4所示，进行下一步的关系预测任务。首先，学习中心实体e_KGs的新的向量表示。为解决传统方法不能获取隐藏在三元组周围的领域中信息的问题，优化向量学习模型，提出基于注意力的特征嵌入方法，该方法可以捕获任意给定实体的邻域内的实体特征和关系特征。此外，我们还在模型中封装了关系集群和多跳关系。为了得到一个中心实体的新的向量表示，通过线性变换来学习中心实体邻域内存在的每个相关事实集的特征向量。然后，基于已有的相关三元组推断KGs中隐藏的关系。通过发现已存的相关事实三元组(h,r,t)推断最初隐藏的关系，其中，h代表头语义实体，r代表语义关系，t代表尾语义实体。更具体地说，学习中心实体邻域内的多跳实体和多跳关系，在n-hop领域之间引入一条辅助边，实现关系预测任务，如图5所示。最后，实现基于实体和关系的知识图推理，通过查询知识图获取答案A。FIG. 6 is a schematic diagram of the knowledge graph reasoning based on entities and relationships according to the present invention. The concept of word embedding is introduced to convert the acquired knowledge graph training samples into low-dimensional space vectors, so that knowledge reasoning is transformed into the problem of processing natural language questions by constructing templates, so as to find the correspondence between "question entities -- knowledge graph entities" , and the corresponding relationship of "Question Natural Language Description-Knowledge Graph Semantic Relationship". Research the entity recognition technology based on deep learning, and use Bi-LSTM to make full use of the context information to locate the candidate entity position of the question. Research the relationship linking technology based on deep learning, use CNNs to extract deep semantic features, integrate into the attention mechanism, so as to obtain the semantic vector of the relevant relationship in the question, and use the parameter sharing mechanism to input the candidate relationship in the triplet into the same model. Get the semantic vector of candidate relations. Finally, the cosine similarity is used to calculate and select the most correct candidate relationship. In KGs, if there is a lack of connection between entity e _KGs and relation r, as shown in Figure 4, proceed to the next step of relation prediction task. First, new vector representations for central entities e _KGs are learned. In order to solve the problem that traditional methods cannot obtain information hidden in the domain around triples, a vector learning model is optimized, and an attention-based feature embedding method is proposed, which can capture entity features and relational features in the neighborhood of any given entity . In addition, we also encapsulate relational clusters and multi-hop relations in the model. To obtain a new vector representation of a central entity, a linear transformation is used to learn the eigenvectors of each relevant fact set existing in the central entity's neighborhood. Then, the hidden relations in the KGs are inferred based on the existing related triples. The initially hidden relationship is inferred by discovering the existing relevant fact triples (h, r, t), where h represents the head semantic entity, r represents the semantic relationship, and t represents the tail semantic entity. More specifically, multi-hop entities and multi-hop relationships within the neighborhood of the central entity are learned, and an auxiliary edge is introduced between n-hop domains to realize the task of relationship prediction, as shown in Figure 5. Finally, implement knowledge graph reasoning based on entities and relationships, and obtain answer A by querying the knowledge graph.

最后说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本技术方案的宗旨和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should all be included in the scope of the claims of the present invention.

Claims

1. A knowledge graph intelligent question answering method based on relationship prediction, characterized in that: the method comprises the following steps:

S1: Input the question Q, and preprocess the question;

S2: Use entity recognition technology to identify the entity e _question in the question, and map the entity e _question to the corresponding entity e KGs in the _KGs ;

S3: Query the category c of the entity e _KGs in the KGs, replace the entity e _question in the question Q with the category c, and mark it as Q _c ;

S4: Map the relation r from Q _c ;

S5: In KGs, if there is a lack of connection between entity e _KGs and relation r;

S6: Learning new vector representations of central entities e _KGs ;

S7: Infer hidden relationships in KGs based on existing related triples;

S8: Knowledge graph reasoning based on entities and relationships, get answer A.

2. the knowledge graph intelligent question answering method based on relationship prediction according to claim 1, is characterized in that: described step S1 is specifically: by the CRF syntactic analyzer in HanLP and Stanfordparser and the maximum entropy dependent syntactic analyzer, the text is divided For words or phrases, and obtain quantitative descriptions of parts of speech, word order, keywords and dependencies.

3. the knowledge graph intelligent question answering method based on relationship prediction according to claim 1, is characterized in that: described step S2 is specifically: utilize bidirectional long short term memory network Bi-LSTM model to whether each word in question is an entity make predictions;

The input sequence is processed by two LSTM units, forward and backward, and the final output vector is the concatenation of the two LSTM output vectors;

The output vector of the model is y=(y ₁ , y ₂ ,...,y _n ), where n is the length of the input sequence, the length of the output vector of the model is consistent with the input sequence, and y _i corresponds to the input question. The annotation information of the i-th word, if it is "1", it represents the entity to be found, otherwise it is not.

4. The knowledge graph intelligent question answering method based on relationship prediction according to claim 1, characterized in that: the step S3 is specifically: utilize the potential Dirichlet topic model to conceptualize the entity in the question, so as to facilitate the identification of the entity. understanding, increasing its interpretability;

We develop a corpus-based framework for context-sensitive conceptualization by combining topic model latent Dirichlet assignments with a large-scale probabilistic KGs that capture semantic relationships between words.

5. The knowledge graph intelligent question and answer method based on relationship prediction according to claim 1, characterized in that: the step S4 is specifically: introducing a convolutional neural network CNNs model in the relationship link task; The semantic information about the relationship in the question sentence, and at the same time, all the relations of the candidate entities are processed with the same model, and the obtained question sentence attribute vector and the knowledge graph attribute vector are matched for similarity, and the correct relationship of the final link is obtained.

6. The knowledge graph intelligent question answering method based on relationship prediction according to claim 1, wherein the step S5 is specifically: based on the entity identified in step S2 and the relationship linked by S4, Knowledge graph reasoning is simplified to subgraph matching problem;

In the knowledge graph, if no match is found, that is, there is a lack of connection between the entity e _KGs and the relationship r, the next task of relationship prediction is performed.

7. The knowledge graph intelligent question answering method based on relationship prediction according to claim 1, wherein the step S6 is specifically: in order to solve the problem that information in the field hidden around the triplet cannot be obtained, optimizing the vector learning model, and propose an attention-based feature embedding method that captures entity features and relational features within the neighborhood of any given entity;

Relation clusters and multi-hop relations are encapsulated in the model; to obtain a new vector representation of a central entity, a linear transformation is used to learn feature vectors for each set of relevant facts present in the central entity's neighborhood.

8. The knowledge graph intelligent question answering method based on relationship prediction according to claim 1, characterized in that: the step S7 is specifically: inferring the initial hidden by identifying existing related fact triples (h, r, t) relationship, where h represents the head semantic entity, r represents the semantic relationship, and t represents the tail semantic entity; that is, the multi-hop entity and multi-hop relationship in the neighborhood of the learning center entity, and an auxiliary edge is introduced between the n-hop fields to achieve Relationship prediction task.

9. The knowledge graph intelligent question answering method based on relationship prediction according to claim 1, wherein the step S8 is specifically: KGs-QA supports binary fact question answering by learning 27 million templates of 2782 intentions .