CN112860916A

CN112860916A - Movie-television-oriented multi-level knowledge map generation method

Info

Publication number: CN112860916A
Application number: CN202110254580.7A
Authority: CN
Inventors: 孙涛; 翟娇娇; 赵晶; 王新刚
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-05-28
Anticipated expiration: 2041-03-09
Also published as: CN112860916B

Abstract

The present disclosure provides a film and television-oriented multi-level knowledge graph generation method, which acquires different levels of information data related to film and television to be processed; performs relation extraction on the acquired information data to obtain triplet data, and constructs multi-level data according to the triplet data. A single-level knowledge graph; use the relation triplet data and attribute triplet data in triplet data to embed structure and attributes; combine the results of structure embedding and attribute embedding to perform entity alignment, and align multiple entities after entity alignment. The single-level knowledge graph is integrated to obtain a multi-level knowledge graph.

Description

A multi-level knowledge graph generation method for film and television

技术领域technical field

本公开涉及数据挖掘和智能信息处理技术领域，特别涉及一种面向影视的多层次知识图谱生成方法。The present disclosure relates to the technical fields of data mining and intelligent information processing, and in particular, to a method for generating a multi-level knowledge graph for film and television.

背景技术Background technique

本部分的陈述仅仅是提供了与本公开相关的背景技术，并不必然构成现有技术。The statements in this section merely provide background related to the present disclosure and do not necessarily constitute prior art.

影视领域存在着数据来源多、数量海量、数据形式多样、数据结构复杂的问题，不同影视知识图谱之间存储了不同的知识，这些知识存在许多重复，也可以互相补充，因此有研究人员提出可以整合各个知识图谱，形成多层次知识图谱。要想形成多层次知识图谱，一个基本问题就是对齐那些存在于不同影视知识图谱中但表示相同含义的实体知识。In the field of film and television, there are problems of many data sources, massive quantities, diverse data forms, and complex data structures. Different knowledge graphs of film and television store different knowledge. These knowledge have many repetitions and can also complement each other. Therefore, some researchers have proposed that they can Integrate various knowledge graphs to form a multi-level knowledge graph. In order to form a multi-level knowledge graph, a basic problem is to align the entity knowledge that exists in different film and television knowledge graphs but represents the same meaning.

对齐方法主要分为两部分：传统的对齐方法和基于嵌入的对齐方法。前者主要是利用有监督的机器学习模型，通过属性相似度匹配的方式来对齐实体。后者主要基于表示学习的方法，具体来说，通过将影视知识图谱的实体和关系映射都到低维向量空间，然后计算实体之间的相似度，从而进行计算和推理。这些方法大多只关注如何以更好的方式对关系三元组进行编码，而忽略了那些属性三元组；尤其是对于缺乏关系的实体来说，如果仅仅利用关系三元组对齐实体，得到的知识图谱的全面性和准确度均较差。Alignment methods are mainly divided into two parts: traditional alignment methods and embedding-based alignment methods. The former mainly uses a supervised machine learning model to align entities through attribute similarity matching. The latter is mainly based on the method of representation learning. Specifically, by mapping the entities and relationships of the film and television knowledge graph to a low-dimensional vector space, and then calculating the similarity between entities, calculation and reasoning are performed. Most of these methods only focus on how to encode relation triples in a better way, while ignoring those attribute triples; especially for entities lacking relations, if only use relation triples to align entities, the resulting The comprehensiveness and accuracy of knowledge graphs are poor.

发明内容SUMMARY OF THE INVENTION

为了解决现有技术的不足，本公开提供了一种面向影视的多层次知识图谱生成方法，同时根据单层次的电影知识图谱中存在的关系三元组和属性三元组，分别从结构和属性的角度来进行对齐，最终形成多层次知识图谱。In order to solve the deficiencies of the prior art, the present disclosure provides a method for generating a multi-level knowledge graph for film and television. Aligned from the perspective of , and finally formed a multi-level knowledge graph.

为了实现上述目的，本公开采用如下技术方案：In order to achieve the above object, the present disclosure adopts the following technical solutions:

本公开第一方面提供了一种面向影视的多层次知识图谱生成方法。A first aspect of the present disclosure provides a method for generating a multi-level knowledge graph for film and television.

一种面向影视的多层次知识图谱生成方法，包括以下步骤：A method for generating a multi-level knowledge graph for film and television, comprising the following steps:

获取待处理的影视相关的不同层面的信息数据；Obtain information data at different levels related to the film and television to be processed;

对获取的信息数据进行关系抽取得到三元组数据，根据三元组数据构建多个单层次知识图谱；Relation extraction is performed on the acquired information data to obtain triple data, and multiple single-level knowledge graphs are constructed according to the triple data;

利用三元组数据中的关系三元组数据和属性三元组数据进行结构和属性的嵌入；Use relation triple data and attribute triple data in triple data to embed structure and attributes;

结合结构嵌入和属性嵌入的结果进行实体对齐，将实体对齐之后的多个单层次知识图谱整合得到多层次知识图谱。Combine the results of structure embedding and attribute embedding for entity alignment, and integrate multiple single-level knowledge graphs after entity alignment to obtain multi-level knowledge graphs.

本公开第二方面提供了一种面向影视的多层次知识图谱生成系统。A second aspect of the present disclosure provides a film and television-oriented multi-level knowledge graph generation system.

一种面向影视的多层次知识图谱生成系统：包括：A multi-level knowledge graph generation system for film and television: including:

数据获取模块，被配置为：获取待处理的影视相关的不同层面的信息数据；The data acquisition module is configured to: acquire information data at different levels related to the film and television to be processed;

关系抽取模块，被配置为：对获取的信息数据进行关系抽取得到三元组数据，根据三元组数据构建多个单层次知识图谱；The relationship extraction module is configured to: perform relationship extraction on the acquired information data to obtain triplet data, and construct multiple single-level knowledge graphs according to the triplet data;

嵌入模块，被配置为：利用三元组数据中的关系三元组数据和属性三元组数据进行结构和属性的嵌入；an embedding module, configured to: perform structure and attribute embedding using relation triplet data and attribute triplet data in triplet data;

知识图谱整合模块，被配置为：结合结构嵌入和属性嵌入的结果进行实体对齐，将实体对齐之后的多个单层次知识图谱整合得到多层次知识图谱。The knowledge graph integration module is configured to: combine the results of structure embedding and attribute embedding to perform entity alignment, and integrate multiple single-level knowledge graphs after entity alignment to obtain a multi-level knowledge graph.

本公开第三方面提供了一种基于多层次知识图谱的影视查询方法。A third aspect of the present disclosure provides a video query method based on a multi-level knowledge graph.

一种基于多层次知识图谱的影视查询方法，包括以下步骤：A film and television query method based on a multi-level knowledge graph, comprising the following steps:

获取待查询文本；Get the text to be queried;

对待查询文本进行解析，获取解析结果；Parse the query text to obtain the parsing result;

根据解析结果和利用本公开第一方面所述的生成方法构建的多层次知识图谱进行电影信息数据查询，获取查询结果。According to the analysis result and the multi-level knowledge graph constructed by using the generation method described in the first aspect of the present disclosure, the movie information data query is performed to obtain the query result.

进一步的，对电影查询结果进行显示。Further, the movie query result is displayed.

本公开第四方面提供了一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时实现如本公开第一方面所述的面向影视的多层次知识图谱生成方法中的步骤。A fourth aspect of the present disclosure provides a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the steps in the method for generating a video-oriented multi-level knowledge graph according to the first aspect of the present disclosure .

本公开第五方面提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的程序，所述处理器执行所述程序时实现如本公开第一方面所述的面向影视的多层次知识图谱生成方法中的步骤。A fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and executable on the processor, when the processor executes the program, the implementation is as described in the first aspect of the present disclosure The steps in the method for generating a multi-level knowledge graph for film and television.

与现有技术相比，本公开的有益效果是：Compared with the prior art, the beneficial effects of the present disclosure are:

1、本公开所述的生成方法、系统、介质或电子设备，同时根据单层次的电影知识图谱中存在的关系三元组和属性三元组，分别从结构和属性的角度来进行对齐，最终形成多层次知识图谱，克服了以往多层次知识图谱的结构缺陷和关系缺陷，保证了生成的多层次知识图谱的全面性和准确性。1. The generation method, system, medium or electronic device described in the present disclosure simultaneously aligns the relationship triples and attribute triples existing in the single-level movie knowledge graph from the perspectives of structure and attributes, and finally The formation of a multi-level knowledge map overcomes the structural defects and relationship defects of the previous multi-level knowledge map, and ensures the comprehensiveness and accuracy of the generated multi-level knowledge map.

2、本公开所述的基于多层次知识图谱的影视查询方法，结合构建的全面和准确的多层次知识图谱，能够实现相关影视信息数据的快速查询，提高了查询的准确度。2. The film and television query method based on the multi-level knowledge graph described in the present disclosure, combined with the constructed comprehensive and accurate multi-level knowledge graph, can realize fast query of relevant film and television information data, and improve the accuracy of the query.

附图说明Description of drawings

构成本公开的一部分的说明书附图用来提供对本公开的进一步理解，本公开的示意性实施例及其说明用于解释本公开，并不构成对本公开的不当限定。The accompanying drawings that constitute a part of the present disclosure are used to provide further understanding of the present disclosure, and the exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation of the present disclosure.

图1为本公开实施例1提供的面向影视的多层次知识图谱生成方法的流程示意图。FIG. 1 is a schematic flowchart of a method for generating a video-oriented multi-level knowledge graph according to Embodiment 1 of the present disclosure.

图2为本公开实施例1提供的多层次知识图谱结构的示意图。FIG. 2 is a schematic diagram of a multi-level knowledge graph structure provided by Embodiment 1 of the present disclosure.

图3为本公开实施例1提供的基于Pseudo-Siamese Neural Network的属性值和属性类型嵌入过程示意图。FIG. 3 is a schematic diagram of an attribute value and attribute type embedding process based on the Pseudo-Siamese Neural Network provided in Embodiment 1 of the present disclosure.

具体实施方式Detailed ways

下面结合附图与实施例对本公开作进一步说明。The present disclosure will be further described below with reference to the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本公开提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本公开的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。The embodiments of this disclosure and features of the embodiments may be combined with each other without conflict.

实施例1：Example 1:

如图1所示，本公开实施例1提供了一种面向影视的多层次知识图谱生成方法，包括以下步骤：As shown in FIG. 1 , Embodiment 1 of the present disclosure provides a method for generating a multi-level knowledge graph for film and television, including the following steps:

具体的，包括以下内容：Specifically, it includes the following:

S1：首先构建单层次知识图谱S1: First build a single-level knowledge graph

S1.1：首先先分析电影知识的不同层次、比如说电影类型有悬疑片、恐怖片、爱情片，武侠片，文艺片等；电影流派有超现实主义电影流派、极简主义流派等；电影技术上分为黑白电影，普通彩色电影，3D电影等，电影演员也分别来自世界上的不同国家地区，在此阶段根据不同层面获取数据。S1.1: First, analyze the different levels of film knowledge. For example, film types include suspense films, horror films, romance films, martial arts films, literary films, etc.; film genres include surreal film genres, minimalist genres, etc.; Technically, it is divided into black and white movies, ordinary color movies, 3D movies, etc. The movie actors also come from different countries and regions in the world, and data is obtained at different levels at this stage.

S1.2：根据获取到的数据的不同特性进行关系抽取，这里我们获取的数据只要有三类：文本、数据、表格。文本使用依存句法分析和HanLP抽取三元组，数据、表格通过皮尔逊相关系数法发现实体之间中高度相关关系抽取三元组。三元组形式为<h,r,t>其中h为头实体，t为尾实体，r为两实体之间的关系。S1.2: Perform relation extraction according to the different characteristics of the acquired data. Here we only need to acquire three types of data: text, data, and tables. The text uses dependency parsing and HanLP to extract triples, and the data and tables use the Pearson correlation coefficient method to find medium and high correlations between entities to extract triples. The triplet form is <h,r,t> where h is the head entity, t is the tail entity, and r is the relationship between the two entities.

S1.3：将抽取的三元组分别构建单层次知识图谱。S1.3: Construct a single-level knowledge graph from the extracted triples.

S2.1：多源数据融合S2.1: Multi-source data fusion

知识是在不断更新的，为了判断这些新增的知识是否真实可信，我们需要将收集到的信息利用多源数据融合模型进行融合，只有判定这些数据真实可信才可以将其加入到知识图谱中。Knowledge is constantly being updated. In order to judge whether the newly added knowledge is authentic, we need to use the multi-source data fusion model to fuse the collected information. Only when the data is judged to be authentic and credible can it be added to the knowledge graph middle.

多源数据融合模型工作过程如下：The working process of the multi-source data fusion model is as follows:

(1)数据分块：在关系抽取阶段，是以实体关键字为中心来进行知识抽取的。因此，以各个层面的实体关键字为依据将不同来源的数据进行分块聚合，作为候选匹配知识。这样在数据融合的时候，可以首先跟各自层面的数据融合，避免遍历整个知识库，减少了计算复杂度。(1) Data segmentation: In the relation extraction stage, knowledge extraction is carried out centered on entity keywords. Therefore, data from different sources are aggregated in blocks based on entity keywords at various levels as candidate matching knowledge. In this way, when data is merged, it can be merged with the data at the respective levels first, avoiding traversing the entire knowledge base and reducing the computational complexity.

(2)多源数据融合系数K：将同一分块中的候选匹配知识，利用多源数据融合系数K与原有知识库的知识进行匹配，若K大于设定的阈值，则认为候选匹配知识为正确的知识，可以添加到知识库中。(2) Multi-source data fusion coefficient K: The candidate matching knowledge in the same block is used to match the knowledge of the original knowledge base with the multi-source data fusion coefficient K. If K is greater than the set threshold, it is considered that the candidate matching knowledge For correct knowledge, it can be added to the knowledge base.

多源数据融合系数K定义如下：The multi-source data fusion coefficient K is defined as follows:

K由两部分组成，其中confidence为置信度评分，后面部分为实体相似度和关系相似度的平均值。其中confidence由两部分组成M和cf，M为数据来源的置信度，比如像百度百科，知网等比较权威的网站或者知识库，其M值较高。cf是基于实体与实体，实体与关系表述之间的距离，为每两个实体组合计算的一个置信度。K consists of two parts, where confidence is the confidence score, and the latter part is the average of entity similarity and relation similarity. The confidence consists of two parts, M and cf, where M is the confidence of the data source. For example, authoritative websites or knowledge bases such as Baidu Encyclopedia and HowNet have a higher M value. cf is a confidence level calculated for each combination of two entities based on the distance between entity and entity, entity and relation representation.

Entity_sim为实体间的文本相似度计算，Relationship_sim为关系相似度计算，取两者平均作为该知识的相似度，若对应的相似度大于设定的阈值0.5，则认为该知识较为可信.公式中L表示实体位置，R表示关系位置。L_i-L_j表示实体1和实体2的距离；L_i-R表示实体1和关系的距离,距离越大表示实体与实体之间，实体与关系之间存在语义关系的可能性就越小，置信度也会越低。Entity_sim is the text similarity calculation between entities, Relationship_sim is the relationship similarity calculation, and the average of the two is taken as the similarity of the knowledge. If the corresponding similarity is greater than the set threshold of 0.5, the knowledge is considered more credible. In the formula L represents entity location and R represents relation location. _{Li -L j} _represents the distance between entity 1 and entity 2; _Li -R represents the distance between entity 1 and the relationship, the larger the distance, the less likely there is a semantic relationship between the entity and the relationship. , the confidence will be lower.

S3：构建多层次知识图谱S3: Building a multi-level knowledge graph

多源数据融合完成后，知识图谱还是分散的单层次知识图谱。通过实体对齐的方式，整合单层次知识图谱，构建多层次知识图谱。知识以三元组的形式存在，通常将其分为关系三元组和属性三元组，比如关系三元组(Zhang Ziyi,Husband,Wang Feng)，属性三元组(Zhang Ziyi,age,41)。After the multi-source data fusion is completed, the knowledge graph is still a decentralized single-level knowledge graph. Through entity alignment, a single-level knowledge map is integrated to build a multi-level knowledge map. Knowledge exists in the form of triples, which are usually divided into relation triples and attribute triples, such as relation triples (Zhang Ziyi, Husband, Wang Feng), attribute triples (Zhang Ziyi, age, 41 ).

通过利用关系三元组和属性三元组进行结构和属性嵌入，可以使实体语义信息和属性信息得到充分利用，进一步提高对齐过程的准确率，具体方法如下：By using relation triples and attribute triples for structure and attribute embedding, entity semantic information and attribute information can be fully utilized, and the accuracy of the alignment process can be further improved. The specific methods are as follows:

S3.1：采用统一命名方法通过谓词相似度对单层次知识图谱中的三元组进行合并，并设置0.95作为相似度阈值，这样可以使实体和关系能够嵌入到同一向量空间。S3.1: Use the unified naming method to merge the triples in the single-level knowledge graph through the predicate similarity, and set 0.95 as the similarity threshold, so that entities and relations can be embedded in the same vector space.

因为在不同的知识图谱中同一实体的表示方式不同，所以是对不同知识图谱的同一实体进行简单的名称统一，统一后，后续的对齐就在向量空间中执行。Because the representation of the same entity in different knowledge maps is different, it is a simple name unification for the same entity in different knowledge maps. After unification, the subsequent alignment is performed in the vector space.

S3.2：分别利用关系三元组和属性三元组进行结构和属性对齐。S3.2: Use relation triples and attribute triples for structure and attribute alignment, respectively.

(A)结构嵌入：经过谓词对齐的三元组来进行，利用关系三元组和训练集进行结构嵌入，学习实体和关系的向量表示。给定关系三元组Tt＝(h,r,t)，期望h+r＝t，为了衡量合理性，本模型优化了基于边际的排名损失，让正三元组的得分低于负三元组的得分：(A) Structural Embedding: Performed through predicate-aligned triples, using relation triples and training set for structural embedding, learning vector representations of entities and relationships. Given a relation triplet Tt=(h,r,t), expect h+r=t, in order to measure rationality, this model optimizes the marginal-based ranking loss, so that the score of positive triples is lower than that of negative triples The score:

其中，

Tr是得分函数，表示所有的正三元组集合，Tr′表示通过随机实体(但不是同时用这两个实体)替换其头部或尾部而生成的相关负三元组集合，α是一个加权正、负三元组的比率超参数，其取值范围为[0,1]。因此我们可以学习实体在KGs(Knowledge Graphs)上的近似向量表示，经过结构嵌入后的实体相似性度量如公式所示：in,

Tr is the score function representing the set of all positive triplets, Tr′ represents the set of correlated negative triplets generated by replacing their heads or tails with random entities (but not both), and α is a weighted positive , the ratio hyperparameter of negative triples, whose value range is [0,1]. Therefore, we can learn the approximate vector representation of entities on KGs (Knowledge Graphs). The entity similarity measure after structural embedding is shown in the formula:

其中，G1、G2指的是知识图谱1和知识图谱2，hi,hj指的是知识图谱1中的实体，和知识图谱2中的实体，cos函数指的是余弦函数，Sim函数指的是经过结构嵌入后的相似度值，它的数值等于两个实体在向量空间的余弦函数值。Among them, G1 and G2 refer to knowledge graph 1 and knowledge graph 2, hi, hj refer to the entities in knowledge graph 1 and knowledge graph 2, the cos function refers to the cosine function, and the Sim function refers to The similarity value after structural embedding is equal to the cosine function value of the two entities in the vector space.

(B)基于Pseudo-Siamese Neural Network的属性值和属性类型嵌入(B) Attribute value and attribute type embedding based on Pseudo-Siamese Neural Network

属性值嵌入：使用双向门控递归单元(Bi-GRU)网络将属性值从两个方向编码成单个嵌入，详见以下公式：Attribute Value Embedding: A Bidirectional Gated Recurrent Unit (Bi-GRU) network is used to encode attribute values into a single embedding from both directions, as detailed in the following formula:

Z_i＝σ(W_Z[C_i,h_i-1])Z _i =σ(W _Z [C _i ,hi _-1 ])

R_i＝σ(W_r[C_i,h_i-1])R _i =σ(W _r [C _i ,hi _-1 ])

其中Z和R分别为GRU单元的更新门和复位门,W_Z,W_h,W_r是权值矩阵，Bi-GRU由前向GRU和后向GRU组成，前向GRU从左到右读取输入字符的嵌入，后向GRU反向读取字符的嵌入，若读取属性值v＝(c₀，c₁.....c_n)他们的输出分别为

Where Z and R are the update gate and reset gate of the GRU unit respectively, W _Z , W _h , W _r are the weight matrix, Bi-GRU is composed of forward GRU and backward GRU, and the forward GRU is read from left to right Embedding of input characters, backward GRU reads the embedding of characters in reverse, if read attribute value v=(c ₀ , c ₁ ..... c _n ), their outputs are respectively

Bi-GRU的初始状态设置为0向量，在读取字符嵌入后，最终形成属性值嵌入

The initial state of Bi-GRU is set to 0 vector, after reading character embedding, attribute value embedding is finally formed

属性类型嵌入：为了学习不同属性对实体的重要性，让属性类型和属性值共享一个注意权重。假设

是实体e的m个属性值，计算注意力权重

属性值的权重应该与其属性类型的权重一致，如下公式：Attribute Type Embedding: To learn the importance of different attributes to entities, let attribute types and attribute values share an attention weight. Assumption

are the m attribute values of entity e, and calculate the attention weight

The weight of the attribute value should be consistent with the weight of its attribute type, as follows:

将属性类型嵌入和属性值嵌入连接起来，就可以得到最终的实体属性嵌入：

Concatenating the attribute type embedding with the attribute value embedding gives the final entity attribute embedding:

属性嵌入过程是利用待对齐实体的属性通常具有高度相似性，通过Pseudo-Siamese Neural Network在两个KGs的属性信息上学习，然后使用一些经过训练的度量来评估实体嵌入之间的相似性，实体嵌入最终是实体属性类型嵌入和属性值嵌入连接。The attribute embedding process is to use the attributes of the entities to be aligned which usually have high similarity, learn on the attribute information of the two KGs through the Pseudo-Siamese Neural Network, and then use some trained metrics to evaluate the similarity between entity embeddings, entities Embedding is ultimately the connection of entity attribute type embedding and attribute value embedding.

S3：整合多层次知识图谱：S3: Integrate multi-level knowledge graphs:

最后将对齐之后的单层次知识图谱整合成一个统一的多层次知识图谱。Finally, the aligned single-level knowledge graph is integrated into a unified multi-level knowledge graph.

通过属性嵌入和结构嵌入过程得到的是得到每个实体对的相似矩阵，然后，利用最优二部图匹配算法，作为下一次迭代的新实体对，最后找到最优匹配的实体，即实现实体对齐。Through the process of attribute embedding and structure embedding, the similarity matrix of each entity pair is obtained. Then, the optimal bipartite graph matching algorithm is used as the new entity pair for the next iteration, and finally the optimal matching entity is found, that is, the realization entity Align.

实施例2：Example 2:

本公开实施例2提供了一种面向影视的多层次知识图谱生成系统，包括：Embodiment 2 of the present disclosure provides a film and television-oriented multi-level knowledge graph generation system, including:

所述系统的工作方法与实施例1提供的面向影视的多层次知识图谱生成方法相同，这里不再赘述。The working method of the system is the same as the method for generating a video-oriented multi-level knowledge graph provided in Embodiment 1, and details are not repeated here.

实施例3：Example 3:

目前，当前人们主要通过搜索引擎进行影视相关知识的搜索，其根据用户输入的关键词从互联网上采集网页信息，对信息进行组织和处理后，将检索到的相关网页展示给用户。搜索引擎适用于全范围的信息检索，不局限于某一特定领域查询，但其有如下弊端有：一是其往往通过关键词机械匹配网页信息，无法理解用户输入的句子语义，因此对用户输入有一定的要求，用户输入的关键词越准确其检索返回的内容越接近用户想要的内容；二是检索返回的是相关网页，并非精确答案，用户要得到需要的信息还得翻阅众多网页At present, people mainly search for film and television related knowledge through search engines, which collect web page information from the Internet according to the keywords input by the user, organize and process the information, and display the retrieved relevant web pages to the user. The search engine is suitable for a full range of information retrieval, not limited to a specific field of query, but it has the following disadvantages: First, it often mechanically matches web page information through keywords, and cannot understand the semantics of the sentences entered by the user, so the user input There are certain requirements. The more accurate the keywords entered by the user, the closer the content returned by the search is to the content that the user wants. Second, the search returns relevant web pages, not exact answers. Users have to browse many web pages to get the required information.

有鉴于此，本公开实施例3提供了一种基于多层次知识图谱的影视查询方法，包括以下步骤：In view of this, Embodiment 3 of the present disclosure provides a video query method based on a multi-level knowledge graph, including the following steps:

获取待查询文本；Get the text to be queried;

根据解析结果和利用实施例1所述的生成方法构建的多层次知识图谱进行电影信息数据查询，获取查询结果；Perform movie information data query according to the analysis result and the multi-level knowledge graph constructed by the generation method described in Embodiment 1, and obtain the query result;

对电影查询结果进行显示。Display the movie query results.

其中，多层次知识图谱的数据来源为在网页上对电影文本数据进行爬取得到的。Among them, the data source of the multi-level knowledge graph is obtained by crawling the movie text data on the web page.

解析方法为：The parsing method is:

采用基于增强卷积神经网络的意图识别模型对所述查询文本进行解析，获取解析结果，具体为：The query text is parsed by an intent recognition model based on an enhanced convolutional neural network, and the parsing result is obtained, specifically:

将知识图谱中的实体、属性以及收集到的影视专业用词构造成字典树；Construct a dictionary tree from entities, attributes and collected professional words for film and television in the knowledge graph;

通过意图识别模型，将查询文本中的字与所述字典树中的影视专业用词进行匹配，将匹配到的电影专业用词作为词信息序列；Through the intent recognition model, the words in the query text are matched with the film and television professional words in the dictionary tree, and the matched movie professional words are used as the word information sequence;

融合匹配到的所有影视专业用词，形成解析结果。Integrate all the matching film and television professional terms to form the analysis result.

实施例4：Example 4:

本公开实施例4提供了一种计算机可读存储介质，其上存储有程序，该程序被处理器执行时实现如本公开实施例1所述的面向影视的多层次知识图谱生成方法中的步骤。Embodiment 4 of the present disclosure provides a computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the steps in the method for generating a video-oriented multi-level knowledge graph according to Embodiment 1 of the present disclosure .

实施例5：Example 5:

本公开实施例5提供了一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的程序，所述处理器执行所述程序时实现如本公开实施例1所述的面向影视的多层次知识图谱生成方法中的步骤。Embodiment 5 of the present disclosure provides an electronic device, including a memory, a processor, and a program stored in the memory and running on the processor. When the processor executes the program, the implementation is as described in Embodiment 1 of the present disclosure. The steps in the method for generating a multi-level knowledge graph for film and television.

本领域内的技术人员应明白，本公开的实施例可提供为方法、系统、或计算机程序产品。因此，本公开可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且，本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)或随机存储记忆体(RandomAccessMemory，RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

以上所述仅为本公开的优选实施例而已，并不用于限制本公开，对于本领域的技术人员来说，本公开可以有各种更改和变化。凡在本公开的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本公开的保护范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

1. A movie-oriented multi-level knowledge map generation method is characterized by comprising the following steps: the method comprises the following steps:

acquiring information data of different layers related to a movie to be processed;

extracting the relation of the acquired information data to obtain triple data, and constructing a plurality of single-level knowledge maps according to the triple data;

embedding the structure and the attribute by utilizing the relation ternary group data and the attribute ternary group data in the ternary group data;

and combining the results of structure embedding and attribute embedding to carry out entity alignment, and integrating a plurality of single-level knowledge maps after entity alignment to obtain a multi-level knowledge map.

2. The method for generating a multi-level knowledge-graph for movies according to claim 1, characterized in that:

the information data includes text, data and tables, the text uses dependency syntax analysis and HanLP to extract triples, and the data and tables extract triples by Pearson's correlation coefficient method.

3. The method for generating a multi-level knowledge-graph for movies according to claim 1, characterized in that:

the method for carrying out multivariate data fusion on the acquired information data and judging the authenticity of the data comprises the following steps:

carrying out block aggregation on data from different sources by taking the entity keywords of each layer as a basis to serve as candidate matching knowledge;

and matching the candidate matching knowledge in the same block by using the multisource data fusion coefficient and the knowledge of the original knowledge base, and if the multisource data fusion coefficient is greater than a set threshold value, considering that the candidate matching knowledge is correct knowledge and can be added into the knowledge base, otherwise, the candidate matching knowledge cannot be added.

4. The method for generating a multi-level knowledge-graph for movies according to claim 1, characterized in that:

merging the triples in the single-level knowledge graph by adopting a uniform naming method through predicate similarity, and embedding the entities and the relations into the same vector space;

in the vector space, structure and attribute alignment is carried out by utilizing the relation triples and the attribute triples respectively;

and integrating the aligned single-level knowledge maps into a uniform multi-level knowledge map.

5. The method for generating a multi-level knowledge-graph for movies according to claim 1, characterized in that:

carrying out structure embedding based on the TransE relation triple, comprising the following steps: and combining the predicate-aligned triples, performing structure embedding by using the relation triples and the training set, learning vector representation of the entities and the relation, and making the scores of the positive triples lower than those of the negative triples to obtain an entity similarity measurement expression after structure embedding.

6. The method for generating a multi-level knowledge-graph for movies according to claim 1, characterized in that:

embedding attribute values and attribute types based on a pseudo-twin neural network, comprising:

and encoding the attribute values into single embedding from two directions by using a bidirectional gating recursive unit network, enabling the attribute types and the attribute values to share an attention weight, and connecting the attribute type embedding and the attribute value embedding in series to obtain a final attribute embedding result.

7. A multi-level knowledge map generation system facing to film and television is characterized in that: the method comprises the following steps:

a data acquisition module configured to: acquiring information data of different layers related to a movie to be processed;

a relationship extraction module configured to: extracting the relation of the acquired information data to obtain triple data, and constructing a plurality of single-level knowledge maps according to the triple data;

an embedding module configured to: embedding the structure and the attribute by utilizing the relation ternary group data and the attribute ternary group data in the ternary group data;

a knowledge-graph integration module configured to: and combining the results of structure embedding and attribute embedding to carry out entity alignment, and integrating a plurality of single-level knowledge maps after entity alignment to obtain a multi-level knowledge map.

8. A movie and television query method based on a multi-level knowledge graph is characterized by comprising the following steps: the method comprises the following steps:

acquiring a text to be queried;

analyzing the text to be queried to obtain an analysis result;

and inquiring the movie information data according to the analysis result and the multi-level knowledge graph constructed by the generation method of any one of claims 1 to 6 to obtain an inquiry result.

9. A computer-readable storage medium, on which a program is stored, wherein the program, when executed by a processor, implements the steps in the movie-oriented multi-level knowledge-graph generation method according to any one of claims 1 to 6.

10. An electronic device comprising a memory, a processor and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for generating a multi-level knowledge map for movie & TV according to any one of claims 1-6.