WO2021098491A1 - Knowledge graph generating method, apparatus, and terminal, and storage medium - Google Patents

Knowledge graph generating method, apparatus, and terminal, and storage medium Download PDF

Info

Publication number
WO2021098491A1
WO2021098491A1 PCT/CN2020/125592 CN2020125592W WO2021098491A1 WO 2021098491 A1 WO2021098491 A1 WO 2021098491A1 CN 2020125592 W CN2020125592 W CN 2020125592W WO 2021098491 A1 WO2021098491 A1 WO 2021098491A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
translated
name
target
relationship
Prior art date
Application number
PCT/CN2020/125592
Other languages
French (fr)
Chinese (zh)
Inventor
陈开济
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021098491A1 publication Critical patent/WO2021098491A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

An artificial intelligence-based knowledge graph generating method, apparatus, and terminal, and a storage medium. The method comprises: determining, in a target language, the translation name of each alias name of a target entity, and according to the alias name and the translation name, generating a translation relationship of the target entity (S101); by means of a preset corpus, separately generating a co-occurrence relationship of each alias name of the target entity (S102); and constructing a knowledge graph according to the translation relationships and the co-occurrence relationships corresponding to all the target entities (S103). The method, apparatus, and terminal, and the storage medium can construct a knowledge graph supporting multiple languages, and improve the association ability of each knowledge node in the knowledge graph, and the breadth and depth of the knowledge graph, thereby improving the accuracy of an artificial intelligence output result and improving the quality of a service response.

Description

知识图谱的生成方法、装置、终端以及存储介质Method, device, terminal and storage medium for generating knowledge graph
本申请要求于2019年11月22日提交国家知识产权局、申请号为201911156483.3、申请名称为“知识图谱的生成方法、装置、终端以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on November 22, 2019, the application number is 201911156483.3, and the application name is "The method, device, terminal and storage medium of knowledge graph generation", and the entire content of it is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及基于人工智能(Artificial Intelligence,AI)的知识图谱的生成方法、装置、终端以及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, terminal, and storage medium for generating a knowledge graph based on artificial intelligence (AI).
背景技术Background technique
知识图谱,又称为语义网络,用可视化技术描述知识资源及其载体,挖掘、分析、构建、绘制和显示知识及它们之间的相互联系。伴随着信息化技术的发展,采用知识图谱的方式作为载体将多样知识资源进行汇聚,为人工智能的决策提供知识参考,因此,知识图谱中各个知识资源的深度以及准确度,则直接影响人工智能处理结果的准确性。现有的知识图谱的生成方法,主要是基于单一语言构建,不同语言之间的知识图谱相互独立,从而降低了知识图谱的深度,在使用其他语言作为人工智能的输入时,会大大降低处理结果的准确率,影响服务响应质量。Knowledge graph, also known as semantic network, uses visualization technology to describe knowledge resources and their carriers, mines, analyzes, constructs, draws and displays knowledge and their interconnections. With the development of information technology, the knowledge graph is used as a carrier to gather diverse knowledge resources and provide knowledge references for artificial intelligence decision-making. Therefore, the depth and accuracy of each knowledge resource in the knowledge graph directly affects artificial intelligence Accuracy of processing results. The existing knowledge graph generation method is mainly based on a single language construction. The knowledge graphs between different languages are independent of each other, thereby reducing the depth of the knowledge graph. When other languages are used as the input of artificial intelligence, the processing results will be greatly reduced. The accuracy rate affects the quality of service response.
发明内容Summary of the invention
本申请实施例提供了一种知识图谱的生成方法、装置、终端以及存储介质,可以解决现有的知识图谱的生成技术,在处理不同的车辆服务请求时均交由相同的服务器进行处理,容易导致处理逻辑冲突,增加了服务响应的时长以及降低了服务响应的成功率的问题。The embodiment of the application provides a method, device, terminal and storage medium for generating a knowledge graph, which can solve the existing knowledge graph generation technology. When processing different vehicle service requests, they are all handled by the same server, which is easy This leads to processing logic conflicts, increases the service response time and reduces the success rate of service responses.
第一方面,本申请实施例提供了一种知识图谱的生成方法,包括:In the first aspect, an embodiment of the present application provides a method for generating a knowledge graph, including:
确定目标实体的各个别名名称在目标语言的译名名称,并根据所述别名名称以及所述译名名称,生成所述目标实体的转译关系;Determine the translated name of each alias name of the target entity in the target language, and generate the translation relationship of the target entity according to the alias name and the translated name;
通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系;Respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;
根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。Construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
示例性的,根据别名名称对应的共现关系,统计别名名称关联的各个共现实体的出现次数,并基于出现次数选取出高频共现实体,通过基于人工智能的自然语言生成算法(Natural Language Generation,NLG)将别名名称与各个高频共现实体组合,得到源语言语句。Exemplarily, according to the co-occurrence relationship corresponding to the alias name, the number of appearances of each co-representation associated with the alias name is counted, and the high-frequency co-representation is selected based on the number of appearances, and the natural language generation algorithm based on artificial intelligence (Natural Language Generation, NLG) combines the alias name with each high-frequency common entity to obtain the source language sentence.
在第一方面的一种可能的实现方式中,所述确定目标实体的各个别名名称在目标语言的译名名称,并根据所述别名名称以及所述译名名称,生成所述目标实体的转译关系,包括:In a possible implementation of the first aspect, the determining the translated name of each alias name of the target entity in the target language, and generating the translation relationship of the target entity according to the alias name and the translated name, include:
分别获取包含各个所述别名名称的源语言语句;Obtain the source language sentences containing each of the alias names respectively;
根据源语言与所述目标语言之间的翻译模型,输出各个所述源语言语句对应的目标语言语句;Output a target language sentence corresponding to each source language sentence according to a translation model between the source language and the target language;
分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名 称;Extract the translated name of the alias name in the target language from each sentence in the target language;
建立所述别名名称以及所述译名名称之间的所述转译关系。Establish the translation relationship between the alias name and the translated name name.
在第一方面的一种可能的实现方式中,所述分别获取包含各个所述别名名称的源语言语句,包括:In a possible implementation manner of the first aspect, the separately obtaining source language sentences containing each of the alias names includes:
根据所述目标实体的实体类型,获取与所述实体类型关联的语句模板;Obtaining a sentence template associated with the entity type according to the entity type of the target entity;
将各个所述别名名称导入所述语句模板,生成所述源语言语句。Import each of the alias names into the sentence template to generate the source language sentence.
示例性地,若语句模板的个数为多个,则可以基于随机分配算法,为每个别名名称配置一个语句模板,从而生成了多个源语言语句。Exemplarily, if there are multiple sentence templates, one sentence template may be configured for each alias name based on a random allocation algorithm, thereby generating multiple source language sentences.
在第一方面的一种可能的实现方式中,所述分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名称,包括:In a possible implementation of the first aspect, the extracting the translated names of the alias names in the target language from each of the target language sentences respectively includes:
若检测到所述目标语言语句内包含所述目标实体对应的词组,则识别所述目标语言语句为有效语句;If it is detected that the target language sentence contains the phrase corresponding to the target entity, identifying the target language sentence as a valid sentence;
将所述有效语句中与所述目标实体对应的词组识别为所述译名名称。Identify the phrase corresponding to the target entity in the valid sentence as the translated name.
在第一方面的一种可能的实现方式中,所述通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系,包括:In a possible implementation manner of the first aspect, the separately generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus includes:
从所述语料库提取包含所述目标实体的目标文本;Extracting the target text containing the target entity from the corpus;
识别所述目标文本内除所述目标实体外的关联实体;Identifying related entities in the target text other than the target entity;
根据所述目标实体在所述目标文本中对应的别名名称,得到所述别名名称与所述关联实体之间的所述共现关系。According to the alias name corresponding to the target entity in the target text, the co-occurrence relationship between the alias name and the associated entity is obtained.
在第一方面的一种可能的实现方式中,所述知识图谱的生成方法还包括:In a possible implementation of the first aspect, the method for generating the knowledge graph further includes:
接收基于源语言的待翻译语句,并识别所述待翻译语句包含的所述待翻译实体,以构建所述待翻译语句的实体关系;Receiving the sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the entity relationship of the sentence to be translated;
在所述知识图谱中提取所述待翻译实体基于所述目标语言对应的转译关系;所述转译关系包含所述待翻译实体的至少一个译名名称;Extracting, from the knowledge graph, a translation relationship corresponding to the entity to be translated based on the target language; the translation relationship includes at least one translated name of the entity to be translated;
根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度;Calculating the degree of matching between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name;
基于所述匹配度,从所有所述译名名称中确定所述待翻译实体的目标译名,并根据所有所述目标译名,输出所述待翻译语句基于目标语言的转译语句。Based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.
在第一方面的一种可能的实现方式中,所述根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度包括:In a possible implementation of the first aspect, the calculating the matching degree between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name includes:
将所述实体关系以及所述译名名称的共现关系导入预设的匹配度计算函数,计算所述匹配度;所述匹配度计算函数具体为:Import the co-occurrence relationship of the entity relationship and the translated name into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:
Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2)max sim entity(ei,ej); Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2) max sim entity (ei,ej);
sim entity(ei,ej)=∑ p∈Prop(ei)∩Prop(ej)ω pSimlarity type(p)(ei[p],ej[p]) sim entity (ei,ej)=∑ p∈Prop(ei)∩Prop(ej) ω p Simlarity type(p) (ei[p],ej[p])
其中,Sim(E1,E2)为所述待翻译实体与所述译名名称之间的所述匹配度;Context(E1)为所述待翻译实体E1在所述知识图谱中对应的所述共现关系内包含的关联实体;Context(E2)为所述译名名称E2的所述共现关系内包含的关联实体;ei为所述待翻译实体E1的所述共现关系内第i个关联实体;ej为所述译名名称E2的所述共现关系内第j个所述关联实体;Prop(ei)为所述待翻译实体E1的所述共现关系内第i 个关联实体的实体类型;Prop(ej)为所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型;ω p为所述实体类型对应的权重值;Simlarity type(p)(ei[p],ej[p])为所述实体类型对应的匹配度函数;ei[p]为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型的参数值;ej[p]为所述第j个所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型的参数值。 Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω p is the weight value corresponding to the entity type; Similarity type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.
在第一方面的一种可能的实现方式中,所述知识图谱的生成方法还包括:In a possible implementation of the first aspect, the method for generating the knowledge graph further includes:
接收用户输入的关键词,并从所述知识图谱中查询所述关键词对应的所述共现关系;Receiving keywords input by the user, and querying the co-occurrence relationship corresponding to the keywords from the knowledge graph;
根据所述共现关系输出所述用户的推荐信息。Output the recommendation information of the user according to the co-occurrence relationship.
第二方面,本申请实施例提供了一种知识图谱的生成装置,包括:In the second aspect, an embodiment of the present application provides an apparatus for generating a knowledge graph, including:
转译关系建立单元,用于建立目标实体的多个别名名称基于目标语言的转译关系;The translation relationship establishment unit is used to establish the translation relationship of multiple alias names of the target entity based on the target language;
共现关系生成单元,用于通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系;The co-occurrence relationship generation unit is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;
知识图谱构建单元,用于根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。The knowledge graph construction unit is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
第三方面,本申请实施例提供了一种终端设备,存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述第一方面中任一项所述知识图谱的生成方法。In a third aspect, embodiments of the present application provide a terminal device, a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The computer program implements the method for generating the knowledge graph in any one of the above-mentioned first aspects.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现上述第一方面中任一项所述知识图谱的生成方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and is characterized in that, when the computer program is executed by a processor, any of the above-mentioned aspects of the first aspect is implemented. A method for generating the knowledge graph.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述知识图谱的生成方法。In a fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the method for generating the knowledge graph in any one of the above-mentioned first aspects.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It can be understood that, for the beneficial effects of the second aspect to the fifth aspect described above, reference may be made to the related description in the first aspect described above, and details are not repeated here.
本申请实施例与现有技术相比存在的有益效果是:Compared with the prior art, the embodiments of this application have the following beneficial effects:
本申请实施例通过获取目标实体的各个别名名称在其他语言的译名名称,其中目标实体可以识别为一个知识节点,并根据各个别名名称与译名名称之间的对应关系,生成目标实体关于目标语言的转译关系,并通过语料库建立目标实体内各个别名名称的共现关系,以挖掘目标实体的各个别名名称与其他实体之间的关联关系,以扩展知识图谱中每个知识节点的关联深度,根据所有目标实体的转译关系以及共现关系,实现构建支持多语言的知识图谱的目的。与现有的知识图谱技术相比,本申请实施例能够对知识图谱中每个知识节点,即目标实体建立转移关系,以连接不同语种之间知识节点,并通过构建共现关系以扩展每个知识节点的知识深度,不单单局限于目标实体自身属性,提高了每个知识节点的联想能力,知识图谱的广度以及深度,从而提高了人工智能输出结果的准确性,提升服务响应质量。The embodiment of the application obtains the translated name of each alias name of the target entity in other languages, where the target entity can be identified as a knowledge node, and according to the correspondence between each alias name and the translated name, generates the target entity’s information about the target language Translate the relationship, and establish the co-occurrence relationship of each alias name in the target entity through the corpus, to mine the relationship between each alias name of the target entity and other entities, to expand the depth of association of each knowledge node in the knowledge graph, according to all The translation relationship and co-occurrence relationship of target entities realize the purpose of constructing a knowledge graph that supports multiple languages. Compared with the existing knowledge graph technology, the embodiment of this application can establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.
附图说明Description of the drawings
图1是本申请第一实施例提供的一种知识图谱的生成方法的实现流程图;FIG. 1 is an implementation flowchart of a method for generating a knowledge graph provided by the first embodiment of the present application;
图2是本申请一实施例提供的目标实体的转译关系的实体图;Figure 2 is an entity diagram of the translation relationship of the target entity provided by an embodiment of the present application;
图3是本申请一实施例提供的共现关系的示意图;Fig. 3 is a schematic diagram of a co-occurrence relationship provided by an embodiment of the present application;
图4是本申请第二实施例提供的一种知识图谱的生成方法S101具体实现流程图;4 is a specific implementation flow chart of a method S101 for generating a knowledge graph provided by the second embodiment of the present application;
图5是本申请一实施例提供的神经机器翻译模型的结构框图;Fig. 5 is a structural block diagram of a neural machine translation model provided by an embodiment of the present application;
图6是本申请第三实施例提供的一种知识图谱的生成方法S1011具体实现流程图;6 is a specific implementation flow chart of a method S1011 for generating a knowledge graph provided by the third embodiment of the present application;
图7是本申请第四实施例提供的一种知识图谱的生成方法S1013具体实现流程图;FIG. 7 is a specific implementation flowchart of a method S1013 for generating a knowledge graph provided by the fourth embodiment of the present application;
图8是本申请第五实施例提供的一种知识图谱的生成方法S102具体实现流程图;FIG. 8 is a specific implementation flowchart of a method S102 for generating a knowledge graph provided by the fifth embodiment of the present application;
图9是本申请第六实施例提供的一种知识图谱的生成方法具体实现流程图;9 is a specific implementation flowchart of a method for generating a knowledge graph provided by the sixth embodiment of the present application;
图10是本申请一实施例提供的基于知识图谱的翻译流程图;FIG. 10 is a flowchart of translation based on a knowledge graph provided by an embodiment of the present application;
图11是本申请一实施例提供的一种基于知识图谱的翻译系统的结构示意图;FIG. 11 is a schematic structural diagram of a translation system based on a knowledge graph provided by an embodiment of the present application;
图12是本申请一实施例提供的一种知识图谱的生成装置内各个单元在响应翻译操作时对应的交互流程图;FIG. 12 is a corresponding interaction flowchart of each unit in the apparatus for generating a knowledge graph provided by an embodiment of the present application when responding to a translation operation;
图13是本申请第七实施例提供的一种知识图谱的生成方法具体实现流程图;FIG. 13 is a specific implementation flowchart of a method for generating a knowledge graph provided by a seventh embodiment of the present application;
图14是本申请一实施例提供的一种知识图谱的生成设备的结构框图;FIG. 14 is a structural block diagram of a device for generating a knowledge graph provided by an embodiment of the present application;
图15是本申请另一实施例提供的一种终端设备的示意图。FIG. 15 is a schematic diagram of a terminal device provided by another embodiment of the present application.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。Reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.
本申请实施例提供的知识图谱的生成方法可以应用于手机、平板电脑、可穿戴设 备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等终端设备上,还可以应用于数据库、服务器以及基于终端人工智能的服务响应系统,本申请实施例对终端设备的具体类型不作任何限制。The method for generating the knowledge graph provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobiles. Personal computers (ultra-mobile personal computers, UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs) and other terminal devices can also be applied to databases, servers, and service response systems based on terminal artificial intelligence. Examples of this application There are no restrictions on the specific types of terminal equipment.
在本申请实施例中,流程的执行主体为知识图谱的生成装置。作为示例而非限定,知识图谱的生成装置具体可以为一数据库服务器,用于接收用户输入的知识资源或从其他数据库获取得到的知识资源,并基于接收到的所有知识数据生成知识图谱,用于支撑终端人工智能的相关逻辑运算。图1示出了本申请第一实施例提供的知识图谱的生成方法的实现流程图,详述如下:In the embodiment of the present application, the execution subject of the process is the generating device of the knowledge graph. As an example and not a limitation, the device for generating a knowledge graph can be specifically a database server for receiving knowledge resources input by users or knowledge resources obtained from other databases, and generating a knowledge graph based on all the received knowledge data for Support the related logic operations of terminal artificial intelligence. Fig. 1 shows an implementation flow chart of the method for generating a knowledge graph provided by the first embodiment of the present application, and the details are as follows:
在S101中,确定目标实体的各个别名名称在目标语言的译名名称,并根据所述别名名称以及所述译名名称,生成所述目标实体的转译关系。In S101, the translated name of each alias name of the target entity in the target language is determined, and the translation relationship of the target entity is generated according to the alias name and the translated name.
在本实施例中,实体,也称为对象,具体可以为一可交互、可操作的客观存在的物体、概念或虚拟对象,举例性地,计算机、手机、服务器等属于客观存在的物体,而数据库、中间件、软件程序等存在于电子信息领域的虚拟对象也可以属于实体。不同的实体根据使用场景的不同,可以存在多个别名名称,上述的别名名称用于指示同一实体对象。举例性地,对于“桔子”这一实体,存在用于指示同一实体的其他别名名称,例如为“柑橘”、“橘子”,即上述“桔子”这一实体存在三个别名名称。生成装置可以通过用户输入、数据库下载、基于语料库的智能学习等方式获取得到各个实体所对应的别名名称,作为另一可行实施例,可以为每个实体建立对应的名称列表,该名称列表内存储有目标实体的别名名称。其中,该名称列表内的所有别名名称具体为基于同一语言下的别名名称,例如上述举例的“柑橘”、“橘子”以及“桔子”,则是基于中文这一语言所对应的别名名称,而对于“桔子”这一实体在英文当中,可以存在有“orange”、“tangerine”以及“citrus”三种不同的说法,并基于“orange”、“tangerine”以及“citrus”三个别名名称,构建“桔子”这一实体关于英文语法的名称列表。生成装置可以将某一语言设置有源语言,并获取基于源语言下各个实体的名称列表,该名称列表内包含有关于上述实体基于源语言的所有别名名称。In this embodiment, an entity, also referred to as an object, can specifically be an objectively existing object, concept or virtual object that can be interacted and operated. For example, computers, mobile phones, servers, etc. are objectively existing objects, and Virtual objects that exist in the field of electronic information such as databases, middleware, and software programs can also belong to entities. Different entities may have multiple alias names according to different usage scenarios, and the above alias names are used to indicate the same entity object. For example, for the entity "orange", there are other alias names used to indicate the same entity, such as "citrus" and "orange", that is, there are three alias names for the entity "orange" mentioned above. The generating device can obtain the alias name corresponding to each entity through user input, database download, corpus-based intelligent learning, etc. As another feasible embodiment, a corresponding name list can be established for each entity, and the name list is stored There is an alias name of the target entity. Among them, all the alias names in the name list are specifically based on the alias names in the same language. For example, the above examples of "citrus", "mandarin" and "mandarin orange" are based on the alias names corresponding to the Chinese language, and For the entity "Orange" in English, there can be three different terms "orange", "tangerine" and "citrus", and it is constructed based on the three alias names of "orange", "tangerine" and "citrus" A list of the names of the entity "Orange" on English grammar. The generating device can set a certain language as an active language, and obtain a name list of each entity based on the source language, and the name list contains all the alias names of the above entities based on the source language.
在本实施例中,知识图谱的生成装置在建立转移关系时,可以选取与源语言不同的其他语言作为目标语言,并确定各个别名名称在目标语言所对应的译名名称。其中,获取别名名称的译名名称的方法可以为通过预设源语言与目标语言之间的翻译算法,确定别名名称关联的译名名称。In this embodiment, when the device for generating the knowledge graph establishes the transfer relationship, another language different from the source language can be selected as the target language, and the translated name corresponding to each alias name in the target language can be determined. Wherein, the method for obtaining the translated name of the alias name may be to determine the translated name associated with the alias name through a preset translation algorithm between the source language and the target language.
作为本申请另一可选的实施例,知识图谱的生成装置可以获取包含别名名称的多个参考文本,获取各个所述参考文本基于目标语言的译文文本,并从各个译文文本中定位关于别名名称对应的词组,将该词组识别为别名名称的候选译名,并统计各个候选译名在所有译文文本中的出现次数,根据出现次数识别别名名称对应的译名名称,例如选取出现概率大于预设的概率阈值的候选译名作为别名名称的译名名称;又或者选取出现概率最大的一个候选译名作为别名名称对应的译名名称。基于此,一个基于源语言的别名名称在目标语言下可以存在多个译名名称,与之相对应的,不同的别名名称在映射到目标语言时,也可以对应到同一译名名称。生成装置可以以别名名称为 节点,为每个别名名称与关联的译名名称建立映射关系,并将所有上述建立的映射关系,构建目标实体的转译关系。As another optional embodiment of the present application, the device for generating the knowledge graph can obtain multiple reference texts containing alias names, obtain the target language-based translation text of each reference text, and locate the alias name from each translation text For the corresponding phrase, identify the phrase as a candidate translated name of the alias name, and count the number of occurrences of each candidate translated name in all translated texts, and identify the translated name corresponding to the alias name according to the number of occurrences, for example, select the occurrence probability to be greater than the preset probability threshold The candidate translated name of is used as the translated name of the alias name; or the candidate translated name with the highest occurrence probability is selected as the translated name corresponding to the alias name. Based on this, an alias name based on the source language can have multiple translated names in the target language. Correspondingly, when different alias names are mapped to the target language, they can also correspond to the same translated name. The generating device may use the alias name as the node, establish a mapping relationship between each alias name and the associated translated name, and construct the translation relationship of the target entity by all the mapping relationships established above.
需要说明的是,现有的知识图谱是以实体为粒度进行图谱构建,因此在多语言场景下,知识图谱中的各个节点会将所有语言的别名名称糅合到同一节点内,无法确定不同别名名称相互之间的映射关系,从而在例如翻译或语义分析等场景下,会降低输出结果的准确性。与现有技术不同的是,本申请能够为每个别名名称建立独立的知识节点,并在知识节点记录与其对应的译名名称,构建了译名名称与别名名称之间的映射关系。It should be noted that the existing knowledge graph is constructed with the granularity of entities. Therefore, in a multilingual scenario, each node in the knowledge graph will combine the alias names of all languages into the same node, and different alias names cannot be determined. The mapping relationship between each other will reduce the accuracy of the output results in scenarios such as translation or semantic analysis. Different from the prior art, this application can establish an independent knowledge node for each alias name, and record its corresponding translated name in the knowledge node, thereby constructing a mapping relationship between the translated name and the alias name.
举例性地,图2示出了本申请一实施例提供的目标实体的转译关系的实体图。如图2所示,“桔子”这一实体存在中文文法下存在三种不同的别名名称,分别为“桔子”、“橘子”以及“柑橘”,在通过大数据分析,可以确定在大部分的翻译场景下,“桔子”以及“橘子”会被翻译为“orange”,而“柑橘”则存在两个译名名称,则为“tangerine”以及“citrus”,根据各个别名名称之间的对应关系,则可以建立各个别名名称关于中文与英文之间的映射关系,从而将所有映射关系进行汇聚,得到目标实体对应的转译关系。通过图2可清晰地看出,本申请中建立映射关系的对象为别名名称,从而能够准确地获取得到每个别名名称对应的译名名称,特别在翻译场景下,能够大大提高翻译的准确性以及文本的可读性。For example, FIG. 2 shows an entity diagram of a translation relationship of a target entity provided by an embodiment of the present application. As shown in Figure 2, the entity "Orange" has three different alias names under the Chinese grammar, namely "Orange", "Orange" and "Citrus". Through big data analysis, it can be determined that most of the In the translation scenario, "Orange" and "Orange" will be translated as "orange", and "Orange" has two translated names, namely "tangerine" and "citrus". According to the correspondence between the alias names, Then, the mapping relationship between Chinese and English of each alias name can be established, so that all the mapping relationships are aggregated to obtain the translation relationship corresponding to the target entity. It can be clearly seen from Figure 2 that the object of establishing the mapping relationship in this application is the alias name, so that the translated name corresponding to each alias name can be accurately obtained. Especially in the translation scenario, the accuracy of translation can be greatly improved. The readability of the text.
在S102中,通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系。In S102, the co-occurrence relationship of each of the alias names in the target entity is generated through a preset corpus.
在本实施例中,语料库可以存储于知识图谱的生成装置内,在该情况下,生成装置可以通过本地调用的方式获取语料库内预存的文本数据,通过文本数据生成共现关系;语料库也可以存储于其他的数据库服务器,在该情况下,知识图谱的生成装置可以与语料库服务器建立通信连接,并生成关于目标实体的数据查询指令,将数据查询指令发送给语料库服务器,语料库服务器在接收到该数据查询指令后,可以提取包含目标实体的所有文本数据,并反馈给知识图谱的生成装置。可选地,若文本数据的数据量较大,例如某一文本数据以书本的格式存储于语料库内,即该文本数据包含多个段落,在该情况下,语料库服务器可以从该文本数据中提取包含目标实体的语句或段落反馈给生成装置,而无需将并不包含目标实体的其他段落或语句发送给生成装置,从而提高后续共现关系的建立操作的准确性。In this embodiment, the corpus can be stored in the knowledge graph generation device. In this case, the generation device can obtain the text data pre-stored in the corpus by local calling, and generate the co-occurrence relationship through the text data; the corpus can also be stored For other database servers, in this case, the knowledge graph generation device can establish a communication connection with the corpus server, and generate data query instructions about the target entity, and send the data query instructions to the corpus server, and the corpus server receives the data After the query instruction, all text data including the target entity can be extracted and fed back to the knowledge graph generating device. Optionally, if the amount of text data is large, for example, a certain text data is stored in a corpus in the format of a book, that is, the text data contains multiple paragraphs, in this case, the corpus server can extract the text data The sentence or paragraph containing the target entity is fed back to the generating device without sending other paragraphs or sentences that do not contain the target entity to the generating device, thereby improving the accuracy of subsequent co-occurrence relationship establishment operations.
在本实施例中,知识图谱的生成装置通过语料库获取得到包含目标实体的训练语句,并实体标记算法识别出各个训练语句内包含的关联实体,并根据在当前的训练语句中该目标实体所出现的别名名称,建立别名名称与各个关联实体之前的关联关系,从而生成别名名称的共现关系。需要说明的是,在语料库中提取得到的训练语句可以为包含目标实体以各个别名名称出现的语句,因此在提取得到的训练语句中对于目标实体的表述方式不一致,因此在生成共现关系的过程中,可以根据别名名称的不同对各个训练语句划分为不同的语句组,同一语句组内对于目标实体的别名名称一致,继而能够通过该语句组确定该别名名称对应的共现关系。In this embodiment, the device for generating the knowledge graph obtains the training sentence containing the target entity through the corpus, and the entity labeling algorithm identifies the associated entity contained in each training sentence, and according to the appearance of the target entity in the current training sentence Create an association relationship between the alias name and each associated entity, thereby generating the co-occurrence relationship of the alias name. It should be noted that the training sentence extracted from the corpus can be a sentence containing the target entity appearing under each alias name. Therefore, the expression of the target entity in the extracted training sentence is inconsistent, so in the process of generating the co-occurrence relationship Each training sentence can be divided into different sentence groups according to different alias names, and the alias names for the target entities in the same sentence group are consistent, and then the co-occurrence relationship corresponding to the alias names can be determined through the sentence group.
举例性地,图3示出了本申请一实施例提供的共现关系的示意图。参见图3所示,某一目标实体为“国家体育馆”,该目标实体存在两个别名名称,分别为“国家体育 馆”以及“鸟巢”,其中,语料库内存储一训练语句为“鸟巢位于水立方对面,是2008年北京奥运会的体育馆”,通过实体标注算法,可以识别得到该训练语句中除“鸟巢”外的其他实体分别为“水立方”、“体育馆”、“北京”以及“奥运会”,因此,建立“国家体育馆”这一目标实体,关于“鸟巢”这一别名名称与“水立方”、“体育馆”、“北京”以及“奥运会”之间的共现关系。其中,共现关系可以通过图3所示的方式进行标识。For example, FIG. 3 shows a schematic diagram of a co-occurrence relationship provided by an embodiment of the present application. As shown in Figure 3, a certain target entity is "National Stadium", and the target entity has two alias names, namely "National Stadium" and "Bird's Nest". Among them, a training sentence is stored in the corpus as "Bird's Nest is located in the Water Cube". Opposite is the 2008 Beijing Olympic Stadium. Through the entity tagging algorithm, the entities other than “Bird’s Nest” in the training sentence can be identified as “Water Cube”, “Gymnasium”, “Beijing” and “Olympics”. Therefore, the establishment of the target entity "National Stadium", regarding the co-occurrence relationship between the alias name "Bird's Nest" and "Water Cube", "Gymnasium", "Beijing" and "Olympics". Among them, the co-occurrence relationship can be identified in the manner shown in FIG. 3.
在本实施例中,与S101相同,知识图谱的生成装置在建立共现关系时,也是基于别名名称进行共现关系的构建,即区分不同别名名称的共现关系,通过区分不同别名名称的共现关系,能够确定各个别名名称的常用使用场景以及关联的其他实体对象,在提高翻译操作的准确性的同时,对于信息推荐以及词语联想等领域具有较高的应用价值,从而能够挖掘出各个别名名称的关联实体,提高了知识图谱的深度。In this embodiment, similar to S101, when the device for generating the knowledge graph establishes the co-occurrence relationship, it also constructs the co-occurrence relationship based on the alias name, that is, distinguish the co-occurrence relationship of different alias names, and distinguish the co-occurrence relationship of different alias names. The existing relationship can determine the common usage scenarios of each alias name and other related entity objects. While improving the accuracy of translation operations, it has high application value in the fields of information recommendation and word association, so that each alias can be mined. The associated entity of the name increases the depth of the knowledge graph.
可选地,作为本申请的另一实施例,由于语料库中存在多个训练语句,而在不同的训练语句中,关联实体的出现次数可以为多次,知识图谱的生成装置在建立目标实体与各个关联实体之间的共现关系时,可以统计各个关联实体与目标实体一并出现的语句个数,即为共现次数,并基于共现次数为各个关联对象配置相应的关联权重。继续参见图3所示,作为示例而非限定,可以将共现次数标记在目标实体与关联实体之间的连接线上。Optionally, as another embodiment of the present application, since there are multiple training sentences in the corpus, and in different training sentences, the number of occurrences of the associated entity may be multiple times, and the device for generating the knowledge graph is establishing the target entity and In the co-occurrence relationship between each associated entity, the number of sentences that appear together with each associated entity and the target entity can be counted, that is, the number of co-occurrences, and corresponding associated weights are configured for each associated object based on the number of co-occurrences. Continuing to refer to FIG. 3, as an example and not a limitation, the number of co-occurrences may be marked on the connecting line between the target entity and the associated entity.
在S103中,根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。In S103, construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
在本实施例中,知识图谱的生成装置可以对所有目标实体执行S101以及S102的操作,建立关于各个目标实体的转译关系,以及该目标实体的各个别名名称的共现关系,并在预设的知识图谱中以别名名称为粒度页面上为各个别名名称创建独立的知识节点,将别名名称对应的共现关系以及译名名称添加到该别名名称对应的知识节点内,将各个别名名称对应的知识节点封装到对应的目标实体的知识节点,并在以实体为粒度的页面上创建目标实体对应的知识节点,根据各个目标实体之间的关联关系,构建得到知识图谱。In this embodiment, the device for generating the knowledge graph can perform the operations of S101 and S102 on all target entities, establish the translation relationship of each target entity, and the co-occurrence relationship of each alias name of the target entity, and set it in the preset In the knowledge graph, the alias name is used as the granularity page to create an independent knowledge node for each alias name, and the co-occurrence relationship and the translated name corresponding to the alias name are added to the knowledge node corresponding to the alias name, and the knowledge node corresponding to each alias name is added Encapsulate the knowledge node of the corresponding target entity, and create the knowledge node corresponding to the target entity on the page with the granularity of the entity, and construct the knowledge graph according to the association relationship between each target entity.
可选地,知识图谱至少包含两个层级,分别为以实体为粒度的第一图谱层级,以及以别名名称为粒度的第二图谱层级。用户可以在第一图谱层级上点击任一目标实体,则知识图谱会切换至以别名名称为粒度的第二图谱层级,并在第二图谱层级中展示关于该目标识别下各个别名名称的语义网络。Optionally, the knowledge graph includes at least two levels, the first graph level with the entity as the granularity, and the second graph level with the alias name as the granularity. The user can click on any target entity on the first graph level, and the knowledge graph will switch to the second graph level with the alias name as the granularity, and display the semantic network of each alias name under the target recognition in the second graph level .
以上可以看出,本申请实施例提供的一种知识图谱的生成方法通过获取目标实体的各个别名名称在其他语言的译名名称,其中目标实体可以识别为一个知识节点,并根据各个别名名称与译名名称之间的对应关系,生成目标实体关于目标语言的转译关系,并通过语料库建立目标实体内各个别名名称的共现关系,以挖掘目标实体的各个别名名称与其他实体之间的关联关系,以扩展知识图谱中每个知识节点的关联深度,根据所有目标实体的转译关系以及共现关系,实现构建支持多语言的知识图谱的目的。与现有的知识图谱技术相比,本申请实施例能够对知识图谱中每个知识节点,即目标实体建立转移关系,以连接不同语种之间知识节点,并通过构建共现关系以扩展每个知识节点的知识深度,不单单局限于目标实体自身属性,提高了每个知识节点的联想 能力,知识图谱的广度以及深度,从而提高了人工智能输出结果的准确性,提升服务响应质量。It can be seen from the above that the method for generating a knowledge graph provided by the embodiments of the present application obtains the translated names of each alias name of the target entity in other languages, where the target entity can be identified as a knowledge node, and according to each alias name and the translated name Correspondence between the names, generate the translation relationship of the target entity with respect to the target language, and establish the co-occurrence relationship of each alias name in the target entity through the corpus to mine the association relationship between each alias name of the target entity and other entities. Expand the depth of association of each knowledge node in the knowledge graph, and realize the purpose of constructing a knowledge graph that supports multiple languages according to the translation relationship and co-occurrence relationship of all target entities. Compared with the existing knowledge graph technology, the embodiment of this application can establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.
图4示出了本申请第二实施例提供的一种知识图谱的生成方法S101的具体实现流程图。参见图4,相对于图1所述实施例,本实施例提供的一种知识图谱的生成方法中S101包括:S1011~S1014,具体详述如下:FIG. 4 shows a specific implementation flow chart of a method S101 for generating a knowledge graph provided by the second embodiment of the present application. Referring to FIG. 4, with respect to the embodiment described in FIG. 1, S101 in a method for generating a knowledge graph provided by this embodiment includes: S1011 to S1014, which are detailed as follows:
在S1011中,分别获取包含各个所述别名名称的源语言语句。In S1011, source language sentences containing each of the alias names are obtained respectively.
在本实施例中,知识图谱的生成装置可以从源语言对应的语料库中分别提取包含各个别名名称的源语言语句,即各个源语言语句记录于历史文本数据内。可选地,生成装置也可以设置有语句模板,将各个别名名称导入到语句模板中,输出各个别名名称对应的源语言语句。In this embodiment, the device for generating the knowledge graph can extract source language sentences containing each alias name from the corpus corresponding to the source language, that is, each source language sentence is recorded in the historical text data. Optionally, the generating device may also be provided with a sentence template, import each alias name into the sentence template, and output the source language sentence corresponding to each alias name.
可选地,作为本申请的另一实施例,知识图谱的生成装置可以根据别名名称对应的共现关系,统计别名名称关联的各个共现实体的出现次数,并基于出现次数选取出高频共现实体,通过基于人工智能的自然语言生成算法(Natural Language Generation,NLG)将别名名称与各个高频共现实体组合,得到源语言语句。由于与别名名称一同出现次数较多的高频共现实体,则可以较好地表示该别名名称的常用语境,从而输出的源语言语句能够具有较高的代表性,在后续的翻译过程中,能够确定别名名称在常用语境下的译名名称,从而能够提高转移关系的准确性。Optionally, as another embodiment of the present application, the device for generating a knowledge graph can count the number of occurrences of each co-real entity associated with the alias name according to the co-occurrence relationship corresponding to the alias name, and select high-frequency co-occurrences based on the number of occurrences. For the real body, the source language sentence is obtained by combining the alias name with each high-frequency co-real body through the natural language generation algorithm (NLG) based on artificial intelligence. Because of the high frequency co-realities that appear frequently with the alias name, it can better represent the common context of the alias name, so that the output source language sentence can have a higher representativeness, and in the subsequent translation process , Can determine the translated name of the alias name in the common context, so as to improve the accuracy of the transfer relationship.
在S1012中,根据源语言与所述目标语言之间的翻译模型,输出各个所述源语言语句对应的目标语言语句。In S1012, according to the translation model between the source language and the target language, a target language sentence corresponding to each source language sentence is output.
在本实施例中,知识图谱的生成装置可以选取除源语言外的任一其他语言作为目标语言,并获取源语言与与目标语言之间的翻译模型。该翻译模型可以基于机器翻译(Machine Translation,MT)算法生成得到。其中,MT算法是借助计算机程序或计算机可读指令等自动化手段,将一种自然语言文本(源语言)翻译成另一种自然语言文本(目标语言),而随着人工智能的不断发展,神经机器翻译(Neural Machine Translation,NMT)算法则作为翻译领域的主流翻译手段。NMT可以通过长短期循环神经网络(Long Short-Term Memory-Recurrent Neural Network,LSTM-RNN)的方式进行构建翻译模型,该翻译模型擅长对自然语言建模,把任意长度的句子转化为特定维度的浮点数向量,将文本数据转换为向量数据,方便计算机程序能够“理解”文本中的语义,并基于语义对语句进行翻译。生成装置可以将获取得到的源语言语句导入到翻译模型中,输出与之对应的目标语言语句。In this embodiment, the device for generating the knowledge graph can select any language other than the source language as the target language, and obtain a translation model between the source language and the target language. The translation model can be generated based on a machine translation (MT) algorithm. Among them, the MT algorithm uses computer programs or computer-readable instructions to translate one natural language text (source language) into another natural language text (target language). With the continuous development of artificial intelligence, neural Machine translation (Neural Machine Translation, NMT) algorithm is used as the mainstream translation method in the field of translation. NMT can construct a translation model through Long Short-Term Memory-Recurrent Neural Network (LSTM-RNN). The translation model is good at modeling natural language and transforming sentences of any length into specific dimensions. The floating-point number vector converts text data into vector data so that computer programs can "understand" the semantics of the text and translate sentences based on the semantics. The generating device can import the obtained source language sentence into the translation model, and output the corresponding target language sentence.
具体地,若知识图谱的生成装置采用NMT模型作为翻译模型,则输出目标语言语句的方式可以为:将源语言语句划分为多个词组,并将各个词组导入NMT模型内的编码模块,得到各个词组对应的编码值,生成关于源语言语句的语句向量,获取目标语言的解码模块,将生成语句向量作为编码模块的输入向量,生成目标语言语句。图5示出了本申请一实施例提供的神经机器翻译模型的结构框图。如图5所示,该NMT模型包括有基于源语言的编码模块Encoder以及基于目标语言的解码模块Decoder,将原目标语言内的各个单词根据词义映射到对应的向量值,并通过解码模块识别该向量值在目标语言中关联的单词,从而完成翻译操作。Specifically, if the device for generating the knowledge graph adopts the NMT model as the translation model, the way to output the target language sentence can be: divide the source language sentence into multiple phrases, and import each phrase into the coding module in the NMT model to obtain each The encoding value corresponding to the phrase generates a sentence vector about the source language sentence, obtains the decoding module of the target language, and uses the generated sentence vector as the input vector of the encoding module to generate the target language sentence. Fig. 5 shows a structural block diagram of a neural machine translation model provided by an embodiment of the present application. As shown in Figure 5, the NMT model includes an encoding module Encoder based on the source language and a decoding module Decoder based on the target language. Each word in the original target language is mapped to the corresponding vector value according to the word meaning, and the decoding module recognizes the The vector value is associated with the word in the target language to complete the translation operation.
在S1013中,分别从各个所述目标语言语句提取所述别名名称在所述目标语言下 的所述译名名称。In S1013, extract the translated name of the alias name in the target language from each of the target language sentences.
在本实施例中,知识图谱的生成装置可以通过与目标语言对应的实体标注算法,标记出目标语言语句包含的各个实体对应的词组,并选取与目标实体对应的词组作为别名名称在目标语言下的译名名称。相比于直接将别名名称导入到翻译模型中计算单个名称对应的译名,通过设置特定语言环境下识别别名名称对应的译名名称,译名名称是基于整个语句的语义输出的名称,与上下文以及当前语境相匹配,从而能够提高翻译的准确性,特别当目标实体在目标语言中存在多个译名时,能够准确确定目标实体在当前翻译的别名名称下关联的译名名称。In this embodiment, the device for generating the knowledge graph can mark the phrase corresponding to each entity contained in the target language sentence through the entity tagging algorithm corresponding to the target language, and select the phrase corresponding to the target entity as the alias name under the target language The translated name. Compared with directly importing the alias name into the translation model to calculate the translated name corresponding to a single name, by setting the translated name corresponding to the alias name in a specific language environment, the translated name is the name based on the semantic output of the entire sentence, and the context and current language Context matching can improve the accuracy of translation, especially when the target entity has multiple translated names in the target language, it can accurately determine the translated name associated with the target entity under the alias name of the current translation.
在S1014中,建立所述别名名称以及所述译名名称之间的所述转译关系。In S1014, the translation relationship between the alias name and the translated name is established.
在本实施例中,知识图谱的生成装置在确定了别名名称关联的译名名称后,可以建立上述两者之间的转译关系。In this embodiment, after the device for generating the knowledge graph determines the translated name associated with the alias name, the translation relationship between the above two can be established.
在本申请实施例中,通过输出包含各个别名名称的源语言语句,能够基于上下文以及实际使用语境确定别名名称对应的译名名称,并建立转译关系,能够提高转译关系的准确性。In the embodiment of the present application, by outputting a source language sentence containing each alias name, the translated name corresponding to the alias name can be determined based on the context and the actual use context, and the translation relationship can be established, which can improve the accuracy of the translation relationship.
图6示出了本申请第三实施例提供的一种知识图谱的生成方法S1011的具体实现流程图。参见图6,相对于图4所述实施例,本实施例提供的一种知识图谱的生成方法中S1011包括:S601~S602,具体详述如下:FIG. 6 shows a specific implementation flowchart of a method S1011 for generating a knowledge graph provided by the third embodiment of the present application. Referring to FIG. 6, compared with the embodiment described in FIG. 4, S1011 in a method for generating a knowledge graph provided by this embodiment includes: S601 to S602, which are detailed as follows:
进一步地,所述分别获取包含各个所述别名名称的源语言语句,包括:Further, the obtaining the source language sentences containing each of the alias names respectively includes:
在S601中,根据所述目标实体的实体类型,获取与所述实体类型关联的语句模板。In S601, according to the entity type of the target entity, a sentence template associated with the entity type is obtained.
在本实施例中,知识图谱的生成装置可以通过手动配置的方式为不同的实体类型配置对应的语句模板,并建立语句模板库。可选地,知识图谱的生成装置可以采用远程监督算法从语料库中识别得到各个训练文本中包含的实体,并确定各个实体的实体类型,选取实体类型相同的多个训练文本,并识别各个训练文本对应的语句结构,选取语句结构出现次数大于预设的出现阈值的语句结构为实体类型对应的常用结构,并基于常用结构生成至少一个关于实体类型的语句模板。In this embodiment, the device for generating the knowledge graph can manually configure corresponding sentence templates for different entity types, and build a sentence template library. Optionally, the device for generating the knowledge graph may use a remote supervision algorithm to identify entities contained in each training text from the corpus, determine the entity type of each entity, select multiple training texts with the same entity type, and identify each training text Corresponding sentence structure, selecting a sentence structure whose occurrence number of sentence structure is greater than a preset occurrence threshold as a common structure corresponding to the entity type, and generating at least one sentence template about the entity type based on the common structure.
在本实施例中,知识图谱的生成装置根据别名名称关联的目标实体对应的实体类型,从语句模板库中提取与实体类型相匹配的语句模板。该语句模板的数量可以为一个也可以为多个。可选地,若语句模板的数量为多个,且语句模板的数量多于目标实体的别名名称的数量,则可以提取与别名名称的数量匹配的多个语句模板,为每个别名名称配置单独的语句模板,从而能够使得每个别名名称分配得到的语句模板不相同。In this embodiment, the device for generating the knowledge graph extracts sentence templates matching the entity type from the sentence template library according to the entity type corresponding to the target entity associated with the alias name. The number of sentence templates can be one or more. Optionally, if the number of sentence templates is multiple, and the number of sentence templates is more than the number of alias names of the target entity, multiple sentence templates matching the number of alias names can be extracted, and a separate configuration for each alias name The sentence template for each alias name can be assigned differently.
在S602中,将各个所述别名名称导入所述语句模板,生成所述源语言语句。In S602, import each of the alias names into the sentence template to generate the source language sentence.
在本实施例中,语句模板中设置有实体类型的导入区域,知识图谱的生成装置可以将别名名称导入到语句模板内预设的导入区域,从而生成一个具有完整意义的语句,即上述的源语言语句。In this embodiment, the sentence template is provided with an import area of the entity type, and the knowledge graph generation device can import the alias name into the preset import area in the sentence template, thereby generating a sentence with complete meaning, that is, the aforementioned source Language statements.
可选地,若语句模板的个数为单个,则可以将各个别名名称导入到相同的语句模板内,生成了具有别名名称不同而其他内容相同的多个源语言语句。举例性地,一语句模板为“这是一棵[水果类型实体]树”,而目标实体为“桔子”,该目标实体的实体类型为水果类型,即与上述的语句模板相匹配,并且该目标实体具有三个别名名称,分别为“桔子”、“橘子”以及“柑橘”,因而可以将上述三个别名名称分别导入该 语句模板内,即导入到[水果类型实体]对应的区域内,得到的“这是一棵[桔子]树”、“这是一棵[橘子]树”以及“这是一棵[柑橘]树”。Optionally, if the number of sentence templates is single, each alias name can be imported into the same sentence template to generate multiple source language sentences with different alias names but the same other content. For example, a sentence template is "this is a [fruit type entity] tree", and the target entity is "orange", the entity type of the target entity is fruit type, that is, it matches the sentence template above, and the The target entity has three alias names, namely "Orange", "Orange" and "Citrus". Therefore, the above three alias names can be imported into the sentence template respectively, that is, into the area corresponding to [Fruit Type Entity], Get "this is a [orange] tree", "this is a [orange] tree" and "this is a [citrus] tree".
可选地,若语句模板的个数为多个,则可以基于随机分配算法,为每个别名名称配置一个语句模板,从而生成了多个源语言语句。举例性地,关于水果类型实体的语句模板的个数为3个,分别为“这是一棵[水果类型实体]树”、“吃点[水果类型实体]”以及“买个[水果类型实体]”,则将“桔子”这一目标实体的三个别名名称分别导入到上述任一语句模板内,则可以得到“这是一棵[橘子]树”、“吃点[柑橘]”以及“买个[桔子]”。Optionally, if there are multiple sentence templates, one sentence template may be configured for each alias name based on a random allocation algorithm, thereby generating multiple source language sentences. For example, the number of sentence templates for fruit type entities is 3, which are "this is a [fruit type entity] tree", "eat some [fruit type entity]", and "buy a [fruit type entity] ]", then import the three alias names of the target entity "Orange" into any of the above sentence templates, and you can get "This is a [orange] tree", "eat some [citrus]" and " Buy an [orange]".
优选地,识别各个语句模板包含的其他实体,并从别名名称对应的共现关系中识别各个其他实体的出现次数,基于上述出现次数计算语句模板与别名名称之间的匹配度,选取匹配度最高的一个语句模板作为别名名称关联的语句模板,将别名名称导入到该语句模板中,生成源语言语句。Preferably, other entities included in each sentence template are identified, and the number of occurrences of each other entity is identified from the co-occurrence relationship corresponding to the alias name, the matching degree between the sentence template and the alias name is calculated based on the number of occurrences, and the highest matching degree is selected As a statement template associated with the alias name, import the alias name into the statement template to generate a source language statement.
可选地,若语句模板的个数为多个,则可以为每个别名名称输出多个源语言语句,即将同一别名名称分别导入各个语句模板内,生成该别名名称的多个源语言语句。举例性地,若语句模板的个数为M个,别名名称的个数为N个,则可以输出M*N个源语言语句。Optionally, if there are multiple sentence templates, multiple source language sentences can be output for each alias name, that is, the same alias name is imported into each sentence template to generate multiple source language sentences of the alias name. For example, if the number of sentence templates is M and the number of alias names is N, then M*N source language sentences can be output.
在本申请实施例中,通过识别目标实体的实体类型,选取与实体类型对应的语句模板,并将别名名称导入语句模板内,生成源语言语句,实现了自动输出基于自然语言生成的多个语句,提高了源语言语句的生成效率。In the embodiment of the present application, by identifying the entity type of the target entity, selecting the sentence template corresponding to the entity type, and importing the alias name into the sentence template to generate the source language sentence, which realizes the automatic output of multiple sentences generated based on natural language , Improve the generation efficiency of source language sentences.
图7示出了本申请第四实施例提供的一种知识图谱的生成方法S1013的具体实现流程图。参见图7,相对于图4所述实施例,本实施例提供的一种知识图谱的生成方法中S1013包括:S701~S702,具体详述如下:FIG. 7 shows a specific implementation flowchart of a method S1013 for generating a knowledge graph provided by the fourth embodiment of the present application. Referring to FIG. 7, compared with the embodiment described in FIG. 4, S1013 in a method for generating a knowledge graph provided by this embodiment includes: S701 to S702, which are detailed as follows:
进一步地,所述分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名称,包括:Further, extracting the translated names of the alias names in the target language from each of the target language sentences respectively includes:
在S701中,若检测到所述目标语言语句内包含所述目标实体对应的词组,则识别所述目标语言语句为有效语句。In S701, if it is detected that the target language sentence contains the phrase corresponding to the target entity, then the target language sentence is identified as a valid sentence.
在本实施例中,知识图谱的生成装置在识别译名名称之前,可以对生成的目标语言语句进行过滤,将不包含目标对象的目标语言语句删除,只对包含目标实体的目标语言语句进行译名名称的识别,以提高译名名称识别的准确性。由于在将源语言语句翻译为目标语言语句的过程中,别名名称与语句模板中相邻的字符进行组合,可能会组成新的词语,从而导致源语言语句在翻译的过程中存在歧义,导致在转换为同一向量代码时出错,从而输出的目标语言语句可能存在不包含目标实体的情况。In this embodiment, before identifying the translated name, the generating device of the knowledge graph can filter the generated target language sentences, delete the target language sentences that do not contain the target object, and only translate the name of the target language sentences containing the target entity To improve the accuracy of the translated name recognition. Because in the process of translating the source language sentence into the target language sentence, the alias name and the adjacent characters in the sentence template may be combined to form new words, resulting in the ambiguity of the source language sentence in the translation process, resulting in An error occurs when converting to the same vector code, and the output target language sentence may not contain the target entity.
举例性地,一目标实体的别名名称为“语句”,而将“语句”导入一语句模板中构成了“生成语句”,在对上述词组进行翻译的过程中,可能会将“成语”识别为一个词组,将“语句”这一目标实体拆分,导致了翻译得到的目标语言语句不存在目标实体。For example, the alias name of a target entity is "sentence", and importing "sentence" into a sentence template constitutes "generating sentence". In the process of translating the above phrase, "idiom" may be recognized as A phrase splits the target entity of "sentence", resulting in that the translated sentence in the target language does not have the target entity.
在本实施例中,知识图谱的生成装置可以识别各个目标语言语句中包含的实体,若该目标语言语句中不包含目标实体,则识别该目标语言语句为无效语句;反之,若该目标语言语句中包含目标实体,则识别该目标语言语句为有效语句,并标记出目标 实体在目标语言语句中对应的词组。In this embodiment, the device for generating the knowledge graph can identify the entities contained in each target language sentence. If the target language sentence does not contain the target entity, then the target language sentence is identified as an invalid sentence; otherwise, if the target language sentence is If the target entity is included in the target language sentence, the target language sentence is identified as a valid sentence, and the phrase corresponding to the target entity in the target language sentence is marked.
可选地,知识图谱的生成装置可以识别无效语句对应的源语言语句,并确定该源语言语句所对应的别名名称。若语句模板存在多个,则为上述别名名称通过与上一语句模板不同的其他模板重新生成源语言语句,以重新识别该别名名称对应的译名名称。Optionally, the device for generating the knowledge graph can identify the source language sentence corresponding to the invalid sentence, and determine the alias name corresponding to the source language sentence. If there are multiple sentence templates, the source language sentence is regenerated from another template different from the previous sentence template for the aforementioned alias name to re-identify the translated name corresponding to the alias name.
在S702中,将所述有效语句中与所述目标实体对应的词组识别为所述译名名称。In S702, the phrase corresponding to the target entity in the valid sentence is recognized as the translated name.
在本实施例中,知识图谱的生成装置将目标实体在有效语句中对应的词组作为别名名称的译名名称,并建立别名名称与译名名称之间的映射关系。In this embodiment, the generating device of the knowledge graph uses the phrase corresponding to the target entity in the effective sentence as the translated name of the alias name, and establishes the mapping relationship between the alias name and the translated name.
在本申请实施例中,通过在识别译名名称之前,对目标语言语句进行有效性的识别,能够使得译名名称的识别操作更为准确,从而提高了转移关系的准确性。In the embodiment of the present application, by validating the target language sentence before recognizing the translated name, the recognition operation of the translated name can be made more accurate, thereby improving the accuracy of the transfer relationship.
图8示出了本申请第五实施例提供的一种知识图谱的生成方法S102的具体实现流程图。参见图5,相对于图1所述实施例,本实施例提供的一种知识图谱的生成方法中S102包括:S1021~S1023,具体详述如下:FIG. 8 shows a specific implementation flowchart of a method S102 for generating a knowledge graph provided by the fifth embodiment of the present application. Referring to FIG. 5, with respect to the embodiment described in FIG. 1, S102 in a method for generating a knowledge graph provided by this embodiment includes: S1021 to S1023, which are detailed as follows:
进一步地,所述通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系,包括:Further, the respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus includes:
在S1021中,从所述语料库提取包含所述目标实体的目标文本。In S1021, extract the target text containing the target entity from the corpus.
在本实施例中,语料库内可以存储从多个不同渠道采集的训练文本。举例性地,该语料库可以接收用户输入的文本数据,例如用户导入的文章、社交应用的交互记录(包括聊天记录以及互动信息),还可以从互联网自动下载文本数据。知识图谱的生成装置在获取得到一个训练文本后,可以识别该训练文本包含的实体,并建立实体与训练文本之间的对应关系,建立实体索引表。知识图谱的生成装置可以基于上述实体索引表从语料库中提取包含目标实体的目标文本。In this embodiment, training texts collected from multiple different channels can be stored in the corpus. For example, the corpus can receive text data input by the user, such as articles imported by the user, interaction records of social applications (including chat records and interaction information), and can also automatically download text data from the Internet. After obtaining a training text, the generating device of the knowledge graph can identify the entities contained in the training text, establish the corresponding relationship between the entities and the training text, and establish the entity index table. The device for generating the knowledge graph can extract the target text containing the target entity from the corpus based on the above-mentioned entity index table.
在S1022中,识别所述目标文本内除所述目标实体外的关联实体。In S1022, identify associated entities in the target text other than the target entity.
在本实施例中,知识图谱的生成装置可以通过命名实体识别(Named Entity Recognition,NER)算法定位出目标文本内包含的实体,将除目标实体外的其他实体识别为目标实体的关联实体。In this embodiment, the device for generating the knowledge graph can locate the entities contained in the target text through a named entity recognition (NER) algorithm, and recognize entities other than the target entity as related entities of the target entity.
举例性地,某一目标文本具体为“鸟巢位于水立方对面,是2008年北京奥运会的体育馆”,而目标实体为“鸟巢”,通过NER算法可以识别上述目标文本包含的实体为“鸟巢”、“水立方”、“北京”、“奥运会”以及“体育馆”,因此可以确定,除“鸟巢”外的其他识别即为“鸟巢”这一目标实体的关联实体。需要说明的是,关联实体之间的关联关系是双向的,即“水立方”为“鸟巢”的关联实体,而“鸟巢”也为“水立方”的关联实体。For example, a certain target text is specifically "The Bird's Nest is located opposite the Water Cube, which is the stadium of the 2008 Beijing Olympic Games", and the target entity is "Bird's Nest". The NER algorithm can identify the entity contained in the above target text as "Bird's Nest", "Water Cube", "Beijing", "Olympics" and "Gymnasium", therefore, it can be determined that other identifications other than "Bird's Nest" are related entities of the target entity "Bird's Nest". It should be noted that the relationship between related entities is bidirectional, that is, the "Water Cube" is the related entity of the "Bird's Nest", and the "Bird's Nest" is also the related entity of the "Water Cube".
在S1023中,根据所述目标实体在所述目标文本中对应的别名名称,得到所述别名名称与所述关联实体之间的所述共现关系。In S1023, the co-occurrence relationship between the alias name and the associated entity is obtained according to the alias name corresponding to the target entity in the target text.
在本实施例中,知识图谱的生成装置可以识别目标文本中目标实体基于源语言使用的别名名称,并创建关于别名名称的名称节点,创建关于该别名名称与关联实体的共现关系。若一个别名名称存在多个目标文本,即可以将各个目标文本中记载的所有关联实体均添加到该名称节点对应的共现关系。In this embodiment, the device for generating the knowledge graph can identify the alias name used by the target entity in the target text based on the source language, create a name node for the alias name, and create a co-occurrence relationship between the alias name and the associated entity. If there are multiple target texts for an alias name, all associated entities recorded in each target text can be added to the co-occurrence relationship corresponding to the name node.
在本申请实施例中,通过从语料库中记录有的文本数据,提取包含别名名称的目标文本,并根据目标文本中记录的关联实体,建立别名名称的共现关系,实现了以名 称为粒度的共现关系的构建,能够准确识别得到每个别名名称所使用的语境以及场景,从而提高了人工智能服务的响应的准确性。In the embodiment of the present application, the target text containing the alias name is extracted from the text data recorded in the corpus, and the co-occurrence relationship of the alias name is established based on the associated entities recorded in the target text, which realizes the name-granularity The construction of the co-occurrence relationship can accurately identify the context and scene used by each alias name, thereby improving the accuracy of the response of the artificial intelligence service.
图9示出了本申请第六实施例提供的一种知识图谱的生成方法的具体实现流程图。参见图9,相对于图1、图4、图6、图7以及图8任一所述实施例,本实施例提供的一种知识图谱的生成方法还包括:S901~S904,具体详述如下:FIG. 9 shows a specific implementation flowchart of a method for generating a knowledge graph provided by the sixth embodiment of the present application. Referring to FIG. 9, with respect to any of the embodiments described in FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8, the method for generating a knowledge graph provided by this embodiment further includes: S901 to S904, which are detailed as follows :
进一步地,在所述根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱之后,还包括:Further, after the constructing a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities, the method further includes:
在S901中,接收基于源语言的待翻译语句,并识别所述待翻译语句包含的所述待翻译实体,以构建所述待翻译语句的实体关系。In S901, a sentence to be translated based on the source language is received, and the entity to be translated included in the sentence to be translated is identified to construct an entity relationship of the sentence to be translated.
在本实施例中,作为知识图谱的一个应用实例,知识图谱的生成装置在构建得到包含多个目标实体的知识图谱后,可以通过知识图谱来对翻译服务进行技术支撑,从而提高翻译质量。常用的翻译技术为基于LSTM-RNN的NMT模型,该NMT模型可以采用端到端的翻译方案,编码模块-解码模块模型将源语言语句转换为隐状态向量,再使用目标语言的解码模块将隐状态向量转换为基于目标语言的自然语言文本。In this embodiment, as an application example of the knowledge graph, after the knowledge graph generating device constructs a knowledge graph containing multiple target entities, it can use the knowledge graph to provide technical support for translation services, thereby improving translation quality. The commonly used translation technology is the NMT model based on LSTM-RNN. The NMT model can adopt an end-to-end translation scheme. The encoding module-decoding module model converts the source language sentence into a hidden state vector, and then uses the decoding module of the target language to convert the hidden state The vector is converted into natural language text based on the target language.
举例性地,图10示出了本申请一实施例提供的基于知识图谱的翻译流程图,参见图10所示,在接收到待翻译的文本数据后,首先对文本数据进行预处理操作,即将文本数据导入至翻译预处理模块,识别该文本数据的源语言以及所需翻译至的目标语言。在确定了源语言以及目标语言后,预处理模块将识别得到的上述信息发送至知识图谱模块,以将知识图谱切换至与源语言相应的检测模式,即选取与源语言对应的自然语言理解(Natural Language Understanding,NLU)算法,通过知识图谱模块结合知识数据对文本数据进行NLU分析,标记出文本数据包含的实体,在生成的知识图谱中确定该实体在目标语言内对应的实体名称并返回给预处理模块。预处理模块根据知识图谱模块返回的实体列表,去除文本数据中的实体,替换为约定的特殊字符,该特殊字符可以根据实体类型确定得到,并将替换了特殊字符后的文本数据发送给NMT模块进行标准翻译流程,并获取其翻译结果,结果中会保留替换的特殊字符以确定文本数据中实体和翻译文本中实体的对应关系。最后,将知识图谱返回的实体翻译结果和NMT返回的原文翻译结果合并,即可获得最终的翻译结果。由此可见,若知识图谱中是以实体为粒度进行构建,在获取文本数据内各个实体在目标语言下的翻译译名时,则不会区分不同别名名称对应的译名名称,从而降低了翻译操作的准确性。基于此,本申请是基于别名名称为粒度,构建别名名称与译名名称之间的转译关系,从而可以通过识别在文本数据中该实体所使用的别名名称,并确定在当前文本数据中该别名名称所对应的译名名称,从而使得译名名称与当前的语境以及文法习惯相匹配,使得翻译译文更为准确。For example, FIG. 10 shows a translation flow chart based on a knowledge graph provided by an embodiment of the present application. As shown in FIG. 10, after receiving the text data to be translated, the text data is first preprocessed, namely The text data is imported into the translation preprocessing module to identify the source language of the text data and the target language to which it needs to be translated. After determining the source language and the target language, the preprocessing module sends the above-identified information to the knowledge graph module to switch the knowledge graph to the detection mode corresponding to the source language, that is, select the natural language understanding corresponding to the source language ( Natural Language Understanding (NLU) algorithm, through the knowledge graph module combined with the knowledge data to perform NLU analysis on the text data, mark the entity contained in the text data, determine the entity name corresponding to the entity in the target language in the generated knowledge graph and return it to Preprocessing module. The preprocessing module removes the entities in the text data according to the entity list returned by the knowledge graph module, and replaces them with the agreed special characters. The special characters can be determined according to the entity type, and the text data after replacing the special characters is sent to the NMT module Perform the standard translation process and obtain the translation results. The replaced special characters will be retained in the results to determine the correspondence between the entities in the text data and the entities in the translated text. Finally, merge the entity translation result returned by the knowledge graph and the original translation result returned by NMT to obtain the final translation result. It can be seen that if the knowledge graph is constructed with entity as the granularity, when obtaining the translated name of each entity in the text data in the target language, the translated name corresponding to different alias names will not be distinguished, thereby reducing the translation operation. accuracy. Based on this, this application is based on the granularity of the alias name to construct the translation relationship between the alias name and the translated name, so that the alias name used by the entity in the text data can be identified, and the alias name in the current text data can be determined The corresponding translated name, so that the translated name matches the current context and grammatical habits, making the translated translation more accurate.
在本实施例中,知识图谱的生成装置可以对待翻译语句进行语义分析,通过NLU算法识别待翻译语句包含的翻译实体,并将识别得到的所有翻译实体构建关于待翻译语句的实体关系。In this embodiment, the device for generating the knowledge graph can perform semantic analysis on the sentence to be translated, identify the translation entities contained in the sentence to be translated through the NLU algorithm, and construct the entity relationship of all the identified translation entities with respect to the sentence to be translated.
举例性地,某一待翻译语句为“中国国家大剧院由法国建筑师保罗·安德鲁主持设计,是亚洲最大的剧院综合体”,则通过NLU算法可以识别出翻译实体包括“中国”、“国家大剧院”、“法国”、“建筑师”、“亚洲”、“剧院”以及“综合体”,建 立上述各个翻译实体的共现关系,该共现关系即为待翻译语句的实体关系。For example, if a sentence to be translated is "The National Grand Theater of China was designed by the French architect Paul Andrew and is the largest theater complex in Asia", the NLU algorithm can identify the translation entities including "China" and "National "Grand Theatre", "France", "architect", "Asia", "theatre" and "complex" establish the co-occurrence relationship of the translation entities mentioned above, and the co-occurrence relationship is the entity relationship of the sentence to be translated.
在S902中,在所述知识图谱中提取所述待翻译实体基于所述目标语言对应的转译关系;所述转译关系包含所述待翻译实体的至少一个译名名称。In S902, a translation relationship corresponding to the entity to be translated based on the target language is extracted from the knowledge graph; the translation relationship includes at least one translated name of the entity to be translated.
在本实施例中,知识图谱的生成装置在确定待翻译语句内包含的翻译实体后,可以在知识图谱中查询关于各个翻译实体对应的实体节点,并从实体节点中提取对应的转译关系。该转译关系记录有翻译实体的至少一个译名名称。In this embodiment, after determining the translation entities included in the sentence to be translated, the knowledge graph generating device can query the knowledge graph for the entity node corresponding to each translation entity, and extract the corresponding translation relationship from the entity node. The translation relationship records at least one translated name of the translation entity.
可选地,若知识图谱中记录有翻译实体关于各个别名名称与译名名称之间的转译关系,知识图谱的生成装置可以识别在待翻译语句中使用的别名名称,并根据别名名称与译名名称之间的转译关系,确定翻译实体在待翻译语句中对应的目标译名,无需执行S903的匹配度计算操作。若知识图谱中并没有记录有翻译实体的各个别名名称与译名名称之间的转译关系,或一个别名名称对应多个译名名称,则执行S903的操作,以确定在待翻译语句中具体使用的译名名称。Optionally, if the translation entity's translation relationship between each alias name and the translated name is recorded in the knowledge graph, the generating device of the knowledge graph can identify the alias name used in the sentence to be translated, and based on the difference between the alias name and the translated name The translation relationship between the two determines the target translation name corresponding to the translation entity in the sentence to be translated without performing the matching degree calculation operation of S903. If the translation relationship between each alias name of the translation entity and the translated name is not recorded in the knowledge graph, or one alias name corresponds to multiple translated names, perform the operation of S903 to determine the specific translated name used in the sentence to be translated name.
在903中,根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度。In 903, the degree of matching between the sentence to be translated and the translated name is calculated according to the entity relationship and the co-occurrence relationship of the translated name.
在本实施例中,知识图谱的生成装置可以根据实体关系以及与翻译实体对应的各个译名名称的共现关系,确定各个译名名称与当前待翻译语句之间的匹配度。由于在不同的语境下,采用的译名名称不同,因此需要确定在待翻译语句的语境下,各个译名名称与待翻译语句之间的匹配度,从而选取与语境最为契合的译名名称,从而提高翻译操作的准确性。In this embodiment, the device for generating the knowledge graph can determine the degree of matching between each translated name and the current sentence to be translated based on the entity relationship and the co-occurrence relationship of each translated name corresponding to the translation entity. Since the translated names used in different contexts are different, it is necessary to determine the degree of matching between each translated name and the sentence to be translated in the context of the sentence to be translated, so as to select the translated name that best fits the context. Thereby improving the accuracy of translation operations.
可选地,计算翻译语句与译名名称之间的匹配度的方式可以为:知识图谱的生成装置可以将译名名称对应的待翻译实体识别为基准实体,并将实体关系中除基准实体外的其他实体识别为参考实体,判断译名名称的共现关系内是否存在参考实体,若存在,则通过共现关系确定该参考实体与译名名称之间共同出现的共现次数,并根据译名名称与所有参考实体之间的共现次数以及存在共现关系的参考实体的实体个数,确定待翻译语句与译名名称之间的匹配度。Optionally, the way of calculating the degree of matching between the translated sentence and the translated name may be: the knowledge graph generating device can identify the entity to be translated corresponding to the translated name as the reference entity, and identify other entities in the entity relationship except for the reference entity The entity is identified as a reference entity, and it is judged whether there is a reference entity in the co-occurrence relationship of the translated name. If it exists, the co-occurrence relationship is used to determine the number of co-occurrences between the reference entity and the translated name, and based on the translated name and all references The number of co-occurrences between entities and the number of entities of reference entities that have a co-occurrence relationship determine the degree of matching between the sentence to be translated and the translated name.
进一步地,作为本申请的另一实施例,S903具体可以为:Further, as another embodiment of the present application, S903 may specifically be:
将所述实体关系以及所述译名名称的共现关系导入预设的匹配度计算函数,计算所述匹配度;所述匹配度计算函数具体为:Import the co-occurrence relationship of the entity relationship and the translated name into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:
Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2)max sim entity(ei,ej); Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2) max sim entity (ei,ej);
sim entity(ei,ej)=∑ p∈Prop(ei)∩Prop(ej)ω pSimlarity type(p)(ei[p],ej[p]) sim entity (ei,ej)=∑ p∈Prop(ei)∩Prop(ej) ω p Simlarity type(p) (ei[p],ej[p])
其中,Sim(E1,E2)为所述待翻译实体与所述译名名称之间的所述匹配度;Context(E1)为所述待翻译实体E1在所述知识图谱中对应的所述共现关系内包含的关联实体;Context(E2)为所述译名名称E2的所述共现关系内包含的关联实体;ei为所述待翻译实体E1的所述共现关系内第i个关联实体;ej为所述译名名称E2的所述共现关系内第j个所述关联实体;Prop(ei)为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型;Prop(ej)为所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型;ω p为所述实体类型对应的权重值;Simlarity type(p)(ei[p],ej[p])为所述实体类型对应的匹配度函数;ei[p]为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型的参数值;ej[p]为所述第j个所述译名名称E2的所述共现关系内第 j个所述关联实体的实体类型的参数值。 Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω p is the weight value corresponding to the entity type; Similarity type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.
在本实施例中,其中,E1为基于源语言下的待翻译实体,而E2则为基于目标语言关于待翻译实体的译名名称。知识图谱的生成装置可以计算源语言下待翻译实体的共现关系对应的实体集合中每个实体与译名名称的共现关系的各个实体之间的相似度,选取匹配度的最大值作为特征匹配度,将所有特征匹配度进行累加,计算得到的译名名称与待翻译语句中待翻译实体之间的匹配度。In this embodiment, E1 is the entity to be translated based on the source language, and E2 is the translated name of the entity to be translated based on the target language. The generating device of the knowledge graph can calculate the similarity between each entity in the entity set corresponding to the co-occurrence relationship of the entity to be translated in the source language and the co-occurrence relationship of the translated name, and select the maximum value of the matching degree as the feature matching The degree of matching of all features is accumulated, and the degree of matching between the translated name obtained by calculation and the entity to be translated in the sentence to be translated is calculated.
其中,不同实体之间的匹配度计算可以参考sim entity(ei,ej)函数,知识图谱的生成装置只对实体类型相同的两个实体计算相互之间的相似度,若实体关系中的其中一个实体与译名名称的共现关系的其中一个实体之间属于不同类型的两个实体,则不会计算上述两个实体之间的相似度,从而能够大大减少大量无效的相似度计算操作。知识图谱的生成装置根据实体类型选取对应的相似度计算模型,即Simlarity type(p)(ei[p],ej[p]),例如两个实体分别为“老人”以及“teenager”,上述两个实体对应的实体类型为“年龄”,则获取年龄相似度计算模型,来计算上述两个实体之间的相似度。在上述函数中,ei[p]为所述所述第i个待翻译实体的实体类型的参数值;ej[p]为所述第j个所述关联实体的实体类型的参数值,继续以“老人”以及“年轻人”这两个实体作为例子进行说明,“老人”对应的年龄为70岁或以上,则关于实体类型的参数值可以设置为70,而“teenager”对应的年龄为18岁至30岁,则关于实体类型的参数值可以设置为20,将上述两个参数值导入到年龄相似度计算模型,可以计算出两个实体之间的相似度。 Among them, the matching degree calculation between different entities can refer to the sim entity (ei, ej) function. The knowledge graph generation device only calculates the mutual similarity between two entities of the same entity type. If one of the entity relationships is In the co-occurrence relationship between the entity and the translated name, if one of the entities is between two entities of different types, the similarity between the above two entities will not be calculated, which can greatly reduce a large number of invalid similarity calculation operations. The generating device of the knowledge graph selects the corresponding similarity calculation model according to the entity type, namely Similarity type(p) (ei[p],ej[p]). For example, the two entities are "old man" and "teenager" respectively. The entity type corresponding to each entity is "age", then the age similarity calculation model is obtained to calculate the similarity between the above two entities. In the above function, ei[p] is the parameter value of the entity type of the i-th entity to be translated; ej[p] is the parameter value of the entity type of the j-th associated entity, continue to The two entities "old man" and "young man" are used as examples to illustrate. The corresponding age of "old man" is 70 years old or above, and the parameter value of the entity type can be set to 70, while the age corresponding to "teenager" is 18. Age to 30, the parameter value for the entity type can be set to 20, and the above two parameter values can be imported into the age similarity calculation model to calculate the similarity between the two entities.
在S904中,基于所述匹配度,从所有所述译名名称中确定所述待翻译实体的目标译名,并根据所有所述目标译名,输出所述待翻译语句基于目标语言的转译语句。In S904, based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.
在本实施例中,在计算了各个译名名称与待翻译语句之间的匹配度后,可以选取匹配度数值最高的一个译名名称,作为本次翻译操作中待翻译实体对应的目标译名,并将各个译名名称导入到通过NMT算法输出的不包含实体的译名中相应的区域,从而得到待翻译语句在目标语言下的转译语句,完成语句翻译的操作。In this embodiment, after calculating the matching degree between each translated name and the sentence to be translated, the translated name with the highest matching degree value can be selected as the target translated name corresponding to the entity to be translated in this translation operation, and Each translated name is imported into the corresponding area in the translated name that does not contain the entity output by the NMT algorithm, so that the translated sentence of the sentence to be translated in the target language is obtained, and the operation of sentence translation is completed.
可选地,知识图谱的生成装置在确定了待翻译实体在待翻译语句中对应的目标译名后,可以根据待翻译实体在待翻译语句中出现的待翻译别名,建立待翻译别名与目标译名之间的转译关系,并将转译关系添加到知识图谱中,实现了智能学习转译关系。Optionally, after determining the target translated name of the entity to be translated in the sentence to be translated, the generating device of the knowledge graph may establish the relationship between the alias to be translated and the target translated name based on the alias to be translated appearing in the sentence to be translated. The translation relationship between the two, and the translation relationship is added to the knowledge graph, which realizes the intelligent learning translation relationship.
在本申请实施例中,通过获取待翻译语句的实体关系,并根据实体关系以及各个译名名称的共现关系,确定在当前语境下待翻译实体的译名名称,通过知识图谱以支撑翻译决策,提高了翻译的准确性。In the embodiment of the present application, by obtaining the entity relationship of the sentence to be translated, and according to the entity relationship and the co-occurrence relationship of each translated name, the translated name of the entity to be translated in the current context is determined, and the knowledge graph is used to support the translation decision. Improve the accuracy of translation.
图11示出了本申请一实施例提供的一种基于知识图谱的翻译系统的结构示意图。参见图11所示,该基于知识图谱的翻译系统包括:翻译服务云服务系统111、知识图谱的生成装置112、智能标注服务器113、云数据库服务器114、用户终端115以及第三方应用平台116。FIG. 11 shows a schematic structural diagram of a translation system based on a knowledge graph provided by an embodiment of the present application. As shown in FIG. 11, the knowledge graph-based translation system includes: a translation service cloud service system 111, a knowledge graph generating device 112, an intelligent annotation server 113, a cloud database server 114, a user terminal 115, and a third-party application platform 116.
翻译服务云系统111,包括有文本检索模块、翻译服务响应模块以及数据接入模块。其中,数据接入模块是用于与各个其他设备进行数据的收发操作,而翻译服务响应模块则用于接收到用户终端发送的翻译服务进行数据封装,得到翻译结果并返回给用户终端,文本检索模块用于提取翻译请求中的文本数据,并对文本数据进行预处理 操作。The translation service cloud system 111 includes a text retrieval module, a translation service response module, and a data access module. Among them, the data access module is used to send and receive data with various other devices, and the translation service response module is used to receive the translation service sent by the user terminal for data encapsulation, obtain the translation result and return it to the user terminal, and text retrieval The module is used to extract the text data in the translation request and perform preprocessing operations on the text data.
知识图谱的生成装置112,包括有翻译纠错模块、知识图谱模块、翻译模块以及数据管理模块。其中,翻译纠错模块用于检测翻译请求中携带的待翻译语句是否存在所需纠错的内容,通过术语纠错、人名纠错、整句纠错等对待翻译语句进行纠错处理,并将纠错处理后的待翻译语句发送给翻译模块,通过翻译模块进行翻译操作,其中具体的翻译过程可以参见图10所示的翻译过程,在此不再赘述。数据管理模块可以用于对接收到的数据进行缓存,并将包含用户身份信息的敏感字段进行屏蔽,从而能够保护用户的隐私信息。The knowledge graph generating device 112 includes a translation error correction module, a knowledge graph module, a translation module, and a data management module. Among them, the translation error correction module is used to detect whether the sentence to be translated contained in the translation request contains the content that needs to be corrected, and correct the sentence to be translated through terminology correction, name correction, whole sentence correction, etc., and The sentence to be translated after the error correction process is sent to the translation module, and the translation operation is performed through the translation module. The specific translation process can be referred to the translation process shown in FIG. 10, which will not be repeated here. The data management module can be used to cache the received data and shield the sensitive fields containing the user's identity information, so as to protect the user's private information.
智能标注服务器113,包括登录认证模块,网页web模块以及服务器Service模块。通过智能标注服务器的登录认证模块进行身份识别,判断服务请求的有效性,并通过web模块以展示云数据库服务器的数据表,并通过服务器模块以将采集得到的数据更新存储于云数据库服务器的数据。The intelligent labeling server 113 includes a login authentication module, a web page web module, and a server Service module. Identify the identity through the login authentication module of the smart label server, determine the validity of the service request, and display the data table of the cloud database server through the web module, and update the data stored in the cloud database server through the server module with the collected data .
云数据库服务器114可以包括有基于MySQL框架构建的数据库、基于Hadhoop框架构建的数据库等,云数据库服务器可以用于存储翻译操作所需的云数据,例如从各个渠道学习到的语料、用户终端发起的历史翻译记录以及构建知识图谱所需的知识等。The cloud database server 114 can include a database based on the MySQL framework, a database based on the Hadhoop framework, etc. The cloud database server can be used to store cloud data required for translation operations, such as corpus learned from various channels, and initiated by user terminals. Historical translation records and knowledge required to construct a knowledge graph, etc.
用户终端115,可以通过内置的应用程序发起服务请求,智能翻译引擎可以确定该服务请求所需使用的翻译渠道,对于语音翻译,则可以通过对应的第三方平台获取对应的翻译单词的语音数据;对于单词文本翻译,则可以通过对应的第三方平台获取对应的单词翻译的翻译数据;对于语句文本翻译,则可以通过智能图谱的生成装置内置的翻译模块输出待翻译语句的转译语句,即与对于翻译请求的类型不同,可以通过智能翻译引擎确定对应的翻译响应路径。The user terminal 115 can initiate a service request through a built-in application, and the intelligent translation engine can determine the translation channel required for the service request. For voice translation, the voice data of the corresponding translated word can be obtained through the corresponding third-party platform; For word text translation, you can obtain the translation data of the corresponding word translation through the corresponding third-party platform; for sentence text translation, you can output the translated sentence of the sentence to be translated through the built-in translation module of the intelligent graph generating device, which is the same as for The type of translation request is different, and the corresponding translation response path can be determined through the intelligent translation engine.
第三方应用平台116,可以包括有多个不同的第三方翻译应用,用于支撑整个翻译系统的部分翻译操作,例如单词翻译、单词语音查询等。The third-party application platform 116 may include multiple different third-party translation applications to support part of the translation operations of the entire translation system, such as word translation, word voice query, and so on.
以用户发起一语句翻译请求的过程说明翻译系统的工作流程。用户终端115通过应用程序接收用户发起的语句翻译请求,继而用户终端115智能翻译引擎确定本次翻译操作所需的翻译渠道,由于本次操作为语句翻译,即需要通过知识图谱的生成装置112以支撑本次的翻译操作,并将携带有渠道标识的语句翻译请求发送给翻译服务云系统111。翻译服务云系统111通过数据接入模块获取得到语句翻译请求,并将语句翻译请求发送给知识图谱的生成装置112,知识图谱的生成装置112通过翻译纠错模块对语句翻译请求中携带的待翻译语句进行初步纠错操作,并将纠错后的待翻译语句导入到翻译模块的预处理单元,通过预处理单元识别待翻译语句的源语言以及目标语言,并通过知识图谱识别待翻译语句中使用的别名名称,并根据转移关系确定各个别名名称在目标语言下对应的译名名称,将译名名称反馈给翻译单元,通过翻译单元以输出待翻译语句的转译语句,对转译语句进行预处理,并通过数据管理模块返回给翻译服务云系统111的数据接入模块,通过翻译服务云系统内的翻译服务响应模块进行翻译结果的封装,并将翻译结果返回给用户终端。The process of the user initiating a sentence translation request illustrates the workflow of the translation system. The user terminal 115 receives the sentence translation request initiated by the user through the application program, and then the intelligent translation engine of the user terminal 115 determines the translation channel required for this translation operation. Since this operation is sentence translation, it needs to pass the knowledge graph generation device 112 to Support this translation operation, and send the sentence translation request carrying the channel identifier to the translation service cloud system 111. The translation service cloud system 111 obtains the sentence translation request through the data access module, and sends the sentence translation request to the knowledge graph generating device 112, and the knowledge graph generating device 112 uses the translation error correction module to perform translation errors in the sentence translation request. Perform preliminary error correction operations on the sentence, and import the corrected sentence to be translated into the preprocessing unit of the translation module. The preprocessing unit identifies the source language and target language of the sentence to be translated, and uses the knowledge graph to identify the sentence to be translated. According to the transfer relationship, determine the corresponding translated name of each alias name in the target language, feedback the translated name to the translation unit, and output the translated sentence of the sentence to be translated through the translation unit, preprocess the translated sentence, and pass The data management module returns to the data access module of the translation service cloud system 111, and encapsulates the translation result through the translation service response module in the translation service cloud system, and returns the translation result to the user terminal.
图12示示出了本申请一实施例提供的一种知识图谱的生成装置内各个单元在响应翻译操作时对应的交互流程图。该知识图谱的生成装置可以包括有翻译预处理单元、 知识图谱服务单元、知识图谱索引单元以及知识图谱图引擎单元。在知识图谱的生成装置在接收到翻译请求后,可以从翻译请求中提取待翻译语句,并将待翻译语句发送给翻译预处理单元,通过翻译预处理单元识别待翻译语句的源语言以及目标语言,将预处理的待翻译语句以及上述两个参数信息发送给知识图谱服务单元,通过知识图谱服务单元选取与源语言对应的NLP模型,并通过NLP模型对待翻译语句进行NER识别,确定该待翻译语句包含的各个待翻译实体,通过各个待翻译实体发送给知识图谱索引单元,通过知识图谱索引单元在知识图谱中定位出各个待翻译实体的实体节点,以及根据知识图谱索引单元确定出各个实体节点关联的名称列表,即获取得到各个待翻译实体基于目标语言的译名名称。知识图谱服务单元向知识图谱图引擎单元发送共现关系查询请求,以确定与各个译名名称存在共现关系的关联实体。知识图谱图引擎单元将查询得到的共现关系返回给知识图谱服务单元,并通过知识图谱服务单元从多个不同别名名称对应的译名名称中,选取出目标译名,根据所有目标译名生成待翻译语句的转译语句并返回给翻译预处理单元,输出翻译结果。FIG. 12 shows a corresponding interaction flow chart of each unit in the apparatus for generating a knowledge graph provided by an embodiment of the present application when responding to a translation operation. The device for generating a knowledge graph may include a translation preprocessing unit, a knowledge graph service unit, a knowledge graph index unit, and a knowledge graph graph engine unit. After the knowledge graph generation device receives the translation request, it can extract the sentence to be translated from the translation request, and send the sentence to be translated to the translation preprocessing unit, and the translation preprocessing unit identifies the source language and target language of the sentence to be translated , Send the pre-processed sentence to be translated and the above two parameter information to the knowledge graph service unit, select the NLP model corresponding to the source language through the knowledge graph service unit, and use the NLP model to identify the sentence to be translated by NER to determine the translation Each entity to be translated contained in the sentence is sent to the knowledge graph index unit through each entity to be translated, and the entity node of each entity to be translated is located in the knowledge graph through the knowledge graph index unit, and each entity node is determined according to the knowledge graph index unit The associated name list is to obtain the translated name of each entity to be translated based on the target language. The knowledge graph service unit sends a co-occurrence relationship query request to the knowledge graph engine unit to determine the associated entities that have a co-occurrence relationship with each translated name. The knowledge graph engine unit returns the co-occurrence relationship obtained by the query to the knowledge graph service unit, and through the knowledge graph service unit, selects the target translated name from the translated names corresponding to multiple different alias names, and generates the sentence to be translated according to all the target translated names The translated sentence is returned to the translation preprocessing unit, and the translation result is output.
图13示出了本申请第七实施例提供的一种知识图谱的生成方法的具体实现流程图。参见图12,相对于图1、图4、图6、图7以及图8任一所述实施例,本实施例提供的一种知识图谱的生成方法还包括:S1301~S1302,具体详述如下:FIG. 13 shows a specific implementation flowchart of a method for generating a knowledge graph provided by the seventh embodiment of the present application. Referring to FIG. 12, with respect to any one of the embodiments described in FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8, the method for generating a knowledge graph provided by this embodiment further includes: S1301 to S1302, which are detailed as follows :
进一步地,在所述根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱之后,还包括:Further, after the constructing a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities, the method further includes:
在S1301中,接收用户输入的关键词,并从所述知识图谱中查询所述关键词对应的所述共现关系。In S1301, the keyword input by the user is received, and the co-occurrence relationship corresponding to the keyword is queried from the knowledge graph.
在本实施例中,作为知识图谱的一个应用实例,知识图谱的生成装置在构建得到包含多个目标实体的知识图谱后,可以通过知识图谱来对推荐服务进行技术支撑,由于知识图谱根据语料库确定了目标实体中各个别名名称的共现关系,对知识图谱的深度进行进一步挖掘,在实体的基础上,挖掘不同别名名称的共现关系,从而能够确定不同别名之间的在关联对象之间的差异,从而能够提高推荐信息的精准度。例如,对于“粉”这一实体,具有“米粉”以及“米线”两种不同的别名名称,而不同的别名名称常常搭配的其他实体不同,例如“肥肠米粉”以及“过桥米线”等,对应与别名名称不同的搭配实体,可以识别得到用户关联的口味、饮食习惯等,对于以“实体”为粒度来确定推荐信息而言,通过“别名名称”为粒度建立的共现关系,可挖掘得到的推荐信息的精准度更高。In this embodiment, as an application example of the knowledge graph, after the knowledge graph generating device constructs a knowledge graph containing multiple target entities, it can use the knowledge graph to provide technical support for the recommendation service, because the knowledge graph is determined according to the corpus The co-occurrence relationship of each alias name in the target entity is further explored in the depth of the knowledge graph. On the basis of the entity, the co-occurrence relationship of different alias names can be explored, so as to determine the relationship between different aliases and related objects. Difference, which can improve the accuracy of the recommended information. For example, for the entity "fen", there are two different alias names of "rice noodles" and "rice noodles", and different alias names often match other entities differently, such as "fatchang rice noodles" and "crossing bridge rice noodles". Corresponding to the matching entity that is different from the alias name, it can identify the user's associated tastes, eating habits, etc., for the "entity" as the granularity to determine the recommended information, the co-occurrence relationship established by the "alias name" as the granularity can be mined The recommended information obtained is more accurate.
在本实施例中,知识图谱的生成装置可以接收用户输入的关键词,并识别该关键词中对应的实体,以及该关键词所使用的别名名称,获取该别名名称在知识图谱中关联的知识节点,并从知识节点中提取该别名名称的共现关系。In this embodiment, the device for generating the knowledge graph can receive the keyword input by the user, and identify the entity corresponding to the keyword, and the alias name used by the keyword, and obtain the knowledge associated with the alias name in the knowledge graph Node, and extract the co-occurrence relationship of the alias name from the knowledge node.
在S1302中,根据所述共现关系输出所述用户的推荐信息。In S1302, output recommendation information of the user according to the co-occurrence relationship.
在本实施例中,知识图谱的生成装置可以根据共现关系内各个关联实体的共现次数,选取出对应的推荐实体,并基于推荐实体输出推荐信息。该推荐信息可以根据场景的不同,得到不同的推荐结果,例如在搜索场景下,则可以输出该输入的关键词的联想关键词,该联想关键词为共现次数较多实体所对应的关键词,并将包含联想关键词的搜索结果显示在较前的位置,即基于搜索结果内包含的联想关键词的个数以及各 个联想关键词与输入的关键词之间的共现次数,确定显示次序,并基于显示次序输出显示结果;例如在产品购买场景下,则可以根据用户输入的关键词,确定关联的产品关键词,并基于产品关键词确定推荐产品,生成产品推荐列表,其中,关联的产品关键词为基于输入的关键词所使用的别名名称所对应的共现关系得到;又例如在用户画像的输出场景下,可以根据用户输入的关键词,从共现关系中识别得到多个共现实体,并根据共现实体以及关键词,输出用户的用户标签。In this embodiment, the device for generating a knowledge graph can select a corresponding recommended entity according to the number of co-occurrences of each associated entity in the co-occurrence relationship, and output recommendation information based on the recommended entity. The recommendation information can obtain different recommendation results according to different scenarios. For example, in a search scenario, the associated keyword of the input keyword can be output, and the associated keyword is the keyword corresponding to the entity with more co-occurrences. , And display the search results containing the associated keywords in the earlier position, that is, determine the display order based on the number of associated keywords contained in the search results and the number of co-occurrences between each associated keyword and the entered keyword , And output the display results based on the display order; for example, in a product purchase scenario, you can determine the associated product keywords based on the keywords entered by the user, and determine the recommended products based on the product keywords, and generate a product recommendation list. Product keywords are obtained based on the co-occurrence relationship corresponding to the alias name used by the input keywords; for example, in the output scene of a user portrait, multiple co-occurrence relationships can be identified from the co-occurrence relationship based on the keywords entered by the user Reality, and output the user tag of the user based on the common reality and keywords.
在本申请实施例中,通过构建以“名称”为粒度的知识图谱,在智能推荐领域能够进一步提高推荐信息的准确性。In the embodiment of the present application, by constructing a knowledge graph with the granularity of "name", the accuracy of recommended information can be further improved in the field of intelligent recommendation.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.
对应于上文实施例所述的知识图谱的生成方法,图14示出了本申请实施例提供的知识图谱的生成装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the method for generating a knowledge graph described in the above embodiment, FIG. 14 shows a structural block diagram of a device for generating a knowledge graph provided by an embodiment of the present application. For ease of description, only the information related to the embodiment of the present application is shown. section.
参照图14,该知识图谱的生成装置包括:Referring to Figure 14, the device for generating the knowledge graph includes:
转译关系建立单元141,用于建立目标实体的多个别名名称基于目标语言的转译关系;The translation relationship establishment unit 141 is configured to establish a translation relationship of multiple alias names of the target entity based on the target language;
共现关系生成单元142,用于通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系;The co-occurrence relationship generating unit 142 is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;
知识图谱构建单元143,用于根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。The knowledge graph construction unit 143 is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
可选地,所述转译关系建立单元141包括:Optionally, the translation relationship establishment unit 141 includes:
源语言语句获取单元,用于分别获取包含各个所述别名名称的源语言语句;The source language sentence acquiring unit is used to separately acquire the source language sentence including each of the alias names;
目标语言语句获取单元,用于根据源语言与所述目标语言之间的翻译模型,输出各个所述源语言语句对应的目标语言语句;The target language sentence acquiring unit is configured to output the target language sentence corresponding to each source language sentence according to the translation model between the source language and the target language;
译名名称识别单元,用于分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名称;The translated name recognition unit is configured to extract the translated name of the alias name in the target language from each sentence in the target language;
转译关系确定单元,用于建立所述别名名称以及所述译名名称之间的所述转译关系。The translation relationship determining unit is used to establish the translation relationship between the alias name and the translated name name.
可选地,所述源语言语句获取单元包括:Optionally, the source language sentence acquisition unit includes:
语句模板获取单元,用于根据所述目标实体的实体类型,获取与所述实体类型关联的语句模板;语句模板导入单元,用于将各个所述别名名称导入所述语句模板,生成所述源语言语句。The sentence template obtaining unit is used to obtain the sentence template associated with the entity type according to the entity type of the target entity; the sentence template importing unit is used to import each of the alias names into the sentence template to generate the source Language statements.
可选地,所述译名名称识别单元包括:Optionally, the translated name recognition unit includes:
有效语句选取单元,用于若检测到所述目标语言语句内包含所述目标实体对应的词组,则识别所述目标语言语句为有效语句;关键词组识别单元,用于将所述有效语句中与所述目标实体对应的词组识别为所述译名名称。A valid sentence selection unit is used to identify the target language sentence as a valid sentence if it is detected that the target language sentence contains the phrase corresponding to the target entity; the keyword group recognition unit is used to compare the valid sentence with The phrase corresponding to the target entity is identified as the translated name.
可选地,所述共现关系生成单元142包括:Optionally, the co-occurrence relationship generation unit 142 includes:
目标文本提取单元,用于从所述语料库提取包含所述目标实体的目标文本;关联 实体识别单元,用于识别所述目标文本内除所述目标实体外的关联实体;共现关系建立单元,用于根据所述目标实体在所述目标文本中对应的别名名称,得到所述别名名称与所述关联实体之间的所述共现关系。The target text extraction unit is used to extract the target text containing the target entity from the corpus; the associated entity recognition unit is used to identify the associated entities other than the target entity in the target text; the co-occurrence relationship establishment unit, It is used to obtain the co-occurrence relationship between the alias name and the associated entity according to the alias name corresponding to the target entity in the target text.
可选地,所述知识图谱的生成装置还包括:待翻译实体识别单元,用于接收基于源语言的待翻译语句,并识别所述待翻译语句包含的所述待翻译实体,以构建所述待翻译语句的实体关系;转译关系提取单元,用于在所述知识图谱中提取所述待翻译实体基于所述目标语言对应的转译关系;所述转译关系包含所述待翻译实体的至少一个译名名称;匹配度计算单元用于,根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度;转译语句输出单元,用于基于所述匹配度,从所有所述译名名称中确定所述待翻译实体的目标译名,并根据所有所述目标译名,输出所述待翻译语句基于目标语言的转译语句。Optionally, the device for generating the knowledge graph further includes: an entity-to-be-translated recognition unit for receiving a sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the The entity relationship of the sentence to be translated; a translation relationship extraction unit for extracting the translation relationship of the entity to be translated based on the target language from the knowledge graph; the translation relationship includes at least one translated name of the entity to be translated Name; the matching degree calculation unit is used to calculate the matching degree between the sentence to be translated and the translated name based on the entity relationship and the co-occurrence relationship of the translated name; the translated sentence output unit is used to calculate the degree of matching between the sentence to be translated and the translated name; According to the matching degree, the target translated name of the entity to be translated is determined from all the translated names, and the translation sentence based on the target language of the sentence to be translated is output according to all the target translated names.
可选地,所述匹配度计算单元具体用于:Optionally, the matching degree calculation unit is specifically configured to:
将所述实体关系以及所述译名名称的共现关系导入预设的匹配度计算函数,计算所述匹配度;所述匹配度计算函数具体为:Import the co-occurrence relationship of the entity relationship and the translated name into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:
Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2)max sim entity(ei,ej); Sim(E1,E2)=∑ ei∈Context(E1),ej∈Context(E2) max sim entity (ei,ej);
sim entity(ei,ej)=∑ p∈Prop(ei)∩Prop(ej)ω pSimlarity type(p)(ei[p],ej[p]) sim entity (ei,ej)=∑ p∈Prop(ei)∩Prop(ej) ω p Simlarity type(p) (ei[p],ej[p])
其中,Sim(E1,E2)为所述待翻译实体与所述译名名称之间的所述匹配度;Context(E1)为所述待翻译实体E1在所述知识图谱中对应的所述共现关系内包含的关联实体;Context(E2)为所述译名名称E2的所述共现关系内包含的关联实体;ei为所述待翻译实体E1的所述共现关系内第i个关联实体;ej为所述译名名称E2的所述共现关系内第j个所述关联实体;Prop(ei)为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型;Prop(ej)为所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型;ω p为所述实体类型对应的权重值;Simlarity type(p)(ei[p],ej[p])为所述实体类型对应的匹配度函数;ei[p]为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型的参数值;ej[p]为所述第j个所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型的参数值。 Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω p is the weight value corresponding to the entity type; Similarity type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.
可选地,所述知识图谱的生成装置还包括:Optionally, the device for generating the knowledge graph further includes:
关键词接收单元,用于接收用户输入的关键词,并从所述知识图谱中查询所述关键词对应的所述共现关系;The keyword receiving unit is configured to receive keywords input by the user, and query the co-occurrence relationship corresponding to the keywords from the knowledge graph;
推荐信息输出单元,用于根据所述共现关系输出所述用户的推荐信息。The recommendation information output unit is configured to output the recommendation information of the user according to the co-occurrence relationship.
因此,本申请实施例提供的知识图谱的生成装置同样可以对知识图谱中每个知识节点,即目标实体建立转移关系,以连接不同语种之间知识节点,并通过构建共现关系以扩展每个知识节点的知识深度,不单单局限于目标实体自身属性,提高了每个知识节点的联想能力,知识图谱的广度以及深度,从而提高了人工智能输出结果的准确性,提升服务响应质量。Therefore, the device for generating a knowledge graph provided by the embodiment of the present application can also establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.
图15为本申请一实施例提供的终端设备的结构示意图。如图15所示,该实施例的终端设备15包括:至少一个处理器150(图15中仅示出一个)处理器、存储器151以及存储在所述存储器151中并可在所述至少一个处理器150上运行的计算机程序 152,所述处理器150执行所述计算机程序152时实现上述任意各个知识图谱的生成方法实施例中的步骤。FIG. 15 is a schematic structural diagram of a terminal device provided by an embodiment of this application. As shown in FIG. 15, the terminal device 15 of this embodiment includes: at least one processor 150 (only one is shown in FIG. 15), a processor, a memory 151, and a processor stored in the memory 151 and capable of being processed in the at least one processor. The computer program 152 running on the processor 150, when the processor 150 executes the computer program 152, implements the steps in any of the above-mentioned methods for generating the knowledge graph.
所述终端设备15可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该终端设备可包括,但不仅限于,处理器150、存储器151。本领域技术人员可以理解,图15仅仅是终端设备15的举例,并不构成对终端设备15的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The terminal device 15 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 150 and a memory 151. Those skilled in the art can understand that FIG. 15 is only an example of the terminal device 15 and does not constitute a limitation on the terminal device 15. It may include more or less components than shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.
所称处理器150可以是中央处理单元(Central Processing Unit,CPU),该处理器150还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 150 may be a central processing unit (Central Processing Unit, CPU), and the processor 150 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器151在一些实施例中可以是所述终端设备15的内部存储单元,例如终端设备15的硬盘或内存。所述存储器151在另一些实施例中也可以是所述**装置/终端设备15的外部存储设备,例如所述终端设备15上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器151还可以既包括所述终端设备15的内部存储单元也包括外部存储设备。所述存储器151用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器151还可以用于暂时地存储已经输出或者将要输出的数据。The memory 151 may be an internal storage unit of the terminal device 15 in some embodiments, such as a hard disk or a memory of the terminal device 15. In other embodiments, the memory 151 may also be an external storage device of the ** device/terminal device 15, for example, a plug-in hard disk equipped on the terminal device 15, a smart memory card (Smart Media Card, SMC). ), Secure Digital (SD) card, Flash Card, etc. Further, the memory 151 may also include both an internal storage unit of the terminal device 15 and an external storage device. The memory 151 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 151 can also be used to temporarily store data that has been output or will be output.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat it here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and running on the at least one processor, and the processor executes The computer program implements the steps in any of the foregoing method embodiments.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在移动终端上运行时,使得移动终端执行时实现可实现上述各个方法实施例中的步骤。The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (11)

  1. 一种知识图谱的生成方法,其特征在于,包括:A method for generating a knowledge graph, which is characterized in that it includes:
    确定目标实体的各个别名名称在目标语言的译名名称,并根据所述别名名称以及所述译名名称,生成所述目标实体的转译关系;Determine the translated name of each alias name of the target entity in the target language, and generate the translation relationship of the target entity according to the alias name and the translated name;
    通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系;Respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;
    根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。Construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
  2. 根据权利要求1所述的生成方法,其特征在于,所述确定目标实体的各个别名名称在目标语言的译名名称,并根据所述别名名称以及所述译名名称,生成所述目标实体的转译关系,包括:The generating method according to claim 1, wherein said determining the translated name of each alias name of the target entity in the target language, and generating the translation relationship of the target entity based on the alias name and the translated name ,include:
    分别获取包含各个所述别名名称的源语言语句;Obtain the source language sentences containing each of the alias names respectively;
    根据源语言与所述目标语言之间的翻译模型,输出各个所述源语言语句对应的目标语言语句;Output a target language sentence corresponding to each source language sentence according to a translation model between the source language and the target language;
    分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名称;Extracting the translated name of the alias name in the target language from each of the target language sentences;
    建立所述别名名称以及所述译名名称之间的所述转译关系。Establish the translation relationship between the alias name and the translated name name.
  3. 根据权利要求2所述的生成方法,其特征在于,所述分别获取包含各个所述别名名称的源语言语句,包括:The generating method according to claim 2, wherein said separately obtaining source language sentences containing each of said alias names comprises:
    根据所述目标实体的实体类型,获取与所述实体类型关联的语句模板;Obtaining a sentence template associated with the entity type according to the entity type of the target entity;
    将各个所述别名名称导入所述语句模板,生成所述源语言语句。Import each of the alias names into the sentence template to generate the source language sentence.
  4. 根据权利要求2所述的生成方法,其特征在于,所述分别从各个所述目标语言语句提取所述别名名称在所述目标语言下的所述译名名称,包括:The generating method according to claim 2, wherein the extracting the translated name of the alias name in the target language from each of the target language sentences respectively comprises:
    若检测到所述目标语言语句内包含所述目标实体对应的词组,则识别所述目标语言语句为有效语句;If it is detected that the target language sentence contains the phrase corresponding to the target entity, identifying the target language sentence as a valid sentence;
    将所述有效语句中与所述目标实体对应的词组识别为所述译名名称。Identify the phrase corresponding to the target entity in the valid sentence as the translated name.
  5. 根据权利要求1所述的生成方法,其特征在于,所述通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系,包括:The generating method according to claim 1, wherein the generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus comprises:
    从所述语料库提取包含所述目标实体的目标文本;Extracting the target text containing the target entity from the corpus;
    识别所述目标文本内除所述目标实体外的关联实体;Identifying related entities in the target text other than the target entity;
    根据所述目标实体在所述目标文本中对应的别名名称,得到所述别名名称与所述关联实体之间的所述共现关系。According to the alias name corresponding to the target entity in the target text, the co-occurrence relationship between the alias name and the associated entity is obtained.
  6. 根据权利要求1至5任一项所述的生成方法,其特征在于,在所述根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱之后,还包括:The generating method according to any one of claims 1 to 5, characterized in that, after constructing a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities, the method further comprises:
    接收基于源语言的待翻译语句,并识别所述待翻译语句包含的所述待翻译实体,以构建所述待翻译语句的实体关系;Receiving the sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the entity relationship of the sentence to be translated;
    在所述知识图谱中提取所述待翻译实体基于所述目标语言对应的转译关系;所述转译关系包含所述待翻译实体的至少一个译名名称;Extracting, from the knowledge graph, a translation relationship corresponding to the entity to be translated based on the target language; the translation relationship includes at least one translated name of the entity to be translated;
    根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度;Calculating the degree of matching between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name;
    基于所述匹配度,从所有所述译名名称中确定所述待翻译实体的目标译名,并根据所有所述目标译名,输出所述待翻译语句基于目标语言的转译语句。Based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.
  7. 根据权利要求6所述的生成方法,其特征在于,所述根据所述实体关系以及所述译名名称的共现关系,计算所述待翻译语句与所述译名名称之间的匹配度包括:The generating method according to claim 6, wherein the calculating the matching degree between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name comprises:
    将所述实体关系以及所述译名名称的共现关系导入预设的匹配度计算函数,计算 所述匹配度;所述匹配度计算函数具体为:The co-occurrence relationship between the entity relationship and the translated name is imported into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:
    Sim(E1,E2)=Σ ei∈Context(E1),ej∈Context(E2)max sim entity(ei,ej); Sim(E1,E2)=Σ ei∈Context(E1),ej∈Context(E2) max sim entity (ei,ej);
    sim entity(ei,ej)=Σ p∈Prop(ei)∩Prop(ej)ω pSimlarity type(p)(ei[p],ej[p]) sim entity (ei,ej)=Σ p∈Prop(ei)∩Prop(ej) ω p Simlarity type(p) (ei[p],ej[p])
    其中,Sim(E1,E2)为所述待翻译实体与所述译名名称之间的所述匹配度;Context(E1)为所述待翻译实体E1在所述知识图谱中对应的所述共现关系内包含的关联实体;Context(E2)为所述译名名称E2的所述共现关系内包含的关联实体;ei为所述待翻译实体E1的所述共现关系内第i个关联实体;ej为所述译名名称E2的所述共现关系内第j个所述关联实体;Prop(ei)为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型;Prop(ej)为所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型;ω p为所述实体类型对应的权重值;Simlarity type(p)(ei[p],ej[p])为所述实体类型对应的匹配度函数;ei[p]为所述待翻译实体E1的所述共现关系内第i个关联实体的实体类型的参数值;ej[p]为所述第j个所述译名名称E2的所述共现关系内第j个所述关联实体的实体类型的参数值。 Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω p is the weight value corresponding to the entity type; Similarity type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.
  8. 根据权利要求1至5任一项所述的生成方法,其特征在于,还包括:The generating method according to any one of claims 1 to 5, further comprising:
    接收用户输入的关键词,并从所述知识图谱中查询所述关键词对应的所述共现关系;Receiving keywords input by the user, and querying the co-occurrence relationship corresponding to the keywords from the knowledge graph;
    根据所述共现关系输出所述用户的推荐信息。Output the recommendation information of the user according to the co-occurrence relationship.
  9. 一种知识图谱的生成装置,其特征在于,包括:A device for generating a knowledge graph, which is characterized in that it comprises:
    转译关系建立单元,用于建立目标实体的多个别名名称基于目标语言的转译关系;The translation relationship establishment unit is used to establish the translation relationship of multiple alias names of the target entity based on the target language;
    共现关系生成单元,用于通过预设的语料库,分别生成所述目标实体内各个所述别名名称的共现关系;The co-occurrence relationship generation unit is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;
    知识图谱构建单元,用于根据所有所述目标实体对应的所述转译关系以及所述共现关系,构建知识图谱。The knowledge graph construction unit is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
  10. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至8任一项所述的方法。A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 8. The method of any one.
  11. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 8 when the computer program is executed by a processor.
PCT/CN2020/125592 2019-11-22 2020-10-30 Knowledge graph generating method, apparatus, and terminal, and storage medium WO2021098491A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911156483.3A CN112836057B (en) 2019-11-22 2019-11-22 Knowledge graph generation method, device, terminal and storage medium
CN201911156483.3 2019-11-22

Publications (1)

Publication Number Publication Date
WO2021098491A1 true WO2021098491A1 (en) 2021-05-27

Family

ID=75921937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125592 WO2021098491A1 (en) 2019-11-22 2020-10-30 Knowledge graph generating method, apparatus, and terminal, and storage medium

Country Status (2)

Country Link
CN (1) CN112836057B (en)
WO (1) WO2021098491A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204651A (en) * 2021-05-28 2021-08-03 华侨大学 Multi-source knowledge graph fusion method and device in Chinese education field

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233656A1 (en) * 2006-03-31 2007-10-04 Bunescu Razvan C Disambiguation of Named Entities
CN105677913A (en) * 2016-02-29 2016-06-15 哈尔滨工业大学 Machine translation-based construction method for Chinese semantic knowledge base
CN106598947A (en) * 2016-12-15 2017-04-26 山西大学 Bayesian word sense disambiguation method based on synonym expansion
CN107038158A (en) * 2016-02-01 2017-08-11 松下知识产权经营株式会社 Paginal translation language material storage preparation method, device, program and machine translation system
CN108460026A (en) * 2017-02-22 2018-08-28 华为技术有限公司 A kind of interpretation method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101840397A (en) * 2009-03-20 2010-09-22 日电(中国)有限公司 Word sense disambiguation method and system
CN108170662A (en) * 2016-12-07 2018-06-15 富士通株式会社 The disambiguation method of breviaty word and disambiguation equipment
US20190188324A1 (en) * 2017-12-15 2019-06-20 Microsoft Technology Licensing, Llc Enriching a knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233656A1 (en) * 2006-03-31 2007-10-04 Bunescu Razvan C Disambiguation of Named Entities
CN107038158A (en) * 2016-02-01 2017-08-11 松下知识产权经营株式会社 Paginal translation language material storage preparation method, device, program and machine translation system
CN105677913A (en) * 2016-02-29 2016-06-15 哈尔滨工业大学 Machine translation-based construction method for Chinese semantic knowledge base
CN106598947A (en) * 2016-12-15 2017-04-26 山西大学 Bayesian word sense disambiguation method based on synonym expansion
CN108460026A (en) * 2017-02-22 2018-08-28 华为技术有限公司 A kind of interpretation method and device

Also Published As

Publication number Publication date
CN112836057B (en) 2024-03-26
CN112836057A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
WO2022022045A1 (en) Knowledge graph-based text comparison method and apparatus, device, and storage medium
CN107992585B (en) Universal label mining method, device, server and medium
US10120861B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN110019732B (en) Intelligent question answering method and related device
WO2020108063A1 (en) Feature word determining method, apparatus, and server
KR20200094627A (en) Method, apparatus, device and medium for determining text relevance
CN108304375A (en) A kind of information identifying method and its equipment, storage medium, terminal
JP2020027649A (en) Method, apparatus, device and storage medium for generating entity relationship data
CN110347790B (en) Text duplicate checking method, device and equipment based on attention mechanism and storage medium
CN111831911A (en) Query information processing method and device, storage medium and electronic device
US9940355B2 (en) Providing answers to questions having both rankable and probabilistic components
CN111459977B (en) Conversion of natural language queries
CN110619051A (en) Question and sentence classification method and device, electronic equipment and storage medium
CN109063184A (en) Multilingual newsletter archive clustering method, storage medium and terminal device
CN111178076A (en) Named entity identification and linking method, device, equipment and readable storage medium
CN107832447A (en) User feedback error correction method, device and its equipment for mobile terminal
WO2021098491A1 (en) Knowledge graph generating method, apparatus, and terminal, and storage medium
US20230112385A1 (en) Method of obtaining event information, electronic device, and storage medium
CN110990451A (en) Data mining method, device and equipment based on sentence embedding and storage device
WO2018214956A1 (en) Machine translation method and apparatus, and storage medium
CN112069267A (en) Data processing method and device
WO2021135103A1 (en) Method and apparatus for semantic analysis, computer device, and storage medium
CN115544218A (en) Information searching method, device and storage medium
CN115828915B (en) Entity disambiguation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20888914

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20888914

Country of ref document: EP

Kind code of ref document: A1