WO2021098491A1

WO2021098491A1 - Knowledge graph generating method, apparatus, and terminal, and storage medium

Info

Publication number: WO2021098491A1
Application number: PCT/CN2020/125592
Authority: WO
Inventors: 陈开济
Original assignee: 华为技术有限公司
Priority date: 2019-11-22
Filing date: 2020-10-30
Publication date: 2021-05-27
Also published as: CN112836057B; CN112836057A

Abstract

An artificial intelligence-based knowledge graph generating method, apparatus, and terminal, and a storage medium. The method comprises: determining, in a target language, the translation name of each alias name of a target entity, and according to the alias name and the translation name, generating a translation relationship of the target entity (S101); by means of a preset corpus, separately generating a co-occurrence relationship of each alias name of the target entity (S102); and constructing a knowledge graph according to the translation relationships and the co-occurrence relationships corresponding to all the target entities (S103). The method, apparatus, and terminal, and the storage medium can construct a knowledge graph supporting multiple languages, and improve the association ability of each knowledge node in the knowledge graph, and the breadth and depth of the knowledge graph, thereby improving the accuracy of an artificial intelligence output result and improving the quality of a service response.

Description

Method, device, terminal and storage medium for generating knowledge graph

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office on November 22, 2019, the application number is 201911156483.3, and the application name is "The method, device, terminal and storage medium of knowledge graph generation", and the entire content of it is approved The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a method, device, terminal, and storage medium for generating a knowledge graph based on artificial intelligence (AI).

Background technique

Knowledge graph, also known as semantic network, uses visualization technology to describe knowledge resources and their carriers, mines, analyzes, constructs, draws and displays knowledge and their interconnections. With the development of information technology, the knowledge graph is used as a carrier to gather diverse knowledge resources and provide knowledge references for artificial intelligence decision-making. Therefore, the depth and accuracy of each knowledge resource in the knowledge graph directly affects artificial intelligence Accuracy of processing results. The existing knowledge graph generation method is mainly based on a single language construction. The knowledge graphs between different languages are independent of each other, thereby reducing the depth of the knowledge graph. When other languages are used as the input of artificial intelligence, the processing results will be greatly reduced. The accuracy rate affects the quality of service response.

Summary of the invention

The embodiment of the application provides a method, device, terminal and storage medium for generating a knowledge graph, which can solve the existing knowledge graph generation technology. When processing different vehicle service requests, they are all handled by the same server, which is easy This leads to processing logic conflicts, increases the service response time and reduces the success rate of service responses.

In the first aspect, an embodiment of the present application provides a method for generating a knowledge graph, including:

Determine the translated name of each alias name of the target entity in the target language, and generate the translation relationship of the target entity according to the alias name and the translated name;

Respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;

Construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.

Exemplarily, according to the co-occurrence relationship corresponding to the alias name, the number of appearances of each co-representation associated with the alias name is counted, and the high-frequency co-representation is selected based on the number of appearances, and the natural language generation algorithm based on artificial intelligence (Natural Language Generation, NLG) combines the alias name with each high-frequency common entity to obtain the source language sentence.

In a possible implementation of the first aspect, the determining the translated name of each alias name of the target entity in the target language, and generating the translation relationship of the target entity according to the alias name and the translated name, include:

Obtain the source language sentences containing each of the alias names respectively;

Output a target language sentence corresponding to each source language sentence according to a translation model between the source language and the target language;

Extract the translated name of the alias name in the target language from each sentence in the target language;

Establish the translation relationship between the alias name and the translated name name.

In a possible implementation manner of the first aspect, the separately obtaining source language sentences containing each of the alias names includes:

Obtaining a sentence template associated with the entity type according to the entity type of the target entity;

Import each of the alias names into the sentence template to generate the source language sentence.

Exemplarily, if there are multiple sentence templates, one sentence template may be configured for each alias name based on a random allocation algorithm, thereby generating multiple source language sentences.

In a possible implementation of the first aspect, the extracting the translated names of the alias names in the target language from each of the target language sentences respectively includes:

If it is detected that the target language sentence contains the phrase corresponding to the target entity, identifying the target language sentence as a valid sentence;

Identify the phrase corresponding to the target entity in the valid sentence as the translated name.

In a possible implementation manner of the first aspect, the separately generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus includes:

Extracting the target text containing the target entity from the corpus;

Identifying related entities in the target text other than the target entity;

According to the alias name corresponding to the target entity in the target text, the co-occurrence relationship between the alias name and the associated entity is obtained.

In a possible implementation of the first aspect, the method for generating the knowledge graph further includes:

Receiving the sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the entity relationship of the sentence to be translated;

Extracting, from the knowledge graph, a translation relationship corresponding to the entity to be translated based on the target language; the translation relationship includes at least one translated name of the entity to be translated;

Calculating the degree of matching between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name;

Based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.

In a possible implementation of the first aspect, the calculating the matching degree between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name includes:

Import the co-occurrence relationship of the entity relationship and the translated name into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:

Sim(E1,E2)=∑ _{ei∈Context(E1),ej∈Context(E2)} max sim _entity (ei,ej);

sim _entity (ei,ej)=∑ _{p∈Prop(ei)∩Prop(ej)} ω _p Simlarity _type(p) (ei[p],ej[p])

Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω _p is the weight value corresponding to the entity type; Similarity _type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.

Receiving keywords input by the user, and querying the co-occurrence relationship corresponding to the keywords from the knowledge graph;

Output the recommendation information of the user according to the co-occurrence relationship.

In the second aspect, an embodiment of the present application provides an apparatus for generating a knowledge graph, including:

The translation relationship establishment unit is used to establish the translation relationship of multiple alias names of the target entity based on the target language;

The co-occurrence relationship generation unit is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;

The knowledge graph construction unit is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.

In a third aspect, embodiments of the present application provide a terminal device, a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the The computer program implements the method for generating the knowledge graph in any one of the above-mentioned first aspects.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and is characterized in that, when the computer program is executed by a processor, any of the above-mentioned aspects of the first aspect is implemented. A method for generating the knowledge graph.

In a fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the method for generating the knowledge graph in any one of the above-mentioned first aspects.

It can be understood that, for the beneficial effects of the second aspect to the fifth aspect described above, reference may be made to the related description in the first aspect described above, and details are not repeated here.

Compared with the prior art, the embodiments of this application have the following beneficial effects:

The embodiment of the application obtains the translated name of each alias name of the target entity in other languages, where the target entity can be identified as a knowledge node, and according to the correspondence between each alias name and the translated name, generates the target entity’s information about the target language Translate the relationship, and establish the co-occurrence relationship of each alias name in the target entity through the corpus, to mine the relationship between each alias name of the target entity and other entities, to expand the depth of association of each knowledge node in the knowledge graph, according to all The translation relationship and co-occurrence relationship of target entities realize the purpose of constructing a knowledge graph that supports multiple languages. Compared with the existing knowledge graph technology, the embodiment of this application can establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.

Description of the drawings

FIG. 1 is an implementation flowchart of a method for generating a knowledge graph provided by the first embodiment of the present application;

Figure 2 is an entity diagram of the translation relationship of the target entity provided by an embodiment of the present application;

Fig. 3 is a schematic diagram of a co-occurrence relationship provided by an embodiment of the present application;

4 is a specific implementation flow chart of a method S101 for generating a knowledge graph provided by the second embodiment of the present application;

Fig. 5 is a structural block diagram of a neural machine translation model provided by an embodiment of the present application;

6 is a specific implementation flow chart of a method S1011 for generating a knowledge graph provided by the third embodiment of the present application;

FIG. 7 is a specific implementation flowchart of a method S1013 for generating a knowledge graph provided by the fourth embodiment of the present application;

FIG. 8 is a specific implementation flowchart of a method S102 for generating a knowledge graph provided by the fifth embodiment of the present application;

9 is a specific implementation flowchart of a method for generating a knowledge graph provided by the sixth embodiment of the present application;

FIG. 10 is a flowchart of translation based on a knowledge graph provided by an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a translation system based on a knowledge graph provided by an embodiment of the present application;

FIG. 12 is a corresponding interaction flowchart of each unit in the apparatus for generating a knowledge graph provided by an embodiment of the present application when responding to a translation operation;

FIG. 13 is a specific implementation flowchart of a method for generating a knowledge graph provided by a seventh embodiment of the present application;

FIG. 14 is a structural block diagram of a device for generating a knowledge graph provided by an embodiment of the present application;

FIG. 15 is a schematic diagram of a terminal device provided by another embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.

As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

Reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless it is specifically emphasized otherwise. The terms "including", "including", "having" and their variations all mean "including but not limited to", unless otherwise specifically emphasized.

The method for generating the knowledge graph provided by the embodiments of this application can be applied to mobile phones, tablet computers, wearable devices, in-vehicle devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, and super mobiles. Personal computers (ultra-mobile personal computers, UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs) and other terminal devices can also be applied to databases, servers, and service response systems based on terminal artificial intelligence. Examples of this application There are no restrictions on the specific types of terminal equipment.

In the embodiment of the present application, the execution subject of the process is the generating device of the knowledge graph. As an example and not a limitation, the device for generating a knowledge graph can be specifically a database server for receiving knowledge resources input by users or knowledge resources obtained from other databases, and generating a knowledge graph based on all the received knowledge data for Support the related logic operations of terminal artificial intelligence. Fig. 1 shows an implementation flow chart of the method for generating a knowledge graph provided by the first embodiment of the present application, and the details are as follows:

In S101, the translated name of each alias name of the target entity in the target language is determined, and the translation relationship of the target entity is generated according to the alias name and the translated name.

In this embodiment, an entity, also referred to as an object, can specifically be an objectively existing object, concept or virtual object that can be interacted and operated. For example, computers, mobile phones, servers, etc. are objectively existing objects, and Virtual objects that exist in the field of electronic information such as databases, middleware, and software programs can also belong to entities. Different entities may have multiple alias names according to different usage scenarios, and the above alias names are used to indicate the same entity object. For example, for the entity "orange", there are other alias names used to indicate the same entity, such as "citrus" and "orange", that is, there are three alias names for the entity "orange" mentioned above. The generating device can obtain the alias name corresponding to each entity through user input, database download, corpus-based intelligent learning, etc. As another feasible embodiment, a corresponding name list can be established for each entity, and the name list is stored There is an alias name of the target entity. Among them, all the alias names in the name list are specifically based on the alias names in the same language. For example, the above examples of "citrus", "mandarin" and "mandarin orange" are based on the alias names corresponding to the Chinese language, and For the entity "Orange" in English, there can be three different terms "orange", "tangerine" and "citrus", and it is constructed based on the three alias names of "orange", "tangerine" and "citrus" A list of the names of the entity "Orange" on English grammar. The generating device can set a certain language as an active language, and obtain a name list of each entity based on the source language, and the name list contains all the alias names of the above entities based on the source language.

In this embodiment, when the device for generating the knowledge graph establishes the transfer relationship, another language different from the source language can be selected as the target language, and the translated name corresponding to each alias name in the target language can be determined. Wherein, the method for obtaining the translated name of the alias name may be to determine the translated name associated with the alias name through a preset translation algorithm between the source language and the target language.

As another optional embodiment of the present application, the device for generating the knowledge graph can obtain multiple reference texts containing alias names, obtain the target language-based translation text of each reference text, and locate the alias name from each translation text For the corresponding phrase, identify the phrase as a candidate translated name of the alias name, and count the number of occurrences of each candidate translated name in all translated texts, and identify the translated name corresponding to the alias name according to the number of occurrences, for example, select the occurrence probability to be greater than the preset probability threshold The candidate translated name of is used as the translated name of the alias name; or the candidate translated name with the highest occurrence probability is selected as the translated name corresponding to the alias name. Based on this, an alias name based on the source language can have multiple translated names in the target language. Correspondingly, when different alias names are mapped to the target language, they can also correspond to the same translated name. The generating device may use the alias name as the node, establish a mapping relationship between each alias name and the associated translated name, and construct the translation relationship of the target entity by all the mapping relationships established above.

It should be noted that the existing knowledge graph is constructed with the granularity of entities. Therefore, in a multilingual scenario, each node in the knowledge graph will combine the alias names of all languages into the same node, and different alias names cannot be determined. The mapping relationship between each other will reduce the accuracy of the output results in scenarios such as translation or semantic analysis. Different from the prior art, this application can establish an independent knowledge node for each alias name, and record its corresponding translated name in the knowledge node, thereby constructing a mapping relationship between the translated name and the alias name.

For example, FIG. 2 shows an entity diagram of a translation relationship of a target entity provided by an embodiment of the present application. As shown in Figure 2, the entity "Orange" has three different alias names under the Chinese grammar, namely "Orange", "Orange" and "Citrus". Through big data analysis, it can be determined that most of the In the translation scenario, "Orange" and "Orange" will be translated as "orange", and "Orange" has two translated names, namely "tangerine" and "citrus". According to the correspondence between the alias names, Then, the mapping relationship between Chinese and English of each alias name can be established, so that all the mapping relationships are aggregated to obtain the translation relationship corresponding to the target entity. It can be clearly seen from Figure 2 that the object of establishing the mapping relationship in this application is the alias name, so that the translated name corresponding to each alias name can be accurately obtained. Especially in the translation scenario, the accuracy of translation can be greatly improved. The readability of the text.

In S102, the co-occurrence relationship of each of the alias names in the target entity is generated through a preset corpus.

In this embodiment, the corpus can be stored in the knowledge graph generation device. In this case, the generation device can obtain the text data pre-stored in the corpus by local calling, and generate the co-occurrence relationship through the text data; the corpus can also be stored For other database servers, in this case, the knowledge graph generation device can establish a communication connection with the corpus server, and generate data query instructions about the target entity, and send the data query instructions to the corpus server, and the corpus server receives the data After the query instruction, all text data including the target entity can be extracted and fed back to the knowledge graph generating device. Optionally, if the amount of text data is large, for example, a certain text data is stored in a corpus in the format of a book, that is, the text data contains multiple paragraphs, in this case, the corpus server can extract the text data The sentence or paragraph containing the target entity is fed back to the generating device without sending other paragraphs or sentences that do not contain the target entity to the generating device, thereby improving the accuracy of subsequent co-occurrence relationship establishment operations.

In this embodiment, the device for generating the knowledge graph obtains the training sentence containing the target entity through the corpus, and the entity labeling algorithm identifies the associated entity contained in each training sentence, and according to the appearance of the target entity in the current training sentence Create an association relationship between the alias name and each associated entity, thereby generating the co-occurrence relationship of the alias name. It should be noted that the training sentence extracted from the corpus can be a sentence containing the target entity appearing under each alias name. Therefore, the expression of the target entity in the extracted training sentence is inconsistent, so in the process of generating the co-occurrence relationship Each training sentence can be divided into different sentence groups according to different alias names, and the alias names for the target entities in the same sentence group are consistent, and then the co-occurrence relationship corresponding to the alias names can be determined through the sentence group.

For example, FIG. 3 shows a schematic diagram of a co-occurrence relationship provided by an embodiment of the present application. As shown in Figure 3, a certain target entity is "National Stadium", and the target entity has two alias names, namely "National Stadium" and "Bird's Nest". Among them, a training sentence is stored in the corpus as "Bird's Nest is located in the Water Cube". Opposite is the 2008 Beijing Olympic Stadium. Through the entity tagging algorithm, the entities other than “Bird’s Nest” in the training sentence can be identified as “Water Cube”, “Gymnasium”, “Beijing” and “Olympics”. Therefore, the establishment of the target entity "National Stadium", regarding the co-occurrence relationship between the alias name "Bird's Nest" and "Water Cube", "Gymnasium", "Beijing" and "Olympics". Among them, the co-occurrence relationship can be identified in the manner shown in FIG. 3.

In this embodiment, similar to S101, when the device for generating the knowledge graph establishes the co-occurrence relationship, it also constructs the co-occurrence relationship based on the alias name, that is, distinguish the co-occurrence relationship of different alias names, and distinguish the co-occurrence relationship of different alias names. The existing relationship can determine the common usage scenarios of each alias name and other related entity objects. While improving the accuracy of translation operations, it has high application value in the fields of information recommendation and word association, so that each alias can be mined. The associated entity of the name increases the depth of the knowledge graph.

Optionally, as another embodiment of the present application, since there are multiple training sentences in the corpus, and in different training sentences, the number of occurrences of the associated entity may be multiple times, and the device for generating the knowledge graph is establishing the target entity and In the co-occurrence relationship between each associated entity, the number of sentences that appear together with each associated entity and the target entity can be counted, that is, the number of co-occurrences, and corresponding associated weights are configured for each associated object based on the number of co-occurrences. Continuing to refer to FIG. 3, as an example and not a limitation, the number of co-occurrences may be marked on the connecting line between the target entity and the associated entity.

In S103, construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.

In this embodiment, the device for generating the knowledge graph can perform the operations of S101 and S102 on all target entities, establish the translation relationship of each target entity, and the co-occurrence relationship of each alias name of the target entity, and set it in the preset In the knowledge graph, the alias name is used as the granularity page to create an independent knowledge node for each alias name, and the co-occurrence relationship and the translated name corresponding to the alias name are added to the knowledge node corresponding to the alias name, and the knowledge node corresponding to each alias name is added Encapsulate the knowledge node of the corresponding target entity, and create the knowledge node corresponding to the target entity on the page with the granularity of the entity, and construct the knowledge graph according to the association relationship between each target entity.

Optionally, the knowledge graph includes at least two levels, the first graph level with the entity as the granularity, and the second graph level with the alias name as the granularity. The user can click on any target entity on the first graph level, and the knowledge graph will switch to the second graph level with the alias name as the granularity, and display the semantic network of each alias name under the target recognition in the second graph level .

It can be seen from the above that the method for generating a knowledge graph provided by the embodiments of the present application obtains the translated names of each alias name of the target entity in other languages, where the target entity can be identified as a knowledge node, and according to each alias name and the translated name Correspondence between the names, generate the translation relationship of the target entity with respect to the target language, and establish the co-occurrence relationship of each alias name in the target entity through the corpus to mine the association relationship between each alias name of the target entity and other entities. Expand the depth of association of each knowledge node in the knowledge graph, and realize the purpose of constructing a knowledge graph that supports multiple languages according to the translation relationship and co-occurrence relationship of all target entities. Compared with the existing knowledge graph technology, the embodiment of this application can establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.

FIG. 4 shows a specific implementation flow chart of a method S101 for generating a knowledge graph provided by the second embodiment of the present application. Referring to FIG. 4, with respect to the embodiment described in FIG. 1, S101 in a method for generating a knowledge graph provided by this embodiment includes: S1011 to S1014, which are detailed as follows:

In S1011, source language sentences containing each of the alias names are obtained respectively.

In this embodiment, the device for generating the knowledge graph can extract source language sentences containing each alias name from the corpus corresponding to the source language, that is, each source language sentence is recorded in the historical text data. Optionally, the generating device may also be provided with a sentence template, import each alias name into the sentence template, and output the source language sentence corresponding to each alias name.

Optionally, as another embodiment of the present application, the device for generating a knowledge graph can count the number of occurrences of each co-real entity associated with the alias name according to the co-occurrence relationship corresponding to the alias name, and select high-frequency co-occurrences based on the number of occurrences. For the real body, the source language sentence is obtained by combining the alias name with each high-frequency co-real body through the natural language generation algorithm (NLG) based on artificial intelligence. Because of the high frequency co-realities that appear frequently with the alias name, it can better represent the common context of the alias name, so that the output source language sentence can have a higher representativeness, and in the subsequent translation process , Can determine the translated name of the alias name in the common context, so as to improve the accuracy of the transfer relationship.

In S1012, according to the translation model between the source language and the target language, a target language sentence corresponding to each source language sentence is output.

In this embodiment, the device for generating the knowledge graph can select any language other than the source language as the target language, and obtain a translation model between the source language and the target language. The translation model can be generated based on a machine translation (MT) algorithm. Among them, the MT algorithm uses computer programs or computer-readable instructions to translate one natural language text (source language) into another natural language text (target language). With the continuous development of artificial intelligence, neural Machine translation (Neural Machine Translation, NMT) algorithm is used as the mainstream translation method in the field of translation. NMT can construct a translation model through Long Short-Term Memory-Recurrent Neural Network (LSTM-RNN). The translation model is good at modeling natural language and transforming sentences of any length into specific dimensions. The floating-point number vector converts text data into vector data so that computer programs can "understand" the semantics of the text and translate sentences based on the semantics. The generating device can import the obtained source language sentence into the translation model, and output the corresponding target language sentence.

Specifically, if the device for generating the knowledge graph adopts the NMT model as the translation model, the way to output the target language sentence can be: divide the source language sentence into multiple phrases, and import each phrase into the coding module in the NMT model to obtain each The encoding value corresponding to the phrase generates a sentence vector about the source language sentence, obtains the decoding module of the target language, and uses the generated sentence vector as the input vector of the encoding module to generate the target language sentence. Fig. 5 shows a structural block diagram of a neural machine translation model provided by an embodiment of the present application. As shown in Figure 5, the NMT model includes an encoding module Encoder based on the source language and a decoding module Decoder based on the target language. Each word in the original target language is mapped to the corresponding vector value according to the word meaning, and the decoding module recognizes the The vector value is associated with the word in the target language to complete the translation operation.

In S1013, extract the translated name of the alias name in the target language from each of the target language sentences.

In this embodiment, the device for generating the knowledge graph can mark the phrase corresponding to each entity contained in the target language sentence through the entity tagging algorithm corresponding to the target language, and select the phrase corresponding to the target entity as the alias name under the target language The translated name. Compared with directly importing the alias name into the translation model to calculate the translated name corresponding to a single name, by setting the translated name corresponding to the alias name in a specific language environment, the translated name is the name based on the semantic output of the entire sentence, and the context and current language Context matching can improve the accuracy of translation, especially when the target entity has multiple translated names in the target language, it can accurately determine the translated name associated with the target entity under the alias name of the current translation.

In S1014, the translation relationship between the alias name and the translated name is established.

In this embodiment, after the device for generating the knowledge graph determines the translated name associated with the alias name, the translation relationship between the above two can be established.

In the embodiment of the present application, by outputting a source language sentence containing each alias name, the translated name corresponding to the alias name can be determined based on the context and the actual use context, and the translation relationship can be established, which can improve the accuracy of the translation relationship.

FIG. 6 shows a specific implementation flowchart of a method S1011 for generating a knowledge graph provided by the third embodiment of the present application. Referring to FIG. 6, compared with the embodiment described in FIG. 4, S1011 in a method for generating a knowledge graph provided by this embodiment includes: S601 to S602, which are detailed as follows:

Further, the obtaining the source language sentences containing each of the alias names respectively includes:

In S601, according to the entity type of the target entity, a sentence template associated with the entity type is obtained.

In this embodiment, the device for generating the knowledge graph can manually configure corresponding sentence templates for different entity types, and build a sentence template library. Optionally, the device for generating the knowledge graph may use a remote supervision algorithm to identify entities contained in each training text from the corpus, determine the entity type of each entity, select multiple training texts with the same entity type, and identify each training text Corresponding sentence structure, selecting a sentence structure whose occurrence number of sentence structure is greater than a preset occurrence threshold as a common structure corresponding to the entity type, and generating at least one sentence template about the entity type based on the common structure.

In this embodiment, the device for generating the knowledge graph extracts sentence templates matching the entity type from the sentence template library according to the entity type corresponding to the target entity associated with the alias name. The number of sentence templates can be one or more. Optionally, if the number of sentence templates is multiple, and the number of sentence templates is more than the number of alias names of the target entity, multiple sentence templates matching the number of alias names can be extracted, and a separate configuration for each alias name The sentence template for each alias name can be assigned differently.

In S602, import each of the alias names into the sentence template to generate the source language sentence.

In this embodiment, the sentence template is provided with an import area of the entity type, and the knowledge graph generation device can import the alias name into the preset import area in the sentence template, thereby generating a sentence with complete meaning, that is, the aforementioned source Language statements.

Optionally, if the number of sentence templates is single, each alias name can be imported into the same sentence template to generate multiple source language sentences with different alias names but the same other content. For example, a sentence template is "this is a [fruit type entity] tree", and the target entity is "orange", the entity type of the target entity is fruit type, that is, it matches the sentence template above, and the The target entity has three alias names, namely "Orange", "Orange" and "Citrus". Therefore, the above three alias names can be imported into the sentence template respectively, that is, into the area corresponding to [Fruit Type Entity], Get "this is a [orange] tree", "this is a [orange] tree" and "this is a [citrus] tree".

Optionally, if there are multiple sentence templates, one sentence template may be configured for each alias name based on a random allocation algorithm, thereby generating multiple source language sentences. For example, the number of sentence templates for fruit type entities is 3, which are "this is a [fruit type entity] tree", "eat some [fruit type entity]", and "buy a [fruit type entity] ]", then import the three alias names of the target entity "Orange" into any of the above sentence templates, and you can get "This is a [orange] tree", "eat some [citrus]" and " Buy an [orange]".

Preferably, other entities included in each sentence template are identified, and the number of occurrences of each other entity is identified from the co-occurrence relationship corresponding to the alias name, the matching degree between the sentence template and the alias name is calculated based on the number of occurrences, and the highest matching degree is selected As a statement template associated with the alias name, import the alias name into the statement template to generate a source language statement.

Optionally, if there are multiple sentence templates, multiple source language sentences can be output for each alias name, that is, the same alias name is imported into each sentence template to generate multiple source language sentences of the alias name. For example, if the number of sentence templates is M and the number of alias names is N, then M*N source language sentences can be output.

In the embodiment of the present application, by identifying the entity type of the target entity, selecting the sentence template corresponding to the entity type, and importing the alias name into the sentence template to generate the source language sentence, which realizes the automatic output of multiple sentences generated based on natural language , Improve the generation efficiency of source language sentences.

FIG. 7 shows a specific implementation flowchart of a method S1013 for generating a knowledge graph provided by the fourth embodiment of the present application. Referring to FIG. 7, compared with the embodiment described in FIG. 4, S1013 in a method for generating a knowledge graph provided by this embodiment includes: S701 to S702, which are detailed as follows:

Further, extracting the translated names of the alias names in the target language from each of the target language sentences respectively includes:

In S701, if it is detected that the target language sentence contains the phrase corresponding to the target entity, then the target language sentence is identified as a valid sentence.

In this embodiment, before identifying the translated name, the generating device of the knowledge graph can filter the generated target language sentences, delete the target language sentences that do not contain the target object, and only translate the name of the target language sentences containing the target entity To improve the accuracy of the translated name recognition. Because in the process of translating the source language sentence into the target language sentence, the alias name and the adjacent characters in the sentence template may be combined to form new words, resulting in the ambiguity of the source language sentence in the translation process, resulting in An error occurs when converting to the same vector code, and the output target language sentence may not contain the target entity.

For example, the alias name of a target entity is "sentence", and importing "sentence" into a sentence template constitutes "generating sentence". In the process of translating the above phrase, "idiom" may be recognized as A phrase splits the target entity of "sentence", resulting in that the translated sentence in the target language does not have the target entity.

In this embodiment, the device for generating the knowledge graph can identify the entities contained in each target language sentence. If the target language sentence does not contain the target entity, then the target language sentence is identified as an invalid sentence; otherwise, if the target language sentence is If the target entity is included in the target language sentence, the target language sentence is identified as a valid sentence, and the phrase corresponding to the target entity in the target language sentence is marked.

Optionally, the device for generating the knowledge graph can identify the source language sentence corresponding to the invalid sentence, and determine the alias name corresponding to the source language sentence. If there are multiple sentence templates, the source language sentence is regenerated from another template different from the previous sentence template for the aforementioned alias name to re-identify the translated name corresponding to the alias name.

In S702, the phrase corresponding to the target entity in the valid sentence is recognized as the translated name.

In this embodiment, the generating device of the knowledge graph uses the phrase corresponding to the target entity in the effective sentence as the translated name of the alias name, and establishes the mapping relationship between the alias name and the translated name.

In the embodiment of the present application, by validating the target language sentence before recognizing the translated name, the recognition operation of the translated name can be made more accurate, thereby improving the accuracy of the transfer relationship.

FIG. 8 shows a specific implementation flowchart of a method S102 for generating a knowledge graph provided by the fifth embodiment of the present application. Referring to FIG. 5, with respect to the embodiment described in FIG. 1, S102 in a method for generating a knowledge graph provided by this embodiment includes: S1021 to S1023, which are detailed as follows:

Further, the respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus includes:

In S1021, extract the target text containing the target entity from the corpus.

In this embodiment, training texts collected from multiple different channels can be stored in the corpus. For example, the corpus can receive text data input by the user, such as articles imported by the user, interaction records of social applications (including chat records and interaction information), and can also automatically download text data from the Internet. After obtaining a training text, the generating device of the knowledge graph can identify the entities contained in the training text, establish the corresponding relationship between the entities and the training text, and establish the entity index table. The device for generating the knowledge graph can extract the target text containing the target entity from the corpus based on the above-mentioned entity index table.

In S1022, identify associated entities in the target text other than the target entity.

In this embodiment, the device for generating the knowledge graph can locate the entities contained in the target text through a named entity recognition (NER) algorithm, and recognize entities other than the target entity as related entities of the target entity.

For example, a certain target text is specifically "The Bird's Nest is located opposite the Water Cube, which is the stadium of the 2008 Beijing Olympic Games", and the target entity is "Bird's Nest". The NER algorithm can identify the entity contained in the above target text as "Bird's Nest", "Water Cube", "Beijing", "Olympics" and "Gymnasium", therefore, it can be determined that other identifications other than "Bird's Nest" are related entities of the target entity "Bird's Nest". It should be noted that the relationship between related entities is bidirectional, that is, the "Water Cube" is the related entity of the "Bird's Nest", and the "Bird's Nest" is also the related entity of the "Water Cube".

In S1023, the co-occurrence relationship between the alias name and the associated entity is obtained according to the alias name corresponding to the target entity in the target text.

In this embodiment, the device for generating the knowledge graph can identify the alias name used by the target entity in the target text based on the source language, create a name node for the alias name, and create a co-occurrence relationship between the alias name and the associated entity. If there are multiple target texts for an alias name, all associated entities recorded in each target text can be added to the co-occurrence relationship corresponding to the name node.

In the embodiment of the present application, the target text containing the alias name is extracted from the text data recorded in the corpus, and the co-occurrence relationship of the alias name is established based on the associated entities recorded in the target text, which realizes the name-granularity The construction of the co-occurrence relationship can accurately identify the context and scene used by each alias name, thereby improving the accuracy of the response of the artificial intelligence service.

FIG. 9 shows a specific implementation flowchart of a method for generating a knowledge graph provided by the sixth embodiment of the present application. Referring to FIG. 9, with respect to any of the embodiments described in FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8, the method for generating a knowledge graph provided by this embodiment further includes: S901 to S904, which are detailed as follows :

Further, after the constructing a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities, the method further includes:

In S901, a sentence to be translated based on the source language is received, and the entity to be translated included in the sentence to be translated is identified to construct an entity relationship of the sentence to be translated.

In this embodiment, as an application example of the knowledge graph, after the knowledge graph generating device constructs a knowledge graph containing multiple target entities, it can use the knowledge graph to provide technical support for translation services, thereby improving translation quality. The commonly used translation technology is the NMT model based on LSTM-RNN. The NMT model can adopt an end-to-end translation scheme. The encoding module-decoding module model converts the source language sentence into a hidden state vector, and then uses the decoding module of the target language to convert the hidden state The vector is converted into natural language text based on the target language.

For example, FIG. 10 shows a translation flow chart based on a knowledge graph provided by an embodiment of the present application. As shown in FIG. 10, after receiving the text data to be translated, the text data is first preprocessed, namely The text data is imported into the translation preprocessing module to identify the source language of the text data and the target language to which it needs to be translated. After determining the source language and the target language, the preprocessing module sends the above-identified information to the knowledge graph module to switch the knowledge graph to the detection mode corresponding to the source language, that is, select the natural language understanding corresponding to the source language ( Natural Language Understanding (NLU) algorithm, through the knowledge graph module combined with the knowledge data to perform NLU analysis on the text data, mark the entity contained in the text data, determine the entity name corresponding to the entity in the target language in the generated knowledge graph and return it to Preprocessing module. The preprocessing module removes the entities in the text data according to the entity list returned by the knowledge graph module, and replaces them with the agreed special characters. The special characters can be determined according to the entity type, and the text data after replacing the special characters is sent to the NMT module Perform the standard translation process and obtain the translation results. The replaced special characters will be retained in the results to determine the correspondence between the entities in the text data and the entities in the translated text. Finally, merge the entity translation result returned by the knowledge graph and the original translation result returned by NMT to obtain the final translation result. It can be seen that if the knowledge graph is constructed with entity as the granularity, when obtaining the translated name of each entity in the text data in the target language, the translated name corresponding to different alias names will not be distinguished, thereby reducing the translation operation. accuracy. Based on this, this application is based on the granularity of the alias name to construct the translation relationship between the alias name and the translated name, so that the alias name used by the entity in the text data can be identified, and the alias name in the current text data can be determined The corresponding translated name, so that the translated name matches the current context and grammatical habits, making the translated translation more accurate.

In this embodiment, the device for generating the knowledge graph can perform semantic analysis on the sentence to be translated, identify the translation entities contained in the sentence to be translated through the NLU algorithm, and construct the entity relationship of all the identified translation entities with respect to the sentence to be translated.

For example, if a sentence to be translated is "The National Grand Theater of China was designed by the French architect Paul Andrew and is the largest theater complex in Asia", the NLU algorithm can identify the translation entities including "China" and "National "Grand Theatre", "France", "architect", "Asia", "theatre" and "complex" establish the co-occurrence relationship of the translation entities mentioned above, and the co-occurrence relationship is the entity relationship of the sentence to be translated.

In S902, a translation relationship corresponding to the entity to be translated based on the target language is extracted from the knowledge graph; the translation relationship includes at least one translated name of the entity to be translated.

In this embodiment, after determining the translation entities included in the sentence to be translated, the knowledge graph generating device can query the knowledge graph for the entity node corresponding to each translation entity, and extract the corresponding translation relationship from the entity node. The translation relationship records at least one translated name of the translation entity.

Optionally, if the translation entity's translation relationship between each alias name and the translated name is recorded in the knowledge graph, the generating device of the knowledge graph can identify the alias name used in the sentence to be translated, and based on the difference between the alias name and the translated name The translation relationship between the two determines the target translation name corresponding to the translation entity in the sentence to be translated without performing the matching degree calculation operation of S903. If the translation relationship between each alias name of the translation entity and the translated name is not recorded in the knowledge graph, or one alias name corresponds to multiple translated names, perform the operation of S903 to determine the specific translated name used in the sentence to be translated name.

In 903, the degree of matching between the sentence to be translated and the translated name is calculated according to the entity relationship and the co-occurrence relationship of the translated name.

In this embodiment, the device for generating the knowledge graph can determine the degree of matching between each translated name and the current sentence to be translated based on the entity relationship and the co-occurrence relationship of each translated name corresponding to the translation entity. Since the translated names used in different contexts are different, it is necessary to determine the degree of matching between each translated name and the sentence to be translated in the context of the sentence to be translated, so as to select the translated name that best fits the context. Thereby improving the accuracy of translation operations.

Optionally, the way of calculating the degree of matching between the translated sentence and the translated name may be: the knowledge graph generating device can identify the entity to be translated corresponding to the translated name as the reference entity, and identify other entities in the entity relationship except for the reference entity The entity is identified as a reference entity, and it is judged whether there is a reference entity in the co-occurrence relationship of the translated name. If it exists, the co-occurrence relationship is used to determine the number of co-occurrences between the reference entity and the translated name, and based on the translated name and all references The number of co-occurrences between entities and the number of entities of reference entities that have a co-occurrence relationship determine the degree of matching between the sentence to be translated and the translated name.

Further, as another embodiment of the present application, S903 may specifically be:

Sim(E1,E2)=∑ _{ei∈Context(E1),ej∈Context(E2)} max sim _entity (ei,ej);

In this embodiment, E1 is the entity to be translated based on the source language, and E2 is the translated name of the entity to be translated based on the target language. The generating device of the knowledge graph can calculate the similarity between each entity in the entity set corresponding to the co-occurrence relationship of the entity to be translated in the source language and the co-occurrence relationship of the translated name, and select the maximum value of the matching degree as the feature matching The degree of matching of all features is accumulated, and the degree of matching between the translated name obtained by calculation and the entity to be translated in the sentence to be translated is calculated.

Among them, the matching degree calculation between different entities can refer to the sim _entity (ei, ej) function. The knowledge graph generation device only calculates the mutual similarity between two entities of the same entity type. If one of the entity relationships is In the co-occurrence relationship between the entity and the translated name, if one of the entities is between two entities of different types, the similarity between the above two entities will not be calculated, which can greatly reduce a large number of invalid similarity calculation operations. The generating device of the knowledge graph selects the corresponding similarity calculation model according to the entity type, namely Similarity _type(p) (ei[p],ej[p]). For example, the two entities are "old man" and "teenager" respectively. The entity type corresponding to each entity is "age", then the age similarity calculation model is obtained to calculate the similarity between the above two entities. In the above function, ei[p] is the parameter value of the entity type of the i-th entity to be translated; ej[p] is the parameter value of the entity type of the j-th associated entity, continue to The two entities "old man" and "young man" are used as examples to illustrate. The corresponding age of "old man" is 70 years old or above, and the parameter value of the entity type can be set to 70, while the age corresponding to "teenager" is 18. Age to 30, the parameter value for the entity type can be set to 20, and the above two parameter values can be imported into the age similarity calculation model to calculate the similarity between the two entities.

In S904, based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.

In this embodiment, after calculating the matching degree between each translated name and the sentence to be translated, the translated name with the highest matching degree value can be selected as the target translated name corresponding to the entity to be translated in this translation operation, and Each translated name is imported into the corresponding area in the translated name that does not contain the entity output by the NMT algorithm, so that the translated sentence of the sentence to be translated in the target language is obtained, and the operation of sentence translation is completed.

Optionally, after determining the target translated name of the entity to be translated in the sentence to be translated, the generating device of the knowledge graph may establish the relationship between the alias to be translated and the target translated name based on the alias to be translated appearing in the sentence to be translated. The translation relationship between the two, and the translation relationship is added to the knowledge graph, which realizes the intelligent learning translation relationship.

In the embodiment of the present application, by obtaining the entity relationship of the sentence to be translated, and according to the entity relationship and the co-occurrence relationship of each translated name, the translated name of the entity to be translated in the current context is determined, and the knowledge graph is used to support the translation decision. Improve the accuracy of translation.

FIG. 11 shows a schematic structural diagram of a translation system based on a knowledge graph provided by an embodiment of the present application. As shown in FIG. 11, the knowledge graph-based translation system includes: a translation service cloud service system 111, a knowledge graph generating device 112, an intelligent annotation server 113, a cloud database server 114, a user terminal 115, and a third-party application platform 116.

The translation service cloud system 111 includes a text retrieval module, a translation service response module, and a data access module. Among them, the data access module is used to send and receive data with various other devices, and the translation service response module is used to receive the translation service sent by the user terminal for data encapsulation, obtain the translation result and return it to the user terminal, and text retrieval The module is used to extract the text data in the translation request and perform preprocessing operations on the text data.

The knowledge graph generating device 112 includes a translation error correction module, a knowledge graph module, a translation module, and a data management module. Among them, the translation error correction module is used to detect whether the sentence to be translated contained in the translation request contains the content that needs to be corrected, and correct the sentence to be translated through terminology correction, name correction, whole sentence correction, etc., and The sentence to be translated after the error correction process is sent to the translation module, and the translation operation is performed through the translation module. The specific translation process can be referred to the translation process shown in FIG. 10, which will not be repeated here. The data management module can be used to cache the received data and shield the sensitive fields containing the user's identity information, so as to protect the user's private information.

The intelligent labeling server 113 includes a login authentication module, a web page web module, and a server Service module. Identify the identity through the login authentication module of the smart label server, determine the validity of the service request, and display the data table of the cloud database server through the web module, and update the data stored in the cloud database server through the server module with the collected data .

The cloud database server 114 can include a database based on the MySQL framework, a database based on the Hadhoop framework, etc. The cloud database server can be used to store cloud data required for translation operations, such as corpus learned from various channels, and initiated by user terminals. Historical translation records and knowledge required to construct a knowledge graph, etc.

The user terminal 115 can initiate a service request through a built-in application, and the intelligent translation engine can determine the translation channel required for the service request. For voice translation, the voice data of the corresponding translated word can be obtained through the corresponding third-party platform; For word text translation, you can obtain the translation data of the corresponding word translation through the corresponding third-party platform; for sentence text translation, you can output the translated sentence of the sentence to be translated through the built-in translation module of the intelligent graph generating device, which is the same as for The type of translation request is different, and the corresponding translation response path can be determined through the intelligent translation engine.

The third-party application platform 116 may include multiple different third-party translation applications to support part of the translation operations of the entire translation system, such as word translation, word voice query, and so on.

The process of the user initiating a sentence translation request illustrates the workflow of the translation system. The user terminal 115 receives the sentence translation request initiated by the user through the application program, and then the intelligent translation engine of the user terminal 115 determines the translation channel required for this translation operation. Since this operation is sentence translation, it needs to pass the knowledge graph generation device 112 to Support this translation operation, and send the sentence translation request carrying the channel identifier to the translation service cloud system 111. The translation service cloud system 111 obtains the sentence translation request through the data access module, and sends the sentence translation request to the knowledge graph generating device 112, and the knowledge graph generating device 112 uses the translation error correction module to perform translation errors in the sentence translation request. Perform preliminary error correction operations on the sentence, and import the corrected sentence to be translated into the preprocessing unit of the translation module. The preprocessing unit identifies the source language and target language of the sentence to be translated, and uses the knowledge graph to identify the sentence to be translated. According to the transfer relationship, determine the corresponding translated name of each alias name in the target language, feedback the translated name to the translation unit, and output the translated sentence of the sentence to be translated through the translation unit, preprocess the translated sentence, and pass The data management module returns to the data access module of the translation service cloud system 111, and encapsulates the translation result through the translation service response module in the translation service cloud system, and returns the translation result to the user terminal.

FIG. 12 shows a corresponding interaction flow chart of each unit in the apparatus for generating a knowledge graph provided by an embodiment of the present application when responding to a translation operation. The device for generating a knowledge graph may include a translation preprocessing unit, a knowledge graph service unit, a knowledge graph index unit, and a knowledge graph graph engine unit. After the knowledge graph generation device receives the translation request, it can extract the sentence to be translated from the translation request, and send the sentence to be translated to the translation preprocessing unit, and the translation preprocessing unit identifies the source language and target language of the sentence to be translated , Send the pre-processed sentence to be translated and the above two parameter information to the knowledge graph service unit, select the NLP model corresponding to the source language through the knowledge graph service unit, and use the NLP model to identify the sentence to be translated by NER to determine the translation Each entity to be translated contained in the sentence is sent to the knowledge graph index unit through each entity to be translated, and the entity node of each entity to be translated is located in the knowledge graph through the knowledge graph index unit, and each entity node is determined according to the knowledge graph index unit The associated name list is to obtain the translated name of each entity to be translated based on the target language. The knowledge graph service unit sends a co-occurrence relationship query request to the knowledge graph engine unit to determine the associated entities that have a co-occurrence relationship with each translated name. The knowledge graph engine unit returns the co-occurrence relationship obtained by the query to the knowledge graph service unit, and through the knowledge graph service unit, selects the target translated name from the translated names corresponding to multiple different alias names, and generates the sentence to be translated according to all the target translated names The translated sentence is returned to the translation preprocessing unit, and the translation result is output.

FIG. 13 shows a specific implementation flowchart of a method for generating a knowledge graph provided by the seventh embodiment of the present application. Referring to FIG. 12, with respect to any one of the embodiments described in FIG. 1, FIG. 4, FIG. 6, FIG. 7 and FIG. 8, the method for generating a knowledge graph provided by this embodiment further includes: S1301 to S1302, which are detailed as follows :

In S1301, the keyword input by the user is received, and the co-occurrence relationship corresponding to the keyword is queried from the knowledge graph.

In this embodiment, as an application example of the knowledge graph, after the knowledge graph generating device constructs a knowledge graph containing multiple target entities, it can use the knowledge graph to provide technical support for the recommendation service, because the knowledge graph is determined according to the corpus The co-occurrence relationship of each alias name in the target entity is further explored in the depth of the knowledge graph. On the basis of the entity, the co-occurrence relationship of different alias names can be explored, so as to determine the relationship between different aliases and related objects. Difference, which can improve the accuracy of the recommended information. For example, for the entity "fen", there are two different alias names of "rice noodles" and "rice noodles", and different alias names often match other entities differently, such as "fatchang rice noodles" and "crossing bridge rice noodles". Corresponding to the matching entity that is different from the alias name, it can identify the user's associated tastes, eating habits, etc., for the "entity" as the granularity to determine the recommended information, the co-occurrence relationship established by the "alias name" as the granularity can be mined The recommended information obtained is more accurate.

In this embodiment, the device for generating the knowledge graph can receive the keyword input by the user, and identify the entity corresponding to the keyword, and the alias name used by the keyword, and obtain the knowledge associated with the alias name in the knowledge graph Node, and extract the co-occurrence relationship of the alias name from the knowledge node.

In S1302, output recommendation information of the user according to the co-occurrence relationship.

In this embodiment, the device for generating a knowledge graph can select a corresponding recommended entity according to the number of co-occurrences of each associated entity in the co-occurrence relationship, and output recommendation information based on the recommended entity. The recommendation information can obtain different recommendation results according to different scenarios. For example, in a search scenario, the associated keyword of the input keyword can be output, and the associated keyword is the keyword corresponding to the entity with more co-occurrences. , And display the search results containing the associated keywords in the earlier position, that is, determine the display order based on the number of associated keywords contained in the search results and the number of co-occurrences between each associated keyword and the entered keyword , And output the display results based on the display order; for example, in a product purchase scenario, you can determine the associated product keywords based on the keywords entered by the user, and determine the recommended products based on the product keywords, and generate a product recommendation list. Product keywords are obtained based on the co-occurrence relationship corresponding to the alias name used by the input keywords; for example, in the output scene of a user portrait, multiple co-occurrence relationships can be identified from the co-occurrence relationship based on the keywords entered by the user Reality, and output the user tag of the user based on the common reality and keywords.

In the embodiment of the present application, by constructing a knowledge graph with the granularity of "name", the accuracy of recommended information can be further improved in the field of intelligent recommendation.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Corresponding to the method for generating a knowledge graph described in the above embodiment, FIG. 14 shows a structural block diagram of a device for generating a knowledge graph provided by an embodiment of the present application. For ease of description, only the information related to the embodiment of the present application is shown. section.

Referring to Figure 14, the device for generating the knowledge graph includes:

The translation relationship establishment unit 141 is configured to establish a translation relationship of multiple alias names of the target entity based on the target language;

The co-occurrence relationship generating unit 142 is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;

The knowledge graph construction unit 143 is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.

Optionally, the translation relationship establishment unit 141 includes:

The source language sentence acquiring unit is used to separately acquire the source language sentence including each of the alias names;

The target language sentence acquiring unit is configured to output the target language sentence corresponding to each source language sentence according to the translation model between the source language and the target language;

The translated name recognition unit is configured to extract the translated name of the alias name in the target language from each sentence in the target language;

The translation relationship determining unit is used to establish the translation relationship between the alias name and the translated name name.

Optionally, the source language sentence acquisition unit includes:

The sentence template obtaining unit is used to obtain the sentence template associated with the entity type according to the entity type of the target entity; the sentence template importing unit is used to import each of the alias names into the sentence template to generate the source Language statements.

Optionally, the translated name recognition unit includes:

A valid sentence selection unit is used to identify the target language sentence as a valid sentence if it is detected that the target language sentence contains the phrase corresponding to the target entity; the keyword group recognition unit is used to compare the valid sentence with The phrase corresponding to the target entity is identified as the translated name.

Optionally, the co-occurrence relationship generation unit 142 includes:

The target text extraction unit is used to extract the target text containing the target entity from the corpus; the associated entity recognition unit is used to identify the associated entities other than the target entity in the target text; the co-occurrence relationship establishment unit, It is used to obtain the co-occurrence relationship between the alias name and the associated entity according to the alias name corresponding to the target entity in the target text.

Optionally, the device for generating the knowledge graph further includes: an entity-to-be-translated recognition unit for receiving a sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the The entity relationship of the sentence to be translated; a translation relationship extraction unit for extracting the translation relationship of the entity to be translated based on the target language from the knowledge graph; the translation relationship includes at least one translated name of the entity to be translated Name; the matching degree calculation unit is used to calculate the matching degree between the sentence to be translated and the translated name based on the entity relationship and the co-occurrence relationship of the translated name; the translated sentence output unit is used to calculate the degree of matching between the sentence to be translated and the translated name; According to the matching degree, the target translated name of the entity to be translated is determined from all the translated names, and the translation sentence based on the target language of the sentence to be translated is output according to all the target translated names.

Optionally, the matching degree calculation unit is specifically configured to:

Sim(E1,E2)=∑ _{ei∈Context(E1),ej∈Context(E2)} max sim _entity (ei,ej);

Optionally, the device for generating the knowledge graph further includes:

The keyword receiving unit is configured to receive keywords input by the user, and query the co-occurrence relationship corresponding to the keywords from the knowledge graph;

The recommendation information output unit is configured to output the recommendation information of the user according to the co-occurrence relationship.

Therefore, the device for generating a knowledge graph provided by the embodiment of the present application can also establish a transfer relationship for each knowledge node in the knowledge graph, that is, the target entity, to connect the knowledge nodes between different languages, and expand each knowledge node by constructing a co-occurrence relationship. The depth of knowledge of the knowledge node is not limited to the attributes of the target entity itself, it improves the associative ability of each knowledge node, and the breadth and depth of the knowledge graph, thereby improving the accuracy of artificial intelligence output results and improving the quality of service response.

FIG. 15 is a schematic structural diagram of a terminal device provided by an embodiment of this application. As shown in FIG. 15, the terminal device 15 of this embodiment includes: at least one processor 150 (only one is shown in FIG. 15), a processor, a memory 151, and a processor stored in the memory 151 and capable of being processed in the at least one processor. The computer program 152 running on the processor 150, when the processor 150 executes the computer program 152, implements the steps in any of the above-mentioned methods for generating the knowledge graph.

The terminal device 15 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device may include, but is not limited to, a processor 150 and a memory 151. Those skilled in the art can understand that FIG. 15 is only an example of the terminal device 15 and does not constitute a limitation on the terminal device 15. It may include more or less components than shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.

The so-called processor 150 may be a central processing unit (Central Processing Unit, CPU), and the processor 150 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 151 may be an internal storage unit of the terminal device 15 in some embodiments, such as a hard disk or a memory of the terminal device 15. In other embodiments, the memory 151 may also be an external storage device of the ** device/terminal device 15, for example, a plug-in hard disk equipped on the terminal device 15, a smart memory card (Smart Media Card, SMC). ), Secure Digital (SD) card, Flash Card, etc. Further, the memory 151 may also include both an internal storage unit of the terminal device 15 and an external storage device. The memory 151 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 151 can also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat it here.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and running on the at least one processor, and the processor executes The computer program implements the steps in any of the foregoing method embodiments.

The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be realized.

The embodiments of the present application provide a computer program product. When the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be realized when the mobile terminal is executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, according to legislation and patent practices, computer-readable media cannot be electrical carrier signals and telecommunication signals.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network equipment and method may be implemented in other ways. For example, the device/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Finally, it should be noted that the above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions within the technical scope disclosed in this application shall be covered by this application. Within the scope of protection applied for. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A method for generating a knowledge graph, which is characterized in that it includes:

Determine the translated name of each alias name of the target entity in the target language, and generate the translation relationship of the target entity according to the alias name and the translated name;

Respectively generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;

Construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
The generating method according to claim 1, wherein said determining the translated name of each alias name of the target entity in the target language, and generating the translation relationship of the target entity based on the alias name and the translated name ,include:

Obtain the source language sentences containing each of the alias names respectively;

Output a target language sentence corresponding to each source language sentence according to a translation model between the source language and the target language;

Extracting the translated name of the alias name in the target language from each of the target language sentences;

Establish the translation relationship between the alias name and the translated name name.
The generating method according to claim 2, wherein said separately obtaining source language sentences containing each of said alias names comprises:

Obtaining a sentence template associated with the entity type according to the entity type of the target entity;

Import each of the alias names into the sentence template to generate the source language sentence.
The generating method according to claim 2, wherein the extracting the translated name of the alias name in the target language from each of the target language sentences respectively comprises:

If it is detected that the target language sentence contains the phrase corresponding to the target entity, identifying the target language sentence as a valid sentence;

Identify the phrase corresponding to the target entity in the valid sentence as the translated name.
The generating method according to claim 1, wherein the generating the co-occurrence relationship of each of the alias names in the target entity through a preset corpus comprises:

Extracting the target text containing the target entity from the corpus;

Identifying related entities in the target text other than the target entity;

According to the alias name corresponding to the target entity in the target text, the co-occurrence relationship between the alias name and the associated entity is obtained.
The generating method according to any one of claims 1 to 5, characterized in that, after constructing a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities, the method further comprises:

Receiving the sentence to be translated based on the source language, and identifying the entity to be translated included in the sentence to be translated, so as to construct the entity relationship of the sentence to be translated;

Extracting, from the knowledge graph, a translation relationship corresponding to the entity to be translated based on the target language; the translation relationship includes at least one translated name of the entity to be translated;

Calculating the degree of matching between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name;

Based on the matching degree, determine the target translated name of the entity to be translated from all the translated names, and output the translated sentence based on the target language of the sentence to be translated according to all the target translated names.
The generating method according to claim 6, wherein the calculating the matching degree between the sentence to be translated and the translated name according to the entity relationship and the co-occurrence relationship of the translated name comprises:

The co-occurrence relationship between the entity relationship and the translated name is imported into a preset matching degree calculation function to calculate the matching degree; the matching degree calculation function is specifically:

Sim(E1,E2)=Σ ei∈Context(E1),ej∈Context(E2) max sim entity (ei,ej);

sim entity (ei,ej)=Σ p∈Prop(ei)∩Prop(ej) ω p Simlarity type(p) (ei[p],ej[p])

Wherein, Sim (E1, E2) is the degree of matching between the entity to be translated and the translated name; Context (E1) is the co-occurrence of the entity to be translated E1 in the knowledge graph The associated entity included in the relationship; Context(E2) is the associated entity included in the co-occurrence relationship of the translated name E2; ei is the i-th associated entity in the co-occurrence relationship of the entity to be translated E1; ej is the j-th associated entity in the co-occurrence relationship of the translated name E2; Prop(ei) is the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; Prop (ej) is the entity type of the j-th associated entity in the co-occurrence relationship of the translated name E2; ω p is the weight value corresponding to the entity type; Similarity type(p) (ei[p], ej[p]) is the matching degree function corresponding to the entity type; ei[p] is the parameter value of the entity type of the i-th associated entity in the co-occurrence relationship of the entity E1 to be translated; ej[p] Is the parameter value of the entity type of the jth associated entity in the co-occurrence relationship of the jth translated name E2.
The generating method according to any one of claims 1 to 5, further comprising:

Receiving keywords input by the user, and querying the co-occurrence relationship corresponding to the keywords from the knowledge graph;

Output the recommendation information of the user according to the co-occurrence relationship.
A device for generating a knowledge graph, which is characterized in that it comprises:

The translation relationship establishment unit is used to establish the translation relationship of multiple alias names of the target entity based on the target language;

The co-occurrence relationship generation unit is configured to generate the co-occurrence relationship of each of the alias names in the target entity through a preset corpus;

The knowledge graph construction unit is configured to construct a knowledge graph according to the translation relationship and the co-occurrence relationship corresponding to all the target entities.
A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to 8. The method of any one.
A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 8 when the computer program is executed by a processor.