CN117972109A

CN117972109A - Knowledge graph generation method, device, equipment, storage medium and program product

Info

Publication number: CN117972109A
Application number: CN202410139316.2A
Authority: CN
Inventors: 许猛; 陈永录; 张飞燕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-05-03

Abstract

The application provides a knowledge graph generation method, a knowledge graph generation device, knowledge graph generation equipment, a storage medium and a program product, and relates to the technical field of big data. The method comprises the following steps: receiving an image generation instruction sent by a control terminal; acquiring a triplet data set corresponding to the image generation instruction; inputting the triplet data set into a map generation model to obtain a knowledge map output by the map generation model, wherein the map generation model is obtained by adopting complement map training. The method solves the problems of lower accuracy and lack of comprehensiveness of the current knowledge graph drawing method.

Description

Knowledge graph generation method, device, equipment, storage medium and program product

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for generating a knowledge graph.

Background

The current explosive growth of internet information has the characteristics of large scale, diversification, loose organization structure and the like, so that people face a great challenge in effectively acquiring information and knowledge. Under the background, the knowledge graph is gradually favored by various large internet enterprises and other fields as a novel mode for managing mass information, and is widely applied to the fields of searching, electronic commerce, social contact and the like.

Currently, a TransE (TRANSLATING EMBEDDINGS, translation embedding) model is generally adopted to draw a knowledge graph in the prior art.

However, the inventor finds that at least the following technical problems exist in the prior art: the accuracy of the current knowledge graph drawing method is low.

Disclosure of Invention

The application provides a knowledge graph generation method, a device, equipment, a storage medium and a program product, which are used for solving the technical problem of lower accuracy of the current knowledge graph drawing method.

In a first aspect, the present application provides a method for generating a map, comprising: receiving an image generation instruction sent by a control terminal; acquiring a triplet data set corresponding to the image generation instruction; inputting the triplet data set into a pattern generation model to obtain a knowledge pattern output by the pattern generation model, wherein the pattern generation model is obtained by adopting complement pattern training.

In one possible implementation, a process of complement atlas training is employed, including: acquiring a training triplet data set, wherein the training triplet data set comprises at least one triplet data, and the triplet data comprises a head entity vector, a relation vector and a tail entity vector; inputting the triplet data into a path reasoning model to obtain each path vector from the target head entity vector to the target tail entity vector and the confidence corresponding to each path vector; if the confidence coefficient corresponding to any path vector is larger than a preset confidence coefficient threshold value, determining any path vector as a candidate vector; calculating the sum of the candidate vectors to obtain a target path vector; calculating a first score according to the target head entity vector, the target tail entity vector and the target path vector; determining a second score corresponding to the target head entity vector and the target tail entity vector by adopting a triplet data set; adding the first fraction and the second fraction to obtain a total fraction; if the total score is greater than a preset score threshold, determining the triplet data corresponding to the target head entity vector and the target tail entity vector as training triples; creating a negative case triplet by adopting a triplet data set; and performing model training by adopting at least one training triplet and negative example triplet to obtain a map generation model.

In one possible implementation, determining the second score corresponding to the target head entity vector and the target tail entity vector using the triplet data set includes: clustering all the relation vectors in the triplet data set to obtain at least one relation cluster, wherein the relation cluster corresponds to at least one relation vector; calculating a relation cluster vector corresponding to the relation cluster; calculating a sub-relation vector corresponding to the triplet data in the triplet data set; clustering the sub-relationship vectors to obtain at least one sub-relationship cluster; calculating a sub-relationship cluster vector corresponding to the sub-relationship cluster; determining a comprehensive relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to a target head entity vector and a target tail entity vector; and determining a second score according to the target head entity vector, the comprehensive relation vector and the target tail entity vector.

In one possible implementation, determining the integrated relationship vector from the target relationship vector, the relationship cluster vector, and the sub-relationship cluster vector includes: determining a target relation cluster vector and a target sub-relation cluster vector corresponding to the target relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to a target head entity vector and a target tail entity vector; and adding the target relation vector, the target relation cluster vector and the target sub-relation cluster vector to obtain a comprehensive relation vector.

In one possible implementation manner, determining a target relationship cluster vector and a target sub-relationship cluster vector corresponding to the target relationship vector according to the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector, includes: calculating a first distance between the target relation vector and each relation cluster vector; determining a relation cluster vector with the smallest corresponding first distance as a target relation cluster vector; calculating a second distance between the target relation vector and each sub-relation cluster vector; and determining the sub-relation cluster vector with the smallest corresponding second distance as a target sub-relation cluster vector.

In one possible implementation manner, calculating a relationship cluster vector corresponding to the relationship cluster includes: and calculating the average value of the relation vectors corresponding to the relation clusters to obtain the relation cluster vectors.

In a second aspect, the present application provides a map generating apparatus comprising: the receiving module is used for receiving an image generation instruction sent by the control terminal; the acquisition module is used for acquiring a triplet data set corresponding to the image generation instruction; the generation module is used for inputting the triplet data set into the atlas generation model to obtain the knowledge atlas output by the atlas generation model, wherein the atlas generation model is obtained by adopting the complement atlas training.

In a third aspect, the present application provides an electronic device comprising: a processor, a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method as described in the first aspect.

In a fourth aspect, the application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out the method as described in the first aspect.

In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the first aspect.

According to the knowledge graph generation method, the device, the equipment, the storage medium and the program product, after receiving the image generation instruction sent by the control terminal, the triplet data set corresponding to the image generation instruction is obtained, and the knowledge graph is generated according to the triplet data set by adopting the graph generation model obtained by the complement graph training, so that the effect of increasing the accuracy and the completeness of the generated graph is realized by adopting the graph generation model obtained by the complement graph training.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of an application scenario of a map generating method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a map generation method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a map generating apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.

It should be noted that the knowledge graph generating method, apparatus, device, storage medium and program product of the present application may be used in the technical field of big data, and may also be used in any field other than the technical field of big data.

With the continuous rapid growth of internet information, the large-scale, diversified and loose organization structure characteristics of the internet provide great challenges for people to effectively acquire and absorb knowledge. In this context, knowledge maps are increasingly favored by large internet enterprises and other fields as an innovative information management method. The graph technology is widely applied to the fields of searching, electronic commerce, social contact and the like, and becomes an important tool for organizing and managing massive information.

In the prior art, transE models are a common method for drawing knowledge maps. The model combines the embedded representations of the head entities and relationships by embedding the entities and relationships in a low-dimensional space using a translation transformation to predict the embedded representation of the tail entities. However, the TransE model also has the problems of neglecting part of the relation and low accuracy of the drawn knowledge graph.

In order to solve the technical problems, the inventor proposes the following technical ideas: model training is carried out by adopting data after information complementation, and knowledge graph drawing is carried out by adopting a model obtained by training.

The application provides a data transmission method, which aims to solve the technical problems in the prior art.

Fig. 1 is a schematic diagram of an application scenario of a map generating method according to an embodiment of the present application. As in fig. 1, in this scenario, it includes: control terminal 101, server 102.

In a specific implementation process, the control terminal 101 may include a computer, a server, a tablet, a mobile phone, a palm (Personal DIGITAL ASSISTANT, PDA), a notebook, etc., which may perform data input and data transmission.

The server 102 may be implemented using a server or a cluster of multiple servers with greater processing power and security, and may be replaced with a more computationally powerful computer, notebook, or the like, if possible.

The connection between the server 102 and the control terminal 101 may be wired or wireless.

And the server 102 is configured to receive the instruction sent by the control terminal 101, and process the triplet data set according to the instruction, so as to obtain a knowledge graph.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

It should be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the method of generating the map. In other possible embodiments of the present application, the architecture may include more or less components than those illustrated, or some components may be combined, some components may be split, or different component arrangements may be specifically determined according to the actual application scenario, and the present application is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

Fig. 2 is a schematic flow chart of a map generation method according to an embodiment of the present application. The execution subject of the embodiment of the present application may be the server 102 in fig. 1, or may be a computer and/or a mobile phone, which is not particularly limited in this embodiment. As shown in fig. 2, the method includes:

s201: and receiving an image generation instruction sent by the control terminal.

In this step, the image generation instruction may be received by receiving a packet or a message.

S202: and acquiring a triplet data set corresponding to the image generation instruction.

In this step, the image generation instruction may include feature information such as a storage address or an identifier of the triplet data set, and the corresponding triplet data set is read according to the feature information. The triplet data set may contain entity data, relationship data, triplet data, etc.

S203: inputting the triplet data set into a pattern generation model to obtain a knowledge pattern output by the pattern generation model, wherein the pattern generation model is obtained by adopting complement pattern training.

In this step, the atlas-generating model may be pre-trained. The triplet data set can be input into the atlas generating model by adopting a preset instruction, or the atlas generating model can be operated to read the triplet data set.

As can be seen from the description of the above embodiment, in the embodiment of the present application, after receiving the image generation instruction sent by the control terminal, the triplet data set corresponding to the image generation instruction is obtained, and the knowledge graph is generated according to the triplet data set by using the graph generation model obtained by the complement graph training, so that the effect of increasing the accuracy and integrity of the generated graph is achieved.

In a possible implementation manner, in step S203, a process of completing atlas training is adopted, including:

S301: a training triplet data set is obtained, wherein the training triplet data set comprises at least one triplet data, and the triplet data comprises a head entity vector, a relation vector and a tail entity vector.

In this step, the triplet data set may be preconfigured by the staff member.

S302: and inputting the triplet data into a path reasoning model to obtain each path vector from the target head entity vector to the target tail entity vector and the confidence corresponding to each path vector.

In this step, the target head entity vector h is any head entity vector, and the target tail entity vector t is any tail entity vector. The Path inference model may be a PCRA (Path-constraint resource allocation, path constrained resource allocation) algorithm model. The PCRA algorithm measures the reliability of the path vector p by measuring the amount of resources that ultimately flow from the target head entity vector h to the target tail entity vector t. If the amount of resources eventually flowing to the target tail entity t is larger, we can consider this path vector p to be more reliable, as it is dominated by more fixed resources. The magnitude of this resource amount may be used as a weight value for the path vector p to help determine the reliability relationship (confidence) between the target head entity vector h and the target tail entity vector t.

S303: if the confidence coefficient corresponding to any path vector is larger than a preset confidence coefficient threshold value, determining any path vector as a candidate vector.

In this step, the confidence threshold may be preset by the staff.

For example, if the confidence threshold is 0.01, if the confidence corresponding to the path vector a is 0.02, determining the path vector a as a candidate vector; if the confidence coefficient corresponding to the path vector B is 0.015, determining the path vector B as a candidate vector; if the confidence corresponding to the path vector C is 0.004, the path vector C is not determined as a candidate vector. Confidence thresholds of 0.015, 0.02, etc. again

S304: and calculating the sum of the candidate vectors to obtain the target path vector.

In this step, the candidate vectors may be input into a preset program or script, and added to obtain the target path vector. It may also involve computing a sum of candidate vectors using pre-written code segments to arrive at the target path vector.

For example, if the current candidate vector includes candidate vector A, B, C, D, E, the sum of these several vectors is calculated to obtain the target path vector.

S305: and calculating a first score according to the target head entity vector, the target tail entity vector and the target path vector.

In this step, adding the target head entity vector to the target path vector and subtracting the target tail entity vector may be included to obtain the first score.

S306: and determining a second score corresponding to the target head entity vector and the target tail entity vector by adopting the triplet data set.

In this step, inputting the triplet data set into a preset second score calculating program or script, so that the second score calculating program or script calculates a second score corresponding to each head entity vector and each tail entity vector in common.

S307: and adding the first fraction and the second fraction to obtain a total fraction.

In this step, for example, the first score is 10, the second score is 20, and the total score is 30; for another example, the first score is 40 and the second score is 30, then the total score is 70.

S308: if the total score is greater than a preset score threshold, determining the triplet data corresponding to the target head entity vector and the target tail entity vector as training triples.

In this step, if the total score is greater than a preset score threshold, determining the triplet data including the target head entity vector and the target tail entity vector as a training triplet.

For example, the current target head entity vector is h ₁, the target entity vector is t ₁, and the current triplet data "(h₁,r₁,t₁)、(h₁,r₂,t₁)、(h₁,r₃,t₂)" determines the "(h ₁,r₁,t₁)" and "(h ₁,r₂,t₁)" as training triples. For another example, the current target head entity vector is h ₂, the target entity vector is t ₂, and the current triplet data "(h₂,r₁,t₂)、(h₁,r₂,t₃)、(h₂,r₃,t₂)" determines the "(h ₂,r₁,t₂)" and "(h ₂,r₃,t₂)" as training triples.

The score threshold may be preset by a worker according to experimental data or empirical parameters.

S309: a negative example triplet is created using the triplet dataset.

In this step, at least one entity vector, a relation vector and a tail entity vector in the triplet data may be replaced randomly to obtain a negative example triplet, and at least one triplet data may be replaced to obtain at least one negative example triplet.

For example, there is currently triplet data "(h₁,r₁,t₁)"、"(h₂,r₂,t₂)"、"(h₃,r₃,t₃)", to replace h ₁ therein with a random vector h _x to obtain a negative triplet "(h _x,r₁,t₁)", and replace r3 in "(h ₃,r₃,t₃)" with a random vector r _x to obtain a negative triplet "(h ₃,r_x,t₃)".

S310: and performing model training by adopting at least one training triplet and negative example triplet to obtain a map generation model.

In this step, it may include computing the distances of the entity vector, the relationship vector, and the negative triplet by the minimum loss function, again updating the vector representation of the entity and relationship. The method can also comprise the steps of taking training triples and negative-example triples as training parameters, inputting a model to be trained to obtain an output spectrum, calculating the error between the output spectrum and a standard spectrum, and optimizing the model to be trained by adopting the error until the error is smaller than a preset error threshold value to obtain a spectrum generation model.

The standard atlas can be preset by a worker or can be contained in the training triplet data set.

From the description of the above embodiment, it can be known that, in the embodiment of the present application, by acquiring a training triplet data set, inputting and inputting the triplet data into a path inference model, each path vector from a target head entity vector to a target tail entity vector and a confidence level corresponding to each path vector are obtained, so that the path vector is screened by adopting the confidence level, a candidate vector is obtained, a sum of the candidate vectors is calculated, a target path vector is obtained, a first score and a second score corresponding to the target head entity vector, the target tail entity vector and the target path vector are calculated, a total score is obtained, and if the total score is greater than a preset score threshold, the triplet data corresponding to the target head entity vector and the target tail entity vector are determined as training triples, and model training is performed by adopting the training triples and the negative triplet, so as to obtain a map generation model.

In a possible implementation manner, in the step S306, determining the second score corresponding to the target head entity vector and the target tail entity vector using the triplet data set includes:

S3061: and clustering all the relation vectors in the triplet data set to obtain at least one relation cluster, wherein the relation cluster corresponds to at least one relation vector.

In this step, it may include inputting TransE the triplet dataset into a model, resulting in all relationship vectors. All the relation vectors in the triplet data set are input into the k-means model, so that the k-means model outputs a relation cluster, and the relation cluster can be composed of at least one relation vector.

S3062: and calculating a relation cluster vector corresponding to the relation cluster.

In this step, the calculation method of the relation cluster vector may include calculating an average value of all relation vectors in the relation cluster to obtain the relation cluster vector.

S3063: and calculating the sub-relation vector corresponding to the triplet data in the triplet data set.

In this step, subtracting the head entity vector in any triplet data from the tail entity vector in the triplet data may be included to obtain a corresponding sub-relationship vector. The formula is as follows: the sub-relationship vector corresponding to the triplet data "(h, r, t)" is r '=t-h, where r' represents the sub-relationship vector, h represents the head entity vector, and t represents the tail entity vector.

S3064: clustering the sub-relationship vectors to obtain at least one sub-relationship cluster.

Similar to step S3061, the sub-relationship cluster may also include at least one sub-relationship vector, which is not described herein.

S3065: and calculating sub-relationship cluster vectors corresponding to the sub-relationship clusters.

This step is similar to step S3062 described above, and will not be described again here.

S3066: and determining a comprehensive relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to the target head entity vector and the target tail entity vector.

In this step, the method may include weighting and summing the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector to obtain a comprehensive relationship vector, or determining the relationship cluster vector and the sub-relationship cluster vector closest to the target relationship vector as the target relationship cluster vector and the target sub-relationship cluster vector, and calculating the sum of the target relationship vector, the target relationship cluster vector and the target sub-relationship cluster vector to obtain the comprehensive relationship vector.

The weights corresponding to the target relation vector, the relation cluster vector and the sub-relation cluster vector can be preset by a worker according to experimental data or experience parameters.

S3067: and determining a second score according to the target head entity vector, the comprehensive relation vector and the target tail entity vector.

This step is similar to step S305 described above, and will not be described again here.

From the description of the above embodiment, it can be known that, in the embodiment of the present application, a relationship cluster is obtained by clustering a relationship vector, a sub-relationship cluster is obtained by clustering sub-relationship vectors, a relationship cluster vector corresponding to the relationship cluster and a sub-relationship cluster vector corresponding to the sub-relationship cluster are determined, a comprehensive relationship vector is determined according to a target relationship vector, a relationship cluster vector and a sub-relationship cluster vector, and a second score is determined by combining a target head entity vector, a comprehensive relationship vector and a target tail entity vector, so as to evaluate the hierarchical structure rationality of the relationship vector, thereby facilitating the subsequent selection of a more suitable triplet for model training.

In a possible implementation manner, in step S3066, determining the integrated relationship vector according to the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector includes:

S601: and determining a target relation cluster vector and a target sub-relation cluster vector corresponding to the target relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to a target head entity vector and a target tail entity vector.

In this step, the method may include calculating a distance between the target relation vector and each relation cluster vector, determining a relation cluster vector with the closest corresponding distance as the target relation cluster vector, and similarly obtaining a target sub-relation cluster vector.

S602: and adding the target relation vector, the target relation cluster vector and the target sub-relation cluster vector to obtain a comprehensive relation vector.

In this step, for example, the target relationship vector is r _y, the target relationship cluster vector is r _n, the target sub-relationship cluster vector is r _m, and the integrated relationship vector is r _total＝r_y+r_n+r_m

From the description of the above embodiments, it can be known that, in the embodiment of the present application, by finding the target relationship cluster vector and the target sub-relationship cluster vector corresponding to the target relationship vector, and adding the target relationship vector, the target relationship cluster vector and the target sub-relationship cluster vector, a comprehensive relationship vector is obtained, so as to implement the comprehensive target relationship vector itself and the other two layers of relationship vectors corresponding to the target relationship vector, and increase the accuracy of the vector.

In a possible implementation manner, in the step S601, a target relationship cluster vector and a target sub-relationship cluster vector corresponding to the target relationship vector are determined according to the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector, including:

s6011: and calculating a first distance between the target relation vector and each relation cluster vector.

In this step, a first distance between the target relationship vector and each relationship cluster vector may be calculated by using a preset program or script; the method also can comprise the step of inputting the target relation vector and the relation cluster vector into a preset formula to obtain a corresponding first distance.

S6012: and determining the relation cluster vector with the smallest corresponding first distance as a target relation cluster vector.

In this step, for example, there are currently 4 relationship cluster vectors A, B, C, D, and the corresponding distances are 4, 6, 12, and 9, respectively, and then the relationship cluster vector a is determined as the target relationship cluster vector; for example, there are currently 5 relationship cluster vectors E, F, G, H, I, and the corresponding distances are 7, 2, 7, 6, and 3, respectively, and then the relationship cluster vector F is determined as the target relationship cluster vector.

S6013: and calculating a second distance between the target relation vector and each sub-relation cluster vector.

This step is similar to step S6011 described above, and will not be described again here.

S6014: and determining the sub-relation cluster vector with the smallest corresponding second distance as a target sub-relation cluster vector.

This step is similar to step S6012 described above, and will not be described again here.

As can be seen from the description of the above embodiment, the embodiment of the present application calculates the distances between the target relationship vector and the relationship cluster vector and between the target relationship cluster vector and the target sub-relationship cluster vector, and selects the closest relationship cluster vector and the target sub-relationship cluster vector, so that the subsequent vector summation of different layers is facilitated, and the deep relationship between vectors is obtained.

In a possible implementation manner, in step S3062, calculating a relationship cluster vector corresponding to the relationship cluster includes:

S30321: and calculating the average value of the relation vectors corresponding to the relation clusters to obtain the relation cluster vectors.

In this step, adding the relationship vectors corresponding to the relationship clusters, and dividing the added relationship vectors by the number of relationship vectors corresponding to the relationship clusters to obtain the relationship cluster vector.

As can be seen from the description of the above embodiment, the embodiment of the present application facilitates the subsequent finding of the relationship cluster closest to the target relationship vector by calculating the relationship cluster vector, thereby facilitating the subsequent embodiment of the deep relationship between entity vectors.

Fig. 3 is a schematic structural diagram of a map generating apparatus according to an embodiment of the present application. As shown in fig. 3, the map generating apparatus 300 includes: a receiving module 301, an acquiring module 302 and a generating module 303.

The receiving module 301 is configured to receive an image generation instruction sent by the control terminal.

And the acquiring module 302 is configured to acquire a triplet data set corresponding to the image generating instruction.

The generating module 303 is configured to input the triplet data set into a spectrum generating model, and obtain a knowledge spectrum output by the spectrum generating model, where the spectrum generating model is obtained by adopting complement spectrum training.

The device provided in this embodiment may be used to implement the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

In one possible implementation, the map generating apparatus 300 further includes: training module 304.

A training module 304, configured to obtain a training triplet data set, where the training triplet data set includes at least one triplet data, and the triplet data includes a head entity vector, a relationship vector, and a tail entity vector; inputting the triplet data into a path reasoning model to obtain each path vector from the target head entity vector to the target tail entity vector and the confidence corresponding to each path vector; if the confidence coefficient corresponding to any path vector is larger than a preset confidence coefficient threshold value, determining any path vector as a candidate vector; calculating the sum of the candidate vectors to obtain a target path vector; calculating a first score according to the target head entity vector, the target tail entity vector and the target path vector; determining a second score corresponding to the target head entity vector and the target tail entity vector by adopting a triplet data set; adding the first fraction and the second fraction to obtain a total fraction; if the total score is greater than a preset score threshold, determining the triplet data corresponding to the target head entity vector and the target tail entity vector as training triples; creating a negative case triplet by adopting a triplet data set; and performing model training by adopting at least one training triplet and negative example triplet to obtain a map generation model.

In a possible implementation manner, the training module 304 is configured to cluster all the relationship vectors in the triplet data set to obtain at least one relationship cluster, where the relationship cluster corresponds to the at least one relationship vector; calculating a relation cluster vector corresponding to the relation cluster; calculating a sub-relation vector corresponding to the triplet data in the triplet data set; clustering the sub-relationship vectors to obtain at least one sub-relationship cluster; calculating a sub-relationship cluster vector corresponding to the sub-relationship cluster; determining a comprehensive relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to a target head entity vector and a target tail entity vector; and determining a second score according to the target head entity vector, the comprehensive relation vector and the target tail entity vector.

In one possible implementation manner, the training module 304 is configured to determine a target relationship cluster vector and a target sub-relationship cluster vector corresponding to the target relationship vector according to the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector, where the target relationship vector is a relationship vector corresponding to a target head entity vector and a target tail entity vector; and adding the target relation vector, the target relation cluster vector and the target sub-relation cluster vector to obtain a comprehensive relation vector.

In one possible implementation, the training module 304 is configured to calculate a first distance between the target relationship vector and each relationship cluster vector; determining a relation cluster vector with the smallest corresponding first distance as a target relation cluster vector; calculating a second distance between the target relation vector and each sub-relation cluster vector; and determining the sub-relation cluster vector with the smallest corresponding second distance as a target sub-relation cluster vector.

In one possible implementation, the training module 304 is specifically configured to calculate an average value of the relationship vectors corresponding to the relationship clusters, to obtain the relationship cluster vector.

In order to achieve the above embodiment, the embodiment of the present application further provides an electronic device.

Referring to fig. 4, there is shown a schematic structural diagram of an electronic device 400 suitable for implementing an embodiment of the present application, where the electronic device 400 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a Personal Digital Assistant (PDA) or the like, a tablet computer (Portable Android Device) or the like, a Portable Multimedia Player (PMP) or the like, a car-mounted terminal (e.g., car navigation terminal) or the like, and a fixed terminal such as a digital TV or a desktop computer or the like. The electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.

As shown in fig. 4, the electronic device 400 may include a processor (e.g., a central processing unit, a graphics processor, etc.) 401, and a Memory 402 communicatively coupled to the processor, which may perform various suitable actions and processes according to a program stored in the Memory 402, a computer-executed instruction, or a program loaded from a storage 408 into a random access Memory (Random Access Memory, RAM) 403, implementing the … … method in any of the above embodiments, where the Memory may be a Read Only Memory (ROM). In the RAM403, various programs and data necessary for the operation of the electronic device 400 are also stored. The processing device 401, the memory 402, and the RAM403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

In general, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, a speaker, a vibrator, and the like; storage 408 including, for example, magnetic tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate with other devices wirelessly or by wire to exchange data. While fig. 4 shows an electronic device 400 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 409, or from storage 408, or from memory 402. The above-described functions defined in the method of the embodiment of the present application are performed when the computer program is executed by the processing means 401.

The computer readable storage medium of the present application may be a computer readable signal medium or a computer storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer-readable storage medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or may be connected to an external computer (e.g., through the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present application may be implemented in software or in hardware. Where the names of the units do not constitute a limitation on the module itself in some cases, for example, the training module may also be described as a "model training module".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

The present application also provides a computer readable storage medium, in which computer executable instructions are stored, when a processor executes the computer executable instructions, the technical scheme of the map generation method in any of the above embodiments is implemented, and the implementation principle and the beneficial effects are similar to those of the map generation method, and can be seen from the implementation principle and the beneficial effects of the map generation method, and will not be described herein.

In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The application also provides a computer program product, which comprises a computer program, when the computer program is executed by a processor, the technical scheme of the map generation method in any embodiment is realized, the realization principle and the beneficial effects of the method are similar to those of the map generation method, and the realization principle and the beneficial effects of the map generation method can be seen, and the detailed description is omitted.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present application is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A map generation method, comprising:

receiving an image generation instruction sent by a control terminal;

acquiring a triplet data set corresponding to the image generation instruction;

Inputting the triplet data set into a map generation model to obtain a knowledge map output by the map generation model, wherein the map generation model is obtained by adopting complement map training.

2. The method of claim 1, wherein the training with the complement atlas comprises:

Obtaining a training triplet data set, wherein the training triplet data set comprises at least one triplet data, and the triplet data comprises a head entity vector, a relation vector and a tail entity vector;

inputting the triplet data into a path reasoning model to obtain each path vector from a target head entity vector to a target tail entity vector and the confidence corresponding to each path vector;

if the confidence coefficient corresponding to any path vector is larger than a preset confidence coefficient threshold value, determining any path vector as a candidate vector;

Calculating the sum of the candidate vectors to obtain a target path vector;

Calculating a first score according to the target head entity vector, the target tail entity vector and the target path vector;

Determining a second score corresponding to the target head entity vector and the target tail entity vector by adopting the triplet data set;

adding the first fraction and the second fraction to obtain a total fraction;

if the total score is greater than a preset score threshold, determining the triplet data corresponding to the target head entity vector and the target tail entity vector as training triples;

Creating a negative case triplet by adopting the triplet data set;

And performing model training by adopting at least one training triplet and the negative example triplet to obtain the map generation model.

3. The method of claim 2, wherein determining the second score corresponding to the target head entity vector and the target tail entity vector using the triplet data set comprises:

Clustering all relation vectors in the triplet data set to obtain at least one relation cluster, wherein the relation cluster corresponds to at least one relation vector;

calculating a relation cluster vector corresponding to the relation cluster;

Calculating a sub-relation vector corresponding to the triplet data in the triplet data set;

Clustering the sub-relationship vectors to obtain at least one sub-relationship cluster;

Calculating a sub-relationship cluster vector corresponding to the sub-relationship cluster;

Determining a comprehensive relationship vector according to a target relationship vector, the relationship cluster vector and the sub-relationship cluster vector, wherein the target relationship vector is a relationship vector corresponding to the target head entity vector and the target tail entity vector;

And determining a second fraction according to the target head entity vector, the comprehensive relation vector and the target tail entity vector.

4. The method of claim 3, wherein the determining the composite relationship vector from the target relationship vector, the relationship cluster vector, and the sub-relationship cluster vector comprises:

determining a target relation cluster vector and a target sub-relation cluster vector corresponding to the target relation vector according to the target relation vector, the relation cluster vector and the sub-relation cluster vector, wherein the target relation vector is a relation vector corresponding to the target head entity vector and the target tail entity vector;

and adding the target relation vector, the target relation cluster vector and the target sub-relation cluster vector to obtain a comprehensive relation vector.

5. The method of claim 4, wherein determining the target relationship cluster vector and the target sub-relationship cluster vector corresponding to the target relationship vector based on the target relationship vector, the relationship cluster vector and the sub-relationship cluster vector comprises:

calculating a first distance between the target relation vector and each relation cluster vector;

determining a relation cluster vector with the smallest corresponding first distance as a target relation cluster vector;

calculating a second distance between the target relation vector and each sub-relation cluster vector;

and determining the sub-relation cluster vector with the smallest corresponding second distance as a target sub-relation cluster vector.

6. The method of claim 3, wherein the calculating a relationship cluster vector corresponding to the relationship cluster comprises:

and calculating the average value of the relation vectors corresponding to the relation clusters to obtain the relation cluster vectors.

7. A map generating apparatus, comprising:

the receiving module is used for receiving an image generation instruction sent by the control terminal;

the acquisition module is used for acquiring a triplet data set corresponding to the image generation instruction;

the generation module is used for inputting the triplet data set into a map generation model to obtain a knowledge map output by the map generation model, wherein the map generation model is obtained by adopting complement map training.

8. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

The processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 6.

9. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 6.