CN114610899A

CN114610899A - Representation learning method and system of knowledge graph

Info

Publication number: CN114610899A
Application number: CN202210222332.9A
Authority: CN
Inventors: 巢林林; 王太峰; 褚崴
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-10

Abstract

The present disclosure relates to the field of graph data processing, and in particular, to a method and a system for representation learning of a knowledge graph. The method comprises one or more rounds of iterative updating to obtain vector representations of nodes and/or edges in the knowledge-graph, wherein the one round of iterative updating comprises: obtaining one or more positive samples based on one or more triples in the knowledge-graph; obtaining one or more negative examples based on one or more triples not present in the knowledge-graph; determining a predicted value corresponding to each sample based on the vector representation in each sample; determining a loss function value, wherein the loss function value reflects the difference between a predicted value corresponding to each sample and a tag value thereof; and adjusting the first vector representation of the node in the sample, the vector representation of the edge and the elements in the dictionary matrix to reduce the difference between the predicted value corresponding to each sample and the tag value thereof.

Description

Representation learning method and system of knowledge graph

Technical Field

The present disclosure relates to the field of graph data processing, and in particular, to a method and a system for representation learning of a knowledge graph.

Background

In various scenarios, a large amount of data is stored in the form of graph data (i.e., including points and edges). The nodes in the graph data can be commodities, devices, users, or images, texts, audio data, and the edges represent the relations between the points. However, the graph data cannot be used as it is, and before further analysis or prediction work is performed, information in the graph data needs to be represented, and the represented quality determines the degree of restoration of the graph data.

Therefore, a need exists for an efficient representation of graph data.

Disclosure of Invention

One of the embodiments of the present specification provides a method for representation learning of a knowledge-graph, the method including performing one or more iterative updates to obtain vector representations of nodes and/or edges in the knowledge-graph, where an iterative update includes: obtaining one or more positive samples based on one or more triples in the knowledge-graph; obtaining one or more negative examples based on one or more triples not present in the knowledge-graph; wherein a triplet comprises two nodes and edges between the nodes, the positive and negative samples respectively comprise vector representations of the nodes in the corresponding triplet and vector representations of the edges, and the vector representations of the nodes are generated based on the first vector representation of the nodes and a dictionary matrix, so that the vector representations of the nodes can share the dictionary matrix; determining a predicted value corresponding to each sample based on the vector representation in each sample; and adjusting the first vector representation of the node in the sample, the vector representation of the edge and the elements in the dictionary matrix to reduce the difference between the predicted value corresponding to each sample and the tag value thereof.

One of the embodiments of the present specification provides a representation learning system of a knowledge-graph, the system being configured to perform one or more iterative updates to obtain vector representations of nodes and/or edges in the knowledge-graph, wherein in one iterative update: a positive sample acquisition module for acquiring one or more positive samples based on the one or more triples in the knowledge-graph; the negative sample acquisition module is used for acquiring one or more negative samples based on one or more triples which do not exist in the knowledge graph; wherein a triplet comprises two nodes and edges between the nodes, the positive and negative samples respectively comprise vector representations of the nodes in the corresponding triplet and vector representations of the edges, and the vector representations of the nodes are generated based on the first vector representation of the nodes and a dictionary matrix, so that the vector representations of the nodes can share the dictionary matrix; the predicted value determining module is used for determining the predicted value corresponding to each sample based on the vector representation in each sample; and the adjusting module is used for adjusting the first vector representation of the node in the sample, the vector representation of the edge and the element in the dictionary matrix so as to reduce the difference between the predicted value corresponding to each sample and the label value thereof.

One of the embodiments of the present specification provides a representation learning apparatus of a knowledge graph, which includes a processor, and is characterized in that the processor is configured to execute the above representation learning method of the knowledge graph.

One of the embodiments of the present specification provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions in the storage medium are read by a computer, the computer executes the representation learning method of the knowledge graph.

Drawings

The present description will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic illustration of graph data shown in accordance with some embodiments herein;

FIG. 2 is an exemplary flow diagram illustrating a round of iterative updating in a representation learning method of a knowledge-graph in accordance with some embodiments of the present description;

FIG. 3 is a modular schematic diagram of a representation learning system of a knowledge graph according to some embodiments of the present description.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only examples or embodiments of the present description, and that for a person skilled in the art, the present description can also be applied to other similar scenarios on the basis of these drawings without inventive effort. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "apparatus", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used in this description to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

In various scenarios, there are various entities (e.g., companies, cities, users, devices, goods, etc.) or information items (e.g., user social accounts, images, text or audio data), etc. that exist. In some embodiments, the entity or information item can be from a source including, but not limited to, a financial industry, an insurance industry, an internet industry, an automotive industry, a catering industry, a telecommunications industry, an energy industry, an entertainment industry, a sports industry, a logistics industry, a medical industry, a security industry, and the like. In some embodiments, entities or information items may have multiple attributes or characteristics and associations between entities or information items, and thus graph data, such as a knowledge graph, may be constructed based on multiple entities or information items and associations between entities or information items. The graph data may include a plurality of nodes and edges, wherein a node may correspond to the aforementioned entity or information item, an attribute or feature of an entity or information item may be regarded as an attribute or feature of a corresponding node, and an edge between nodes may represent an association between two corresponding entities or information items. Depending on the type of entity or information item, the graph data may be a social graph (where nodes correspond to users and edges between nodes represent relationships between users), a device network graph (where nodes correspond to network devices and edges between nodes represent communication relationships between devices), a transfer graph (where nodes correspond to user accounts and edges between nodes represent fund flow relationships between users), and so on.

FIG. 1 is a schematic illustration of graph data shown in accordance with some embodiments of the present description.

The social graph 100 in fig. 1 is an example, and the graph data is shown as the social graph 100, wherein the nodes include zhang three, lie four, wang five and prune.

In some embodiments, edges may be determined based on relationships between nodes, and an edge may be a representation of a relationship or a specific relationship. Illustratively, continuing with the social graph 100 as an example, such as Zhang three, Li four, and Wang five are coworkers, so edges representing coworkers relationships can be established between Zhang three, Li four, and Wang five, two by two; the four plums and the small plums are in a parent-child relationship, so that edges representing the parent-child relationship are established between the four plums and the small plums. In some embodiments, an edge may also have a direction to represent a one-way relationship of a node that the arrow is far away (source node) to a node that is close to the arrow (target node). Continuing with the foregoing example, the edge between lie four and the duplet may be the edge of lie four pointing to the duplet to indicate that lie four is the father of the duplet. In some embodiments, the edges may be processed numerically, for example, when the edges are used to indicate the existence of the relationship, the form of the edges may include 0 or 1, and in some embodiments, the edges may also be a value between 0 and 1 to indicate the strength of the relationship. In some embodiments, when edges are used to represent a particular relationship, parent-child relationships may be represented by a "1" and co-worker relationships may be represented by a "2". In some embodiments, the type of the edge may be any other encoding or representation capable of converting a non-numerical value into a numerical value, and is not limited in this specification.

In some embodiments, the nodes may also include a plurality of attributes or characteristics, continuing with the social graph 100 as an example, such as lie four nodes may include attributes or characteristic information such as age, job title, credit information, or company name; prune nodes may include attribute or characteristic information such as age, school name, etc. In some embodiments, attributes or characteristics of a node may be numerically processed, for example. If "0" is used to indicate credit good, "1" indicates credit risk, etc. In some embodiments, the way of quantifying attributes and features is similar to the type of edges, and may be any other encoding or representing way capable of converting non-numeric values into numeric values, and is not limited in this specification.

Nodes and/or edges in the graph data are encoded or represented for further processing of the graph data by the computing device, a process also referred to as vectorized representation or embedded (embedding) representation of the graph data. For example, the initialized embedded vectors of the nodes and the initialized embedded vectors of the edges may be processed by a machine learning model (such as a graph convolution neural network GCN, etc.), a loss function is constructed by a set prediction task, parameters of the model are adjusted based on the loss function, and then the initialized embedded vectors of the nodes and the initialized embedded vectors of the edges are processed by the model based on the adjusted model parameters, so as to obtain vector representations of the trained or learned nodes and edges. The vector representation of the nodes and edges of the graph data obtained by representation learning can reflect the original information of the graph data to a certain extent, and can be used as the representative of the nodes or edges to participate in the subsequent graph data calculation.

In some embodiments, a larger number of model parameters may result from a larger amount of data in the graph data. Generally, the number of elements represented by vectors corresponding to nodes in the model is the product of the number of nodes and the dimensionality represented by the vectors of the nodes, and the dimensionality represented by the vectors of the nodes is positively correlated with the number of the nodes or the scale of the graph data. Therefore, it can be seen that, when the number of nodes is large, vector representation elements with huge data volume need to be trained or learned, which results in a bottleneck of computing resources or storage resources.

In view of the above, in some embodiments, by introducing a dictionary learning method, a representation learning method of a map is provided, which can effectively reduce the amount of parameters and ensure the representation quality.

Some embodiments of the present disclosure provide a method for learning representation of a knowledge graph, which is accompanied by one or more rounds of iterative updates on vector representations of nodes and/or edges to obtain vector representations of nodes and edges in the knowledge graph, where it can be understood that, as differences between elements of the vector representations in the iterations become smaller and smaller, the vector representation results of the nodes and the edges in the knowledge graph also tend to be stable or converge. Accordingly, some embodiments of the present description provide methods of representation learning of a knowledge-graph that include one or more iterative updates. It should be noted that the term "vector" in this specification is interchangeable with "vector".

FIG. 2 is an exemplary flow diagram illustrating a round of iterative updating in a representation learning method of a knowledge-graph according to some embodiments of the present description.

In some embodiments, the process 200 of one of the one or more iterative updates may include the steps of:

step 210, one or more positive samples are obtained based on one or more triples in the knowledge-graph. In some embodiments, step 210 may be performed by the positive sample acquisition module 310.

In some embodiments, a triplet includes two nodes and an edge between the nodes. In some embodiments, the triples may be extracted in the knowledge-graph by the positive sample acquisition module 310. The triple corresponding to the positive sample actually exists in the knowledge graph, and specifically, the combination relationship of two nodes, edges and the three of the triple corresponding to the positive sample actually exists in the knowledge graph. Taking the social graph in fig. 1 as an example, the triplet may be (lie four, father-son, prune).

In some embodiments, the positive samples obtained based on the triples include vector representations of nodes in the corresponding triples and vector representations of edges. Illustratively, a positive sample (h, r, t) may be obtained by representing the vector representation of two nodes in a triplet by h and t, respectively, and representing the vector representation of an edge between the nodes by r.

In some embodiments, the vector representations of the nodes are generated based on the first vector representation of the nodes and a dictionary matrix, such that the dictionary matrix is shared by the vector representations of the nodes. Any matrix can be regarded as comprising a plurality of row vectors, and the row vectors in the dictionary matrix can be regarded as basic vectors forming the dictionary matrix, or called atom vectors. In some embodiments, each node has a corresponding first vector representation, and the first vector representation of the node is operated on the dictionary matrix, such as matrix multiplication, which is equivalent to linearly combining the atom vectors in the dictionary matrix by using the first vector representation of the node, thereby obtaining the vector representation of the node. By the aid of the dictionary matrix, vector representations of all nodes can share a part of parameters, and accordingly the quantity of the parameters is effectively reduced while the quantity of information is not reduced. In some embodiments, the length or dimension of the atom vector in the dictionary matrix may be a, and the number of atom vectors in the dictionary matrix is d, in this case, the dimension of the dictionary matrix is d × a. In some embodiments, the vector representation of all nodes can be expressed by a first vector of the node and the dictionary matrix, the length of the first vector of the node can be set to d, i.e. 1 × d, and the vector representation dimension of the node is 1 × a since the dimension of the dictionary matrix is d × a.

In some embodiments, the number d of atom vectors in the dictionary matrix and the dimension a of the atom vectors may be preset, for example, preset before one or more iterations. Generally, the larger the number of atom vectors, the larger the length of the atom vectors, the more information the vector representation of the node can carry, the more accurate the representation learning, but at the same time, the increased number of elements increases the calculation or storage cost. In some embodiments, the result of representation learning may be evaluated, and the number d of atom vectors and/or the length of atom vectors in the dictionary matrix may be adjusted according to the evaluation result to find a better solution, so as to reduce the scale of elements or parameters as much as possible while ensuring the representation quality. Specific reference may be made to the following description.

In some embodiments, the vector representations of the nodes and the vector representations of the edges may be derived based on the first vector representation of the nodes, the vector representations of the edges, and the dictionary matrix derived from a previous iteration. In some embodiments, when the current iteration update is the first iteration update (i.e., the first iteration update), the first vector representation of the node, the vector representation of the edge, and the dictionary matrix may be obtained by random initialization.

In some embodiments, when more edges exist in the knowledge-graph at the same time, the edge vector representation may also be generated based on the first vector of the edge and the dictionary matrix to which the edge corresponds. The specific implementation is similar to the vector representation of the node, and is not described herein again. The dictionary matrix corresponding to the node and the dictionary matrix corresponding to the edge may be different or the same.

Step 220, one or more negative examples are obtained based on one or more triples that do not exist in the knowledge-graph. In some embodiments, step 220 may be performed by negative example acquisition module 320.

In some embodiments, a negative example is also determined based on one or more triples, the negative example comprising a vector representation of a node and a vector representation of an edge in a corresponding triplet. Unlike positive exemplars, the triples used to determine negative exemplars are triples that are not present in the knowledge-graph. In some embodiments, a triplet that is not present in the knowledge-graph may refer to at least some of the elements in the triplet that are not present in the knowledge-graph. Illustratively, the node h in the triplet (h, r, t) in step 210 may be replaced by h 'or the node t may be replaced by t', and in some embodiments, the relationship r between two nodes may be replaced by r ', wherein h', t 'and r' may be nodes or edges that do not exist in the knowledge-graph. It should be noted that, in some embodiments, one, two, or all of the elements in the triplet (h, r, t) may be replaced to obtain a triplet for obtaining a negative sample, for example, the obtained triplet is (h ', r, t').

In some embodiments, a triplet that is not present in the knowledge-graph may mean that both the node and the edge in the triplet are from the knowledge-graph, but the combination of the three is not in the knowledge-graph. Continuing with the example of the triplet (h ', r, t'), where h 'and t' are different from the original nodes h and t, but both are from the same knowledge map, if the triplet used to obtain the positive sample is from the social graph in fig. 1, the triplet (h, r, t) is (lie four, father, plum), and then the triplet (h ', r, t') used to obtain the negative sample may be (zhang three, father child, wang five). In some embodiments, by restricting the elements of the triples corresponding to negative examples in the knowledge-graph, the range of nodes and edges can be narrowed, thereby speeding up the convergence of parameters or element values.

Step 230, determining a prediction value corresponding to each sample based on the vector representation in each sample. In some embodiments, step 230 may be performed by the predictive value determination module 330.

In some embodiments, the prediction value corresponding to each sample may represent a probability that the triplet corresponding to the sample is true in the current round.

In some embodiments, for any sample, the vector representation of each node in the sample and the vector representation of the edge may be processed by a representation learning algorithm to obtain a score of the sample; the score reflects the probability that the sample is a positive sample. For example, the smaller the score, the larger the predictive value, and the greater the probability of reflecting that the sample is a positive sample. In some embodiments, the representation learning algorithm may be a TransE algorithm or a TransH algorithm, or the like. As an example, the representation learning algorithm may specifically include: summing the vector representation of the first node in the sample with the vector representation of the edge to obtain a sum vector; calculating a similarity of the sum vector to a vector representation of the second node, the score being determined based on the similarity. Continuing with the triplet (h, r, t) as an example, summing the vector representations of the node h and the edge r to obtain a sum vector h + r, calculating the similarity between the sum vector h + r and t, and determining the score or the predicted value based on the similarity. For example, the greater the similarity, the smaller the score or the greater the predictive value, the greater the likelihood of reflecting that the sample is a positive sample. In some embodiments, the similarity between the sum vector and the second node vector may be obtained by calculating the distance between the two vectors. Wherein the distance is inversely related to the similarity, i.e. the greater the distance, the smaller the similarity. In some embodiments, the distance may include, but is not limited to, a cosine distance, an Euclidean distance, a Manhattan distance, a Mahalanobis distance, or a Minkowski distance, among others. In some other embodiments, the similarity may also be obtained in other manners, the two vectors may be subtracted in bit to obtain a difference vector, and the similarity is characterized by a norm of the difference vector, where the similarity is negatively correlated with the norm, and the larger the range is, the smaller the similarity is.

In some embodiments, by f_r(. cndot.) represents a function representing the calculated score of the learning algorithm. Illustratively, the score f of the triplet (h, r, t)_r(. h) can be expressed as h + r-t | which represents a 1 norm with the smaller the score, the greater the probability that the triplet is true.

In some embodiments, the first node may be a source node and the second node may be a destination node, with the edges having directions as described in fig. 1.

Step 240, adjusting the first vector representation of the node in the sample, the vector representation of the edge, and the element in the dictionary matrix to reduce the difference between the predicted value corresponding to each sample and the tag value thereof. In some embodiments, step 240 may be performed by the adjustment module.

In some embodiments, step 240 may further comprise determining a loss function value.

In some embodiments, the label value for a positive sample may be 1 and the label value for a negative sample may be 0, and for example, the loss function value may be expressed as

T represents a knowledge graph, (h, r, T) epsilon T represents a positive sample,

which represents a negative sample of the sample to be tested,

indicating the predicted value.

With reference to the foregoing example, f_r(. cndot.) represents a function representing the score calculated by the learning algorithm, and the smaller the score, the larger the predicted value, the larger the probability of reflecting that sample as a positive sample, and therefore,the loss function value can be simply expressed as

It and

and equivalently, the difference between the predicted value corresponding to the sample and the tag value thereof can be reflected.

In some embodiments, the loss function value includes a first part reflecting a difference between a predicted value and a tag value of each sample, which can be implemented by the above formula, and a second part representing a regularization constraint calculated based on the first vector representation of the node, which is an optional part.

In some embodiments, the second part of the loss function values reflects a sum value of elements of the first vector representation of nodes in the samples. Illustratively, the second part of the loss function value may be expressed as

Where { h, t } represents the set of nodes in the positive and negative samples used by the current round of iteration,

the ith bit element of the first vector representation of node k, a is the number of elements.

Based on the first and second parts of the loss function value described above, in some embodiments, the loss function value L may be expressed as:

wherein gamma is an interval parameter used for adjusting the correction interval between the positive sample and the negative sample; a is a weight parameter representing the effect of the second part of the loss function value on the overall loss function value, the interval parameter and the weight parameter both being constant, and in some embodiments the interval parameter and/or the weight parameter may not be provided.

In some embodiments, reducing the difference between the predicted value corresponding to each sample and its tag value may be adjusting the first vector representation of the node in the sample, the vector representation of the edge, and the elements in the dictionary matrix to reduce the loss function value as described above. In some embodiments, the loss function value may be minimized by a gradient descent method. Minimizing the first part of the loss function values may make the vector representation of the nodes and/or edges more accurate and minimizing the second part may prevent overfitting, e.g. the vector representation of the nodes may be made as little as possible from the dictionary matrix, so that the node vector representation is represented by atom vectors in the dictionary matrix that are as relevant as possible.

In some embodiments, after the current iteration update is completed, the current iteration round number may be obtained, and whether the current iteration round number is smaller than a set threshold is determined, if so, it is determined that the next iteration update is performed, and meanwhile, the parameter of the current iteration round is used as the parameter of the next iteration update. The set threshold may be a preset positive integer to represent an upper limit of the iteration number (for example, the set threshold may be any value such as 5, 10, 100, etc.). When the number of iteration rounds is not less than the set threshold, it can be determined to terminate the iteration, and the vector representation of the current node and the vector representation of the edge are used as the vector representations of the node and the edge in the final knowledge graph.

In some embodiments, after the current iteration is updated, it may be further determined whether a current loss function value or a difference between the loss function values of two adjacent iterations is smaller than a preset threshold or not, when the loss function value is smaller than the preset threshold or the difference between the loss function values of two adjacent iterations is smaller than the preset threshold, it may be determined to terminate the iterations, and the vector representation of the current node and the vector representation of the edge may be used as the vector representations of the node and the edge in the final knowledge graph.

In some embodiments, since the number of elements in the dictionary matrix, such as the number d of atom vectors and/or the dimension a of atom vectors, is preset before training, and the value thereof may affect the vector representation quality and parameter quantity of the nodes and edges, in some embodiments, the process 200 may further include:

and step 250, adjusting the number of elements in the dictionary matrix, and performing one or more rounds of iterative updating again to obtain another set of vector representations of the nodes and/or edges in the knowledge graph.

Taking the adjustment of the number of atom vectors as an example, in some embodiments, after the vector representations of the nodes and the edges in the final knowledge graph are obtained according to the number d of atom vectors in the dictionary matrix preset in the current one or more rounds of iterative updating, the expression effects of the nodes and the edges are determined through experiments or evaluation (such as a Mean Reciprocal Rank (MRR) test), the value of the number d of atom vectors in the dictionary matrix is adjusted according to the effects, and then the flow 200 is executed again according to the number d of atom vectors in the new dictionary matrix for a new round or more of training, so as to obtain the vector representations of the nodes and the edges in the new knowledge graph and perform experiments or evaluation again. The method is circulated once or for multiple times until the number d of the atom vectors in the optimal dictionary matrix is selected, so that the vector representation quality of the nodes and edges in the knowledge graph can be ensured, and the parameter number is effectively reduced.

It should be noted that the above description related to the flow 200 is only for illustration and description, and does not limit the applicable scope of the present specification. Various modifications and alterations to flow 200 will be apparent to those skilled in the art in light of this description. However, such modifications and variations are intended to be within the scope of the present description.

In some embodiments, the representation learning system 300 of the knowledge-graph includes a positive sample acquisition module 310, a negative sample acquisition module 320, a predictor determination module 330, and an adjustment module 340. These modules may also be implemented as an application or a set of instructions that are read and executed by a processing engine. Further, a module may be any combination of hardware circuitry and applications/instructions. For example, a module may be part of a processor when a processing engine or processor executes an application/set of instructions.

A positive sample obtaining module 310 for obtaining one or more positive samples based on the one or more triples in the knowledge-graph; wherein a triplet comprises two nodes and an edge between the nodes, and the positive sample comprises a vector representation of a node in the corresponding triplet and a vector representation of an edge.

Further description of the positive samples can be found elsewhere in this specification (e.g., in step 210 and its related description), and will not be repeated herein.

The negative examples obtaining module 320 is configured to obtain one or more negative examples based on one or more triples that do not exist in the knowledge-graph.

More details about the negative examples can be found elsewhere in this specification (e.g., in step 220 and its related description), and are not repeated herein.

The predictor determining module 330 is configured to determine a predictor corresponding to each sample based on the vector representation in each sample.

Further description of the predicted values can be found elsewhere in this specification (e.g., in step 230 and its related description), and will not be described herein.

The adjusting module 340 is configured to adjust the first vector representation of the node in the sample, the vector representation of the edge, and the element in the dictionary matrix to reduce the difference between the predicted value corresponding to each sample and the tag value thereof.

More details about the adjustment vector representations and elements can be found elsewhere in this specification (e.g., in step 240 and its related description), and are not repeated herein.

It should be understood that the system and its modules shown in FIG. 3 may be implemented in a variety of ways. For example, in some embodiments, an apparatus and its modules may be implemented by hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may then be stored in a memory for execution by a suitable instruction execution device, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and apparatus described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided for example on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware) or a data carrier such as an optical or electronic signal carrier. The apparatus and modules thereof in this specification may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above description of the knowledge graph representation learning system and the modules thereof is for convenience of description only, and the description is not limited to the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, for example, the positive sample acquisition module 310 and the negative sample acquisition module 320 disclosed in fig. 3 may be different modules in a system, or may be a module that implements the functions of two or more modules described above. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present disclosure.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The beneficial effects that may be brought by the embodiments of the present description include, but are not limited to: (1) by introducing the dictionary matrix, the parameter quantity in the training process is reduced, and the requirements on computing resources and storage resources are reduced; (2) the same dictionary matrix is used for representing each node, so that the nodes with fewer related edges in the graph can share part of parameters with other nodes, and better vector representation quality is obtained.

It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, though not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of representation learning of a knowledge-graph, the method comprising performing one or more iterative updates to obtain vector representations of nodes and/or edges in the knowledge-graph, wherein an iterative update comprises:

obtaining one or more positive samples based on one or more triples in the knowledge-graph; obtaining one or more negative examples based on one or more triples that do not exist in the knowledge-graph; wherein a triplet comprises two nodes and edges between the nodes, the positive and negative samples respectively comprise vector representations of the nodes in the corresponding triplet and vector representations of the edges, and the vector representations of the nodes are generated based on the first vector representation of the nodes and a dictionary matrix, so that the vector representations of the nodes can share the dictionary matrix;

determining a predicted value corresponding to each sample based on the vector representation in each sample;

and adjusting the first vector representation of the nodes in the samples, the vector representation of the edges and the elements in the dictionary matrix to reduce the difference between the predicted value corresponding to each sample and the label value thereof.

2. The method of claim 1, wherein one or more triples not present in the knowledge-graph means that nodes and edges in a triplet are from the knowledge-graph, but the combination of the triples is not in the knowledge-graph.

3. The method of claim 1, wherein adjusting the first vector representation of the nodes, the vector representations of the edges, and the elements in the dictionary matrix in the samples to reduce the difference between the predicted values corresponding to each sample and its label value comprises:

determining a loss function value; the loss function value comprises a first part and a second part, wherein the first part reflects the difference between the predicted value corresponding to each sample and the label value thereof, and the second part represents the regularization constraint obtained by calculation based on the first vector of the node;

adjusting the first vector representation of the nodes in the sample, the vector representation of the edges, and the elements in the dictionary matrix to reduce the loss function values.

4. The method of claim 3, the second portion reflecting a sum of values of elements of the first vector representation of nodes in each sample.

5. The method of claim 1 or 4, the determining a predictor corresponding to each sample based on the vector representation in each sample, comprising, for any sample:

processing the vector representation of each node and the vector representation of the edge in the sample by using a representation learning algorithm to obtain the score of the sample; the smaller the score, the greater the probability of reflecting that sample as a positive sample.

6. The method of claim 5, the representation learning algorithm comprising:

summing the vector representation of the first node in the sample with the vector representation of the edge to obtain a sum vector;

calculating the similarity of the sum vector and the vector representation of the second node, and determining the score based on the similarity.

7. The method of claim 6, wherein the score positively correlates with a distance between the sum vector and a vector representation of a second node; the first portion is the score sum of each positive sample minus the score sum of each negative sample.

8. The method of claim 6, wherein the first node is a source node and the second node is a destination node.

9. The method of claim 1, further comprising adjusting a number of elements in the dictionary matrix and repeating the one or more iterative updates to obtain another set of vector representations of nodes and/or edges in the knowledge-graph.

10. The method of claim 1, wherein when the one iteration update is the first iteration update, the first vector representation of the node, the vector representation of the edge, and the dictionary matrix are all obtained by random initialization.

11. A representation learning system for a knowledge-graph, the system being configured to perform one or more iterative updates to obtain vector representations of nodes and/or edges in the knowledge-graph, wherein in one of the iterative updates:

a positive sample obtaining module for obtaining one or more positive samples based on the one or more triples in the knowledge-graph;

the negative sample acquisition module is used for acquiring one or more negative samples based on one or more triples which do not exist in the knowledge graph;

wherein a triplet comprises two nodes and edges between the nodes, the positive and negative samples respectively comprise vector representations of the nodes in the corresponding triplet and vector representations of the edges, and the vector representations of the nodes are generated based on the first vector representation of the nodes and a dictionary matrix, so that the vector representations of the nodes can share the dictionary matrix;

the predicted value determining module is used for determining the predicted value corresponding to each sample based on the vector representation in each sample;

and the adjusting module is used for adjusting the first vector representation of the node in the sample, the vector representation of the edge and the element in the dictionary matrix so as to reduce the difference between the predicted value corresponding to each sample and the label value thereof.

12. A knowledge-graph representation learning apparatus comprising a processor, wherein the processor is configured to perform the knowledge-graph representation learning method according to any one of claims 1 to 10.

13. A computer readable storage medium storing computer instructions which, when read by a computer, cause the computer to perform a method of representation learning of a knowledge-graph as claimed in any one of claims 1 to 10.