CN114860886A

CN114860886A - Method for generating relation graph and method and device for determining matching relation

Info

Publication number: CN114860886A
Application number: CN202210583988.3A
Authority: CN
Inventors: 张正东; 陈俊; 代小亚; 黄海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-05
Anticipated expiration: 2042-05-25
Also published as: CN114860886B

Abstract

The disclosure provides a method for generating a relation graph and a method and a device for determining a matching relation, and relates to the field of artificial intelligence, in particular to the fields of deep learning, intelligent medical treatment and knowledge maps. The specific implementation scheme of the generation method is as follows: for the first class of objects and the second class of objects, determining similar information for each class of objects according to a predetermined association relationship among a plurality of objects belonging to each class of objects, the similar information indicating a similarity between the plurality of objects; determining association information between the first class objects and the second class objects according to a preset association relation between a plurality of first objects belonging to the first class objects and a plurality of second objects belonging to the second class objects; determining initial adjacency information according to the similar information aiming at the first class of objects, the similar information aiming at the second class of objects and the associated information; determining initial embedded information according to the associated information; and generating a relational graph by adopting a graph neural network according to the initial adjacent information and the initial embedded information.

Description

Method for generating relation graph and method and device for determining matching relation

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of deep learning, intelligent medical treatment and knowledge maps, and more particularly, to a method for generating a relationship graph and a method and an apparatus for determining a matching relationship.

Background

With the development of computer technology and network technology, deep learning technology has been widely used in many fields. For example, deep learning techniques may be employed to extract text features, to predict associations between objects, and so on. In the prediction of the association between the objects, the accuracy of the prediction result depends on the completeness of the known knowledge.

Disclosure of Invention

The present disclosure is directed to a method for generating a relationship graph, a method for determining a matching relationship, an apparatus, an electronic device, and a storage medium, and aims to complement an existing relationship and improve accuracy of a matching result.

According to an aspect of the present disclosure, there is provided a method of generating a relationship graph, including: for each class of objects in a first class of objects and a second class of objects, determining similarity information for each class of objects according to a predetermined association relationship among a plurality of objects belonging to the each class of objects, the similarity information indicating a similarity degree of the plurality of objects to each other; determining association information between a plurality of first objects belonging to the first class of objects and a plurality of second objects belonging to the second class of objects according to a predetermined association relationship between the first objects and the second objects; determining initial adjacency information aiming at a relation graph according to the similar information aiming at the first class of objects, the similar information aiming at the second class of objects and the association information; determining initial embedding information aiming at the relation graph according to the association information; and generating the relation graph by adopting a graph neural network according to the initial adjacency information and the initial embedded information, wherein the relation graph indicates the incidence relation between any two objects in an object set consisting of the plurality of first objects and the plurality of second objects.

According to an aspect of the present disclosure, there is provided a method of determining a matching relationship, including: aiming at a first object in an object pair to be matched, inquiring a preset relation graph according to the first object to obtain a first target object associated with the first object; aiming at a second object in the object pair, determining a second target object associated with the second object according to the association relation of a plurality of objects in an object class to which the second object belongs; for each of the first object and the second object, determining feature information for the each object according to a target object of the each object; determining a matching relationship between the first object and the second object according to the characteristic information of the first object and the characteristic information of the second object, wherein the object class to which the first target object belongs comprises the object class to which the second object belongs; the predetermined relationship diagram is a relationship diagram generated by the relationship diagram method provided by the present disclosure.

According to an aspect of the present disclosure, there is provided an apparatus for generating a relationship graph, including: the similarity determination module is used for determining similarity information aiming at each class of objects according to a preset incidence relation among a plurality of objects belonging to each class of objects aiming at each class of objects in the first class of objects and the second class of objects, and the similarity information indicates the similarity among the plurality of objects; the association information determining module is used for determining association information between the first class objects and the second class objects according to a preset association relation between a plurality of first objects belonging to the first class objects and a plurality of second objects belonging to the second class objects; the adjacency information determining module is used for determining initial adjacency information aiming at the relation graph according to the similar information aiming at the first class of objects, the similar information aiming at the second class of objects and the association information; the embedded information determining module is used for determining initial embedded information aiming at the relation graph according to the associated information; the graph generating module is used for generating the relational graph by adopting a graph neural network according to the initial adjacent information and the initial embedded information; wherein the relationship graph indicates an association relationship between any two objects in an object set composed of the plurality of first objects and the plurality of second objects.

According to an aspect of the present disclosure, there is provided an apparatus for determining a matching relationship, including: the graph query module is used for querying a preset relation graph according to a first object in an object pair to be matched to obtain a first target object associated with the first object; the object determination module is used for determining a second target object associated with a second object in the object pair according to the association relation of a plurality of objects in an object class to which the second object belongs; a feature determination module configured to determine, for each of the first object and the second object, feature information for the each object according to a target object of the each object; the relationship determination module is used for determining a matching relationship between the first object and the second object according to the characteristic information aiming at the first object and the characteristic information aiming at the second object, wherein the object class to which the first target object belongs comprises an object class to which the second object belongs; the predetermined relationship diagram is a relationship diagram generated by the relationship diagram generating device provided by the present disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a relationship graph or a method of determining a matching relationship provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method of generating a relationship graph or a method of determining a matching relationship provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of generating a relationship graph or the method of determining matching relationships provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of a method for generating a relationship diagram and a method and an apparatus for determining a matching relationship according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a method of generating a relationship diagram in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating the principle of a method of determining similarity between two objects according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a principle of generating a relationship graph according to an embodiment of the present disclosure;

FIG. 5 is a flow chart diagram of a method of determining a match relationship in accordance with an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a method of determining a match relationship according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a method of determining a match relationship according to another embodiment of the present disclosure;

FIG. 8 is a block diagram of an apparatus for generating a relationship graph according to an embodiment of the present disclosure;

FIG. 9 is a block diagram of an apparatus for determining a matching relationship according to an embodiment of the present disclosure; and

fig. 10 is a block diagram of an electronic device used to implement a method of generating a relationship graph or a method of determining a matching relationship of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Manual detection can be typically employed to determine whether two objects match. For example, in the medical field, whether medicines in a prescription are matched with diagnosed diseases can be determined by means of manual detection, so that the medication reasonableness is improved, and disputes in the medical field are reduced. The manual detection mode has the problems of low detection efficiency and easily subjectively influenced detection results.

With the development of artificial intelligence technology, big data mining technology and deep learning technology show remarkable results in various fields. For example, deep learning techniques and big data mining techniques can be applied to the field of intelligent medicine for predicting the relationship between drugs and diagnostic results. It is to be understood that the above-mentioned intelligent medical field is only an example, and for example, a deep learning technique and a big data mining technique can also be applied to any field for predicting a relationship between a first object belonging to a first class and a second object belonging to a second class of two classes of objects having a relationship in the field. The object of the first category may be an actor, and the object of the second category may be a movie or a television play, for example. For convenience of understanding of the present disclosure, the method of relationship prediction will be described in detail below by taking the medical field as an example.

In one embodiment, the relationship may be predicted by means of knowledge retrieval. For example, knowledge extraction may be performed using machine-learned knowledge extraction techniques based on authoritative medical books, drug specifications, published and desensitized diagnostic information, published and desensitized order data, and the like, and a knowledge base may be constructed based on the extracted knowledge. Subsequently, a knowledge base is searched to determine whether there is a matching relationship between the drug and the diagnosed disease. The method can avoid the interference of artificial subjective judgment and can ensure the accuracy of the prediction result. But the implementation of the method depends on the completeness of the knowledge source and the sufficiency of the knowledge extraction strategy. When the medicine name or the disease name of which the relation needs to be predicted does not exist in the knowledge base, the relation cannot be effectively predicted. Moreover, the method has higher requirements on the authority of knowledge relied on when the knowledge base is constructed.

In one embodiment, an end-to-end implicit model can be employed to predict the relationship between drugs and diseases in the diagnosis. During training, medicines and diseases with matching relations can be extracted from the Torwi book, and a text pair consisting of medicine nouns and disease nouns is used as a positive sample. Meanwhile, for the medicine and the disease which have no matching relationship, the text pair consisting of the medicine name and the disease name can be used as a negative sample. Training an end-to-end implicit model by adopting a training set formed by a positive sample and a negative sample, and taking the trained model as a medicine-disease relation prediction model by continuously optimizing loss. In prediction, the name of the drug and the name of the disease to be predicted are input into the relational prediction model, and the relational prediction model can output the probability of matching the drug and the disease. The end-to-end implicit model may include DNN, CNN, LSTM, or Bidirectional Encoder characterization model (BERT) from Transformer, for example. The method can encode the semantic features of the drug name and the disease name in text form. The training precision of the model depends on the quantity and precision of training data, and the method has no good interpretability.

In an embodiment, a label system can be established, the characteristics of the entity words are subjected to labeling processing by constructing an entity word label mapping algorithm, and finally whether a matching relationship exists between the two objects is judged according to the labels. The method can understand the semantic features of the text and has strong interpretability. However, the coverage of the label features in the label system is affected by the label system, and this method may not obtain valid label features from the names of the representation objects, which may limit the prediction of the matching relationship.

Text matching is an important fundamental technology in the field of natural language processing. Many practical problems in the field of natural language processing can be abstracted to the task of text matching. For example, a web search may be abstracted as a text matching problem between a search query and web page content. The present disclosure is directed to implementing relationship prediction between objects based on the idea of text matching to improve the generalization ability and interpretability of relationship prediction techniques and to fully understand semantic features of entity words representing objects.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic application scenario diagram of a method for generating a relationship diagram, a method for determining a matching relationship, and an apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.

The electronic device 110 may process the input first object name 121 and second object name 122, for example. For example, the electronic device 110 may query the knowledge base according to the name of each object, obtain the association information of each object, and predict the matching relationship 130 between the first object and the second object according to the association information of the two objects. The match relationship may include a match and a mismatch.

In one embodiment, a knowledge base may be constructed based on the association relationship between the objects extracted from the waykipedia and the like, so that the electronic device 110 can perform a knowledge query when predicting the relationship between the first object and the second object. For example, a relationship graph G may be constructed from the extracted relationship, and the relationship graph G may be expressed as G ═ V, E. Wherein, the elements in V are vertexes (vertex), one vertex indicates one object, the elements in E are edges (edge) connecting two vertexes, and one edge indicates the incidence relation between the two objects indicated by the two connected vertexes.

In an embodiment, as shown in fig. 1, the application scenario 100 may further include a server 150, and the server 150 may be, for example, a background management server supporting the running of the client application in the electronic device 110. The electronic device 110 may be communicatively coupled to the server 150 via a network, which may include wired or wireless communication links.

For example, the server 150 may complement the object relationship extracted from the existing knowledge such as the wakebook using the graph neural network model, and generate the relationship diagram 140 based on the result of the complementation. The server 150 may also query the relationship graph 140 in response to a request of the electronic device 110, and feed back a query result to the electronic device 110. Alternatively, the server 150 may send the relationship graph 140 to the electronic device 110 for the electronic device 110 to query in predicting the relationship of the objects. Alternatively, the electronic device 110 may send the received first object name 121 and second object name 122 to the server 150, and the server 150 predicts the matching relationship 130 between the first object and the second object according to the result obtained by querying the relationship graph 140.

It should be noted that the method for generating the relationship graph provided by the present disclosure may be executed by the server 150. Accordingly, the apparatus for generating the relationship diagram provided by the present disclosure may be disposed in the server 150. The method for determining the matching relationship provided by the present disclosure may be executed by the electronic device 110, and may also be executed by the server 150. Accordingly, the apparatus for determining the matching relationship provided by the present disclosure may be disposed in the electronic device 110, and may also be disposed in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150, as desired for an implementation.

The method for generating the relationship diagram provided by the present disclosure will be described in detail by fig. 2 to 4 in conjunction with fig. 1.

Fig. 2 is a schematic flow diagram of a method of generating a relationship diagram according to an embodiment of the disclosure.

As shown in fig. 2, the method 200 of generating a relationship diagram of this embodiment may include operations S210 to S250.

In operation S210, for each of the first class of objects and the second class of objects, similarity information for each class of objects is determined according to a predetermined association relationship between a plurality of objects belonging to each class of objects.

According to embodiments of the present disclosure, the category of the object may be selected according to actual needs. For example, in the field of smart medicine, a first class of objects may be drug class objects, which may be represented by a drug name; the second class of objects may be disease class objects and symptom class objects, which may be represented by a disease name and a symptom name. For example, in the video field, the first type object may be an actor, and the first type object may be represented by an actor name; the second type object may be a movie, and the second type object may be represented by a name of the movie.

According to the embodiment of the present disclosure, a plurality of objects included in each class of objects and a predetermined association relationship between the plurality of objects may be acquired from a knowledge base. For example, in the video field, the association relationship between two objects in each class of objects can be obtained from an encyclopedia knowledge base. For example, in the field of smart medicine, the association between two objects in each class of objects may be obtained from an authoritative book repository. For example, for the class of drugs, the association between two subjects can be drawn from the catalog "basic medical insurance, industry injury insurance and fertility insurance drugs". For the disease class and symptom class, the association between the two objects can be extracted from the international disease classification result.

The embodiment may determine, for each object belonging to each class of objects, a similarity between the object and itself and a similarity between the object and other objects in the plurality of objects according to a predetermined association relationship, and arrange all the similarities in a matrix form, thereby obtaining the similarity information for each class of objects. Here, for example, the similarity between two objects may be a character similarity between two object names representing the two objects. As such, the similarity information for each class of objects may indicate a similarity degree between a plurality of objects belonging to the each class of objects to each other.

Setting the number of the first class objects extracted from the knowledge base to be M, the similarity information for the first class objects can be represented as a first matrix with M rows and M columns, and the value of the ith row and jth column element in the first matrix represents the similarity between the ith object and the jth object in the M first class objects. Similarly, if the number of the second class objects extracted from the knowledge base is set to be N, the similarity information for the second class objects can be represented as a second matrix with N rows and N columns, and the value of the kth row and the l column in the second matrix represents the similarity between the kth object and the l object in the N second class objects. Wherein M, N are integers with values greater than 1. i. j is any integer belonging to the value range [1, M ], and k and l are any integer belonging to the value range [1, N ].

In operation S220, association information between the first class object and the second class object is determined according to a predetermined association relationship between a plurality of first objects belonging to the first class object and a plurality of second objects belonging to the second class object.

According to an embodiment of the present disclosure, in the field of smart medicine, the predetermined association between the first object and the second object may be extracted from a knowledge base composed of a drug description and/or public and desensitized case information, and the like. For example, for each of the M first objects, an object having an association relationship with each first object may be extracted from the knowledge base to obtain an object set. It is then determined whether there is a second object belonging to the set of objects among the aforementioned N second objects. If so, determining that the second object belonging to the object set has an association relation with each first object. Or, for each second object in the N second objects, the object having an association relationship with each second object may be extracted from the knowledge base to obtain an object set. It is then determined whether there is a first object belonging to the set of objects among the aforementioned M first objects. If so, determining that the first object belonging to the object set has an association relation with each second object.

And if the first object and the second object have the association relationship, determining that the association information between the first object and the second object is 1 and other non-zero values, and otherwise, determining that the association information between the first object and the second object is zero. By forming information vectors by the association information between each first object and the N second objects, M information vectors can be obtained. By splicing the M information vectors in the column direction, a first class object and a second class object can be obtainedAssociation information between objects. Specifically, the association information may be represented by a matrix of M rows and N columns. The value of the element in the ith row and the jth column in the matrix represents the association information between the ith object in the M first objects and the jth object in the N second objects. If it is determined that the value of the association information between the first object and the second object having the association relationship is 1, the association information may be represented by a binary matrix {0, 1} ^M×N To indicate. In this embodiment, the value of i belongs to the value range [1, M]J is a value belonging to a value interval [1, N ]]Any integer of (a).

In operation S230, initial adjacency information for the relationship graph is determined according to the similarity information for the first class object, the similarity information for the second class object, and the association information.

In the initial structure of the relationship graph G, the vertices included in V may be, for example, M + N, each vertex indicating one object in an object set of M first objects and N second objects, according to an embodiment of the present disclosure. In the initial structure of the relationship graph G, E may include the same number of edges as the number of non-zero-valued elements in the association information.

According to an embodiment of the present disclosure, the initial adjacency information of the relationship graph may be, for example, an adjacency matrix of the initial structure of the relationship graph G. The initial adjacency matrix represents the relationship between the vertices in the initial structure of the relational graph G, and if the number of vertices in the graph is M + N, the size of the adjacency matrix is (M + N) × (M + N). The value of the element in the ith row and the jth column in the adjacency matrix can indicate whether a connected edge exists between the ith vertex and the jth vertex in the (M + N) vertices. In this embodiment, the values of i and j are any integers belonging to the value range [1, (M + N) ].

For example, the embodiment may determine the transpose information of the associated information, i.e., determine the binary matrix {0, 1} described above ^M×N Is transposed matrix {0, 1} ^N×M . And then splicing the similar information of the first class of objects, the similar information of the second class of objects, the associated information and the transposed information into a matrix with the size of (M + N) × (M + N), and taking the spliced matrix as initial adjacency information. For example, the similar information of the first kind of object is set by M linesMatrix S of M columns ^r The similar information of the second kind of object is shown by a matrix S with N rows and N columns ^d The associated information is represented by a matrix A with M rows and N columns, and the initial adjacency information can be represented by the matrix A _H To represent, then the matrix A _H The following equation (1) can be used to obtain:

in operation S240, initial embedding information for the relationship graph is determined according to the association information.

According to an embodiment of the present disclosure, the embedded information of the relational graph includes an embedded representation of each vertex in the relational graph. The initial embedding information may be constituted by the embedded representation of all vertices in the initial structure of the relationship graph G. Assuming that the embedding representation for each vertex is represented by a vector of size K, the initial embedding information may be represented by a matrix of (M + N) rows and K columns. The initial embedded information may serve as initial state information for a hidden layer in the convolutional neural network.

In one embodiment, for example, an object set of M first objects and N second objects may be formed, with an embedded representation of the object name of each object as the embedded representation of the vertex representing the each object. And splicing the embedded representations of the (M + N) objects in the object set in the column direction to obtain initial embedded information.

In one embodiment, for example, the transpose information of the association information may be determined first. A matrix of (M + N) rows and (M + N) columns formed from the transposed information and the associated information is then used as initial embedding information. For example, in the initial embedding information, the association information and the transposition information may be located on a main diagonal line or a sub diagonal line. For example, the initial embedded information H ^(O) Can be expressed by the following formula (2):

by setting the matrix H ^(O) Encoding in a neural network of a graph can be madeThe embedded information output by the device can characterize the association information between each of the (M + N) nodes and other nodes.

In operation S250, a relational graph is generated using a graph neural network according to the initial adjacency information and the initial embedded information.

According to an embodiment of the present disclosure, a Graph Neural Network (GNN) learns Graph structure data using a Neural Network to extract and mine features in the Graph structure data. Features in the graph structure data may include associations between vertices in the graph. The association relationship between the vertexes in the graph is the association relationship between the objects indicated by the vertexes.

In one embodiment, the Graph neural Network may employ a Graph Convolutional Network (GCN) whose inputs include a feature matrix X and an adjacency matrix. The feature matrix X may be the initial embedded information described above, and the adjacency matrix may be the initial adjacency information described above. This embodiment may input the initial embedded information and the initial adjacency information into a graph convolution network whose output is equal to the size of the feature matrix X. For example, if the feature matrix X is the initial embedded information, the output of the graph convolution network is a matrix of (M + N) rows and (M + N) columns.

For example, the embodiment may perform forward propagation through the GCN, and implement prediction on the relationship between the objects by updating in the reverse direction, and the calculation formula used in the forward propagation may be the following GCN convolution formula (3):

wherein H ^(l+1) And H ^(l) Respectively representing the embedded information obtained in the (l + 1) th iteration and the (l) th iteration, G is a heterogeneous graph which can be represented by the initial adjacent matrix described above, D is a degree matrix of the initial structure of the relational graph, the degree matrix is a diagonal matrix, elements on the diagonals are degrees of each vertex in the initial structure of the relational graph, and the degree of each vertex represents the degree connected with each vertexThe number of edges of (2). σ () represents an activation function, which may be, for example, a ReLU function. W ^(l) Is a weight matrix.

It is understood that the operation S250 is essentially a process of graph reconstruction, and the embodiment may use the GCN as an encoder, and then use the output of the encoder as an input of a decoder, and calculate a probability that an edge exists between every two vertices in the initial structure of the graph by the decoder, and according to the probability, a relationship graph may be obtained. For example, if the probability that an edge exists between two vertices is greater than a probability threshold, it is determined that the prediction result indicates that two objects indicated by the two vertices have an association relationship. Connecting two vertexes of two objects which are obtained by prediction and indicate the association relation, so as to obtain E in the relation graph G, and forming V in the relation graph G by vertexes indicating (M + N) objects.

The embodiment of the disclosure may use the generated relationship graph as prior knowledge to serve as a query basis for predicting a matching relationship between two objects. According to the technical scheme, the information obtained by inquiring the knowledge base forms the initial structure of the graph, the graph neural network is adopted to reconstruct the structure of the graph, and the relational graph is obtained. Therefore, more complete priori knowledge is provided for the prediction of the matching relationship, and the accuracy of the matching relationship between the determined objects is improved.

Fig. 3 is a schematic diagram illustrating the principle of a method of determining similarity between two objects according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, the similarity between two objects can be determined according to the association relationship between the two objects and other objects in each class of objects, the number of the same objects in the objects associated with the two objects, and the like. Compared with a method for determining the similarity of two objects only according to the character similarity between names, the method can improve the accuracy of the determined similarity to a certain extent and can more accurately dig out the association relationship between the two objects.

According to the embodiment of the present disclosure, a tree diagram for a plurality of objects belonging to each category may be constructed according to the association relationship between the objects. For example, for a plurality of objects of each category, a tree 300 as shown in FIG. 3 can be constructed. Each node in the tree 300 indicates one of a plurality of objects. For example, a root node 301 in the tree diagram 300 indicates drug A, and the children of the root node include a node 302 indicating drug a. The child nodes of node 302 include nodes 303-306 for indicating drugs a-1-4, respectively. The child nodes of the node 303 include a node 307, the node 307 is used for indicating the drug a-1-1, the child nodes of the node 305 include nodes 308 to 310, and the nodes 308 to 310 are respectively used for indicating the drug a-3-1 to the drug a-3-3. The child nodes of node 310 include node 311 and node 312, where node 311 and node 312 are used to indicate drug a-3-3-1 and drug a-3-3-2, respectively.

For a plurality of objects, the embodiment may further combine the plurality of objects two by two to form a plurality of object pairs. For example, for 12 drugs indicated by the dendrogram 300, 66 object pairs may be obtained by combining two by two. The operation of forming the plurality of object pairs may be performed in synchronization with the operation of constructing the tree diagram. After obtaining a plurality of object pairs and the tree diagram, for each object included in each object pair, a connection node between a node indicating each object in the tree diagram and a root node of the tree diagram is determined, and a set of parent nodes for each object is obtained, where the set of parent nodes includes the determined connection node and the root node. For example, for node 307 in the tree diagram 300 indicating drug a-1-1, the resulting set of parent nodes includes node 303, node 302, and node 301. For node 311, which indicates drug a-3-3-1 in the tree 300, the resulting set of parent nodes includes node 310, node 305, node 302, and node 301.

Then, the embodiment may determine the similarity between the two objects according to the intersection of the two sets of parent nodes of the two objects included for each object pair and the two sets of parent nodes. For example, the embodiment may count the number m1 of nodes in the intersection of the two sets of parent nodes, and count the number of nodes in the union of the two sets of parent nodesNumber m ₂ Subsequently, m is ₁ And m ₂ The ratio of (a) is taken as the similarity between the two objects included in each object pair. For example, for an object pair consisting of the drug a-1-1 and the drug a-3-3-1, the intersection of the two sets of parent nodes includes node 302 and node 301, and the union of the two sets of parent nodes includes node 303, node 310, node 305, node 302 and node 301, then the similarity between the drug a-1-1 and the drug a-3-3-1 is 2/5 ═ 0.4.

In one embodiment, for a set of parent nodes of each object, the node may be given a weight based on a distance between each node and the node indicating the object, and a similarity between two objects may be calculated based on the weight. The farther away from the node indicating the each object, the smaller the weight of the each node. The weight may represent, for example, a similarity relationship between the object indicated by each node and each object. By the method, the incidence relation among a plurality of objects can be fully considered, so that the determined similarity can embody the tree-shaped connection relation among the objects to a certain extent, and the accuracy of the determined similarity is improved.

Specifically, the embodiment may determine, for each object, a similarity value between the object indicated by a set of parent nodes in the tree graph and the each object as a first similarity value for the each object according to the set of parent nodes for the each object and a predetermined similarity coefficient. The predetermined similarity coefficient may correspond to a base for giving a weight to each node in a group of parent nodes, and may be any value smaller than 1. For example, the embodiment may calculate the similarity value DV (d) between each object and the objects indicated by a set of parent nodes according to the following formula (4) _i )：

Wherein d is _i And representing the ith object in the plurality of objects, wherein delta is a preset similarity coefficient, and the value of n is the number of nodes in a group of father nodes aiming at each object. It can be understood thatIf, in the tree diagram, for each object, the node indicating each object can be connected to the root node via multiple paths, then the node in the set of parent nodes for each object in this embodiment is the node on the shortest path in the multiple paths. It is to be understood that the above formula (4) for calculating the similarity is only an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

For example, for the drug a-1-1, the predetermined similarity coefficient is set to 0.5, and the similarity between the drug indicated by the group of father nodes of the drug a-1-1 and the drug a-1-1 takes a value of 0.5+0.5 ² +0.5 ³ 0.875. For the medicine a-3-3-1, the similarity between the medicine indicated by a group of father nodes of the medicine a-3-3-1 and the medicine a-3-3-1 takes a value of 0.5+0.5 ² +0.5 ³ +0.5 ⁴ ＝0.9375。

While determining the first similarity, the embodiment may also determine, for each node in the intersection of the two sets of parent nodes, a target node in the tree graph between the node and the node indicating each object. And then determining a similarity value between the object indicated by each node and each object according to the target node and a preset similarity coefficient, wherein the similarity value is used as a second similarity value aiming at each object.

For example, the similarity between the object indicated by each node and each object may be represented as Δ ^p+1 And p is the number of the target nodes.

For example, the similarity between the object indicated by each node and each object can also be calculated in a recursive manner as shown in the following formula (5):

wherein g represents a node in the intersection of two sets of father nodes, g' is a child node of g, C _d (g) Representing the similarity between the object indicated by node g and each object. Δ is a predetermined similarity coefficient.

For example, for drug a-1-1 and drug a-3-3-1, the intersection of the two sets of parent nodes includes node 302 and node 301, and the target node between node 302 and node 307 indicating drug a-1-1 includes node 303. The target node between the node 301 and the node 307 indicating the drug a-1-1 includes the node 303 and the node 302, and by the above formula (5), it can be calculated that the similarity between the drug a indicated by the node 302 and the drug a-1-1 is 0.5 × 1 — 0.25, and the similarity between the drug a indicated by the node 301 and the drug a-1-1 is 0.5 × 1 — 0.125.

According to the embodiment of the present disclosure, the similarity between the object indicated by the node in the intersection of the two sets of parent nodes and each object may be accumulated, and the accumulated value may be used as the second similarity value.

After obtaining the two first similarity values and the two second similarity values for the two objects, the similarity between the two objects may be determined according to the two first similarity values and the two second similarity values. For example, the embodiment may take the ratio of the sum of the two second similarity values to the sum of the two first similarity values as the similarity between the two objects. For example, object d _i And an object d _j Similarity between them

Can be calculated by using the following equation (6):

wherein, G (d) _i ) Representing for an object d _i A group of parent nodes of G (d) _j ) Representing for object d _j A set of parent nodes.

Object represented by node g and object d _i The value of the similarity between the two values,

object represented by node g and object d _j The similarity value therebetween. DV (d) _i ) Representing for object d _i First similarity ofValue, DV (d) _i ) Representing for object d _j A first similarity value of (a).

For example, for drug a-1-1 and drug a-3-3-1, the similarity between the two is about 0.31. For drug a-3-3-1 and drug a-3-3-2, the similarity between the two is about 1.

FIG. 4 is a schematic diagram of generating a relationship graph according to an embodiment of the disclosure.

As shown in fig. 4, in this embodiment 400, when generating the relationship graph 470, the similarity information 420 may be determined according to a predetermined association relationship between a plurality of first objects 401 in the first class of objects and a predetermined association relationship between a plurality of second objects 402 in the second class of objects. The similarity information 420 may be composed of similarity information for the first class of objects and similarity information for the second class of objects. Meanwhile, the embodiment may also determine the association information 410 according to a predetermined association relationship between the plurality of first objects 401 and the plurality of second objects 402.

Subsequently, a heterogeneous graph 430 for the relationship graph is constructed according to the association information 410 and the similar information 420, and the heterogeneous graph 430 can represent an initial structure of the relationship graph. The abnormal pattern 430 may be represented by the initial adjacency information described above.

After obtaining the abnormal pattern 430, the embodiment may use the abnormal pattern 430 as an input of the encoder 440, so that the encoder 440 encodes the abnormal pattern 430 with the initial value of the above-mentioned initial embedded information in the hidden layer state, thereby obtaining the embedded information 450 for the abnormal pattern 430. Subsequently, the embedded information 450 is input to the decoder 460, and after the embedded information 450 is processed by the decoder 460, a probability matrix having the same size as that of the adjacency matrix representing the initial adjacency information may be output. The probability matrix is binarized by comparing each element in the probability matrix with the probability threshold, the matrix obtained after binarization is the adjacency matrix of the relation graph 470, and the relation graph 470 can be reconstructed according to the adjacency matrix.

For example, the encoder may employ the GCN described above, and the decoder may employ a decoder in a variable Graph Auto-Encoders (VGAE) model or a Graph Auto-encoder (GAE) model.

In an embodiment, when constructing the heteromorphic graph 430 (i.e. when obtaining the initial adjacency information), the similar information of each type of object may be normalized, and then the initial adjacency information may be constructed according to the normalized similar information. Therefore, the influence of the dimension on the similar information of the two types of objects is eliminated, the similar information of the first type of objects and the similar information of the second type of objects have comparability, the expression capability of the obtained initial adjacency matrix is improved, and the precision of the relationship graph obtained by reconstruction is improved.

Wherein, for example, the following formulas (7) and (8) can be respectively adopted to represent the matrix S of similar information of the first class object ^r And a matrix S representing similar information for objects of the second type ^d Carrying out normalization treatment:

wherein, S ^r 、～S ^d Respectively representing normalized similarity information for the first class of objects and normalized similarity information for the second class of objects.

Is shown in

The elements in the constructed vector are diagonal matrices of the main diagonal elements. The diag () function is used to construct a diagonal matrix in which the elements that are not on the diagonal are all 0's.

Is shown in

The elements in the constructed vector are diagonal matrices of the main diagonal elements.

Representing the similarity between the ith object and the jth object in the first class of objects.

Representing the similarity between the kth object and the l-th object in the second class of objects.

After obtaining the normalized similar information, the embodiment may determine the initial adjacency information of the relationship graph according to the normalized similar information, the association information, and the transposed information of the association information. For example, in this embodiment, a matrix A representing initial adjacency information _H Can be expressed by the following formula (9):

in an embodiment, a penalty factor may be further added to the similar information for each type of object to adjust the importance of the similar information in the generation process of the relationship diagram, so that the generated relationship diagram better meets the actual requirement. For example, the penalty factor may be an adjustable parameter in the training process of the graph neural network, and a value of the penalty factor is determined according to a test result in the training process of the graph neural network.

For example, the normalized similarity information for each class of objects may be adjusted according to a penalty factor, resulting in adjusted specification information for each class of objects. And then sequentially splicing the adjusted specification information, the association information, the adjusted specification information and the transposed information aiming at the first class of objects to obtain initial adjacent information. For example, in this embodiment, a matrix A representing initial adjacency information _H Can be expressed by the following equation (10):

wherein mu is a penalty factor.

Based on the method for generating the relationship graph provided by the disclosure, the disclosure also provides a method for determining the matching relationship. The method of determining the matching relationship will be described in detail below with reference to fig. 5 to 7.

Fig. 5 is a flow chart diagram of a method of determining a matching relationship according to an embodiment of the present disclosure.

As shown in fig. 5, the method 500 of determining a matching relationship of this embodiment may include operations S510 to S540.

In operation S510, for a first object in the pair of objects to be matched, a predetermined relationship graph is queried according to the first object, and a first target object associated with the first object is obtained.

According to an embodiment of the present disclosure, the object class to which the first target object belongs includes an object class to which the second object belongs. Taking the intelligent medical field as an example, if the first object is a medicine, the first object is queried to obtain symptoms or diagnosis results associated with the medicine, and the diagnosis results may include diseases and the like.

According to an embodiment of the present disclosure, the predetermined relationship graph is a relationship graph generated by the method of generating a relationship graph described above. The embodiment may first determine a vertex in the predetermined relationship graph that indicates the first object. Subsequently, the symptom or disease indicated by the vertex connected via the edge with the vertex indicating the first object is taken as the first target object.

For example, if an object indicated by a vertex connected to a vertex indicating a first object via an edge does not include a symptom or a disease, the embodiment may further expand the vertex indicating the first object as a starting point to a periphery step by step along the edge until the vertex indicating the symptom or the disease is expanded, and set the symptom or the disease indicated by the vertex as a first target object. In one embodiment, an upper limit value may be set for the number of steps of the diffusion to avoid a case where the symptom obtained by the diffusion or the degree of association between the disease and the drug is low and the prediction of the matching relationship is disturbed. The upper limit value may be any value greater than 1, such as 2 or 3, for example, and the disclosure is not limited thereto.

It is understood that if the first subject is a symptom or disease, the drug to be the first target subject may be determined by a method similar to the above.

In operation S520, for a second object in the object pair, a second target object associated with the second object is determined according to the association relationship for a plurality of objects in the object class to which the second object belongs.

According to the embodiment of the present disclosure, the second target object may be determined according to the association relationship between the plurality of second objects described above. For example, if the second object is a disease, the tree for the second class of objects may be traversed to take as the second target object the symptoms in the information indicated by the nodes that are located in the same branch as the node indicating the disease.

It is to be understood that if the second object is a drug, the drug having an association relationship with the drug can be determined to be the second target object in a similar manner.

In an embodiment, when determining the second target object, for example, a pre-constructed mapping relationship may be used, and the mapping relationship may be a mapping relationship between a disease and a symptom. The embodiment may take a symptom having a mapping relation with the disease as the second target object.

Before determining the second target object according to the mapping relationship, the embodiment may first search for the name of the second object from the mapping relationship. If the name of the second object is not found, replacing the name of the second object with the name of the disease matched with the name of the second object in the mapping relation, and then determining the symptom having the mapping relation with the name of the replaced second object. This approach may take into account that different regions may have different presentations of the same disease. When the name is matched, an object having an inclusion or included relationship with the second object may be determined according to the tree diagram, and the name of the object having the inclusion or included relationship may be used as the name of the matched object. If there are no objects in the tree that contain or are contained, the matching name can be determined according to the text similarity between the two object names. The text similarity may be represented by a Dice distance and the like.

In operation S530, for each of the first object and the second object, feature information for each object is determined according to a target object of each object.

According to the embodiment of the present disclosure, for each object, the name of the target object associated with the object determined above may be subjected to text processing, and the obtained text feature may be used as feature information of the object. The text processing may include, for example, first performing embedded representation on the name of the target object to convert the target object into a low-dimensional dense semantic vector with global information. For example, a bag-of-words model or the like may be employed to embed the name of the target object. After the embedded representation is obtained, the low-dimensional dense semantic vector can be processed by adopting models such as CNN (convolutional neural network) and RNN (neural network), so that the feature information is obtained.

In one embodiment, the feature representation layer may be constructed using LSTM to process the embedded representation of the name to obtain feature information. When the text is processed, the hidden markov model can be adopted to perform word segmentation processing on the text, and then the text is encoded according to the result of word segmentation and the corresponding relation between the words and the index numbers in the predefined word list, so as to obtain the basic data of the embedding processing. The underlying data is then subjected to an embedding process to map the underlying data to a fixed dimension.

In operation S540, a matching relationship between the first object and the second object is determined according to the feature information for the first object and the feature information for the second object.

According to an embodiment of the present disclosure, a similarity between feature information of a first object and feature information of a second object may be calculated first, and the similarity may be represented by cosine similarity, a jaccard similarity coefficient, or the like. If the similarity between the two pieces of feature information is greater than the similarity threshold, it may be determined that the matching relationship between the first object and the second object is a match.

In one embodiment, before determining the similarity between two pieces of feature information, for example, the two pieces of feature information may be processed by using the full connection layer and the activation layer, and then the similarity between the two pieces of feature information after processing may be calculated.

When the matching relationship between the two objects is determined, the relationship graph generated above is inquired, and the target object obtained through inquiry is taken as a consideration factor for determining the matching relationship to extract the characteristic information, so that the prediction result is not influenced by the subjective factor, the potential relationship between the two objects can be mined, and the accuracy of the determined matching relationship is improved.

Fig. 6 is a schematic diagram illustrating a method of determining a matching relationship according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, when determining the matching relationship, in addition to the target object associated with the object, for example, the description text of the object may be considered to more fully understand the object when determining the matching relationship, and thus improve the accuracy of the determined matching relationship.

Taking the intelligent medical field as an example, as shown in fig. 6, when determining the matching relationship between the medicine and the disease represented by the entity pair < medicine name, disease name >, the embodiment 600 may first query the above-described relationship diagram according to the medicine name 601 to obtain the associated diagnosis result 604 and the associated symptom information 605 associated with the medicine. Meanwhile, pre-constructed drug-efficacy mapping information can be queried according to the drug name 601, so as to obtain efficacy description information 603 (i.e. efficacy description text of the drug) of the drug. The embodiment 600 may then take the efficacy descriptive information 603, the associated diagnostic information 604 (i.e., the associated disease name), and the associated symptom information 605 as input to the input layer 611 to obtain an embedded representation of these information via the input layer 611. Then, the embedded representation is input to the feature extraction layer 621, so that feature information for the medicine can be obtained. The feature extraction layer 621 may be the LSTM described above, among others.

The present disclosure may, for example, maintain in advance a description text library for the first type of object, the description text library including a plurality of description texts, each description text being indexed by a name of the first object that it describes. Thus, the embodiment may also query the description text library according to the name of the first object, so as to obtain the description text of the first object. When the first object is a drug, the descriptive text may be efficacy descriptive text.

It is to be understood that the present disclosure may employ, for example, a two-tower model to determine the matching relationship. The parameters of the input layer and the feature extraction layer in the two branches of the double tower model are shared.

In this embodiment, in addition to determining the feature information of the disease from the symptom information 606 obtained as the second target object by the query, the disease name 602 may be considered. Therefore, the expression capability of the obtained characteristic information aiming at the second object can be improved, and the accuracy of the determined matching relation is favorably improved. Specifically, the symptom information 606 and the disease name 602 may be input to the input layer 612, the embedded representation of the symptom information 606 and the disease name 602 may be obtained by the input layer 612, and after the embedded representation is input to the feature extraction layer 622, the feature information for the second object may be output by the feature extraction layer 622.

The embodiment may then input the two characteristic information into the matching layer 630, with the matching layer 630 outputting a matching score 607, the matching score 607 representing the probability of matching the drug and the disease. If the matching score 607 is greater than the similarity threshold described above, then the matching relationship between the drug and the disease may be determined to be a match.

According to an embodiment of the present disclosure, the matching layer 630 may also include, for example, two branches, each of which includes the fully-connected layer and the active layer described above, and whose parameters are shared. The matching layer 630 may further include a processing layer for calculating similarity, which is used to calculate similarity between feature information obtained by two branch processes, and obtain a matching score 607 after normalizing the similarity.

Fig. 7 is a schematic diagram illustrating a method of determining a matching relationship according to another embodiment of the present disclosure.

As shown in fig. 7, in an embodiment 700, after obtaining feature information 701 for a first object and feature information 702 for a second object, the embodiment may splice the feature information 701 and the feature information 702, then input the spliced information into a multi-layer perceptron network, and output a matching score 703 by the multi-layer perceptron network.

Among other things, the multi-layer perceptron network may include a fully-connected layer 731, an active layer 732, and a fully-connected layer 733. The fully-connected layer 731 is used as an input layer in which each input neuron connects at least one hidden layer in the active layers 732 that represents a latent variable. Wherein an input layer may be understood as a mapping layer. The fully connected layer 733 is used as an output layer for mapping data output by the active layer 732 to a matching score dimension.

It is understood that the matching score obtained in this embodiment 700 may represent a probability value of matching the first object and the second object, and may also represent a similarity between two feature information. If the matching score is greater than the score threshold, a matching relationship between the first object and the second object may be determined to be a match. Otherwise, determining that the matching relationship between the first object and the second object is not matching.

According to the embodiment, the two pieces of feature information are spliced firstly, and then the matching relationship is determined by adopting the multilayer sensing machine, so that the determination of the matching relationship is more flexible, the two pieces of feature information can be well learned, the fitting capability of the model can be improved, and the accuracy of the determined matching relationship can be improved.

According to the embodiment of the disclosure, the input layer, the feature extraction layer and the matching layer used for determining the matching relationship can be integrated into one matching relationship prediction model. In training the matching relationship prediction model, for example, a training set including positive and negative samples may be used for training. The proportion of positive and negative samples in the training set can be 1: 1, and the hyper-parameters used in the training process can be set according to actual requirements. For example, the hyper-parameters and the values of the hyper-parameters during the training process can be shown in the following table.

Name of hyper-parameter	Value of hyper-parameter
		Iteration round	10
Learning rate	5e-4
		batch_size	64
beta_1	0.9
		beta_2	0.999
epsilon	1e-8
		weight_decay	0
dropout_rate	0.5
		emb_dim	128

Beta _1 and beta _2 can be parameters used in back propagation, and batch _ size is the number of samples used in a single round in the iterative training process. Epsilon is a parameter in the convergence condition of the model. weight _ decay is the weight reduction value, and emb _ dim is the data dimension of the embedded representation obtained by the input layer. drop _ rate refers to the proportion of hidden node failures.

Based on the method for generating the relationship diagram provided by the present disclosure, the present disclosure also provides a device for generating the relationship diagram, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a structure of an apparatus for generating a relationship diagram according to an embodiment of the present disclosure.

As shown in fig. 8, the apparatus 800 for generating a relationship diagram of this embodiment may include a similarity determination module 810, an association information determination module 820, an adjacency information determination module 830, an embedded information determination module 840, and a diagram generation module 850.

The similarity determining module 810 is configured to determine, for each of the first class of objects and the second class of objects, similarity information for each class of objects according to a predetermined association relationship between a plurality of objects belonging to each class of objects, where the similarity information indicates a similarity between the plurality of objects. In an embodiment, the similarity determining module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The association information determining module 820 is configured to determine association information between a plurality of first objects belonging to the first class of objects and a plurality of second objects belonging to the second class of objects according to a predetermined association relationship between the first objects and the second objects. In an embodiment, the association information determining module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The adjacency information determination module 830 is configured to determine initial adjacency information for the relationship graph according to the similar information for the first class object, the similar information for the second class object, and the association information. In an embodiment, the adjacency information determining module 830 may be configured to perform the operation S230 described above, which is not described herein again.

The embedded information determining module 840 is configured to determine initial embedded information for the relationship graph according to the association information. In an embodiment, the embedded information determining module 840 may be configured to perform the operation S240 described above, which is not described herein again.

The graph generation module 850 is configured to generate a relational graph using a graph neural network according to the initial adjacency information and the initial embedded information. Wherein the relationship graph indicates an association relationship between any two objects in an object set composed of the plurality of first objects and the plurality of second objects. In an embodiment, the graph generating module 850 may be configured to perform the operation S250 described above, which is not described herein again.

According to an embodiment of the present disclosure, the similarity determining module 810 may include a dendrogram determining submodule, an object pair constructing submodule, a parent node obtaining submodule, and a similarity determining submodule. The tree-shaped graph determining submodule is used for determining a tree-shaped graph aiming at a plurality of objects according to the incidence relation among the plurality of objects; each node in the tree indicates one of the plurality of objects. The object pair forming submodule is used for combining the plurality of objects pairwise to form a plurality of object pairs. The parent node obtaining submodule is used for determining a connection node between a node indicating each object in the tree-like graph and a root node of the tree-like graph aiming at each object included in each object pair in a plurality of object pairs to obtain a group of parent nodes aiming at each object; a set of parent nodes includes a connection node and a root node. The similarity determination submodule is used for determining the similarity between the two objects according to the intersection of the two sets of father nodes of the two objects included in each object pair and the two sets of father nodes.

According to an embodiment of the present disclosure, the similarity determination submodule may include a first value determination unit, a target node determination unit, a second value determination unit, and a similarity determination unit. The first value determination unit is configured to determine, for each of the two objects, a similarity value between the object indicated by the set of parent nodes for each object and each object as a first similarity value for each object, based on the set of parent nodes for each object and a predetermined similarity coefficient. The target node determining unit is used for determining a target node between each node and the node indicating each object in the tree-shaped graph aiming at each node in the intersection of the two groups of father nodes. The second value determination unit is configured to determine a similarity value between the object indicated by each node and each object as a second similarity value for each object, according to the target node and a predetermined similarity coefficient. The similarity determining unit is used for determining the similarity between the two objects according to the two first similarity values and the two second similarity values aiming at the two objects.

According to an embodiment of the present disclosure, the adjacency information determination module 830 may include a normalization sub-module, a first translation sub-module, and an adjacency information determination sub-module. The normalization sub-module is used for carrying out normalization processing on the similar information aiming at each class of objects to obtain the normalized similar information aiming at each class of objects. The first transposition submodule is used for determining transposition information of the associated information. The adjacency information determining submodule is used for determining initial adjacency information aiming at the relational graph according to the normalized similar information aiming at the first class of objects, the normalized similar information aiming at the second class of objects, the association information and the transposition information.

According to an embodiment of the present disclosure, the adjacency information determination sub-module may include an information adjustment unit and an information splicing unit. The information adjusting unit is used for adjusting the normalized similar information aiming at each class of objects according to a preset penalty factor to obtain the adjusted normalized information aiming at each class of objects. The information splicing unit is used for splicing the adjusted specification information and the association information aiming at the first class of objects, the adjusted specification information and the transposition information aiming at the second class of objects to obtain the initial adjacency information.

According to an embodiment of the present disclosure, the embedded information determining module 840 may include a second transpose sub-module and an information composing sub-module. The second transposition submodule is used for determining transposition information of the association information. And the information forming submodule is used for forming initial embedded information according to the transposed information and the associated information. In the initial embedding information, the transposed information and the associated information are information on the secondary diagonal line.

Based on the method for determining the matching relationship provided by the present disclosure, the present disclosure also provides a device for determining the matching relationship, which will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of a structure of an apparatus for determining a matching relationship according to an embodiment of the present disclosure.

As shown in fig. 9, the apparatus 900 for determining a matching relationship of this embodiment may include a graph query module 910, an object determination module 920, a feature determination module 930, and a relationship determination module 940.

The graph query module 910 is configured to query, for a first object in the pair of objects to be matched, the predetermined relationship graph according to the first object, so as to obtain a first target object associated with the first object. The object class to which the first target object belongs comprises an object class to which the second object belongs. The predetermined relational graph is a relational graph generated by the relational graph generating device provided by the disclosure. In an embodiment, the graph query module 910 may be configured to perform the operation S510 described above, which is not described herein again.

The object determining module 920 is configured to determine, for a second object in the object pair, a second target object associated with the second object according to the association relationship for a plurality of objects in the object class to which the second object belongs. In an embodiment, the object determining module 920 may be configured to perform the operation S520 described above, which is not described herein again.

The feature determination module 930 is configured to determine, for each of the first object and the second object, feature information for each object according to a target object of each object. In an embodiment, the feature determining module 930 may be configured to perform the operation S530 described above, which is not described herein again.

The relationship determination module 940 is configured to determine a matching relationship between the first object and the second object according to the feature information for the first object and the feature information for the second object. In an embodiment, the relationship determining module 940 may be configured to perform operation S540 described above, which is not described herein again.

According to an embodiment of the present disclosure, the apparatus 900 for determining a matching relationship may further include a description text query module, configured to query a description text library for an object class to which the first object belongs according to the first object, so as to obtain a description text of the first object. The above-mentioned feature determination module 930 may be configured to determine feature information for the first object according to the first target object and the description text of the first object in a case where each object is the first object.

According to an embodiment of the present disclosure, the above-mentioned feature determination module 930 may be configured to determine feature information for the second object according to the second target object and the second object in case that each object is the second object.

According to an embodiment of the present disclosure, the relationship determining module 940 may include a concatenation submodule, a probability determining submodule, and a relationship determining submodule. The splicing submodule is used for splicing the characteristic information aiming at the first object and the characteristic information aiming at the second object to obtain splicing characteristic information. And the probability determination submodule is used for determining the probability value of the matching of the first object and the second object according to the splicing characteristic information. And the relation determining submodule is used for determining the matching relation between the first object and the second object according to the probability value.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 10 illustrates a schematic block diagram of an example electronic device 1000 that may be used to implement the method of generating a relationship graph or the method of determining a matching relationship of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

A number of components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as a method of generating a relationship diagram or a method of determining a matching relationship. For example, in some embodiments, the method of generating a relationship graph or the method of determining matching relationships may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communications unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the method of generating a relationship graph or the method of determining a matching relationship described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of generating a relationship graph or the method of determining matching relationships.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of generating a relationship graph, comprising:

for each class of objects in a first class of objects and a second class of objects, determining similarity information for each class of objects according to a predetermined association relationship among a plurality of objects belonging to the each class of objects, the similarity information indicating a similarity degree of the plurality of objects to each other;

determining association information between a plurality of first objects belonging to the first class of objects and a plurality of second objects belonging to the second class of objects according to a predetermined association relationship between the first objects and the second objects;

determining initial adjacency information aiming at a relation graph according to the similar information aiming at the first class of objects, the similar information aiming at the second class of objects and the association information;

determining initial embedding information aiming at the relation graph according to the association information; and

generating the relational graph by adopting a graph neural network according to the initial adjacency information and the initial embedded information,

wherein the relationship graph indicates an association relationship between any two objects in an object set composed of the plurality of first objects and the plurality of second objects.

2. The method of claim 1, wherein determining similar information for the objects of each class according to a predetermined association relationship between objects belonging to the objects of each class comprises:

determining a tree diagram aiming at the plurality of objects according to a preset incidence relation among the plurality of objects; each node in the tree indicates one of the plurality of objects;

combining the plurality of objects pairwise to form a plurality of object pairs;

determining a connection node between a node indicating each object in the tree diagram and a root node of the tree diagram aiming at each object in each object pair in the plurality of object pairs to obtain a group of parent nodes aiming at each object; the set of parent nodes comprises the connection node and the root node; and

and determining the similarity between the two objects according to the intersection of the two sets of parent nodes of the two objects included in each object pair and the two sets of parent nodes.

3. The method of claim 2, wherein said determining a similarity between two objects included in said each object pair according to an intersection of two sets of parent nodes of said two objects and said two sets of parent nodes comprises:

for each of the two objects:

determining a similarity value between the object indicated by the set of parent nodes for each object and each object according to the set of parent nodes for each object and a predetermined similarity coefficient, as a first similarity value for each object;

for each node in the intersection of the two sets of parent nodes, determining a target node in the tree graph between each node and the node indicating each object;

determining a similarity value between the object indicated by each node and each object according to the target node and the predetermined similarity coefficient, and taking the similarity value as a second similarity value for each object; and

determining a similarity between the two objects according to the two first similarity values and the two second similarity values for the two objects.

4. The method of claim 1, wherein the determining initial adjacency information for a relationship graph from the similarity information for the first class of objects, the similarity information for the second class of objects, and the association information comprises:

normalizing the similar information aiming at each type of object to obtain normalized similar information aiming at each type of object;

determining transposition information of the associated information; and

determining initial adjacency information for a relational graph according to the normalized similarity information for the first class of objects, the normalized similarity information for the second class of objects, the association information, and the transposition information.

5. The method of claim 4, wherein the determining initial adjacency information for a relationship graph from the normalized similarity information for the first class of objects, the normalized similarity information for the second class of objects, the association information, and the transpose information comprises:

adjusting the normalized similar information aiming at each class of objects according to a preset penalty factor to obtain adjusted normalized information aiming at each class of objects; and

and splicing the adjusted specification information aiming at the first class of objects, the association information, the adjusted specification information aiming at the second class of objects and the transposition information to obtain the initial adjacency information.

6. The method of claim 1, wherein the determining initial embedding information for the relationship graph from the association information comprises:

determining transposition information of the associated information; and

constructing the initial embedding information according to the transposed information and the associated information,

wherein, in the initial embedding information, the transposed information and the associated information are information on a sub diagonal line.

7. A method of determining a matching relationship, comprising:

aiming at a first object in an object pair to be matched, inquiring a preset relation graph according to the first object to obtain a first target object associated with the first object;

for a second object in the object pair, determining a second target object associated with the second object according to the association relation of a plurality of objects in an object class to which the second object belongs;

for each of the first object and the second object, determining feature information for the each object according to a target object of the each object; and

determining a matching relationship between the first object and the second object according to the feature information for the first object and the feature information for the second object,

wherein the object class to which the first target object belongs comprises an object class to which the second object belongs; the predetermined relationship graph is a relationship graph generated according to the method of any one of claims 1-6.

8. The method of claim 7, further comprising:

inquiring a description text library aiming at the object class to which the first object belongs according to the first object to obtain a description text of the first object;

determining, in a case where the each object is the first object, feature information for the each object according to a target object of the each object includes:

and determining feature information aiming at the first object according to the first target object and the description text of the first object.

9. The method of claim 7, wherein, in a case where the each object is the second object, determining the feature information for the each object according to a target object of the each object comprises:

determining feature information for the second object from the second target object and the second object.

10. The method of claim 7, wherein the determining a matching relationship between the first object and the second object according to the feature information for the first object and the feature information for the second object comprises:

splicing the characteristic information aiming at the first object and the characteristic information aiming at the second object to obtain spliced characteristic information;

determining a probability value of the first object matched with the second object according to the splicing characteristic information; and

and determining a matching relation between the first object and the second object according to the probability value.

11. An apparatus to generate a relationship graph, comprising:

the similarity determination module is used for determining similarity information aiming at each class of objects according to a preset incidence relation among a plurality of objects belonging to each class of objects aiming at each class of objects in the first class of objects and the second class of objects, and the similarity information indicates the similarity among the plurality of objects;

the association information determining module is used for determining association information between a plurality of first objects belonging to the first class of objects and a plurality of second objects belonging to the second class of objects according to a preset association relationship between the first objects and the second objects;

the adjacency information determining module is used for determining initial adjacency information aiming at the relation graph according to the similar information aiming at the first class of objects, the similar information aiming at the second class of objects and the association information;

the embedded information determining module is used for determining initial embedded information aiming at the relation graph according to the associated information; and

a graph generating module for generating the relational graph by using a graph neural network according to the initial adjacency information and the initial embedded information,

12. The apparatus of claim 11, wherein the similarity determination module comprises:

the tree-shaped graph determining submodule is used for determining a tree-shaped graph aiming at the plurality of objects according to the preset incidence relation among the plurality of objects; each node in the tree indicates one of the plurality of objects;

the object pair forming submodule is used for combining the plurality of objects pairwise to form a plurality of object pairs;

a parent node obtaining submodule, configured to determine, for each object included in each object pair of the plurality of object pairs, a connection node between a node indicating each object in the tree diagram and a root node of the tree diagram, and obtain a set of parent nodes for each object; the set of parent nodes comprises the connection node and the root node; and

and the similarity determining submodule is used for determining the similarity between the two objects according to the intersection of the two sets of father nodes of the two objects included in each object pair and the two sets of father nodes.

13. The apparatus of claim 12, wherein the similarity determination submodule comprises:

a first value determination unit configured to determine, for each of the two objects, a similarity value between an object indicated by the set of parent nodes for the each object and the each object as a first similarity value for the each object, based on the set of parent nodes for the each object and a predetermined similarity coefficient;

a target node determining unit, configured to determine, for each node in an intersection of the two sets of parent nodes, a target node located between each node and a node indicating each object in the tree graph;

a second value determination unit configured to determine, as a second similarity value for each object, a similarity value between the object indicated by each node and each object according to the target node and the predetermined similarity coefficient; and

a similarity determining unit, configured to determine a similarity between the two objects according to the two first similarity values and the two second similarity values for the two objects.

14. The apparatus of claim 11, wherein the adjacency information determination module comprises:

the normalization submodule is used for carrying out normalization processing on the similar information aiming at each class of objects to obtain the normalized similar information aiming at each class of objects;

a first transposition submodule for determining transposition information of the association information; and

and the adjacency information determining submodule is used for determining initial adjacency information aiming at the relational graph according to the normalized similar information aiming at the first class of objects, the normalized similar information aiming at the second class of objects, the association information and the transposition information.

15. The apparatus of claim 14, wherein the adjacency information determination sub-module comprises:

the information adjusting unit is used for adjusting the normalized similar information aiming at each class of objects according to a preset penalty factor to obtain the adjusted normalized information aiming at each class of objects; and

and the information splicing unit is used for splicing the adjusted specification information aiming at the first class of objects, the association information, the adjusted specification information aiming at the second class of objects and the transposition information to obtain the initial adjacent information.

16. The apparatus of claim 11, wherein the embedded information determination module comprises:

a second transposition submodule configured to determine transposition information of the association information; and

an information constructing sub-module for constructing the initial embedding information according to the transposed information and the associated information,

17. An apparatus for determining a matching relationship, comprising:

the graph query module is used for querying a preset relation graph according to a first object in an object pair to be matched to obtain a first target object associated with the first object;

the object determination module is used for determining a second target object associated with a second object in the object pair according to the association relation of a plurality of objects in an object class to which the second object belongs;

a feature determination module configured to determine, for each of the first object and the second object, feature information for the each object according to a target object of the each object; and

a relationship determination module for determining a matching relationship between the first object and the second object according to the feature information for the first object and the feature information for the second object,

wherein the object class to which the first target object belongs comprises an object class to which the second object belongs; the predetermined relationship graph is a relationship graph generated by the device according to any one of claims 11-16.

18. The apparatus of claim 17, further comprising:

the description text query module is used for querying a description text library aiming at the object class to which the first object belongs according to the first object to obtain a description text of the first object;

the feature determination module is to: and determining feature information aiming at the first object according to the first target object and the description text of the first object under the condition that each object is the first object.

19. The apparatus of claim 17, wherein:

the feature determination module is to: determining feature information for the second object according to the second target object and the second object in a case where each of the objects is the second object.

20. The apparatus of claim 17, wherein the relationship determination module comprises:

the splicing submodule is used for splicing the characteristic information aiming at the first object and the characteristic information aiming at the second object to obtain splicing characteristic information;

the probability determination submodule is used for determining the probability value of the matching of the first object and the second object according to the splicing characteristic information; and

and the relation determining submodule is used for determining the matching relation between the first object and the second object according to the probability value.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1-10.

23. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 10.