CN113535976A

CN113535976A - Path vectorization representation method and device, computing equipment and storage medium

Info

Publication number: CN113535976A
Application number: CN202110775745.5A
Authority: CN
Inventors: 李钊; 赵凯; 邓晓雨; 刘岩; 宋慧驹
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-22

Abstract

The embodiment of the application provides a vectorization representation method, a vectorization representation device, a computing device and a storage medium of a path, wherein the method comprises the following steps: determining a first input code of a field dimension of a path to be embedded and a second input code of a character dimension of the path to be embedded; the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all the characters in the path to be embedded; and inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded. According to the method, the characteristics of two dimensions of the field and the character are fused through the embedded model of the double-flow architecture, semantic extraction is carried out on the basis of the two dimensions, and the obtained target vector can accurately represent the semantics of the path to be embedded.

Description

Path vectorization representation method and device, computing equipment and storage medium

Technical Field

The embodiment of the application relates to the field of knowledge graphs, in particular to a vectorization representation method and device of a path, computing equipment and a storage medium.

Background

A knowledge graph is a knowledge base based on a directed graph structure, and can describe various entities and their relationships existing in the real world. Knowledge graph embedding techniques (i.e., vectorized characterization of knowledge) map entities and relationships in knowledge to vector space for application to various tasks, such as: knowledge maps may be used to support the question-answering engine.

In the related art, semantic information of an entity and a relation dimension in a path (knowledge) input by a user is extracted to obtain a vector corresponding to the path, and the vector is used for representing the input path (performing knowledge graph embedding).

However, the above scheme only extracts semantic information of the dimension of the entity and the relationship in the path, and the obtained vector is difficult to accurately represent the semantics of the corresponding path.

Disclosure of Invention

The embodiment of the application provides a path vectorization representation method, a path vectorization representation device, a calculation device and a storage medium, which are used for accurately vectorizing a path.

In a first aspect, an embodiment of the present application provides a vectorization characterization method for a path, where the method includes:

determining a first input code of a field dimension of a path to be embedded and a second input code of a character dimension of the path to be embedded; wherein the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all characters in the path to be embedded;

and inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded.

In the technical scheme, a first input code of a field dimension and a second input code of a character dimension of a path to be embedded are determined; and then, by fusing the characteristics of two dimensions of a field and a character through an embedded model of a double-flow architecture, semantic extraction is carried out based on the two dimensions, and the obtained target vector can accurately represent the semantics of the path to be embedded.

Optionally, the embedded model comprises an M-layer atlas embedded attention encoder; inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded, wherein the method comprises the following steps:

embedding an attention encoder into an N-th layer map, and performing feature fusion on the vector of the (N-1) -th field dimension and the vector of the (N-1) -th character dimension to obtain a vector of the N-th field dimension and a vector of the N-th character dimension; if N is 1, the vector of the (N-1) th field dimension is the first input code and the vector of the (N-1) th character dimension is the second input code;

if N is a positive integer between 1 and (M-1), outputting the vector of the Nth field dimension and the vector of the Nth character dimension to an (N +1) th layer map-embedded attention encoder; and if N is equal to M, selecting a sub-vector of a target position from the vectors of the Nth field dimension as the target vector.

In the technical scheme, by arranging the multilayer map embedded attention encoder, each layer of map embedded attention encoder can perform feature fusion between the vector of the field dimension and the vector of the character dimension, and semantic extraction is performed fully based on the two dimensions, so that the sub-vector of the target position of the vector of the field dimension output by the last layer of map embedded attention encoder can represent the semantics of the path to be embedded more accurately.

Optionally, performing feature fusion on the vector of the (N-1) th field dimension and the vector of the (N-1) th character dimension to obtain a vector of the nth field dimension and a vector of the nth character dimension, including:

performing attention layer operation on the vector of the (N-1) th character dimension to obtain an intermediate feature of the Nth character dimension; performing full-connection layer operation on the intermediate features of the Nth character dimension to obtain a vector of the Nth character dimension; and

adjusting the vector of the (N-1) th field dimension based on the dimension reduction vector obtained by performing dimension reduction processing on the vector of the (N-1) th character dimension to obtain an Nth adjustment vector; performing attention layer operation on the Nth adjustment vector to obtain an intermediate feature of the Nth field dimension; and performing full-connection layer operation on the intermediate features of the nth field dimension to obtain a vector of the nth field dimension.

In the technical scheme, the attention encoder is embedded into each layer of atlas to obtain the vector of the character dimension output to the next layer by performing attention layer and full-connection layer operation on the vector of the character dimension received by the layer; the dimensionality reduction vector matched with the vector dimensionality of the field dimensionality is obtained by carrying out dimensionality reduction processing on the vector of the character dimensionality received by the layer, and the vector of the field dimensionality received by the layer is adjusted based on the dimensionality reduction vector, so that attention layer and full-connection layer operation is carried out on the adjustment vector for realizing feature fusion, and the vector of the field dimensionality after the feature fusion is obtained.

Optionally, the input code of any field in the path to be embedded is obtained by:

determining a field code corresponding to the field according to a preset corresponding relation between the field and the code;

and obtaining the input code of the field according to the field code of the field and the code representing the position of the field in the path to be embedded.

In the technical scheme, the field codes of the fields represent the semantics of the corresponding fields, and the position codes of the fields represent the positions of the corresponding fields in the path to be embedded, so that the input codes of the fields are obtained in the above manner, and the semantics of the fields and the positions in the path to be embedded are represented, so that the characteristics of the fields can be better learned subsequently.

Optionally, the input code of any character in the path to be embedded is obtained by:

and obtaining the input code of the character according to the code of the character in the natural language and the code representing the position of the character in the path to be embedded.

In the technical scheme, because the encoding of each character in the natural language represents the semantic meaning of the corresponding character, and the encoding of the position of each character represents the position of the corresponding character in the path to be embedded, the input encoding of the character is obtained in the above mode, and the semantic meaning and the position of the character in the path to be embedded are simultaneously represented, so that the character can be better learned subsequently.

Optionally, the path to be embedded is a single-hop path or a multi-hop path.

In the technical scheme, the path to be embedded can be a single-hop path or a multi-hop path, so that paths in different forms can be embedded more flexibly.

Optionally, after obtaining the target vector output by the embedding model and representing the to-be-embedded path semantic, the method further includes:

and after the similarity between the target vector and the semantic vector of each path in the target map is determined to be lower than a similarity threshold value, adding the path to be embedded into the target map.

According to the technical scheme, the embedding process of the path to be embedded is completed after the target vector is obtained, and the target vector represents the semantics of the path to be embedded, so that when the path is required to be added to the target map, the similarity between the target vector corresponding to the path and the semantic vectors of the paths existing in the target map is determined to be lower than the similarity threshold, namely the semantics of the path to be embedded is determined to be different from the semantics of the paths existing in the target map, the path is added to the target map, and the path with similar semantics is prevented from being repeatedly added to the target map.

In a second aspect, an embodiment of the present application further provides a device for vectorizing and characterizing a path, including:

the encoding determination module is used for determining a first input encoding of a field dimension of a path to be embedded and a second input encoding of a character dimension of the path to be embedded; wherein the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all characters in the path to be embedded;

and the vector determination module is used for inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded.

In a third aspect, an embodiment of the present application provides a computing device, including at least one processor and at least one memory, where the memory stores a computer program, and when the program is executed by the processor, the processor is caused to execute the vectorization characterization method for a path according to any one of the above first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computing device, where the program, when executed on the computing device, causes the computing device to execute the vectorization characterization method for a path according to any one of the above first aspects.

In addition, for technical effects brought by any one implementation manner in the second to fourth aspects, reference may be made to technical effects brought by different implementation manners in the first aspect, and details are not described here.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a first method for vectorization characterization of a path according to an embodiment of the present application;

FIG. 2 is a schematic input/output diagram of an embedding model according to an embodiment of the present disclosure;

FIG. 3 is a block diagram of an embedded model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a second method for vectorizing a representation of a path according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an embodiment of an atlas embedded attention encoder;

FIG. 6 is a schematic structural diagram of a vectorization characterization device for a path according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the embodiment of the present application, the term "and/or" describes an association relationship of associated objects, and means that there may be three relationships, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

In the description of the present application, it is to be noted that, unless otherwise explicitly stated or limited, the term "connected" is to be understood broadly, and may for example be directly connected, indirectly connected through an intermediate medium, or be a communication between two devices. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.

The term "path" in the present embodiment refers to knowledge;

the term "vectorized token" in this embodiment is the embedding of the corresponding path/knowledge;

the term "path to be embedded" in this embodiment means a path/knowledge that needs vectorization characterization.

Knowledge graph embedding techniques (i.e., vectorized characterization of knowledge) map entities and relationships in knowledge to vector space for application to various tasks, such as: knowledge maps may be used to support the question-answering engine.

In some embodiments, semantic information of an entity and a relation dimension in a path (knowledge) input by a user is extracted to obtain a vector corresponding to the path, and the vector is used for representing the path (performing knowledge graph embedding). If a user inputs a single-hop path (triple) "sun-star attribute-star", semantic information of "sun", "star attribute" and "star" is extracted to obtain a vector corresponding to the path.

In view of this, an embodiment of the present application provides a vectorization characterization method, an apparatus, a computing device, and a storage medium for a path, where the method includes: determining a first input code of a field dimension of a path to be embedded and a second input code of a character dimension of the path to be embedded; wherein the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all characters in the path to be embedded; and inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded. By fusing the characteristics of two dimensions of a field and a character through an embedded model of a double-flow architecture and performing semantic extraction based on the two dimensions, the obtained target vector can accurately represent the semantics of a path to be embedded.

The following describes the technical solutions of the present application and how to solve the above technical problems in detail with reference to the accompanying drawings and specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

The embodiment of the present application provides a first path vectorization characterization method, as shown in fig. 1, including the following steps:

step S101: determining a first input encoding of a field dimension of a path to be embedded and a second input encoding of a character dimension of the path to be embedded.

Wherein the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all the characters in the path to be embedded.

In this embodiment, a specific implementation form of the path to be embedded is not limited, and the following examples are given:

1) the path to be embedded may be a single-hop path, that is, the path to be embedded is implemented in the form of a triple, and is composed of an entity field at the head, a relationship field in the middle, and an entity field at the tail. For example, the path is "sun (entity field) -star attribute (relationship field) -star (entity field)".

2) The path to be embedded may also be a multi-hop path, that is, the path to be embedded is implemented in the form of five tuples, seven tuples, nine tuples, etc., and is composed of a head entity field, a plurality of middle relationship fields, a middle entity field connected by the relationship fields, and a tail entity field. Taking quintuple as an example, the path structure is 'entity 1-relation 1-entity 2-relation 2-entity 3'; taking the seven-tuple as an example, the path structure is "entity 1-relationship 1-entity 2-relationship 2-entity 3-relationship 3-entity 4", which is not illustrated one by one here.

The first input code is determined according to the input codes of all fields in the path to be embedded, that is, the path to be embedded is split in field dimension (the path to be embedded is split into a plurality of fields), and the first input code is determined according to the input codes of all fields, such as: and arranging the input codes of all the fields according to the positions in the path to be embedded to obtain a first input code.

The second input code is determined according to the input codes of all the characters in the path to be embedded, that is, the path to be embedded is split in a character dimension (the path to be embedded is split into a plurality of characters), and the second input code is determined according to the input codes of all the characters, such as: and arranging the input codes of the characters according to the positions in the path to be embedded to obtain a second input code.

Step S102: and inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded.

Referring to fig. 2, the embedding model adopted in this embodiment is a model of a dual-stream architecture, a first input code is used as a first-stream input of the embedding model, a second input code is used as another first-stream input of the embedding model, and a target vector representing semantics of a path to be embedded is output by fusing features of two dimensions of a field and a character through the embedding model.

The embodiment does not limit the specific implementation of the embedding model, and for example, a transform model with a dual-flow architecture may be used.

In this embodiment, the manner of obtaining the field and the character of the path to be embedded is not specifically limited, and the implementation form of the path to be embedded is a triple ("entity 1-relationship 1-entity 2"), where entity 1 includes character 11 and character 12, relationship 1 includes character 21, character 22, character 23, and character 24, and entity 2 includes character 31 and character 32:

the triplets are converted into a string in the form of "entity 1[ seg1] relationship 1[ seg2] entity 2", denoted P, by separating "entity-relationship" with a first separator [ seg1] and "relationship-entity" with a second separator [ seg2 ];

sequentially extracting fields to obtain an entity 1, a relation 1 and an entity 2 by positioning the position of a first separator and the position of a second separator in the P, inserting a [ CLS ] token at the head position, and inserting a [ SEP ] token at the tail to obtain a set E which comprises all fields of a path to be embedded and is in ordered arrangement;

the method comprises the steps of segmenting P according to the character sequence by positioning the position of a first separator and the position of a second separator in the P, sequentially extracting characters to obtain characters 11, 12, 21, 22, 23, 24, 31 and 32, inserting [ CLS ] token at the head position and inserting [ SEP ] token at the tail to obtain a set S which comprises all characters of a path to be embedded and is in ordered arrangement.

In some alternative embodiments, the input code of any field to be embedded in the path may be obtained by, but is not limited to, the following methods:

The embodiment does not limit the specific implementation manner of the coding characterizing the position of the field in the path to be embedded, for example, the coding has the same dimension as the field coding and includes the order of the corresponding fields in the path to be embedded. In implementation, the input code of the corresponding field may be obtained according to the two codes according to an actual application scenario. The following are exemplary:

extracting fields of all paths in the map in advance, and establishing a corresponding relation between the fields and codes; based on the corresponding relation, determining field codes corresponding to the fields in the path to be embedded, adding the field codes corresponding to the fields and the codes of the positions of the field codes to obtain input codes of the fields, arranging the input codes of the fields in the path to be embedded according to the positions in the path to be embedded to obtain a first input code E₀。

In the embodiment, the field codes of the fields represent the semantics of the corresponding fields, and the codes of the positions of the fields represent the positions of the corresponding fields in the path to be embedded, so that the input codes of the fields are obtained in the above manner, and the semantics of the fields and the positions of the fields in the path to be embedded are simultaneously represented, so that the characteristics of the fields can be better learned subsequently.

In some alternative embodiments, the input code of any character in the path to be embedded may be obtained by, but is not limited to:

The embodiment does not limit the specific implementation manner of the code representing the position of the character in the path to be embedded, for example, the code has the same dimension as the character code and includes the order of the corresponding character in the path to be embedded. In implementation, the input code of the corresponding character can be obtained according to the two codes according to the actual application scene. The following are exemplary:

acquiring the corresponding relation between characters and codes in the natural language; based on the corresponding relation, determining the character codes corresponding to the characters in the path to be embedded, adding the character codes corresponding to the characters and the codes of the positions of the character codes to obtain the input codes of the characters, arranging the input codes of the characters in the path to be embedded according to the positions in the path to be embedded to obtain a second input code S₀。

In the embodiment, because the encoding of each character in the natural language represents the semantic meaning of the corresponding character, and the encoding of the position of each character represents the position of the corresponding character in the path to be embedded, the input encoding of the character is obtained by the method, and the semantic meaning and the position of the character in the path to be embedded are simultaneously represented, so that the character can be better learned subsequently.

In some embodiments, the embedded model comprises a multi-layer map embedded attention encoder, and the embedded model shown in fig. 3 comprises an M-layer map embedded attention encoder. Based on the embedded model, the embodiment of the present application provides a second vectorization characterization method for a path, as shown in fig. 4, including the following steps:

step S401: determining a first input encoding of a field dimension of a path to be embedded and a second input encoding of a character dimension of the path to be embedded.

The specific implementation manner of step S401 may refer to the above embodiments, and is not described herein again.

Step S402: inputting the first input code and the second input code into an embedded model of a dual-stream architecture, embedding an attention encoder through an Nth-layer map, and performing feature fusion on the vector of the (N-1) th field dimension and the vector of the (N-1) th character dimension to obtain the vector of the Nth field dimension and the vector of the Nth character dimension.

If N is 1, the vector in the (N-1) th field dimension is the first input code and the vector in the (N-1) th character dimension is the second input code.

That is, the first input code is input as a field dimension in the first-level map-embedded attention encoder, and the second input code is input as a character dimension in the first-level map-embedded attention encoder; the first-layer map embedding attention encoder performs feature fusion on a first input code and a second input code to obtain a vector of a 1 st field dimension and a vector of a 1 st character dimension;

the second-layer map embedding attention encoder performs feature fusion on the vector of the 1 st field dimension and the vector of the 1 st character dimension to obtain a vector of the 2 nd field dimension and a vector of the 2 nd character dimension;

the remaining map-embedded attention encoders are similar to the above processing and will not be described again.

In some alternative embodiments, the N-th layer atlas embedded attention encoder may perform feature fusion by, but not limited to:

Any layer of map embedded attention encoder is a double-flow structure, and the specific structure can be seen in fig. 5, wherein the nth layer of map embedded attention encoder comprises a second attention layer and a second full-connection layer, wherein the second attention layer and the second full-connection layer correspond to character dimensions; and a fused fully-connected layer, a first attention layer and a first fully-connected layer corresponding to the field dimensions.

The following is an exemplary description of the feature fusion process of the nth layer map embedded attention encoder based on the architecture shown in fig. 5:

character dimension

1) Vector S for (N-1) th character dimension through second attention layer_N-1Obtaining an intermediate feature S 'of an Nth character dimension through attention layer calculation'_N. Concretely, S'_N＝S_N-1+Norm[SelfAtt(S_N-1)](ii) a Wherein SelfAtt is attention calculation, and Norm is normalization calculation.

2) Through the second full connecting layer pair S'_NPerforming full-connection layer operation to obtain the vector S of the Nth character dimension_N. Specifically, S_N＝S′_N+Norm[FC(S′_N)](ii) a Wherein, FC is full connection operation (multilayer sensing operation), and Norm is normalization operation.

Dimension of field

1) By fusing full connection layers, based on pairs S_N-1The dimension reduction vector obtained by dimension reduction processing is used for the vector E of the (N-1) th field dimension_N-1Adjusting to obtain the Nth adjustment vector e_N. In particular, e_N＝E_N-1+Vec(S_N-1)*W_N+b_N(ii) a Wherein Vec (S)_N-1) Is the above-mentioned dimension-reduced vector, W_NEmbedding weight matrix corresponding to attention encoder into Nth layer atlas, b_NAnd embedding a bias matrix corresponding to the attention encoder for the nth layer atlas.

Because each field in the path to be embedded corresponds to one or more characters, the vector of the field dimension is not matched with the vector dimension of the character dimension, and the feature fusion cannot be directly carried out. Based on this, in this embodiment, the vector of the character dimension is subjected to the dimension reduction processing, so that the vector of the field dimension is matched with the vector dimension of the character dimension, thereby implementing the feature fusion of the vector of the field dimension and the vector of the character dimension.

2) Through the first pair of attention layers e_NPerforming attention layer operation to obtain an intermediate feature E 'of the Nth field dimension'_N. Concretely, E'_N＝e_N+Norm[SelfAtt(e_N)]。

3) Through the first full link layer pair E'_NPerforming full-connection layer operation to obtain a vector E of the Nth field dimension_N. In particular, E_N＝E′_N+Norm[FC(E′_N)]。

The above process of feature fusion for the nth layer atlas embedded attention encoder is only an exemplary illustration, and the present application is not limited thereto.

Step S403: if N is a positive integer between 1 and (M-1), outputting the vector of the Nth field dimension and the vector of the Nth character dimension to an (N +1) th layer map-embedded attention encoder; and if N is equal to M, selecting a sub-vector of a target position from the vectors of the Nth field dimension as the target vector.

As described above, the map-embedded attention encoder except the first-layer map-embedded attention encoder performs feature fusion on the field-dimension vector and the character-dimension vector obtained by the previous-layer map-embedded attention encoder, and based on this, the map-embedded attention encoder except the last-layer map-embedded attention encoder needs to output the field-dimension vector and the character-dimension vector obtained by feature fusion to the next-layer map-embedded attention encoder.

In addition, the last layer of map is embedded into the attention encoder, a field dimension vector is obtained through feature fusion, based on the field dimension vector, a sub-vector of a target position is selected from the field dimension vector output by the last layer of map embedded attention encoder, and the sub-vector can accurately represent the semantics of the path to be embedded.

Illustratively, the first input encoding is input as a field dimension in a first-level map-embedded attention encoder, and the second input encoding is input as a character dimension in the first-level map-embedded attention encoder; the first-layer map embedded attention encoder performs feature fusion on the first input code and the second input code to obtain a 1 st field dimension vector and a 1 st character dimension vector, and outputs the 1 st field dimension vector and the 1 st character dimension vector to the second-layer map embedded attention encoder;

the second-layer map embedded attention encoder performs feature fusion on the vector of the 1 st field dimension and the vector of the 1 st character dimension to obtain a vector of the 2 nd field dimension and a vector of the 2 nd character dimension, and outputs the vector of the 2 nd field dimension and the vector of the 2 nd character dimension to the third-layer map embedded attention encoder;

and the other map embedding attention encoders are similar to the processing mode until the M-th layer map embedding attention encoder performs feature fusion on the vector of the (M-1) -th field dimension and the vector of the (M-1) -th character dimension to obtain the vector of the M-th field dimension. The vector of the M field dimension is an M x n matrix, and a sub-vector of the target position (for example, a sub-vector corresponding to [ CLS ] token, which is a 1 x n matrix) is selected as the target vector.

It is understood that the above-mentioned mth layer map embedded attention encoder may be the same as the other layer map embedded attention encoder architecture; the fusion full-link layer, the first attention layer, and the first full-link layer corresponding to the field dimension may be set only, and the second attention layer and the second full-link layer corresponding to the character dimension are not set.

In the technical scheme, by arranging the multilayer map embedded attention encoders, feature fusion is carried out between the vectors of the field dimensions and the vectors of the character dimensions by each layer of map embedded attention encoder, and semantic extraction is fully carried out based on the two dimensions, so that the sub-vectors of the target positions of the vectors of the field dimensions output by the last layer of map embedded attention encoder can represent the semantics of the path to be embedded more accurately.

After the target vector is obtained, the embedding process of the path to be embedded is completed, and in some optional embodiments, the target vector may be returned to a front-end Interface or an Application Programming Interface (API).

In addition, after the path embedding is completed, the path embedding-based method can be applied to various tasks, and three specific embodiments will be described below.

Example 1, target profile expansion:

Illustratively, the target map is a map requiring knowledge (path) expansion, after embedding the path to be embedded, if the path to be embedded needs to be added into the target map, the similarity between the semantic vector of the existing path in the target map and the target vector is firstly compared, and whether the path is repeated with the existing path is judged. For example, the target map has diabetes-symptom-emaciation, the diabetes-symptom-weight loss needs to be introduced, and the similarity of the semantic vectors exceeds a similarity threshold; to avoid repeated addition of semantically similar pathways to the target profile, the "diabetes-symptom-weight loss" was no longer introduced into the target profile.

Embodiment 2, the true and false judgment of the path is carried out based on the map knowledge:

determining a semantic vector of a path to be determined through the embedded model, inputting the semantic vector of the path to be determined into a full connection layer and a softmax layer, performing 0/1 classification, and directly determining the authenticity of the path to be determined; or the semantic vector of the path to be judged is input into the scoring function to obtain the confidence coefficient of the path to be judged, and the truth of the path to be judged can be determined based on the confidence coefficient of the path to be judged and a confidence coefficient threshold value, for example, the confidence coefficient of the path to be judged exceeds the confidence coefficient threshold value, and the path to be judged is determined to be true; and determining that the path to be judged is false when the confidence coefficient of the path to be judged is lower than the confidence coefficient threshold value.

Example 3 knowledge inference based on atlas knowledge:

and for the path to be compensated, deleting part of the entity field or the relation field, and adding the path to be compensated into the map knowledge to obtain the deleted entity field or the relation field.

Based on the same inventive concept, the embodiment of the present application provides a vectorization characterization device for a path, and referring to fig. 6, the vectorization characterization device 600 for a path includes:

the encoding determining module 601 is configured to determine a first input encoding of a field dimension of a path to be embedded and a second input encoding of a character dimension of the path to be embedded; wherein the first input code is determined according to the input codes of all fields in the path to be embedded, and the fields in the path to be embedded comprise entity fields and/or relationship fields; the second input code is determined according to the input codes of all characters in the path to be embedded;

a vector determining module 602, configured to input the first input code and the second input code into an embedded model of a dual-stream architecture, so as to obtain a target vector, which is output by the embedded model and represents the semantics of the path to be embedded.

Optionally, the embedded model comprises an M-layer atlas embedded attention encoder; the vector determination module 602 is specifically configured to:

if N is a positive integer between 1 and (M-1), the vector determination module 602 is further configured to:

outputting the vector of the nth field dimension and the vector of the nth character dimension to an (N +1) th layer map-embedded attention encoder;

if N is equal to M, the vector determination module 602 is further configured to:

selecting a sub-vector of a target location from the vectors of the nth field dimension as the target vector.

Optionally, the vector determining module 602 is specifically configured to:

Optionally, the code determining module 601 is configured to obtain an input code of any field in the path to be embedded by:

Optionally, the code determining module 601 is configured to obtain an input code of any character in the path to be embedded by:

Optionally, the path to be embedded is a single-hop path or a multi-hop path.

Optionally, the vector determining module 602, after obtaining the target vector output by the embedding model and representing the to-be-embedded path semantic, is further configured to:

Since the apparatus is the apparatus in the method in the embodiment of the present application, and the principle of the apparatus for solving the problem is similar to that of the method, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

Based on the same technical concept, the embodiment of the present application further provides a computing device 700, as shown in fig. 7, including at least one processor 701 and a memory 702 connected to the at least one processor, where a specific connection medium between the processor 701 and the memory 702 is not limited in this embodiment, and the processor 701 and the memory 702 are connected through a bus 703 in fig. 7 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

The processor 701 is a control center of the computing device, and may be connected to various parts of the computing device by using various interfaces and lines, and implement data processing by executing or executing instructions stored in the memory 702 and calling data stored in the memory 702. Optionally, the processor 701 may include one or more processing units, and the processor 701 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes an issued instruction. It will be appreciated that the modem processor described above may not be integrated into the processor 701. In some embodiments, processor 701 and memory 702 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 701 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the disclosed method in connection with the embodiment of vectorized characterization method of path may be directly embodied as a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

Memory 702, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 702 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 702 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 702 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

In the embodiment of the present application, the memory 702 stores a computer program, which, when executed by the processor 701, causes the processor 701 to perform:

Optionally, the embedded model comprises an M-layer atlas embedded attention encoder; the processor 701 specifically executes:

if N is a positive integer between 1 and (M-1), the processor 701 specifically performs:

if N is equal to M, the processor 701 specifically executes:

Optionally, the processor 701 specifically executes:

Optionally, the processor 701 further performs:

Optionally, the path to be embedded is a single-hop path or a multi-hop path.

Optionally, after obtaining the target vector output by the embedding model and representing the to-be-embedded path semantic, the processor 701 further performs:

Since the computing device is the computing device in the method in the embodiment of the present application, and the principle of the computing device to solve the problem is similar to that of the method, reference may be made to implementation of the method for implementation of the computing device, and repeated details are not described here.

Based on the same technical concept, the embodiment of the present application further provides a computer-readable storage medium storing a computer program executable by a computing device, and when the program runs on the computing device, the computer program causes the computing device to execute the steps of the vectorization characterization method for the path.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for vectorized representation of a path, the method comprising:

2. The method of claim 1, wherein the embedded model comprises an M-layer atlas embedded attention encoder; inputting the first input code and the second input code into an embedded model of a double-flow architecture to obtain a target vector which is output by the embedded model and represents the semantics of the path to be embedded, wherein the method comprises the following steps:

3. The method of claim 2, wherein feature fusing the (N-1) th field dimension vector and the (N-1) th character dimension vector to obtain an nth field dimension vector and an nth character dimension vector comprises:

4. The method of claim 1, wherein the input encoding of any field in the path to be embedded is obtained by:

5. The method of claim 1, wherein the input encoding of any character in the path to be embedded is obtained by:

6. The method of claim 1, wherein the path to be embedded is a single-hop path or a multi-hop path.

7. The method according to any one of claims 1 to 6, further comprising, after obtaining the target vector representing the path semantics to be embedded output by the embedding model, the following steps:

8. An apparatus for vectorized characterization of a path, comprising:

9. A computing device comprising at least one processor and at least one memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program executable by a computing device, the program, when run on the computing device, causing the computing device to perform the method of any of claims 1 to 7.