CN111159423A

CN111159423A - Entity association method, device and computer readable storage medium

Info

Publication number: CN111159423A
Application number: CN201911378790.6A
Authority: CN
Inventors: 袁婧; 牟小峰
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-15
Anticipated expiration: 2039-12-27
Also published as: CN111159423B

Abstract

An entity association method, an entity association device and a computer-readable storage medium comprise the following steps: acquiring a feature vector of a text to be processed; constructing entity pairs according to the relation between the entities in the text to be processed; acquiring the maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity pair, and taking the maximum similarity as the first similarity between the entity pair and the text to be processed; acquiring the maximum similarity between the feature vector of the relation path and the feature vector of the text to be processed in the knowledge graph for each entity in the text to be processed, and taking the maximum similarity as the second similarity between the entity and the text to be processed; acquiring association scores of the entities and the knowledge graph according to a first similarity corresponding to the entity pairs containing the entities in the text to be processed and a second similarity corresponding to the entities; and when the association score exceeds a preset threshold value, associating the entity to the knowledge graph node corresponding to the second similarity. Because the relationship among multiple entities is utilized, the association success rate of the entities is improved.

Description

Entity association method, device and computer readable storage medium

Technical Field

The present disclosure relates to knowledge graph technology, and more particularly, to a method and apparatus for entity association and a computer readable storage medium.

Background

Entity association refers to associating entities mentioned in the text to corresponding nodes in the knowledge graph.

In the related technology, the common method of entity association is to calculate similarity of context semantic vectors of entities in a text and vectors composed of attributes and relations of nodes in a knowledge graph, sort similarity scores, associate the similarity scores to the nodes of the knowledge graph if the similarity scores exceed a threshold value, and otherwise, do not associate.

However, this method can only associate the context description information with the entity with higher association degree of the attribute and the relationship information of the node in the knowledge graph, but cannot associate the context description information with the entity with lower association degree of the attribute and the relationship information of the node in the knowledge graph, so that the association success rate of the entity in the text is low.

Disclosure of Invention

The application provides an entity association method, an entity association device and a computer readable storage medium, which can associate context description information with entities with low correlation degree of attributes and relationship information of nodes in a knowledge graph, thereby improving the association success rate of the entities.

The application provides an entity association method, which comprises the following steps:

acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be related to nodes in the knowledge graph spectrum;

constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed;

acquiring the maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity pair, and taking the maximum similarity as the first similarity between the entity pair and the text to be processed; wherein, when there is no non-generalization relationship path between the entity pair, the first similarity is 0;

acquiring the maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed in the knowledge graph aiming at each entity in the text to be processed, and taking the maximum similarity as the second similarity between the entity and the text to be processed;

acquiring an association score of an entity and the knowledge graph according to a first similarity corresponding to the entity pair containing the entity in the text to be processed and a second similarity corresponding to the entity;

and when the association score exceeds a preset threshold value, associating the entity to a node in the knowledge graph corresponding to the second similarity.

The obtaining of the feature vector of the text to be processed includes:

performing word segmentation on the text to be processed, and performing word-out and duplicate-removal processing on the obtained word segmentation;

sequencing the processed participles according to word frequency, and counting the number x of the processed participles;

calculating word segmentation vectors according to the obtained word segmentation quantity x, the preset lowest feature word number a, the preset maximum feature word number b and the average length t of the text to be processed to obtain a number y;

and acquiring word vectors of words with word frequency arrangement positioned at the front y positions, and performing addition operation on the word vectors to obtain the feature vector of the text to be processed.

The above-mentioned

The constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed comprises the following steps:

acquiring an entity pair with the probability of the same sample file of the plurality of sample files exceeding a preset probability in the text to be processed as a first entity pair;

acquiring an entity pair which is closest to each sentence of the text to be processed and is composed of two entities except the first entity pair as a second entity pair;

acquiring an entity pair which is closest to the text to be processed and is composed of two entities except the first entity pair and the second entity pair as a third entity pair;

and combining the first entity pair, the second entity pair and the third entity pair to obtain a plurality of entity pairs constructed according to the relation among the entities in the text to be processed.

The obtaining, for each entity pair, the maximum similarity between the feature vector of the non-generalization relationship path and the feature vector of the text to be processed in the knowledge graph as the first similarity between the entity pair and the text to be processed includes:

each entity pair is acquired in turn, and each time an entity pair is acquired, the following operations are performed on the acquired entity pairs:

acquiring the feature vectors of all the non-generalization relation paths of the obtained entity pair;

calculating the similarity between the feature vector of each non-generalization relation path and the feature vector of the text to be processed;

and taking the maximum similarity as the first similarity of the obtained entity pair and the text to be processed.

The obtaining of the feature vectors of all non-generalization relationship paths of the obtained entity pair includes:

judging whether a path relation exists between nodes corresponding to the obtained entity pairs in the knowledge graph;

when the obtained entity has a path relation between corresponding nodes, obtaining the existing path relation, and judging whether the obtained path relation is a generalization path relation;

acquiring each non-generalization path relation, and performing the following operations on the acquired non-generalization path every time one non-generalization path relation is acquired:

acquiring the node name, the node attribute and the word vector of the entity relationship of the obtained non-generalized path relationship;

and performing addition operation on the obtained word vectors to obtain the feature vectors of the non-generalization path relation.

The judging whether a path relation exists between nodes corresponding to the obtained entity pairs in the knowledge graph comprises the following steps:

whether 1-degree, 2-degree and 3-degree relations exist between nodes corresponding to the obtained entity pairs or not is sequentially judged in the knowledge graph;

and when any one of the three relations between the nodes corresponding to the obtained entity pair is determined, stopping the subsequent judgment process, and determining that the path relation exists between the nodes corresponding to the obtained entity pair.

The acquiring, in the knowledge graph, the maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed for each entity in the text to be processed as the second similarity between the entity and the text to be processed includes:

acquiring each entity in the text to be processed, and performing the following operations on the acquired entities each time one entity is acquired:

acquiring nodes with the same names as the acquired entities from the knowledge graph, and acquiring a feature vector of each node;

calculating the similarity between the feature vector of each node and the feature vector of the text to be processed;

and taking the maximum similarity as the second similarity of the obtained entity and the text to be processed.

The obtaining of the feature vector of each node includes:

and performing the following operations on each obtained node:

acquiring the names and attributes of the obtained nodes and word vectors of all entity relations;

and performing addition operation on the obtained vectors to obtain the feature vectors of the obtained nodes.

The obtaining of the association score between the entity and the knowledge graph according to the first similarity corresponding to the entity pair containing the entity in the text to be processed and the second similarity corresponding to the entity comprises:

when the first similarity is 0, taking the second similarity as an association score of the entity and the knowledge graph;

and when the first similarity is not 0, acquiring the association score of the entity and the knowledge graph according to the first weight, the second weight, the first similarity and the second similarity.

The obtaining of the association score between the entity and the knowledge graph according to the first weight, the second weight, the first similarity and the second similarity comprises:

calculating a value (the first weight + the first similarity + the second weight + the second similarity), and using the obtained value as an association score of the entity with the knowledge graph; wherein the first weight is greater than the second weight, and a sum of the first weight and the second weight is 1.

The present application further provides an entity association apparatus, including:

the first acquisition module is used for acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in a knowledge graph;

the construction module is used for constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed;

a second obtaining module, configured to obtain, for each entity pair in a knowledge graph, a maximum similarity between a feature vector of a non-generalization relationship path and a feature vector of the text to be processed, as a first similarity between the entity pair and the text to be processed; wherein, when there is no non-generalized relationship path between the entity pair, the first similarity is 0;

a third obtaining module, configured to obtain, for each entity in the to-be-processed text, a maximum similarity between a feature vector of a relationship path and a feature vector of the to-be-processed text in the knowledge graph, where the maximum similarity is used as a second similarity between the entity and the to-be-processed text;

a fourth obtaining module, configured to obtain an association score between an entity and the knowledge graph according to a first similarity corresponding to an entity pair including the entity in the text to be processed and a second similarity corresponding to the entity;

and the processing module is used for associating the entity to the node in the knowledge graph corresponding to the second similarity when the association score exceeds a preset threshold value.

The present application further provides an entity association apparatus, including: a processor and a memory, wherein the memory has written therein the following commands executable by the processor:

The present application further provides a computer-readable storage medium, wherein the storage medium has stored thereon computer-executable instructions for performing the following steps:

Compared with the related art, the method comprises the following steps: acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in a knowledge graph; constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed; acquiring the maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity pair, and taking the maximum similarity as the first similarity between the entity pair and the text to be processed; when no non-generalization relation path exists between the entity pair, the first similarity is 0; acquiring the maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed in the knowledge graph aiming at each entity in the text to be processed, and taking the maximum similarity as the second similarity between the entity and the text to be processed; acquiring an association score of an entity and the knowledge graph according to a first similarity corresponding to the entity pair containing the entity in the text to be processed and a second similarity corresponding to the entity; and when the association score exceeds a preset threshold value, associating the entity to the node in the knowledge graph corresponding to the second similarity. Because the relation among multiple entities existing in the text to be processed is utilized, the association between the context description information and the entities with lower relevance of the attribute and relation information of the nodes in the knowledge graph spectrum is realized, and the association success rate of the entities is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not limit the disclosure.

Fig. 1 is a schematic flowchart of an entity association method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a knowledge-graph in the related art;

fig. 3 is a schematic flowchart of another entity association method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an entity association apparatus according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

Assume that the text to be processed is as follows: the influence of the continuous performance of Asian traditional culture elements on European and American big-brand designers of the Name Tie pajamas wind is well known, and in autumn and winter big shows of the maple leaves 2013, the quality of the air is successfully integrated into fashion design and the hot hold of east and west ladies is obtained, so that the 'pajamas wind' is rolled into the world in an inaudible proud position. On the nightwear show of the maple leaves 2013 in autumn and winter, Zhang III is the guest sitting on the night at that time, and the nightwear series in the season is looked at by the eyes and shows a great preference at that time. Finally, on a curtain ceremony of a maple leaf boutique in a certain city, Zhang III looks bright again in the shape of a lady, and wears a maple leaf pajama series of gold silk and satin suspender skirts and a luqi blue-gray printed long coat. The plum has four bodies and wears maple leaves as a bathrobe, and the unique coiled hair and the exquisite dressing increase the noble feeling of the night clothes.

And (3) performing entity identification on the text to be processed, and identifying a person name: zhang three, Li four. This is the entity to be associated. As can be seen from the text, the context semantic vector of the entity is related to maple leaves, pajamas. In the knowledge graph, there is a relationship: (Zhang three, speakers and maple leaves) is similar to the context in the text, and a high score is obtained by obtaining semantic vectors for calculation, so that the three-dimensional semantic vector can be associated with the node Zhang three in the knowledge graph. However, for the entity of the above text, namely lie four, the context semantic vector thereof is also related to maple leaves and pajamas, but the entity of lie four in the knowledge graph is mostly film and television works and obtained awards and the like in terms of attribute and relationship information, is not related to maple leaves, pajamas and the like, obtains the semantic vector of any attribute and relationship, calculates similarity with the text vector, and has low score and can not be related.

An embodiment of the present application provides an entity association method, as shown in fig. 1, including:

step 101, acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in the knowledge graph spectrum.

In an exemplary instance, a knowledge graph is composed of nodes (entities) and edges (entity relationships), the entities having descriptive information such as names, attributes, etc., and the entity relationships also having names and attributes, and having directions. A schematic representation of a knowledge graph can be seen in fig. 2.

In one illustrative example, obtaining a feature vector of a text to be processed includes:

firstly, performing word segmentation on a text to be processed, and performing word-stop-removing and duplicate-removing processing on the obtained word segmentation.

Secondly, the processed participles are sorted according to word frequency, and the number x of the processed participles is counted.

And then, calculating the word segmentation vector acquisition number y according to the obtained word segmentation number x, the preset minimum feature word number a, the preset maximum feature word number b and the average length t of the text to be processed.

And finally, word vectors of words with the word frequency arrangement positioned at the front y positions are obtained, and the word vectors are subjected to addition operation to obtain the feature vectors of the text to be processed.

In one illustrative example, Fasttext can be selected for training Chinese pre-training word vectors based on encyclopedia data, with the word vector dimension being 300 dimensions.

In one illustrative example of the present invention,

in one illustrative example, the feature vector of the text to be processed may be represented as textVec,

wherein, V_iA word vector representing the ith word.

And 102, constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed.

In one illustrative example, entity recognition may be performed on the text to be processed using a CRF entity recognition model.

In one illustrative example, constructing pairs of entities based on relationships between entities in text to be processed includes:

firstly, an entity pair with the probability of the same sample file of a plurality of sample files exceeding the preset probability in the text to be processed is obtained and used as a first entity pair.

And secondly, acquiring an entity pair which is closest to each sentence of the text to be processed and is composed of two entities except the first entity pair as a second entity pair.

And then, acquiring an entity pair which is closest to the text to be processed and consists of two entities except the first entity pair and the second entity pair as a third entity pair.

And finally, combining the first entity pair, the second entity pair and the third entity pair to obtain a plurality of entity pairs constructed according to the relation among the entities in the text to be processed.

103, acquiring the maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity pair, and taking the maximum similarity as the first similarity between the entity pair and the text to be processed; when there is no non-generalization relationship path between the entity pair, the first similarity is 0.

In an exemplary instance, acquiring, for each entity pair, a maximum similarity between a feature vector of a non-generalization relation path and a feature vector of a text to be processed in a knowledge graph as a first similarity between the entity pair and the text to be processed includes:

firstly, acquiring feature vectors of all non-generalization relation paths of the obtained entity pair.

And secondly, calculating the similarity between the feature vector of each non-generalization relation path and the feature vector of the text to be processed.

And finally, taking the maximum similarity as the first similarity of the obtained entity pair and the text to be processed.

In one illustrative example, obtaining feature vectors for all non-generalized relationship paths of an obtained entity pair comprises:

first, it is determined in the knowledge-graph whether a path relationship exists between nodes corresponding to the obtained entity pairs.

And secondly, when the obtained entity has a path relation to the corresponding node, obtaining the existing path relation, and judging whether the obtained path relation is a generalization path relation.

And finally, acquiring each non-generalization path relation, and performing the following operations on the acquired non-generalization path every time one non-generalization path relation is acquired:

acquiring the node name, the node attribute and the word vector of the entity relationship of the obtained non-generalized path relationship; and performing addition operation on the obtained word vectors to obtain the feature vectors of the non-generalization path relation.

In an illustrative example, the feature vector of the non-generalized path relationship may be represented as the vector similarity of relVec, textVec, and each relVec:

| x | | represents the norm of the vector x, and sim takes the maximum value and is denoted as pathSim (i.e., the first similarity in the above embodiment).

In one illustrative example, determining whether a path relationship exists between nodes corresponding to the obtained entity pair in the knowledge-graph comprises:

firstly, whether 1-degree, 2-degree and 3-degree relations exist between nodes corresponding to the obtained entity pairs or not is sequentially judged in the knowledge graph.

And secondly, when any one of the three relations between the nodes corresponding to the obtained entity pair is determined, stopping the subsequent judgment process, and determining that a path relation exists between the nodes corresponding to the obtained entity pair.

In an exemplary embodiment, if a 1-degree relationship exists between the obtained entity pair and the corresponding node, the subsequent determination process is stopped, and a path relationship exists between the obtained entity pair and the corresponding node is determined; if the obtained entity pair does not have a 1-degree relation but has a 2-degree relation, stopping the subsequent judgment process and determining that a path relation exists between the obtained entity pair and the corresponding node; if the obtained entity pair does not have a 1-degree and 2-degree relation but has a 3-degree relation, stopping the subsequent judgment process and determining that a path relation exists between the obtained entity pair and the corresponding node; and if the obtained entity pair does not have the relation of 1 degree, 2 degrees and 3 degrees between the corresponding nodes, determining that the path relation does not exist between the corresponding nodes of the obtained entity pair.

And 104, acquiring the maximum similarity between the feature vector of the relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity in the text to be processed, and taking the maximum similarity as the second similarity between the entity and the text to be processed.

In an exemplary instance, acquiring, in the knowledge graph, a maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed for each entity in the text to be processed as a second similarity between the entity and the text to be processed includes:

first, the nodes with the same name as the obtained entity are obtained in the knowledge graph, and the feature vector of each node is obtained.

And secondly, calculating the similarity between the feature vector of each node and the feature vector of the text to be processed.

And finally, taking the maximum similarity as the second similarity of the obtained entity and the text to be processed.

In one illustrative example, obtaining a feature vector for each node comprises:

and performing the following operations on each obtained node:

firstly, the obtained names, attributes and word vectors of all entity relations of the nodes are obtained.

And secondly, adding the obtained vectors to obtain the feature vectors of the obtained nodes.

In an illustrative example, the feature vector of a node may be represented as relVec, and the maximum similarity of textVec and relVec may be represented as relSim.

And 105, acquiring the association score of the entity and the knowledge graph according to the first similarity corresponding to the entity pair containing the entity in the text to be processed and the second similarity corresponding to the entity.

In an exemplary instance, obtaining an association score between an entity and a knowledge graph according to a first similarity corresponding to an entity pair containing the entity in the text to be processed and a second similarity corresponding to the entity comprises:

first, when the first similarity is 0, the second similarity is used as the association score of the entity and the knowledge graph.

And secondly, when the first similarity is not 0, acquiring the association score of the entity and the knowledge graph according to the first weight, the second weight, the first similarity and the second similarity.

In one illustrative example, obtaining an association score of an entity with a knowledge-graph based on a first weight, a second weight, a first similarity, and a second similarity comprises:

calculating a value (first weight and first similarity + second weight and second similarity), and taking the obtained value as an association score of the entity and the knowledge graph; wherein the first weight is greater than the second weight, and the sum of the first weight and the second weight is 1.

And 106, when the association score exceeds a preset threshold value, associating the entity to a node in the known graph corresponding to the second similarity.

According to the entity association method provided by the embodiment of the application, because the relation among multiple entities existing in the text to be processed is utilized, the association between the context description information and the entities with low correlation degree of the attributes and the relation information of the nodes in the knowledge graph spectrum is realized, and the association success rate of the entities is improved.

An embodiment of the present application further provides an entity association method, as shown in fig. 2, including:

step 201, obtaining a text to be processed.

Step 202, identify entities in the text to be processed.

Step 203, searching the knowledge-graph for the entity.

And 204, when an entity is searched, extracting characteristics, segmenting words, calculating word frequency, sequencing according to the word frequency, determining the number of the segmented words, acquiring word vectors, and adding to form a text characteristic vector textVec.

And step 205, if a plurality of entities exist in the text, forming entity pairs by the entities, sequentially searching whether the relation exists within 1 degree, 2 degrees and 3 degrees in the knowledge graph, and searching for jumping out.

Step 206, if there is only one entity in the text, go to step 211.

And step 207, if the relationship exists, taking out the node name and the attribute of each relationship of the entity pair, segmenting words, and adding the obtained word vectors to form relVec.

Step 208, if no relation exists, jump to step 211.

Step 209, calculating the similarity of two vectors, namely textVec and relVec, for each relation.

And step 210, recording the maximum similarity as pathSim.

And step 211, corresponding nodes, relations and relation connection nodes of all entities in the knowledge graph, and forming a relVec by the attributes.

And step 212, calculating the similarity of two vectors, namely textVec and relVec, of each relation, and taking the maximum similarity relSim.

Step 213, setting weights for pathSim and relSim, calculating a score, determining whether a correlation threshold is reached, and if so, performing correlation.

An embodiment of the present application further provides an entity association apparatus, as shown in fig. 4, where the entity association apparatus 3 includes:

a first obtaining module 31, configured to obtain a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in the knowledge graph spectrum.

And the constructing module 32 is used for constructing a plurality of entity pairs according to the relationship between the entities in the text to be processed.

A second obtaining module 33, configured to obtain, for each entity pair, a maximum similarity between a feature vector of the non-generalization relationship path and a feature vector of the text to be processed in the knowledge graph, as a first similarity between the entity pair and the text to be processed; wherein, when there is no non-generalization relationship path between the entity pair, the first similarity is 0.

The third obtaining module 34 is configured to obtain, in the knowledge graph, a maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed as a second similarity between the entity and the text to be processed, for each entity in the text to be processed.

The fourth obtaining module 35 is configured to obtain an association score between an entity and a knowledge graph according to the first similarity corresponding to the entity pair including the entity in the text to be processed and the second similarity corresponding to the entity.

And the processing module 36 is configured to associate the entity with the node in the knowledge graph corresponding to the second similarity when the association score exceeds a preset threshold.

In an exemplary embodiment, the first obtaining module 31 is specifically configured to:

and performing word segmentation on the text to be processed, and performing word-stop-removing and duplicate-removing processing on the obtained word segmentation.

And sequencing the processed participles according to the word frequency, and counting the number x of the processed participles.

And calculating the word segmentation vector acquisition number y according to the obtained word segmentation number x, the preset minimum feature word number a, the preset maximum feature word number b and the average length t of the text to be processed.

And acquiring word vectors of words with the word frequency arrangement positioned at the front y position, and performing addition operation on the word vectors to obtain the feature vector of the text to be processed.

In one illustrative example of the present invention,

in one illustrative example, the construction module 32 is specifically configured to:

and acquiring an entity pair with the probability of the same sample file of the plurality of sample files exceeding the preset probability in the text to be processed as a first entity pair.

And acquiring an entity pair which is closest to each sentence of the text to be processed and is composed of two entities except the first entity pair as a second entity pair.

And acquiring an entity pair which is closest to the text to be processed and consists of two entities except the first entity pair and the second entity pair as a third entity pair.

And combining the first entity pair, the second entity pair and the third entity pair to obtain a plurality of entity pairs constructed according to the relation between the entities in the text to be processed.

In an exemplary embodiment, the second obtaining module 33 is specifically configured to:

and acquiring the feature vectors of all the non-generalization relation paths of the obtained entity pairs.

And calculating the similarity between the feature vector of each non-generalization relation path and the feature vector of the text to be processed.

and judging whether a path relation exists between nodes corresponding to the obtained entity pairs in the knowledge graph.

And when the obtained entity has a path relation between the corresponding nodes, obtaining the existing path relation, and judging whether the obtained path relation is a generalization path relation.

and acquiring the obtained node name, node attribute and word vector of the entity relationship of the non-generalized path relationship.

and sequentially judging whether the relation of 1 degree, 2 degrees and 3 degrees exists between the nodes corresponding to the obtained entity pairs in the knowledge graph.

In an exemplary embodiment, the third obtaining module 34 is specifically configured to:

and acquiring nodes with the same names as the acquired entities in the knowledge graph, and acquiring a feature vector of each node.

And calculating the similarity between the feature vector of each node and the feature vector of the text to be processed.

In an exemplary example, the third obtaining module 34 is further specifically configured to:

and performing the following operations on each obtained node:

and acquiring the obtained names, attributes and word vectors of all entity relations of the nodes.

In an exemplary embodiment, the fourth obtaining module 35 is specifically configured to:

and when the first similarity is 0, taking the second similarity as the association score of the entity and the knowledge graph.

In an exemplary example, the fourth obtaining module 35 is further configured to calculate a value (the first weight + the first similarity + the second weight + the second similarity), and use the obtained value as an association score between the entity and the knowledge graph; wherein the first weight is greater than the second weight, and the sum of the first weight and the second weight is 1.

According to the entity association device provided by the embodiment of the application, because the relation among multiple entities existing in the text to be processed is utilized, the association between the context description information and the entities with low correlation degree of the attributes and the relation information of the nodes in the knowledge graph spectrum is realized, and the association success rate of the entities is improved.

In practical applications, the first obtaining module 31, the constructing module 32, the second obtaining module 33, the third obtaining module 34, the fourth obtaining module 35, and the processing module 36 are all implemented by a Central Processing Unit (CPU), a microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like, which are located in the entity association apparatus 3.

An embodiment of the present application further provides an entity association apparatus, including: a processor and a memory, wherein the memory has stored therein a computer program which, when executed by the processor, implements the processing of any of the methods described above.

Embodiments of the present application further provide a storage medium having stored thereon computer-executable instructions for performing the processes of any of the above-described methods.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. An entity association method, comprising:

acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in a knowledge graph spectrum;

acquiring the maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity pair, and taking the maximum similarity as the first similarity between the entity pair and the text to be processed; when no non-generalization relation path exists between the entity pair, the first similarity is 0;

acquiring the maximum similarity between the feature vector of the relation path and the feature vector of the text to be processed in the knowledge graph aiming at each entity in the text to be processed, and taking the maximum similarity as the second similarity between the entity and the text to be processed;

2. The method according to claim 1, wherein the obtaining the feature vector of the text to be processed comprises:

3. The method of claim 2, wherein the step of generating the second signal comprises generating a second signal based on the first signal and the second signal

4. The method of claim 1, wherein constructing a number of entity pairs according to relationships between entities in the text to be processed comprises:

5. The method according to claim 1, wherein the obtaining, for each entity pair, a maximum similarity between the feature vector of the non-generalization relation path and the feature vector of the text to be processed in the knowledge graph as a first similarity between the entity pair and the text to be processed comprises:

6. The method of claim 5, wherein obtaining the feature vectors of all non-generalization relationship paths of the obtained entity pair comprises:

7. The method of claim 6, wherein determining whether a path relationship exists between nodes corresponding to the obtained entity pairs in the knowledge-graph comprises:

8. The method according to claim 1, wherein the obtaining, in the knowledge graph, the maximum similarity between the feature vector of the relationship path and the feature vector of the text to be processed for each entity in the text to be processed as the second similarity between the entity and the text to be processed comprises:

9. The method of claim 8, wherein obtaining the feature vector of each node comprises:

and performing the following operations on each obtained node:

10. The method of claim 1, wherein obtaining the association score between the entity and the knowledge graph according to a first similarity corresponding to the entity pair containing the entity in the text to be processed and a second similarity corresponding to the entity comprises:

11. The method of claim 10, wherein obtaining the association score of the entity with the knowledge-graph according to the first weight, the second weight, the first similarity and the second similarity comprises:

12. An entity association apparatus, comprising:

the first acquisition module is used for acquiring a feature vector of a text to be processed; the text to be processed comprises a plurality of entities to be associated to nodes in a knowledge graph spectrum;

a second obtaining module, configured to obtain, for each entity pair in a knowledge graph, a maximum similarity between a feature vector of a non-generalization relationship path and a feature vector of the text to be processed, as a first similarity between the entity pair and the text to be processed; when no non-generalization relation path exists between the entity pair, the first similarity is 0;

a third obtaining module, configured to obtain, in the knowledge graph, a maximum similarity between a feature vector of a relationship path and a feature vector of the text to be processed for each entity in the text to be processed, where the maximum similarity is used as a second similarity between the entity and the text to be processed;

13. An entity association apparatus, comprising: a processor and a memory, wherein the memory has written therein the following commands executable by the processor:

14. A computer-readable storage medium having computer-executable instructions stored thereon for performing the steps of: