CN112836513A - Linking method, device and equipment of named entities and readable storage medium - Google Patents

Linking method, device and equipment of named entities and readable storage medium Download PDF

Info

Publication number
CN112836513A
CN112836513A CN202110194227.4A CN202110194227A CN112836513A CN 112836513 A CN112836513 A CN 112836513A CN 202110194227 A CN202110194227 A CN 202110194227A CN 112836513 A CN112836513 A CN 112836513A
Authority
CN
China
Prior art keywords
entity
word
vector
linked
named
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110194227.4A
Other languages
Chinese (zh)
Inventor
付红雷
赵士郡
罗征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glodon Co Ltd
Original Assignee
Glodon Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glodon Co Ltd filed Critical Glodon Co Ltd
Priority to CN202110194227.4A priority Critical patent/CN112836513A/en
Publication of CN112836513A publication Critical patent/CN112836513A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The invention relates to the technical field of natural language identification, and discloses a method, a device and equipment for linking named entities and a readable storage medium. Wherein, the method comprises the following steps: acquiring a word set corresponding to a named entity to be linked; determining a first entity vector corresponding to the named entity to be linked based on a word set corresponding to the named entity to be linked; calculating the similarity of the first entity vector and a second entity vector corresponding to each candidate entity; and determining a target link entity corresponding to the named entity to be linked based on the similarity. By implementing the method and the device, the problem that preconditions and semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is further ensured.

Description

Linking method, device and equipment of named entities and readable storage medium
Technical Field
The invention relates to the technical field of natural language identification, in particular to a method, a device and equipment for linking named entities and a readable storage medium.
Background
Named entity recognition typically contains two subfunctions, entity recognition and entity linking. Entity identification mainly identifies possible pieces of information representing entities in natural language, and entity linking is a standard entity mapping the possible pieces of information representing entities to an entity library. Named entity recognition is widely applied to systems such as knowledge graph, man-machine conversation and the like, and how to map possible information fragments representing entities into standard entities in an entity library is important. Currently, entity links typically use attribute similarity of entities in a knowledge graph to find standard entities, or by training entity vectors on labeled data to find standard entities, or by finding standard entities through longest common subsequences or edit distances. However, finding the standard entity by using the attribute similarity of the entities in the knowledge graph requires that the knowledge graph is constructed in advance; training entity vectors on the labeled data to search for standard entities, and carrying out model training by using a large amount of labeled data; finding the standard entity by the longest common subsequence or edit distance, while no preconditions are needed, loses semantic information. Therefore, the existing linking method for the named entities is difficult to simultaneously take preconditions and semantic information into consideration, so that the linking accuracy of the named entities is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a readable storage medium for linking named entities, so as to solve the problem that the link accuracy of the named entities is low due to the precondition and the semantic information being difficult to be considered at the same time.
According to a first aspect, an embodiment of the present invention provides a method for linking named entities, including the following steps: acquiring a word set corresponding to a named entity to be linked; determining a first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked; calculating the similarity of the first entity vector and a second entity vector corresponding to each candidate entity; and determining a target link entity corresponding to the named entity to be linked based on the similarity.
The linking method of the named entities provided by the embodiment of the invention determines the first entity vector corresponding to the named entities to be linked by acquiring the word set corresponding to the named entities to be linked based on the word set corresponding to the named entities to be linked, calculates the similarity between the first entity vector and the second entity vector corresponding to each candidate entity, and determines the target linked entity corresponding to the named entities to be linked based on the calculated similarity. The target link entity corresponding to the named entity to be linked is determined by adopting the similarity of the first entity vector and the second entity vector of the named entity, the entity vector value corresponding to the command entity is calculated to reserve the semantic information of the named entity, and a precondition is not required to be set, so that the problem that the precondition and the semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is further ensured.
With reference to the first aspect, in a first implementation manner of the first aspect, the determining, based on the word set corresponding to the named entity to be linked, that the named entity to be linked corresponds to a first entity vector includes: acquiring a word weight dictionary and a word vector dictionary; traversing the word weight dictionary and the word vector dictionary, and determining the word weight value and the word vector of the word corresponding to the named entity to be linked; and calculating to obtain a first entity vector of the named entity to be linked based on the word weight value and the word vector of each word.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the calculating a first entity vector of the named entities to be linked based on the word weight value of each word and the word vector includes: judging whether the word corresponding to the named entity to be linked is in the word weight dictionary; when the word corresponding to the named entity to be linked is in the word weight dictionary, determining a word weight value of the word corresponding to the named entity to be linked from the word weight dictionary, and determining a word vector of the word corresponding to the named entity to be linked from the word vector dictionary; calculating the product of the word weight value and the word vector corresponding to the named entity to be linked; and adding the products of the weight values corresponding to the words and the word vectors, and taking the addition result as the first entity vector.
With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, the obtaining a word weight dictionary and a word vector dictionary includes: calculating the weight of each word in the word set of the standard entity library corresponding to the named entity to be linked to obtain the word weight dictionary; and calculating the vector of each word in the word set of the standard entity library corresponding to the named entity to be linked, and generating the word vector dictionary.
According to the linking method of the named entities, the first entity vector of the named entities to be linked is calculated in a mode of multiplying, accumulating and summing the word weight values of the words corresponding to the named entities to be linked and the word vectors, so that not only is the semantic information of the named entities to be linked retained, but also the importance degree of the words corresponding to the named entities to be linked is fully considered, and therefore the linking accuracy of the named entities to be linked is improved.
With reference to the first aspect, in a fourth implementation manner of the first aspect, the calculating a similarity between the first entity vector and a second entity vector corresponding to each candidate entity includes: traversing each word in the word set corresponding to the named entity to be linked, and determining at least one standard entity set containing the words corresponding to the named entity to be linked; calculating a union set of at least one standard entity set to obtain a candidate entity set containing words corresponding to the named entities to be linked, and determining a second entity vector of each candidate entity in the candidate entity set; and calculating the similarity of the first entity vector and each second entity vector.
With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the calculating a similarity between the first entity vector and each of the second entity vectors includes: calculating an inner product of the first entity vector and the second entity vector and a product of the moduli of the first entity vector and the second entity vector; determining cosine similarity of the first entity vector and the second entity vector based on a product of the inner product and the modulus.
With reference to the fourth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the calculating a similarity between the first entity vector and each of the second entity vectors includes: and calculating the Euclidean distance between the first entity vector and the second entity vector, and representing the similarity by the Euclidean distance.
According to the linking method of the named entities, the candidate entity set containing the words corresponding to the named entities to be linked is determined by traversing each word in the word set corresponding to the named entities to be linked, the target linked entity corresponding to the named entities to be linked is determined from the candidate entity set by calculating the similarity between the first entity vector and the second entity vector, and the linking speed of the named entities to be linked is improved while the accurate linking of the target linked entity and the named entities to be linked is ensured.
With reference to the first aspect, in a seventh implementation manner of the first aspect, the determining, based on the similarity, a target link entity corresponding to the named entity to be linked includes: judging whether the similarity is greater than a preset threshold value or not; and when the similarity is greater than the preset threshold, determining the target link entity from the candidate entities with the similarity greater than the preset threshold.
According to the linking method for the named entities, provided by the embodiment of the invention, by judging whether the similarity is greater than the preset threshold value or not, when the similarity is greater than the preset threshold value, the candidate entities with the similarity greater than the preset threshold value are taken as target linked entities, the entities to be linked exist in the candidate entity set, and the named entities and the candidate entities are accurately linked.
According to a second aspect, an embodiment of the present invention provides a linking apparatus for naming an entity, including: the acquisition module is used for acquiring a word set corresponding to the named entity to be linked; the first determining module is used for determining a first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked; the calculation module is used for calculating the similarity of the first entity vector and a second entity vector corresponding to each candidate entity; and the second determining module is used for determining the target link entity corresponding to the named entity to be linked based on the similarity.
The linking device for the named entities provided in the embodiments of the present invention determines a first entity vector corresponding to a named entity to be linked based on a word set corresponding to the named entity to be linked by obtaining the word set corresponding to the named entity to be linked, calculates a similarity between the first entity vector and a second entity vector corresponding to each candidate entity, and determines a target linked entity corresponding to the named entity to be linked based on the calculated similarity. The target link entity corresponding to the named entity to be linked is determined by adopting the similarity of the first entity vector and the second entity vector of the named entity, the entity vector value corresponding to the command entity is calculated to reserve the semantic information of the named entity, and a precondition is not required to be set, so that the problem that the precondition and the semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is further ensured.
According to a third aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, and the processor executing the computer instructions to perform the method for linking named entities according to the first aspect or any embodiment of the first aspect.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for linking named entities according to the first aspect or any embodiment of the first aspect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram of a method of linking named entities according to an embodiment of the invention;
FIG. 2 is another flow diagram of a method of linking named entities according to an embodiment of the invention;
FIG. 3 is another flow diagram of a method of linking named entities according to an embodiment of the invention;
FIG. 4 is a block diagram of a linking apparatus for naming an entity according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The existing named entity linking method generally uses the attribute similarity of entities in a knowledge graph to search for standard entities, and the knowledge graph needs to be constructed in advance; or training entity vectors on the marked data to search for standard entities, and carrying out model training by using a large amount of marked data; or finding the standard entity by the longest common subsequence or edit distance, while no preconditions are needed, semantic information is lost. Therefore, the existing linking method for the named entities is difficult to simultaneously take preconditions and semantic information into consideration, so that the linking accuracy of the named entities is low.
Based on the technical scheme, the target link entity corresponding to the named entity to be linked is determined in the candidate entity set by calculating the first entity vector corresponding to the named entity to be linked and calculating the similarity between the first entity vector and the second entity vector corresponding to each candidate entity based on the first entity vector. The semantic information of the named entity is reserved, and the precondition is not required to be set, so that the problem that the precondition and the semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is guaranteed.
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for linking named entities, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
In this embodiment, a method for linking named entities is provided, which can be used in electronic devices, such as a mobile phone, a tablet computer, a computer, and the like, and fig. 1 is a flowchart of a method for linking named entities according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:
and S11, acquiring a word set corresponding to the named entity to be linked.
The named entity to be linked is the named entity which needs to find the corresponding target linked entity in the standard entity library. The word set corresponding to the named entity to be linked is a set formed after each word in the named entity to be linked is de-duplicated. The electronic device can remove each word contained in the named entities to be linked and put the words into a set by identifying each word contained in the named entities to be linked. For example, if the named entity to be linked is "stainless steel plate", its corresponding set of words is "stainless, rusty, steel, plate"; if the named entity to be linked is black and non-ferrous metal, the named entity to be linked is identified to obtain that the words contained in the named entity to be linked are black, color, presence, color, gold and metal, and the words corresponding to the words are black, color, presence, gold and metal after the words are de-duplicated.
S12, determining the first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked.
According to the word set corresponding to the named entity to be linked, the word weight and the word vector corresponding to each word contained in the named entity to be linked are obtained, the product of the word weight and the word vector of each word is calculated, and the product result of the word weight and the word vector of each word is accumulated to obtain the first entity vector corresponding to the named entity to be linked. Specifically, if the word set corresponding to the named entity to be linked is "stainless, rust, steel, plate", and the word weights corresponding to the word sets are P1, P2, P3, and P4, respectively, and the word vectors corresponding to the word sets are q1, q2, q3, and q4, the product of the word weight of each word contained in the entity to be linked and the word vector is P1 q1, P2 q2, P3 q3, and P4 q4, so that the first entity vector P can be calculated as: p1 × q1+ p2 × q2+ p3 × q3+ p4 × q 4. Of course, the first entity vector may also be an average occurrence number of each word in the named entities to be linked in each named entity, as long as the feature information representing the named entities to be linked determines the target linked entity corresponding to the feature information from the standard entity library.
And S13, calculating the similarity between the first entity vector and the second entity vector corresponding to each candidate entity.
The similarity may be a cosine similarity between the first entity vector and the second entity vector, or may be represented by a euclidean distance between the first entity vector and the second entity vector, where the smaller the euclidean distance, the greater the representation similarity. The determination method of the second entity vector is the same as the calculation method of the first entity vector, and the second entity vector corresponding to the candidate entity may be pre-calculated and stored in the electronic device, or may be stored in a cloud space accessible by the electronic device.
And S14, determining a target link entity corresponding to the named entity to be linked based on the similarity.
By calculating the similarity between the first entity vector and the second entity vector, the similarity between the named entity to be linked and each candidate entity corresponding to the named entity to be linked can be determined. The electronic device may preset a similarity threshold, determine a plurality of candidate entities greater than the similarity threshold from the candidate entities, determine a candidate entity with the highest similarity with the named entity to be linked from the candidate entities greater than the similarity threshold, and use the candidate entity with the highest similarity as a target link entity corresponding to the named entity to be linked.
In the linking method for the named entities provided in this embodiment, a word set corresponding to the named entity to be linked is obtained, a first entity vector corresponding to the named entity to be linked is determined based on the word set corresponding to the named entity to be linked, the similarity between the first entity vector and a second entity vector corresponding to each candidate entity is calculated, and a target linked entity corresponding to the named entity to be linked is determined based on the calculated similarity. The target link entity corresponding to the named entity to be linked is determined by adopting the similarity of the first entity vector and the second entity vector of the named entity, the entity vector value corresponding to the command entity is calculated to reserve the semantic information of the named entity, and a precondition is not required to be set, so that the problem that the precondition and the semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is further ensured.
In this embodiment, a method for linking named entities is provided, which can be used in electronic devices, such as a mobile phone, a tablet computer, a computer, and the like, and fig. 2 is a flowchart of a method for linking named entities according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
and S21, acquiring a word set corresponding to the named entity to be linked. For a detailed description, refer to the related description of step S11 corresponding to the above embodiment, and the detailed description is omitted here.
S22, determining the first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked.
Specifically, the step S22 may include the following steps:
s221, acquiring a word weight dictionary and a word vector dictionary.
The word weight dictionary is a query dictionary formed by the word weight values of each word corresponding to each standard named entity contained in the standard entity library. The word vector dictionary is a query dictionary formed by word weight values of each word corresponding to each standard named entity contained in the standard entity library. The word weight dictionary is used for inquiring a word weight value corresponding to each word contained in the named entity to be linked; the word vector dictionary is used for inquiring the word vector corresponding to each word contained in the named entity to be linked. The word weight dictionary and the word vector dictionary adopt inverted indexes from words to standard entities, namely key is a word, and value is a lookup table of a set corresponding to the standard entity containing the word. The searching range of the named entities to be linked is narrowed through a word inverted index mode, and the linking speed of the named entities to be linked can be improved.
Specifically, the step S221 includes the following steps:
(1) and calculating the weight of each word in the word set of the standard entity library corresponding to the named entity to be linked to obtain a word weight dictionary.
If the named entity to be linked can be linked to the standard entity library, each word included in the named entity to be linked can be queried to have a corresponding word weight value in the standard entity library, that is, the named entity to be linked has a word set of the standard entity library corresponding to the named entity to be linked. Calculating a word weight value of each word in a word set of the standard entity library to obtain a word weight dictionary, wherein the calculation mode of the word weight value corresponding to each word is as follows: log (total number of standard named entities/number of standard named entities containing the word).
(2) And calculating the vector of each word in the word set of the standard entity library corresponding to the named entity to be linked to generate a word vector dictionary.
And calculating the vector corresponding to each word to generate a word vector dictionary. Specifically, the electronic device may obtain the published word vector dictionary, such as a word vector dictionary trained by the bert model, directly from the internet. Certainly, the disclosed bert model can also be downloaded first, and then the bert model is retrained by using the professional corpus of the bert model to obtain the word vector dictionary after fine tuning, so as to obtain a more accurate word vector.
S222, traversing the word weight dictionary and the word vector dictionary, and determining the word weight value and the word vector of the word corresponding to the named entity to be linked.
After the electronic device obtains the named entity to be linked, the electronic device may traverse the word weight dictionary and the word vector dictionary corresponding to the standard entity library, and determine a word weight value and a word vector value corresponding to a word corresponding to the named entity to be linked. Specifically, if the word set corresponding to the named entity to be linked is "no, rust, steel, board", the word weight dictionary and the word vector dictionary are traversed to obtain the word weight values p1, p2, p3 and p4 corresponding to "no", "rust", "steel" and "board", and the word vectors q1, q2, q3 and q4 corresponding to the word weight values p1, p2, p3 and p 4.
And S223, calculating to obtain a first entity vector of the named entity to be linked based on the word weight value and the word vector of each word.
Multiplying the word weight value of each word corresponding to the named entity to be linked with the word vector respectively, accumulating the multiplication results of the word weight value of each word and the word vector to obtain a multiplication summation result corresponding to the named entity to be linked, and taking the multiplication summation result as a first entity vector. Specifically, if the word set corresponding to the named entity to be linked is "no, rust, steel, board", the word weight dictionary and the word vector dictionary are traversed to obtain the word weight values p1, p2, p3 and p4 corresponding to "no", "rust", "steel" and "board", and the word vectors q1, q2, q3 and q4 corresponding to the word weight values p1, p2, p3 and p 4. From this, the first entity vector can be calculated as: p1 × q1+ p2 × q2+ p3 × q3+ p4 × q 4.
Specifically, the step S223 includes the following steps:
(1) and judging whether the word corresponding to the named entity to be linked is in the word weight dictionary.
After the electronic equipment acquires the named entity to be linked, each word contained in the named entity to be linked can be identified, the word weight dictionary is traversed, and whether the word weight dictionary contains the word contained in the named entity to be linked is judged. Specifically, if the word set corresponding to the named entity to be linked is "stainless, rusty, steel, and board", the word weight dictionary is traversed to determine whether the word set includes the words "stainless", "rusty", "steel", and "board" corresponding to the named entity to be linked. And (3) when the word corresponding to the named entity to be linked is in the word weight dictionary, executing the step (2), otherwise, setting the word weight value of each word contained in the named entity to be linked as a default value, wherein the default value can be determined by a person skilled in the art according to actual needs, and is not specifically limited herein.
(2) And determining the word weight value of the word corresponding to the named entity to be linked from the word weight dictionary, and determining the word vector of the word corresponding to the named entity to be linked from the word vector dictionary.
When the word corresponding to the named entity to be linked is in the word weight dictionary, the word weight value corresponding to the word can be determined from the word vector dictionary. The word weight dictionary is in one-to-one correspondence with the word vector dictionary so that the word vector corresponding to the word can be determined.
(3) And calculating the product of the word weight value and the word vector corresponding to the named entity to be linked.
The product of the word weight value and the word vector is calculated by the formula: pi Qi. Pi is a word weight value of the ith word in the word set corresponding to the named entity to be linked; qi is a word vector of the ith word in the word set corresponding to the named entity to be linked.
(4) And adding the products of the weight values corresponding to the words and the word vectors, and taking the addition result as a first entity vector.
The calculation formula of the first entity vector is as follows:
Figure BDA0002945655780000101
wherein P is a first entity vector; pi is a word weight value of the ith word in the word set corresponding to the named entity to be linked; qi is a word vector of the ith word in the word set corresponding to the named entity to be linked, and n is the number of words included in the named entity to be linked.
And S23, calculating the similarity between the first entity vector and the second entity vector corresponding to each candidate entity. For a detailed description, refer to the related description of step S13 corresponding to the above embodiment, and the detailed description is omitted here.
And S24, determining a target link entity corresponding to the named entity to be linked based on the similarity. For a detailed description, refer to the related description of step S14 corresponding to the above embodiment, and the detailed description is omitted here.
According to the linking method of the named entities, the first entity vector of the named entities to be linked is calculated in a mode of multiplying, accumulating and summing the word weight values of the words corresponding to the named entities to be linked and the word vectors, so that not only is the semantic information of the named entities to be linked retained, but also the importance degree of the words corresponding to the named entities to be linked is fully considered, and therefore the linking accuracy of the named entities to be linked is improved.
In this embodiment, a method for linking named entities is provided, which can be used in electronic devices, such as a mobile phone, a tablet computer, a computer, and the like, fig. 3 is a flowchart of a method for linking named entities according to an embodiment of the present invention, as shown in fig. 3, the flowchart includes the following steps:
and S31, acquiring a word set corresponding to the named entity to be linked. For a detailed description, refer to the related description of step S21 corresponding to the above embodiment, and the detailed description is omitted here.
S32, determining the first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked. For a detailed description, refer to the related description of step S22 corresponding to the above embodiment, and the detailed description is omitted here.
And S33, calculating the similarity between the first entity vector and the second entity vector corresponding to each candidate entity.
Specifically, the step S33 may include the following steps:
s331, traversing each word in the word set corresponding to the named entity to be linked, and determining at least one standard entity set containing the words corresponding to the named entity to be linked.
Traversing each word in the word set of the named entities to be linked, and determining one or more standard entities containing the word in the standard entity library for each word, so as to obtain at least one standard named entity set corresponding to the named entities to be linked. Specifically, if the word set corresponding to the named entity to be linked is "stainless, rusty, steel, plate", one or more standard entities including "stainless", "rusty", "steel", and "plate" are respectively determined in the standard entity library, and then at least one standard named entity set corresponding to the named entity to be linked "stainless steel plate" is obtained.
S332, calculating a union set of at least one standard entity set to obtain a candidate entity set containing words corresponding to the named entities to be linked, and determining a second entity vector of each candidate entity in the candidate entity set.
And taking a union set of all standard entity sets of all words contained in the named entity to be linked, and determining a candidate standard entity set corresponding to the named entity to be linked. The candidate standard entity set includes a plurality of candidate entities, and the second entity vector corresponding to each candidate entity is calculated in the same manner as the first entity vector, which is not described herein again. Of course, the second entity vector may also be pre-stored in a space directly accessible by the electronic device or other electronic devices, and is not limited in this respect.
S333, calculating the similarity between the first entity vector and each second entity vector.
The similarity is the similarity between the first entity vector and the second entity vector. And the electronic equipment respectively calculates the similarity of the first entity vector and each second entity vector to obtain the similarity between the named entity to be linked and each candidate entity.
Specifically, the step S331 may include the following steps:
(1) an inner product of the first entity vector and the second entity vector and a product of the moduli of the first entity vector and the second entity vector are calculated.
After obtaining the first entity vector and the second entity vector, the electronic device may calculate an inner product of the first entity vector and the second entity vector and a product of moduli of the first entity vector and the second entity vector. Specifically, if the first entity vector is a and the second entity vector is B, the inner product of the first entity vector and the second entity vector is a, and the product of the moduli of the first entity vector and the second entity vector is | | a | | | B |.
(2) Based on the product of the inner product and the modulus, the cosine similarity of the first entity vector and the second entity vector is determined.
The cosine similarity is the cosine similarity, and the similarity between the named entity to be linked and the candidate entity is evaluated by calculating the cosine value of the included angle between the first entity vector and the second entity vector. The calculation formula of the cosine similarity S based on the product of the inner product of the first entity vector and the second entity vector and the modulus is as follows: s ═ S ((a × B)/| | a | | | | B | | + 1)/2.
Specifically, the step S331 may include: and calculating the Euclidean distance between the first entity vector and the second entity vector, and representing the similarity by using the Euclidean distance.
The first entity vector and the second entity vector are mapped as two points in Euclidean space. The euclidean distance is the distance between two points corresponding to the first entity vector and the second entity vector, i.e. the distance that the point corresponding to the first entity vector needs to travel to the point corresponding to the second entity vector. And respectively calculating Euclidean distances between the named entity to be linked and each candidate entity, wherein the smaller the Euclidean distance is, the greater the similarity of the named entity to be linked and each candidate entity is represented. Specifically, the first entity vector and the second entity vector are mapped in euclidean space as two points, p and q, respectively, and both p and q are m coordinates. The Euclidean distance is the value subtraction on each coordinate, the square sum is calculated, and then the square root is output.
And S34, determining a target link entity corresponding to the named entity to be linked based on the similarity.
Specifically, the step S34 may include the following steps:
and S341, judging whether the similarity is greater than a preset threshold value.
The preset threshold is a similarity threshold preset in the electronic equipment, and the similarity threshold is the minimum similarity for determining that the named entity to be linked and the candidate entity are the same entity. And comparing the similarity between the first entity vector and the second entity vector obtained by calculation with a preset threshold value, determining the magnitude relation between the similarity and the preset threshold value, executing the step S342 when the similarity is greater than the preset threshold value, otherwise, indicating that the named entity to be linked does not exist in the standard entity library.
And S342, determining a target link entity from the candidate entities with the similarity greater than a preset threshold.
And when the similarity is greater than a preset threshold, taking the candidate entity with the similarity greater than the preset threshold as the most possible candidate link entity corresponding to the named entity to be linked. If the most probable candidate link entities comprise a plurality of link entities, the similarity corresponding to the most probable candidate link entities is ranked, the most probable candidate link entity with the maximum similarity is determined, and the most probable candidate link entity with the maximum similarity is used as a target link entity of the named entity to be linked.
In the linking method for the named entities provided in this embodiment, a candidate entity set including words corresponding to the named entities to be linked is determined by traversing each word in a word set corresponding to the named entities to be linked, and a target linked entity corresponding to the named entities to be linked is determined from the candidate entity set by calculating the similarity between the first entity vector and the second entity vector, so that the link speed of the named entities to be linked is improved while the target linked entity and the named entities to be linked are accurately linked. By judging whether the similarity is greater than a preset threshold value or not, and when the similarity is greater than the preset threshold value, the candidate entity with the similarity greater than the preset threshold value is taken as a target link entity to represent that the entity to be linked exists in the candidate entity set, so that accurate link between the named entity and the candidate entity is realized.
In this embodiment, a linking apparatus for naming an entity is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
The present embodiment provides a linking device for naming an entity, as shown in fig. 4, including:
the obtaining module 41 is configured to obtain a word set corresponding to the named entity to be linked. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
The first determining module 42 is configured to determine, based on the word set corresponding to the named entity to be linked, that the named entity to be linked corresponds to the first entity vector. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
A calculating module 43, configured to calculate similarity between the first entity vector and the second entity vector corresponding to each candidate entity. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
And a second determining module 44, configured to determine, based on the similarity, a target link entity corresponding to the named entity to be linked. For a detailed description, reference is made to the corresponding related description of the above method embodiments, which is not repeated herein.
The linking device for the named entities provided in the embodiments of the present invention determines a first entity vector corresponding to a named entity to be linked based on a word set corresponding to the named entity to be linked by obtaining the word set corresponding to the named entity to be linked, calculates a similarity between the first entity vector and a second entity vector corresponding to each candidate entity, and determines a target linked entity corresponding to the named entity to be linked based on the calculated similarity. The target link entity corresponding to the named entity to be linked is determined by adopting the similarity of the first entity vector and the second entity vector of the named entity, the entity vector value corresponding to the command entity is calculated to reserve the semantic information of the named entity, and a precondition is not required to be set, so that the problem that the precondition and the semantic information are difficult to simultaneously consider is solved, and the link accuracy of the named entity is further ensured.
The linking means of the named entities in this embodiment are presented in the form of functional units, where a unit refers to an ASIC circuit, a processor and memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, which has the linking apparatus for the named entity shown in fig. 4.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 5, the electronic device may include: at least one processor 501, such as a CPU (Central Processing Unit), at least one communication interface 503, memory 504, and at least one communication bus 502. Wherein a communication bus 502 is used to enable connective communication between these components. The communication interface 503 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 503 may also include a standard wired interface and a standard wireless interface. The Memory 504 may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 504 may optionally be at least one storage device located remotely from the processor 501. Wherein the processor 501 may be in connection with the apparatus described in fig. 4, an application program is stored in the memory 504, and the processor 501 calls the program code stored in the memory 504 for performing any of the above-mentioned method steps.
The communication bus 502 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 502 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 504 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 504 may also comprise a combination of memories of the kind described above.
The processor 501 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.
The processor 501 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 504 is also used to store program instructions. Processor 501 may invoke program instructions to implement the linking method for named entities as shown in the embodiments of fig. 1-3 of the present application.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the processing method of the link method of the named entity in any method embodiment. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (11)

1. A method for linking named entities, comprising the steps of:
acquiring a word set corresponding to a named entity to be linked;
determining a first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked;
calculating the similarity of the first entity vector and a second entity vector corresponding to each candidate entity;
and determining a target link entity corresponding to the named entity to be linked based on the similarity.
2. The method according to claim 1, wherein the determining that the named entity to be linked corresponds to the first entity vector based on the word set to which the named entity to be linked corresponds comprises:
acquiring a word weight dictionary and a word vector dictionary;
traversing the word weight dictionary and the word vector dictionary, and determining the word weight value and the word vector of the word corresponding to the named entity to be linked;
and calculating to obtain a first entity vector of the named entity to be linked based on the word weight value and the word vector of each word.
3. The method of claim 2, wherein calculating the first entity vector of the named entities to be linked based on the word weight value of each word and the word vector comprises:
judging whether the word corresponding to the named entity to be linked is in the word weight dictionary;
when the word corresponding to the named entity to be linked is in the word weight dictionary, determining a word weight value of the word corresponding to the named entity to be linked from the word weight dictionary, and determining a word vector of the word corresponding to the named entity to be linked from the word vector dictionary;
calculating the product of the word weight value and the word vector corresponding to the named entity to be linked;
and adding the products of the weight values corresponding to the words and the word vectors, and taking the addition result as the first entity vector.
4. The method of claim 2, wherein obtaining the word weight dictionary and the word vector dictionary comprises:
calculating the weight of each word in the word set of the standard entity library corresponding to the named entity to be linked to obtain the word weight dictionary;
and calculating the vector of each word in the word set of the standard entity library corresponding to the named entity to be linked, and generating the word vector dictionary.
5. The method of claim 1, wherein the calculating the similarity between the first entity vector and the second entity vector corresponding to each candidate entity comprises:
traversing each word in the word set corresponding to the named entity to be linked, and determining at least one standard entity set containing the words corresponding to the named entity to be linked;
calculating a union set of at least one standard entity set to obtain a candidate entity set containing words corresponding to the named entities to be linked, and determining a second entity vector of each candidate entity in the candidate entity set;
and calculating the similarity of the first entity vector and each second entity vector.
6. The method of claim 5, wherein the calculating the similarity between the first entity vector and each of the second entity vectors comprises:
calculating an inner product of the first entity vector and the second entity vector and a product of the moduli of the first entity vector and the second entity vector;
determining cosine similarity of the first entity vector and the second entity vector based on a product of the inner product and the modulus.
7. The method of claim 5, wherein the calculating the similarity between the first entity vector and each of the second entity vectors comprises:
and calculating the Euclidean distance between the first entity vector and the second entity vector, and representing the similarity by the Euclidean distance.
8. The method according to claim 1, wherein the determining the target link entity corresponding to the named entity to be linked based on the similarity comprises:
judging whether the similarity is greater than a preset threshold value or not;
and when the similarity is greater than the preset threshold, determining the target link entity from the candidate entities with the similarity greater than the preset threshold.
9. A linking apparatus for naming an entity, comprising:
the acquisition module is used for acquiring a word set corresponding to the named entity to be linked;
the first determining module is used for determining a first entity vector corresponding to the named entity to be linked based on the word set corresponding to the named entity to be linked;
the calculation module is used for calculating the similarity of the first entity vector and a second entity vector corresponding to each candidate entity;
and the second determining module is used for determining the target link entity corresponding to the named entity to be linked based on the similarity.
10. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor performing the method of linking named entities according to any of claims 1-8 by executing the computer instructions.
11. A computer-readable storage medium storing computer instructions for causing a computer to perform the method for linking named entities according to any one of claims 1 to 8.
CN202110194227.4A 2021-02-20 2021-02-20 Linking method, device and equipment of named entities and readable storage medium Pending CN112836513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110194227.4A CN112836513A (en) 2021-02-20 2021-02-20 Linking method, device and equipment of named entities and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110194227.4A CN112836513A (en) 2021-02-20 2021-02-20 Linking method, device and equipment of named entities and readable storage medium

Publications (1)

Publication Number Publication Date
CN112836513A true CN112836513A (en) 2021-05-25

Family

ID=75934040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110194227.4A Pending CN112836513A (en) 2021-02-20 2021-02-20 Linking method, device and equipment of named entities and readable storage medium

Country Status (1)

Country Link
CN (1) CN112836513A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115358241A (en) * 2022-10-20 2022-11-18 科大讯飞股份有限公司 Human-computer interaction-based labeling method, and related device, equipment and medium
WO2023048359A1 (en) * 2021-09-24 2023-03-30 삼성전자 주식회사 Speech recognition device and operation method therefor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052262A1 (en) * 2006-08-22 2008-02-28 Serhiy Kosinov Method for personalized named entity recognition
US20130346421A1 (en) * 2012-06-22 2013-12-26 Microsoft Corporation Targeted disambiguation of named entities
US20140172754A1 (en) * 2012-12-14 2014-06-19 International Business Machines Corporation Semi-supervised data integration model for named entity classification
US20190130282A1 (en) * 2017-10-31 2019-05-02 Microsoft Technology Licensing, Llc Distant Supervision for Entity Linking with Filtering of Noise
CN109918669A (en) * 2019-03-08 2019-06-21 腾讯科技(深圳)有限公司 Entity determines method, apparatus and storage medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111597788A (en) * 2020-05-18 2020-08-28 腾讯科技(深圳)有限公司 Attribute fusion method, device and equipment based on entity alignment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052262A1 (en) * 2006-08-22 2008-02-28 Serhiy Kosinov Method for personalized named entity recognition
US20130346421A1 (en) * 2012-06-22 2013-12-26 Microsoft Corporation Targeted disambiguation of named entities
US20140172754A1 (en) * 2012-12-14 2014-06-19 International Business Machines Corporation Semi-supervised data integration model for named entity classification
US20190130282A1 (en) * 2017-10-31 2019-05-02 Microsoft Technology Licensing, Llc Distant Supervision for Entity Linking with Filtering of Noise
CN109918669A (en) * 2019-03-08 2019-06-21 腾讯科技(深圳)有限公司 Entity determines method, apparatus and storage medium
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111597788A (en) * 2020-05-18 2020-08-28 腾讯科技(深圳)有限公司 Attribute fusion method, device and equipment based on entity alignment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
漆桂林;高桓;吴天星;: "知识图谱研究进展", 情报工程, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023048359A1 (en) * 2021-09-24 2023-03-30 삼성전자 주식회사 Speech recognition device and operation method therefor
CN115358241A (en) * 2022-10-20 2022-11-18 科大讯飞股份有限公司 Human-computer interaction-based labeling method, and related device, equipment and medium

Similar Documents

Publication Publication Date Title
CN108170650B (en) Text comparison method and text comparison device
CN112836513A (en) Linking method, device and equipment of named entities and readable storage medium
CN110941951B (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
US20200364216A1 (en) Method, apparatus and storage medium for updating model parameter
CN108710662B (en) Language conversion method and device, storage medium, data query system and method
CN114116973A (en) Multi-document text duplicate checking method, electronic equipment and storage medium
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN103530345A (en) Short text characteristic extension and fitting characteristic library building method and device
CN112163409A (en) Similar document detection method, system, terminal device and computer readable storage medium
CN111831685A (en) Query statement processing method, model training method, device and equipment
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN114647739A (en) Entity chain finger method, device, electronic equipment and storage medium
CN113989569A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112988993A (en) Question answering method and computing device
CN110795537A (en) Method, device, equipment and medium for determining improvement strategy of target commodity
CN115455968A (en) Named entity identification method, device, equipment and readable storage medium
CN113157538B (en) Spark operation parameter determination method, device, equipment and storage medium
CN112528646B (en) Word vector generation method, terminal device and computer-readable storage medium
CN113821533B (en) Method, device, equipment and storage medium for data query
CN113779201B (en) Method and device for identifying instruction and voice interaction screen
CN110083679B (en) Search request processing method and device, electronic equipment and storage medium
CN109740671B (en) Image identification method and device
CN117312877A (en) Shape near word discriminating method, device, electronic equipment and readable storage medium
CN114996439A (en) Text search method and device
CN116383491A (en) Information recommendation method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination