CN111079437B - Entity identification method, electronic equipment and storage medium - Google Patents

Entity identification method, electronic equipment and storage medium Download PDF

Info

Publication number
CN111079437B
CN111079437B CN201911324636.0A CN201911324636A CN111079437B CN 111079437 B CN111079437 B CN 111079437B CN 201911324636 A CN201911324636 A CN 201911324636A CN 111079437 B CN111079437 B CN 111079437B
Authority
CN
China
Prior art keywords
entity
character
vector
determining
suspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911324636.0A
Other languages
Chinese (zh)
Other versions
CN111079437A (en
Inventor
王正魁
付霞
张毕涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cloudminds Robotics Co Ltd
Original Assignee
Cloudminds Shanghai Robotics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cloudminds Shanghai Robotics Co Ltd filed Critical Cloudminds Shanghai Robotics Co Ltd
Priority to CN201911324636.0A priority Critical patent/CN111079437B/en
Publication of CN111079437A publication Critical patent/CN111079437A/en
Application granted granted Critical
Publication of CN111079437B publication Critical patent/CN111079437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention relates to the field of data processing, and discloses an entity identification method, electronic equipment and a storage medium. In some embodiments of the present application, the entity identification method includes: determining suspected entities of texts to be identified; determining an entity vector of each character of the text to be recognized, wherein the entity vector of the character is determined according to the entity type of the suspected entity corresponding to the character; determining a semantic vector of the character according to the character vector of the character and the entity vector of the character; and inputting semantic vectors of all characters of the text to be recognized into an entity recognition model, and determining the entity in the text to be recognized. In the embodiment, the difficulty of entity identification is reduced, and the accuracy of entity identification is improved.

Description

Entity identification method, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to an entity identification method, electronic equipment and a storage medium.
Background
With technological progress, intelligent devices are becoming popular. To enable the smart device to better interact with humans, the smart device needs semantic recognition of received speech or text. In semantic recognition, it is an important link to identify entities in the semantic meaning.
However, the inventors found that there are at least the following problems in the prior art: different entity types may have similar or even identical entity names, which easily causes entity identification errors and limits the application of entity identification.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the invention aims to provide an entity identification method, electronic equipment and a storage medium, so that the difficulty of entity identification is reduced, and the accuracy of entity identification is improved.
In order to solve the above technical problems, an embodiment of the present invention provides an entity identification method, including: determining suspected entities of texts to be identified; determining an entity vector of each character of the text to be recognized, wherein the entity vector of the character is determined according to the entity type of the suspected entity corresponding to the character; determining a semantic vector of the character according to the character vector of the character and the entity vector of the character; and inputting semantic vectors of all characters of the text to be recognized into an entity recognition model, and determining the entity in the text to be recognized.
The embodiment of the invention also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the entity identification method as mentioned in the above embodiments.
Compared with the prior art, the method and the device for identifying the text based on the suspicious entity determine the semantic vector of each character in the text to be identified, enrich the semantic representation of the characters of the text to be identified. The semantic vector of the character of the text to be recognized is input into the entity recognition model, so that the entity recognition model can acquire the information of the suspected entity of the text to be recognized, thereby reducing the difficulty of entity recognition and improving the accuracy of entity recognition.
In addition, determining the entity vector of the character specifically includes: judging whether a suspected entity corresponding to the character exists or not; if yes, determining the entity type of the corresponding suspected entity, and determining the entity vector of the character according to the vector corresponding to the entity type. In this embodiment, the entity vector of the character can embody information of the entity type, so that the entity recognition model can acquire information of a suspected entity of the text to be recognized.
In addition, if it is determined that the suspected entity corresponding to the character does not exist, the entity identification method further includes: the entity vector of the character is determined to be a zero vector.
In addition, according to the vector corresponding to the entity type, determining the entity vector of the character specifically includes: and determining the entity vector of the character according to the vector corresponding to the entity type and the influence factor of the entity type. In this embodiment, the influence of some entity types which are easy to cause misrecognition can be reduced, so that the accuracy of entity identification is improved.
In addition, according to the vector corresponding to the entity type and the influence factor of the entity type, determining the entity vector of the character, and the concrete packageThe method comprises the following steps: calculating the entity vector of the character according to the formula A; formula a:
Figure BDA0002328045840000021
wherein, the entity_vector i An entity vector representing the ith character, N representing the number of entity types of the text to be recognized, j representing the jth entity type, the number of entity types being determined according to the entity type to which each suspected entity belongs, f i,j Characterizing whether the ith character belongs to the jth entity type, alpha j An influence factor representing the jth entity type, t j And the vector corresponding to the jth entity type is represented, and epsilon is a constant.
In addition, determining suspected entities of the text to be recognized specifically includes: acquiring a pre-stored entity library; and determining suspected entities in the text to be identified according to the entity library.
In addition, according to the entity library, determining the suspected entity in the text to be identified specifically includes: constructing a dictionary tree according to the entity library; using the dictionary tree, a suspected entity in the text to be identified is determined. In the embodiment, unnecessary character string comparison can be reduced to the greatest extent, and the query efficiency is improved.
In addition, according to the character vector of the character and the entity vector of the character, the semantic vector of the character is determined, and the method specifically comprises the following steps: splicing the character vector of the character with the entity vector of the character to obtain the semantic vector of the character; or, the character vector of the character and the entity vector of the character are used for obtaining the semantic vector of the character.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
Fig. 1 is a flowchart of an entity recognition method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a dictionary tree in accordance with a first embodiment of the present invention;
FIG. 3 is a flowchart of an entity identification method according to a second embodiment of the present invention;
fig. 4 is a schematic structural view of an entity recognition apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural view of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present invention, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".
In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.
The first embodiment of the invention relates to an entity identification method which is applied to electronic equipment, such as a terminal or a server. As shown in fig. 1, the entity identification method includes:
step 101: a suspected entity of text to be identified is determined.
Specifically, the electronic device may obtain a pre-stored entity library; and determining suspected entities in the text to be identified according to the entity library.
In one example, the electronic device constructs a dictionary tree from the entity library; using the dictionary tree, a suspected entity in the text to be identified is determined. For example, four entity types, eight entities, respectively, [ Libai (poetry) ], [ Libai (song) ], [ Li Zhi (singer) ], [ Zhou Jie (singer) ], [ Zhou Jielun (singer) ], [ Zhuqiong (singer) ], [ empty city counter (song) ], [ empty city counter (Beijing opera) ], in the current entity library, the dictionary tree is as shown in FIG. 2.
It is worth mentioning that the suspected entity in the text to be identified is found through the dictionary tree, unnecessary character string comparison can be reduced to the greatest extent, and query efficiency is improved.
It should be noted that, as will be understood by those skilled in the art, in practical applications, other similar data structures may be used to store and discover entities in the entity library, besides the dictionary tree, and the embodiment is not limited.
Step 102: determining an entity vector of each character of the text to be recognized; and determining the semantic vector of the character according to the character vector of the character and the entity vector of the character.
Specifically, the entity vector of the character is determined according to the entity type of the suspected entity corresponding to the character.
In one embodiment, the characters are kanji, and the character vectors of the characters can be obtained based on corpus training using a character vector generation method. For example, word2vec may be used to generate a word vector for each Chinese character in the corpus.
It should be noted that, as those skilled in the art will appreciate, in practical applications, other methods may be used to generate the character vector, and this embodiment is merely illustrative.
Step 103: and inputting semantic vectors of all characters of the text to be recognized into an entity recognition model, and determining the entity in the text to be recognized.
Specifically, the semantic vector of the character is determined based on the character vector of the character and the entity vector of the character, so that the semantic expression of the character is enriched, and the entity recognition model is informed of the information of the suspected entity of the text to be recognized, thereby reducing the difficulty of entity recognition and improving the accuracy of the entity recognition result.
In one embodiment, the entity recognition model may be a model created based on various models that can extract deep semantic and sentence features, such as a BiLSTM model, a biglu model, or a transducer model. Specifically, the entity recognition model extracts deep semantic and sentence features through a Bi-directional Long Short-Term Memory (BiLSTM) model, a BiGRU model or a transducer model, and adopts a full-connection layer and a conditional random field (conditional random field, CRF) layer to recognize the entity.
It is worth mentioning that, because the entity recognition model can extract deep semantic and sentence characteristics, under the condition that the entity library is incomplete, the entity recognition model is used for learning the sentence patterns of the text or the corpus to be recognized, so that the entity which does not exist in the entity library is obtained, and the dependency of entity recognition on the entity library is reduced.
It should be noted that, in practical application, the entity of the text to be recognized may be determined based on the semantic vector of each character through other entity recognition models, which are only illustrated in this embodiment, and the practical use of the models is not limited.
The inventor finds that in many application scenarios, a complete entity library is relatively easy to obtain, for example, a brand list of commodities in an e-commerce platform and a drug directory of a pharmacy in a medical institution, and the man-machine conversation system can also construct the entity library possibly related to the conversation offline, construct a dictionary tree to find suspected entities according to the entity library, and then combine predefined rules to identify the entities. However, relying solely on dictionary trees and predefined replacement rules (e.g., actual body length) is often inadequate to address complex problems. For example, the text to be recognized such as "from a first plum white feed", "singing a song into a dragon", "singing a first Beijing opera, a space city counter", and the like, and the suspected entity and the correct entity found using the dictionary tree are shown in table 1. In the first example of table 1, the liqueur is both a poetry and a song name, and the following entities [ verses ] are combined to disambiguate. In a second example, a "adult" is a singer, a "song of a dragon" is a song name, and if the substitutions are made in order of the entity length from long to short, the "song of a dragon" is erroneously recognized, and the "adult (singer)" is not recognized, so that other text features such as "singing" and "song" need to be combined to obtain a correct result. In a third example, a "space meter" is both a song name and a Beijing opera name, and other text features (Beijing opera) are required to disambiguate and correctly identify the entity. For entity types with large scale, short entity length, such as songs, names and places, the model based on literal matching is difficult to process ambiguous situations, so that the model has lower accuracy. Therefore, in this embodiment, the suspected entity in the text to be recognized is determined first, then the information of the text to be recognized and the information of the suspected entity are input to the deep learning model together, and then the deep learning model coordinates all the input semantic information to eliminate ambiguity, and the final recognition result is given. By the identification method, suspected entities are identified first, the recall rate is absolutely high, and the deep learning model focuses on integrating information to eliminate ambiguity. When an entity is required to be added or deleted, the dictionary tree is only required to be modified, and the updating can be completed in millisecond level, so that the electronic equipment is convenient to maintain. In addition, the entity identification method in this embodiment fully utilizes the advantages of the model for finding the suspected entity and the deep learning model, and has higher generalization capability. In addition, the entity identification method in the embodiment is suitable for engineering application without adding excessive variable parameters on the basis of the traditional deep learning model.
TABLE 1
Text to be recognized Suspected entity Correct entity
Delivery of Lei Yi Di Bai Libai (song, poet), send out another (poet) Libai (poet), delivering-off (poet)
Singing a song into a dragon Adult (singer), song of dragon (song) Adult dragon (singer)
Singing head Beijing opera and empty city meter Empty city counter (song), empty city counter (Beijing opera) Air city meter (Beijing opera)
The foregoing is merely illustrative, and is not intended to limit the technical aspects of the present invention.
Compared with the prior art, the entity recognition method provided by the embodiment determines the semantic vector of each character in the text to be recognized based on the suspected entity of the text to be recognized, and enriches the semantic representation of the characters of the text to be recognized. The semantic vector of the character of the text to be recognized is input into the entity recognition model, so that the entity recognition model can acquire the information of the suspected entity of the text to be recognized, thereby reducing the difficulty of entity recognition and improving the accuracy of entity recognition.
A second embodiment of the present invention relates to an entity identification method. This embodiment is an illustration of the first embodiment, and specifically describes a process in which the electronic device determines a semantic vector of a character.
Specifically, as shown in fig. 3, the electronic device performs the following operations for each character of the text to be recognized:
step 201: and judging whether a suspected entity corresponding to the character exists or not.
Specifically, in step 101, the electronic device determines a suspected entity in the text to be identified. And if the character is included in a certain suspected entity, the suspected entity is considered to correspond to the character. If the electronic device determines that there is a suspected entity corresponding to the character being processed, step 202 is executed, and if it determines that there is no suspected entity corresponding to the character being processed, step 204 is executed.
Step 202: and determining the entity type of the suspected entity corresponding to the character.
Specifically, the electronic device stores a correspondence between an entity and an entity type. The electronic device may determine, based on the correspondence, an entity type to which the suspected entity corresponding to the character belongs.
It should be noted that, in practical application, the number of the suspected entities corresponding to each character may be one or more, if the character corresponds to one suspected entity, the entity vector of the character is determined according to the entity type to which the one suspected entity belongs, and if the character corresponds to a plurality of suspected entities, the entity vector of the character is determined by combining the entity types to which each suspected entity belongs.
It should be noted that, as will be understood by those skilled in the art, in practical applications, the number and the division of the entity types may be set according to needs, which is not limited herein.
Step 203: and determining the entity vector of the character according to the vector corresponding to the entity type of the suspected entity. Step 205 is then performed.
Specifically, the electronic device may obtain vectors corresponding to the respective entity types in advance. The vectors corresponding to the entity types are obtained based on corpus training.
In one embodiment, the process of obtaining the vectors corresponding to each entity type based on corpus training is as follows: first, an initial vector corresponding to each entity type is randomly initialized. The dimension of the initial vector corresponding to each entity type can be set according to the requirement. In the random initialization phase, random numbers of the respective dimensions of the initial vector may be distributed over the interval [ -1,1]. Then, based on the corpus, training the initial vectors corresponding to the entity types by using a cross entropy loss function to obtain the final corresponding vectors of the entity types.
It should be noted that other conductive loss functions may also be used in the training process, and this embodiment is merely illustrative.
In one embodiment, the impact factors are set for each entity type based on a library of entities. The electronic equipment determines the entity vector of the character according to the vector corresponding to the entity type of the suspected entity, and the process is as follows: and determining the entity vector of the character according to the vector corresponding to the entity type of the suspected entity corresponding to the character and the influence factor of the entity type of the suspected entity corresponding to the character.
It is worth mentioning that setting the influence factor for each entity type can reduce the influence of some entity types which are easy to cause the identification error, thereby reducing the probability of the identification error.
Specifically, the electronic device calculates the entity vector of the character according to the formula A; wherein, formula A is:
Figure BDA0002328045840000061
entity_vector i an entity vector representing the ith character, N representing the number of entity types of the text to be recognized, j representing the jth entity type, the number of entity types being determined according to the entity type to which each suspected entity belongs, f i,j Characterizing whether the ith character belongs to the jth entity type, alpha j An influence factor representing the jth entity type, t j And the vector corresponding to the jth entity type is represented, and epsilon is a constant. Wherein f i,j If the suspected entity corresponding to the character i belongs to the j-th entity type and is positive, f i,j 1, if the suspected entity corresponding to the character i does not belong to the jth entity type, f i,j Is 0. Epsilon may be a small amount greater than 0, such as 0.001, etc., to prevent the denominator from being 0. Alpha j The method can be a floating point number reflecting the influence degree of the jth entity type, and the value of the floating point number can meet the condition:
Figure BDA0002328045840000071
it should be noted that, in practical application, it is also possible to avoid setting an influence factor for each entity type, for example, vectors corresponding to entity types of each suspected entity corresponding to a character may be directly added to obtain an entity vector of the character, and the specific process of obtaining the vector of the character based on the vector corresponding to the entity type of the suspected entity corresponding to the character is not limited in this embodiment.
In one embodiment, to ensure alpha j Meets the above conditions, let
Figure BDA0002328045840000072
Wherein e is a natural index, beta is an N-dimensional vector of floating point number constituent points, beta j Represents the j-th element in β.
In one embodiment, β may be trained based on a corpus. Specifically, β may be initialized to a vector with element values of all 1, i.e., assuming the same impact of each entity type, and then the values of each element are automatically adjusted during the training of the model. Because different entity types have different influence on prediction, for example, the entity type of songs has a large number of entities and a large number of entities with short numbers, and has relatively large misleading effect on entity identification, the influence of the entity type of songs is weak, and after model learning, the entity type of songs corresponds to alpha j And beta j Should have a small value.
It should be noted that, as will be understood by those skilled in the art, in practical application, α j It can also be empirically set, this embodiment is merely illustrative and not limiting in determining alpha j Is described in detail in (a).
Step 204: the entity vector of the character is determined to be a zero vector.
Specifically, if a character does not have a corresponding entity, the value of each element of the entity vector of the character is set to 0.
Step 205: and determining the semantic vector of the character according to the character vector of the character and the entity vector of the character.
In one embodiment, the electronic device concatenates the character vector of the character with the entity vector of the character to obtain a semantic vector of the character. For example, char_reproduction=char_vector+entity_vector, where char_reproduction represents a semantic vector of a character, char_vector represents a character vector of a character, i.e., a vector of the character itself, and entity_vector represents an entity vector of the character.
In another embodiment, the electronic device obtains a semantic vector of the character from a character vector of the character and an entity vector of the character. For example, the character_reproduction= [ character_vector, entity_vector ], where the character_reproduction represents a semantic vector of a character, the character_vector represents a character vector of the character, that is, a vector of the character itself, and the entity_vector represents an entity vector of the character.
It should be noted that, it is understood by those skilled in the art that the present embodiment is merely illustrative, and in practical application, the electronic device may use other ways to combine the character vector of the character with the entity vector of the character to obtain the semantic vector of the character.
The entity recognition method mentioned in this embodiment is exemplified below with reference to examples.
Let the text to be identified be "sing a song now," let only two entity types, song and singer, be considered, i.e., n=2 in equation a. Three suspected entities exist in the text by adopting the dictionary tree, which are respectively: the current, adult and song of dragon, the respective corresponding entity types are: song, singer and song. Assume that, after model training is completed, beta is found in the learned beta vector singer =1.8,β song =0.4, from which it can be calculated that singer's influence factor α singer Influence factor α of song=0.8 song =0.2. For each character of the text to be recognized, the entity vector of each character is determined as shown in table 2 using the entity recognition method mentioned in this embodiment.
TABLE 2
Figure BDA0002328045840000081
In the above table, f_singer and f_singer correspond to f in formula A i,j T_song and t_singer correspond to t in formula A j J e (singer, song), and the identity_vector of each character and the character vector (here, the character vector) of each character are added or relied on to obtain the semantic vector char_reproduction of each character. And inputting semantic vectors of all characters into a BiLSTM+CRF model, a BiGRU+CRF model or a transducer+CRF model, and identifying and obtaining an entity 'Dragon' in the text to be identified.
TABLE 3 Table 3
Figure BDA0002328045840000082
In an example, the inventor takes a bilstm+crf model as an example, and compares the entity identification method mentioned in this embodiment with other existing entity identification methods, and the comparison result is shown in table 3, and according to table 3, it can be found that the entity identification accuracy of the entity identification method mentioned in this embodiment is higher, and is more suitable for engineering application.
The foregoing is merely illustrative, and is not intended to limit the technical aspects of the present invention.
Compared with the prior art, the entity recognition method provided by the embodiment determines the semantic vector of each character in the text to be recognized based on the suspected entity of the text to be recognized, and enriches the semantic representation of the characters of the text to be recognized. The semantic vector of the character of the text to be recognized is input into the entity recognition model, so that the entity recognition model can acquire the information of the suspected entity of the text to be recognized, thereby reducing the difficulty of entity recognition and improving the accuracy of entity recognition.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
A third embodiment of the present invention relates to an entity recognition apparatus, as shown in fig. 4, including: a first determining module 401, configured to determine a suspected entity of the text to be identified; a second determining module 402, configured to determine, for each character of the text to be recognized, an entity vector of the character, where the entity vector of the character is determined according to an entity type to which a suspected entity corresponding to the character belongs; determining a semantic vector of the character according to the character vector of the character and the entity vector of the character; the recognition module 403 is configured to input a semantic vector of each character of the text to be recognized into the entity recognition model, and determine an entity in the text to be recognized.
It is to be noted that this embodiment is a system example corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units that are not so close to solving the technical problem presented by the present invention are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.
A fourth embodiment of the present invention relates to an electronic apparatus, as shown in fig. 5, including: at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can perform the entity identification method as mentioned in the above embodiment.
The electronic device includes: one or more processors 501 and a memory 502, one processor 501 being illustrated in fig. 5. The processor 501, the memory 502 may be connected by a bus or otherwise, in fig. 5 by way of example. The memory 502 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as character vectors for each character in the embodiments of the present application, are stored in the memory 502. The processor 501 executes various functional applications of the device and data processing, i.e., implements the entity identification method described above, by running non-volatile software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store a list of options, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some implementations, the memory 502 may optionally include memory located remotely from the processor 501, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 502 that, when executed by the one or more processors 501, perform the entity identification method in any of the method embodiments described above.
The electronic device may execute the method provided by the embodiment of the present application, and have the corresponding functional module and beneficial effect of the execution method, and technical details not described in detail in the embodiment of the present application may refer to the entity identification method provided by the embodiment of the present application.
A fifth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments described herein. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (8)

1. A method of entity identification, comprising:
determining suspected entities of texts to be identified;
determining an entity vector of each character of a text to be recognized, wherein the entity vector of the character is determined according to the entity type of a suspected entity corresponding to the character; determining a semantic vector of the character according to the character vector of the character and the entity vector of the character;
inputting semantic vectors of characters of the text to be recognized into an entity recognition model, and determining an entity in the text to be recognized;
calculating the entity vector of the character according to the formula A;
formula a:
Figure FDA0004214290010000011
wherein, the entity_vector i An entity vector representing the ith character, N representing the number of entity types of the text to be recognized, j representing the jth entity type, the number of entity types being based on the respective suspected entityEntity type determination, f i,j Characterizing whether the ith character belongs to the jth entity type, alpha j An influence factor representing the jth entity type, t j And the vector corresponding to the jth entity type is represented, and epsilon is a constant.
2. The method for entity identification according to claim 1, wherein said determining the entity vector of the character specifically comprises:
judging whether a suspected entity corresponding to the character exists or not;
if yes, determining the entity type of the corresponding suspected entity, and determining the entity vector of the character according to the vector corresponding to the entity type.
3. The entity recognition method according to claim 2, wherein if it is determined that there is no suspected entity corresponding to the character, the entity recognition method further comprises:
and determining the entity vector of the character as a zero vector.
4. The method for identifying entities according to any one of claims 1 to 3, wherein said determining suspected entities of text to be identified specifically comprises:
acquiring a pre-stored entity library;
and determining suspected entities in the text to be identified according to the entity library.
5. The method for identifying an entity according to claim 4, wherein determining a suspected entity in the text to be identified according to the entity library specifically includes:
constructing a dictionary tree according to the entity library;
and determining suspected entities in the text to be identified by using the dictionary tree.
6. The method for identifying an entity according to claim 1, wherein determining the semantic vector of the character according to the character vector of the character and the entity vector of the character specifically comprises:
splicing the character vector of the character and the entity vector of the character to obtain the semantic vector of the character; or alternatively, the process may be performed,
and adding the character vector of the character and the entity vector of the character to obtain the semantic vector of the character.
7. An electronic device, comprising: at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the entity identification method of any one of claims 1 to 6.
8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the entity identification method of any one of claims 1 to 6.
CN201911324636.0A 2019-12-20 2019-12-20 Entity identification method, electronic equipment and storage medium Active CN111079437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911324636.0A CN111079437B (en) 2019-12-20 2019-12-20 Entity identification method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911324636.0A CN111079437B (en) 2019-12-20 2019-12-20 Entity identification method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111079437A CN111079437A (en) 2020-04-28
CN111079437B true CN111079437B (en) 2023-07-07

Family

ID=70316185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911324636.0A Active CN111079437B (en) 2019-12-20 2019-12-20 Entity identification method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111079437B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668334B (en) * 2020-12-16 2024-02-13 科大讯飞股份有限公司 Entity identification method, electronic equipment and storage device
CN113011186B (en) * 2021-01-25 2024-04-26 腾讯科技(深圳)有限公司 Named entity recognition method, named entity recognition device, named entity recognition equipment and computer readable storage medium
CN114692644A (en) * 2022-03-11 2022-07-01 粤港澳大湾区数字经济研究院(福田) Text entity labeling method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797992A (en) * 2017-11-10 2018-03-13 北京百分点信息科技有限公司 Name entity recognition method and device
CN108536679B (en) * 2018-04-13 2022-05-20 腾讯科技(成都)有限公司 Named entity recognition method, device, equipment and computer readable storage medium
US20190370398A1 (en) * 2018-06-01 2019-12-05 SayMosaic Inc. Method and apparatus for searching historical data
CN109858041B (en) * 2019-03-07 2023-02-17 北京百分点科技集团股份有限公司 Named entity recognition method combining semi-supervised learning with user-defined dictionary

Also Published As

Publication number Publication date
CN111079437A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN111079437B (en) Entity identification method, electronic equipment and storage medium
CN104142915B (en) A kind of method and system adding punctuate
CN105095204B (en) The acquisition methods and device of synonym
CN110826335B (en) Named entity identification method and device
CN110825857B (en) Multi-round question and answer identification method and device, computer equipment and storage medium
CN112487190B (en) Method for extracting relationships between entities from text based on self-supervision and clustering technology
CN110825949A (en) Information retrieval method based on convolutional neural network and related equipment thereof
CN111414757B (en) Text recognition method and device
CN111061840A (en) Data identification method and device and computer readable storage medium
CN111460814A (en) Sensitive information detection method, device, terminal and medium
CN114154487A (en) Text automatic error correction method and device, electronic equipment and storage medium
CN111797217B (en) Information query method based on FAQ matching model and related equipment thereof
CN111914564B (en) Text keyword determination method and device
CN111325033B (en) Entity identification method, entity identification device, electronic equipment and computer readable storage medium
CN112085091A (en) Artificial intelligence-based short text matching method, device, equipment and storage medium
CN110390104B (en) Irregular text transcription method and system for voice dialogue platform
CN112560425B (en) Template generation method and device, electronic equipment and storage medium
CN113204613B (en) Address generation method, device, equipment and storage medium
CN112541357B (en) Entity identification method and device and intelligent equipment
CN111858860B (en) Search information processing method and system, server and computer readable medium
CN114049642A (en) Text recognition method and computing device for form certificate image piece
JP2018010482A (en) Document concept base generation device, document concept search device, method and program
CN111967248A (en) Pinyin identification method and device, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210207

Address after: 200245 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Applicant after: Dalu Robot Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: CLOUDMINDS (SHENZHEN) ROBOTICS SYSTEMS Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 200245 Building 8, No. 207, Zhongqing Road, Minhang District, Shanghai

Applicant after: Dayu robot Co.,Ltd.

Address before: 200245 2nd floor, building 2, no.1508, Kunyang Road, Minhang District, Shanghai

Applicant before: Dalu Robot Co.,Ltd.

GR01 Patent grant
GR01 Patent grant