CN112668334B - Entity identification method, electronic equipment and storage device - Google Patents

Entity identification method, electronic equipment and storage device Download PDF

Info

Publication number
CN112668334B
CN112668334B CN202011487574.8A CN202011487574A CN112668334B CN 112668334 B CN112668334 B CN 112668334B CN 202011487574 A CN202011487574 A CN 202011487574A CN 112668334 B CN112668334 B CN 112668334B
Authority
CN
China
Prior art keywords
entity
node
path
semantic
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011487574.8A
Other languages
Chinese (zh)
Other versions
CN112668334A (en
Inventor
汪强兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202011487574.8A priority Critical patent/CN112668334B/en
Publication of CN112668334A publication Critical patent/CN112668334A/en
Application granted granted Critical
Publication of CN112668334B publication Critical patent/CN112668334B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses an entity identification method, electronic equipment and a storage device, wherein the entity identification method comprises the following steps: acquiring a text to be identified, and determining the knowledge field to which the text to be identified belongs; determining a target entity matched with the real intention of the text to be identified by using an entity tree matched with the knowledge field; the entity tree comprises a plurality of node entities, a preset relation exists between two connected node entities, and the target entity is from the plurality of node entities. By the aid of the scheme, accuracy of entity identification can be improved.

Description

Entity identification method, electronic equipment and storage device
Technical Field
The present invention relates to the field of natural language processing, and in particular, to an entity identification method, an electronic device, and a storage device.
Background
With the rapid development of information technology, natural language understanding technology has been widely applied to services such as knowledge questions and answers, voice assistant, online customer service, etc., and by identifying target entities in texts to be identified to respond to real intention of users, labor cost can be reduced, and user experience can be improved. For example, by identifying the target entity "post-air conditioner" in the text to be identified "how post-air conditioner is open", the machine can automatically give several ways to open the post-air conditioner.
However, in the actual man-machine interaction process, there still exist situations in which the machine cannot normally recognize the target entity. For example, for the user spoken text to be recognized, "how to turn on the air conditioner on the rear side", the machine can only recognize the entity "air conditioner", but cannot accurately recognize the target entity "rear air conditioner" indicated by the user's real intention. In view of this, how to improve the accuracy of entity identification is a problem to be solved.
Disclosure of Invention
The text to be identified mainly solves the technical problem of providing an entity identification method, electronic equipment and a storage device, and can improve the accuracy of entity identification.
In order to solve the above-mentioned text to be identified, a first aspect of the present application provides an entity identification method, including: acquiring a text to be identified, and determining the knowledge field to which the text to be identified belongs; determining a target entity matched with the real intention of the text to be identified by using an entity tree matched with the knowledge field; the entity tree comprises a plurality of node entities, a preset relation exists between two connected node entities, and the target entity is from the plurality of node entities.
In order to solve the text to be identified, a second aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, where the memory stores program instructions, and the processor is configured to execute the program instructions to implement the entity identification method in the first aspect.
In order to solve the above-mentioned text to be recognized, a third aspect of the present application provides a storage device storing program instructions executable by a processor for implementing the entity recognition method in the above-mentioned first aspect.
According to the scheme, the text to be recognized is obtained, the knowledge domain to which the text to be recognized belongs is determined, so that the entity tree matched with the knowledge domain is utilized to determine the target entity matched with the real intention of the text to be recognized, the entity tree comprises a plurality of node entities, a preset relationship exists between two connected node entities, and the target entity is from the plurality of node entities. Therefore, based on the entity tree matched with the knowledge field, the relation among a plurality of node entities related to the knowledge field can be considered in the process of identifying the target entity, so that the true intention in the text to be identified can be fully mined, and the accuracy of entity identification is improved.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for entity identification in the present application;
FIG. 2 is a schematic diagram of a framework of one embodiment of an entity tree;
FIG. 3 is a flowchart illustrating an embodiment of step S12 in FIG. 1;
FIG. 4 is a flowchart illustrating the step S12 of FIG. 1 according to another embodiment;
FIG. 5 is a schematic diagram of a framework of another embodiment of an entity tree;
FIG. 6 is a flowchart of a further embodiment of step S12 in FIG. 1;
FIG. 7 is a flowchart of a further embodiment of step S12 in FIG. 1;
FIG. 8 is a schematic diagram of a frame of an embodiment of an electronic device of the present application;
FIG. 9 is a schematic diagram of a frame of an embodiment of a storage device of the present application.
Detailed Description
The following describes the embodiments of the present application in detail with reference to the drawings.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a flow chart illustrating an embodiment of an entity identification method. Specifically, the method may include the steps of:
step S11: and acquiring the text to be identified, and determining the knowledge field to which the text to be identified belongs.
In one implementation, the text to be recognized may be obtained directly through text input by the user. Specifically, in the human-computer interaction process, the user can directly input the text through the characters, so that the text input by the user can be directly used as the text to be identified. For example, in the human-computer interaction process, the user can directly input text to be recognized, "how the air conditioner at the rear row is opened" in text to be recognized. Other situations can be similar and are not exemplified here.
In another implementation scenario, the text to be recognized may also be obtained by recognition of voice data entered by the user. Specifically, the user can perform man-machine interaction through voice, so that voice recognition can be performed on the collected voice data to obtain a text to be recognized. For example, the user can input "how the air conditioner of the rear row is opened" through voice, and by collecting and recognizing voice data, the text to be recognized "how the air conditioner of the rear row is opened" can be obtained. Other situations can be similar and are not exemplified here.
In addition, the knowledge field of the text to be recognized may be set according to the actual application situation, and is not limited herein. For example, the knowledge field of the text to be recognized may be an automotive field, in which case, an automotive manufacturer can provide services such as questions, answers, consultation, etc. to an owner of the vehicle; or, the knowledge field of the text to be identified can also be an elevator, in which case the elevator manufacturer can provide services such as questions, answers, consultations, etc. for owners/properties, etc.; or, the knowledge field of the text to be identified may be a banking field, in which case, the banking can provide services such as questions and answers, consultation, etc. for the depositor, etc.; alternatively, the knowledge domain of the text to be identified may be a tax domain, in which case the tax authority may be able to provide services such as questions and answers, consultation, etc. to the enterprise/employee, etc., and the like, and so on, which is not exemplified here.
In one implementation scenario, the knowledge domain to which the text to be identified belongs may be selected by the user who proposes the text to be identified, so that the knowledge domain to which the text to be identified belongs can be determined according to the knowledge domain selected by the user. Specifically, in the human-computer interaction process, before the text to be identified is acquired, the user can be prompted to select the knowledge domain to be consulted, so that the knowledge domain selected by the user can be directly used as the knowledge domain to which the text to be identified acquired subsequently belongs. For example, the user may be prompted to select a knowledge domain to be consulted prior to consulting for car-related questions, such as may include, but not be limited to: air conditioner, engine, wiper, central control, etc. and directly uses the knowledge field selected by the user as the knowledge field to which the text to be identified is to be input by the user later, if the user selects "air conditioner", the "air conditioner" can be directly used as the knowledge field to which the text to be identified (e.g. "how the air conditioner on the rear row is opened") is to be input by the user later. Other situations can be similar and are not exemplified here.
In another implementation scenario, the knowledge domain to which the text to be recognized belongs may also be determined based on the key words recognized by the text to be recognized. Specifically, in the human-computer interaction process, after the text to be identified is obtained, the text to be identified can be further identified to obtain a plurality of key words, so that the knowledge field to which the plurality of key words belong can be used as the knowledge field to which the text to be identified belongs. Still take the example of how the air conditioner on the rear side of the text to be identified is opened, the text to be identified can be identified to obtain the keyword air conditioner, and the air conditioner can be directly used as the knowledge field of the text to be identified because the knowledge field of the keyword air conditioner is the air conditioner. Other situations can be similar and are not exemplified here.
It should be noted that, the text to be identified in the embodiment of the present disclosure may not be limited to the aforementioned question statement of how the air conditioner on the rear side is turned on, and may also include, but is not limited to: the command sentence, the statement sentence, etc. may also be, for example, "please open the air conditioner of the rear row", specifically, may be set according to the actual application scenario, and is not limited herein. For example, in a use scenario of a voice assistant or the like, the text to be recognized may be a query sentence (e.g., how the air conditioner on the back row is turned on), a command sentence (e.g., how the air conditioner on the back row is turned on), or a statement sentence (e.g., how the air conditioner on the front row is always creak); or, in the use scenario such as knowledge question and answer, the text to be identified may be a query sentence (for example, "why the air conditioner air outlet on the front row is always crunchy abnormal sound"), which may be specifically set according to the actual application scenario, and is not limited herein.
Step S12: and determining a target entity matched with the real intention of the text to be identified by using the entity tree matched with the knowledge field.
In the embodiment of the disclosure, the entity tree includes a plurality of node entities, a preset relationship exists between two connected node entities, and the target entity is from the plurality of node entities. Specifically, the preset relationship may include, but is not limited to: positional relationships, parent-child relationships, and the like.
Referring to fig. 2 in combination, fig. 2 is a schematic diagram of an embodiment of a physical tree. As shown in fig. 2, taking the preset knowledge domain as an "air conditioner" as an example, an entity tree matched with the "air conditioner" as shown in fig. 2 may be obtained, where the entity tree includes node entities: an air conditioner, a front air conditioner, a rear air conditioner, an air outlet, a front air outlet and a rear air outlet. The node entity air conditioner is connected with the node entity front air conditioner and the node entity rear air conditioner by solid lines to indicate that a positional relationship exists, specifically, as shown in fig. 2, the positional relationship between the node entity air conditioner and the node entity front air conditioner is a front row, the positional relationship between the node entity air conditioner and the node entity rear air conditioner is a rear row, the node entity air outlet is connected with the node entity front air outlet and the node entity rear air outlet by solid lines to indicate that a positional relationship exists, and specifically, as shown in fig. 2, the positional relationship between the node entity air outlet and the node entity front air outlet is a front row, and the positional relationship between the node entity air outlet and the node entity rear air outlet is a rear row. In addition, the node entity air conditioner is connected with the node entity air outlet by a dotted line, which indicates that a composition relationship exists, namely the node entity air outlet is a part of the node entity air conditioner. Other situations can be similar and are not exemplified here.
In addition, as shown in fig. 2, the entity tree includes several layers, and there is a parent-child relationship between an upper node entity and a lower node entity connected to the upper node entity, for example, child nodes of a node entity "air conditioner" are "front air conditioner", "rear air conditioner", and "air outlet", whereas parent nodes of a node entity "front air conditioner", "rear air conditioner", and "air outlet" are "air conditioners", and other node entities can be similarly used, which is not exemplified herein.
In one implementation scenario, several entities may be extracted from language materials related to the knowledge domain, and an entity tree matching the knowledge domain may be constructed using the several entities.
In one particular implementation scenario, language materials related to the knowledge domain may include, but are not limited to: encyclopedia vocabulary entries, literature books, newspapers and magazines, etc., are not limited herein. Taking the knowledge domain "air conditioner" as an example, language materials related to the "air conditioner" may include: the encyclopedic entry of "air-conditioning", professional books, paper documents, magazines, etc. related to "air-conditioning" are not limited herein. Other knowledge areas may be so-called and are not exemplified herein.
In another specific implementation scenario, a NER (Named Entity Recognition ) tool such as NLTK, stanford NLP may be specifically used to extract the entity in the language material, and the specific type of the NER tool may be set according to the actual application needs, which is not limited herein.
In another implementation scenario, entity trees respectively matched with a plurality of knowledge domains may also be pre-constructed, so that after determining that the knowledge domain to which the text to be identified belongs is obtained, the entity tree matched with the knowledge domain may be directly selected. For example, entity trees matched with the knowledge domain "air conditioner", "engine", "center control", and the like may be constructed in advance, so that in the case of determining how to turn on the air conditioner on the rear side of the text to be recognized "the knowledge domain to which" belongs is "air conditioner", the entity tree matched with the knowledge domain "air conditioner" may be directly selected.
In one implementation scenario, the text to be identified includes at least one keyword related to the real intention, in which case, the target entity may be determined based on a candidate path including the keyword in the entity tree, where the candidate path includes a plurality of sequentially connected node entities and a preset relationship between neighboring node entities. According to the method, the keyword which is contained in the text to be recognized and is related to the real intention is obtained, and the target entity is determined based on the candidate path containing the keyword in the entity tree, so that the fineness of searching the target entity in the entity tree can be improved, and the accuracy of the target entity can be improved.
In a specific implementation scenario, taking the foregoing text to be identified as an example of how the air conditioner on the rear side is turned on, the real intention of the text to be identified is "the opening mode of the rear air conditioner", that is, the text to be identified includes the following keywords "rear side", "air conditioner" related to the real intention; or taking the example that the air conditioner air outlet on the front row side of the text to be identified is always in the squeak noise, the real intention of the text to be identified is the front exhaust air outlet abnormal noise, namely the text to be identified comprises the following key words of front row, air conditioner, air outlet and abnormal noise related to the real intention. Other situations can be similar and are not exemplified here.
In another specific implementation scenario, referring to fig. 2, still taking the text to be recognized as "how the air conditioner on the rear row is turned on" as an example, as described above, the key words include "rear row" and "air conditioner", so the candidate paths including the key words in the entity tree shown in fig. 2 include the following: the total of the air conditioner-back row air conditioner, the air outlet-back row air outlet, the air conditioner-back row air conditioner, the air conditioner-front row air conditioner and the air conditioner-air outlet is 5 candidate paths. Other situations can be similar and are not exemplified here.
In another specific implementation scenario, the key words in the text to be identified may be specifically obtained by performing entity identification on the text to be identified by using NER tools such as the aforementioned NLTK, stanford NLP, etc., and specific types of the NER tools may be set according to actual application needs, which is not limited herein.
In yet another specific implementation scenario, one candidate path may be selected as a target path from among candidate paths that thus include the keyword, and one node entity in the target path may be selected as the target entity. Reference may be made specifically to the following disclosure examples, which are not described in detail herein.
In another implementation scenario, the semantic relationship between the text to be identified and the plurality of node entities may also be utilized to determine the target entity. Semantic relationships may include, but are not limited to: semantic similarity, semantic difference, etc., are not limited herein. According to the method, the target entity is determined by utilizing the semantic relation between the text to be identified and the plurality of node entities, so that searching of candidate paths through key words can be avoided, calculation resources required by searching of the candidate paths can be reduced, the speed of searching the target entity in the entity tree can be improved, and the accuracy of searching the target entity can be improved due to searching of the target entity based on deep semantic.
In a specific implementation scenario, in order to facilitate calculation of the above-mentioned semantic relationship, the target entity may be specifically determined by calculating semantic similarity between the text to be identified and the plurality of node entities, respectively. For example, the similarity of cosine between the semantic representation of the text to be recognized and the semantic representations of the plurality of node entities, respectively, may be determined without limitation. Reference may be made specifically to the following disclosure examples, which are not described in detail herein.
In another specific implementation scenario, referring to fig. 2, still taking how the air conditioner on the rear side of the text to be identified is turned on as an example, as described above, the semantic relationship between the text to be identified and the node entities "air conditioner", "front air conditioner", "rear air conditioner", "air outlet", "front air outlet", "rear air outlet" contained in fig. 2 may be calculated, so that a node entity is selected from the semantic relationships between the text to be identified and the node entities based on the semantic relationship between the text to be identified and the node entities, as a target entity for matching the text to be identified with the real intention. For example, based on semantic similarity, a node entity "post-air conditioner" may be selected as a target entity for the text to be identified "how open the air conditioner on the post-air conditioner" may be. Other situations can be similar and are not exemplified here.
In one implementation scenario, to facilitate understanding of the actual intent of the text to be identified later, after identifying the target entity, the target entity may be mapped to an entity corresponding to the knowledge base. Still taking the foregoing text to be identified as how to turn on the air conditioner on the rear row side as an example, after the target entity "rear row air conditioner" is identified, the target entity "rear row air conditioner" may be mapped to a corresponding entity "rear row air conditioner" in a knowledge base (e.g., a knowledge map), and further, the knowledge base may further include a related explanation of the corresponding entity "rear row air conditioner", where "rear row air conditioner" refers to that the rear row seat may be the same as the front row seat, and different temperatures may be set to adjust and control the air volume. Other situations can be similar and are not exemplified here.
According to the scheme, the text to be recognized is obtained, the knowledge domain to which the text to be recognized belongs is determined, so that the entity tree matched with the knowledge domain is utilized to determine the target entity matched with the real intention of the text to be recognized, the entity tree comprises a plurality of node entities, a preset relationship exists between two connected node entities, and the target entity is from the plurality of node entities. Therefore, based on the entity tree matched with the knowledge field, the relation among a plurality of node entities related to the knowledge field can be considered in the process of identifying the target entity, so that the true intention in the text to be identified can be fully mined, and the accuracy of entity identification is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S12 in fig. 1. The method specifically comprises the following steps:
step S31: candidate paths of any key words are contained in the search entity tree.
In an implementation scenario, as described in the foregoing disclosure embodiment, the keyword in the text to be identified may be specifically obtained by performing entity recognition on the text to be identified by using NER tools, such as the aforementioned NLTK, stanford NLP, etc., and the specific types of the NER tools may be set according to actual application needs, which is not limited herein.
In one implementation scenario, each node entity and connection edge may be searched in an entity tree, where the entity node or connection edge in the entity tree is a path of a keyword, and the path is a candidate path. As described in the foregoing disclosure embodiments, the text to be recognized, "how the air conditioner is turned on the rear side" is still taken as an example, and the key words include "rear side" and "air conditioner". On this basis, the candidate paths including the key words in the entity tree shown in fig. 2 may include the following: the total of the air conditioner-back row air conditioner, the air outlet-back row air outlet, the air conditioner-back row air conditioner, the air conditioner-front row air conditioner and the air conditioner-air outlet is 5 candidate paths. Other situations can be similar and are not exemplified here.
In one implementation scenario, to facilitate subsequent computation of path depths of each candidate path, a last node entity of the candidate path is a leaf node of the entity tree, and the leaf node is a node of the entity tree where no child node exists. Further, the first node entity of the candidate path may be a root node of the entity tree, where the root node is a node in the entity tree where no parent node exists.
Step S32: based on the path depth of each candidate path, one node entity is selected from one of the candidate paths as a target entity.
It should be noted that, the path depth represents the number of layers that the last node entity of the candidate path has to the root node of the entity tree in the entity tree. Referring to fig. 2 in combination, taking the example of how the air conditioner on the rear row side of the text to be identified is turned on, the number of layers experienced by the candidate path "air conditioner-rear row air conditioner" in the entity tree is 2, so the path depth of the candidate path is 2; or the number of layers experienced by the candidate path 'air conditioner-back row air conditioner' in the entity tree is 2, so that the path depth of the candidate path is 2; or, the number of layers experienced by the candidate path 'air outlet-back row-back exhaust outlet' in the entity tree is 3, so that the path depth of the candidate path is 3. Other situations can be similar and are not exemplified here.
In one implementation scenario, a candidate path with the greatest path depth may be selected as the target path, and the last node entity of the target path may be used as the target entity. Referring to fig. 2 in combination, taking the case of how the air conditioner on the rear row side is turned on of the text to be recognized as an example, the path depth of the candidate path "air outlet-rear row-rear air outlet" is the largest among the 5 statistically corresponding candidate paths, so that the candidate path can be used as a target path, and the last node entity "rear air outlet" can be used as a target entity. Other things can be said and are not exemplified here.
In another implementation scenario, in order to reduce interference as much as possible, before determining the target path based on the path depth of the candidate path, the candidate path that does not include all the keywords may be removed, so that the target path may be selected based on the path depth from the remaining candidate paths, and the last node entity of the target path may be used as the target entity. According to the method, before the target path is determined based on the path depth of the candidate path, the candidate path which does not contain all key words is removed, so that interference of the candidate path with low correlation with the true intention of the text to be identified on a subsequent searching target entity can be eliminated, and efficiency and accuracy of searching the target entity can be improved.
In a specific implementation scenario, referring to fig. 2, taking still an example of how the air conditioner on the rear side of the text to be recognized is turned on, the aforementioned 5 candidate paths: in the candidate path a 'air conditioner-back row-back air conditioner', the candidate path b 'air outlet-back row-back air outlet', the candidate path c 'air conditioner-back row-back air conditioner', the candidate path d 'air conditioner-front row air conditioner', and the candidate path e 'air conditioner-air outlet', only the candidate path a and the candidate path c contain all key words, namely the key words 'back row' and the key words 'air conditioner', so that after being rejected, only the candidate path a and the candidate path c remain, and further, as the two are identical, the last node entity of the candidate path a and the candidate path c, namely the node entity 'back row air conditioner', can be directly used as a target entity. Other situations can be similar and are not exemplified here.
Different from the foregoing embodiment, by searching candidate paths including any keyword in the entity tree, thereby selecting one node entity from one of the candidate paths as a target entity based on the path depth of each candidate path, it is possible to facilitate searching the target entity based on the candidate path including any keyword, thereby being able to facilitate improving the search fineness, and searching the target entity by the path depth of the candidate path, thereby being able to facilitate improving the accuracy of the target entity.
Referring to fig. 4, fig. 4 is a flowchart illustrating another embodiment of step S12 in fig. 1. The method specifically comprises the following steps:
step S41: and eliminating key words with different names from any node entity.
Referring to fig. 5 in combination, fig. 5 is a schematic diagram of a frame of another embodiment of an entity tree. As shown in fig. 5, taking the case that the air conditioner air outlet on the front row side of the text to be identified is always in the squeak noise, the NER tool can identify and obtain the key words "front row", "air conditioner", "air outlet", "abnormal noise" in the text to be identified, if the candidate path containing any key word is directly searched in the entity tree as shown in fig. 5, not only the candidate path containing at least one of the preset relationship "front row", "abnormal noise" but also the candidate path containing at least one of the node entities "air conditioner", "air outlet" can be searched, so that a large number of candidate paths can be searched, and further the candidate path explosion can be possibly caused. Therefore, before searching candidate paths, the key words are screened to remove the key words which have no significance to the searching target entity, so that the number of the candidate paths can be reduced, and the problem of explosion of the candidate paths is greatly relieved.
In the embodiment of the disclosure, considering that the key words with the same name as the preset relation in the entity tree are necessarily attached to the node entity, the key words with different names from any node entity are directly removed, so that the interference of the key words on the candidate path for subsequent searching is eliminated. Taking the previous text to be identified, namely the air conditioner air outlet on the front row side, as an example, the squeak abnormal sound is always taken as an example, the key words of front row and abnormal sound can be removed, namely only the residual key words of air conditioner and air outlet are removed. Other situations can be similar and are not exemplified here.
Step S42: candidate paths of any key words are contained in the search entity tree.
In the embodiment of the disclosure, the last node entity of the candidate path is a leaf node of the entity tree. Reference may be made specifically to the relevant steps in the foregoing disclosed embodiments, which are not described herein.
It should be noted that any of the above keywords is the keyword remaining after the culling. That is, in the disclosed embodiment, candidate paths are searched for that include any remaining keywords in the entity tree. Taking the previous text to be identified, i.e. the air conditioner air outlet on the front row side, is always in the case of squeak noise, the candidate paths including any one of the remaining key words "air conditioner" and "air outlet" in the entity tree, i.e. the following candidate paths are searched: candidate path a ' air conditioner-front row air conditioner ', candidate path b ' air conditioner-rear row air conditioner ', candidate path c ' air conditioner-air outlet ', candidate path d ' air conditioner-air outlet-front row-front outlet-abnormal sound-front outlet abnormal sound ', candidate path e ' air conditioner-air outlet-rear row-rear outlet ', candidate path f ' air outlet-front row-front outlet-abnormal sound-front outlet abnormal sound). Other situations can be similar and are not exemplified here.
Step S43: based on the path depth of each candidate path, one node entity is selected from one of the candidate paths as a target entity.
In one implementation scenario, as described in the foregoing disclosure embodiments, the candidate path with the greatest path depth may be selected as the target path, and the last node entity of the target path may be used as the target entity. Reference may be made specifically to the relevant descriptions in the foregoing disclosed embodiments, and details are not repeated here.
In another implementation scenario, as in the foregoing disclosure embodiment, in order to reduce interference as much as possible, before determining the target path based on the path depth of the candidate path, the candidate path that does not include all the keywords may be removed, so that, among the remaining candidate paths, the target path may be selected based on the path depth, and the last node entity of the target path may be used as the target entity. It should be noted that, at this time, all the key words include the key words that have been removed and the key words that have not been removed, that is, all the key words that are recognized in the text to be recognized. Taking the previous text to be identified, i.e. the air conditioner air outlet on the front row side, is always in the case of squeak, and all key words comprise front row, air conditioner, air outlet and abnormal, so that only candidate path d and candidate path f comprise all key words, and on the basis, the path depth of candidate path d is 4, and the path depth of candidate path e is 4, so that the last node entity, i.e. the front air outlet abnormal, is taken as a target entity.
Different from the embodiment, the method includes the steps of firstly eliminating the key words with different names from any node entity, searching candidate paths containing any key word in the entity tree on the basis of the key words, selecting one node entity from one candidate path with the path depth of each candidate path as a target entity, eliminating the key words which have no significance to searching the target entity before searching the candidate path, reducing the number of the candidate paths, greatly relieving the problem of candidate path explosion, reducing the resource load of searching the target entity, and improving the searching speed of the target entity.
Referring to fig. 6, fig. 6 is a flowchart of another embodiment of step S12 in fig. 1. In the embodiment of the present disclosure, the preset relationship in the entity tree includes a parent-child relationship, and the inclusion range of the parent node entity is greater than the inclusion range of the child node entity, for example, the inclusion range of the root node "air conditioner" in fig. 5 is greater than the inclusion range of all other node entities, and the other cases can be similar, which are not exemplified here. Specifically, embodiments of the present disclosure may include the steps of:
step S61: and respectively determining the inclusion range of at least one keyword by using the entity tree.
As described above, the inclusion range of the parent node entity is larger than the inclusion range of the child node entity, so based on this, the size relationship between the inclusion ranges of the respective key words can be determined. In addition, considering that the keyword with the same name as the preset relation in the entity tree may exist, but the preset relation does not exist in the entity tree, the keyword with the same name as the preset relation in the entity tree may be removed first, and then the inclusion range of the remaining keywords may be determined. Taking the example that the air conditioner air outlet on the front row side of the text to be identified is always in the squeak noise, the NER tool can identify and obtain the key words "front row", "air conditioner", "air outlet", "abnormal noise" in the text to be identified, and please refer to the entity tree shown in fig. 5 in combination, the key words "front row" and "abnormal noise" can be determined as the key words with the same name as the preset relation in the entity tree, so that the two key words can be removed first. Further, in connection with the entity tree shown in fig. 5, it may be determined that the inclusion range of the keyword "air conditioner" is larger than the inclusion range of the keyword "air outlet". Other situations can be similar and are not exemplified here.
Step S62: and reserving the key words with the smallest included range, and eliminating the key words which are not reserved.
Taking the case that the text to be identified is "the air conditioner air outlet on the front row is always in the squeak noise" as an example, the keyword "air outlet" with the smallest included range can be reserved. Other things can be said and are not exemplified here.
Step S63: candidate paths of any key words are contained in the search entity tree.
In the embodiment of the disclosure, the last node entity of the candidate path is a leaf node of the entity tree. Reference may be made specifically to the relevant steps in the foregoing disclosed embodiments, which are not described herein.
It should be noted that any of the above keywords is the keyword remaining after the culling. That is, in the disclosed embodiment, candidate paths are searched for that include any remaining keywords in the entity tree. Taking the fact that the air conditioner air outlet on the front row of the text to be identified is always in the squeak noise as an example, candidate paths containing the residual key words of the air outlet in the entity tree can be searched, namely the following candidate paths: candidate path a "air conditioner-air outlet-front row-front air outlet-abnormal sound-front air outlet abnormal sound", candidate path b "air outlet-front row-front air outlet-abnormal sound-front air outlet abnormal sound", candidate path c "air conditioner-air outlet-rear row-rear air outlet". Other situations can be similar and are not exemplified here. Therefore, the candidate paths obtained by searching are further reduced, and the speed of searching the target entity can be improved.
Step S64: based on the path depth of each candidate path, one node entity is selected from one of the candidate paths as a target entity.
In one implementation scenario, as described in the foregoing disclosure embodiments, the candidate path with the greatest path depth may be selected as the target path, and the last node entity of the target path may be used as the target entity. Reference may be made specifically to the relevant descriptions in the foregoing disclosed embodiments, and details are not repeated here.
In another implementation scenario, as in the foregoing disclosure embodiment, in order to reduce interference as much as possible, before determining the target path based on the path depth of the candidate path, the candidate path that does not include all the keywords may be removed, so that, among the remaining candidate paths, the target path may be selected based on the path depth, and the last node entity of the target path may be used as the target entity. It should be noted that, at this time, all the key words include the key words that have been removed and the key words that have not been removed, that is, all the key words that are recognized in the text to be recognized. Taking the previous text to be identified, i.e. the air conditioner air outlet on the front row side, is always in the case of squeak, and all key words comprise front row, air conditioner, air outlet and abnormal, so that only candidate path a and candidate path b comprise all key words, on the basis, the path depth of the candidate path a is 4, and the path depth of the candidate path b is 4, thereby taking the last node entity, i.e. the front air outlet abnormal, as a target entity.
Different from the foregoing embodiment, the entity tree is utilized to determine the inclusion range of at least one keyword, and retain the keyword with the minimum inclusion range, and reject the keyword that is not retained, and on this basis, the candidate paths of any keyword are searched for in the entity tree, so that, based on the path depth of each candidate path, a node entity is selected from one of the candidate paths, as a target entity, which can be beneficial to utilizing the inclusion range, and further consider the relationship between the node entities in the process of searching the target entity, which can be beneficial to further reducing the number of candidate paths, and further reducing the resource load of searching the target entity, and improving the searching speed of the target entity.
Referring to fig. 7, fig. 7 is a flowchart of another embodiment of step S12 in fig. 1. The method specifically comprises the following steps:
step S71: a first semantic representation of leaf nodes of the entity tree is obtained and a second semantic representation of text to be identified is obtained.
In one implementation scenario, as described in the foregoing disclosed embodiments, the leaf nodes of the entity tree are nodes in the entity tree where no child nodes exist. Referring to fig. 5 in combination, as shown in fig. 5, the node entities "front air conditioner", "rear air outlet", "front air outlet abnormal noise" are all leaf nodes of the entity tree shown in fig. 5. Other situations can be similar and are not exemplified here.
In one implementation scenario, in order to improve efficiency and accuracy of semantic extraction, a node semantic extraction network may be used to perform semantic extraction on a leaf node to obtain a first node semantic representation, and a path semantic extraction network is used to perform semantic extraction on an entity path of the leaf node to obtain a first path semantic representation, where the entity path of the leaf node includes: and fusing the semantic representation of the first node with the semantic representation of the first path to obtain a first semantic representation. According to the method, the first node semantic representation of the leaf node is extracted through the node semantic extraction network, the first path semantic representation of the entity path of the leaf node is extracted through the path semantic extraction network, and finally the first node semantic representation and the first path semantic representation are fused to obtain the first semantic representation, so that the first semantic representation not only contains semantic information of the leaf node, but also contains semantic information of a physical path, and the accuracy of the first semantic representation can be improved.
In one particular implementation scenario, the node semantic extraction network, the path semantic extraction network may include, but is not limited to: BERT (Bidirectional Encoder Representation from Transformers, bi-directional transducer encoder), ELMo, GPT, etc., are not limited herein.
In another specific implementation scenario, as described in the foregoing disclosure embodiment, the root node is a node in the entity tree where no parent node exists, and referring to fig. 5, in the entity tree shown in fig. 5, the "air conditioner" is the root node. In addition, for the leaf node "front exhaust outlet abnormal sound", the actual path is "front exhaust outlet abnormal sound of front exhaust outlet of air conditioner air outlet", while for the leaf node "rear exhaust outlet", the actual path is "rear exhaust outlet of air conditioner air outlet", while for the leaf node "rear exhaust air conditioner", the actual path is "rear exhaust outlet of air conditioner", while for the leaf node "front air conditioner", the actual path is "front exhaust outlet of air conditioner". Other situations can be similar and are not exemplified here.
In yet another specific implementation scenario, the first node semantic representation and the first path semantic representation may be added to obtain a first semantic representation. For example, if the first node semantic representation and the first path semantic representation are both 128-dimensional vectors, the corresponding position elements of the first node semantic representation and the first path semantic representation may be added to obtain the 128-dimensional first semantic representation, and other cases may be similar, which are not illustrated here. For convenience of description, the above node semantic extraction network may be denoted as Entity, the path semantic extraction network is denoted as Edge, the leaf node is denoted as node, and the Entity path of the leaf node is denoted as path, and the first semantic representation node_vec of the leaf node may be represented as:
node_vec=Entity(node)+Edge(path)……(1)
In another implementation scenario, in order to improve efficiency and accuracy of semantic extraction, semantic extraction can be performed on a text to be identified by using a node semantic extraction network to obtain a second node semantic representation, and semantic extraction is performed on the text to be identified by using a path semantic extraction network to obtain a second path semantic representation, so that the second node semantic representation and the second path semantic representation are fused to obtain the second semantic representation. According to the method, the second node semantic representation of the text to be recognized is extracted through the node semantic extraction network, the second path semantic representation of the text to be recognized is extracted through the path semantic extraction network, and finally the second node semantic representation and the second path semantic representation are fused to obtain the second semantic representation, so that semantic information of the text to be recognized can be extracted at the node level and the path cover, and the accuracy of the second semantic representation can be improved.
In a specific implementation scenario, the second node semantic representation and the second path semantic representation may be added to obtain a second semantic representation. For example, if the second node semantic representation and the second path semantic representation are both 128-dimensional vectors, the corresponding position elements of the two may be added to obtain the 128-dimensional second semantic representation, and the other cases may be similar, which is not exemplified here. For convenience of description, the node semantic extraction network may be denoted as Entity, the path semantic extraction network is denoted as Edge, the text to be identified is denoted as query, and the second semantic representation query_vec of the text to be identified may be expressed as:
query_vec=Entity(query)+Edge(query)……(2)
Step S72: semantic similarity between the second semantic representation and the first semantic representation of each leaf node is obtained respectively.
In one implementation scenario, the semantic similarity between the second semantic representation and the first semantic representation of each leaf node may be calculated by cosine similarity.
In another implementation scenario, the second semantic representation may also be multiplied by the first semantic representation of each leaf node, respectively, to obtain a semantic similarity between the second semantic representation and the first semantic representation of the corresponding leaf node. For ease of description, assuming that there are N leaf nodes in the entity tree, the ith leaf node may be denoted as node i Where 1.ltoreq.i.ltoreq.N, the first semantic representation of the ith leaf node may be denoted node_vec i So the second semantic representation query_vec and the first semantic representation node_vec of the ith leaf node i Similarity of semantic similarity (node) i Query) can be expressed as:
similarity(node i ,query)=node_vec i *query_vec……(3)
step S73: and taking the leaf node corresponding to the highest semantic similarity as a target entity.
After the semantic similarity between the second semantic representation of the text to be recognized and the first semantic representation of each leaf node is obtained, the leaf node corresponding to the highest semantic similarity can be used as the target entity.
In addition, in order to improve the accuracy of the node semantic extraction network and the path semantic extraction network, the node semantic extraction network and the path semantic extraction network may be trained in advance. Specifically, a sample text can be obtained, the sample text is marked with a corresponding sample target entity and a knowledge field to which the sample text belongs, so that a sample entity tree matched with the knowledge field can be obtained, on the basis, a node semantic extraction network can be utilized to carry out semantic extraction on leaf nodes of the sample entity tree to obtain a first sample node semantic representation, a path semantic extraction network is utilized to carry out semantic extraction on entity paths of the leaf nodes to obtain a first sample path semantic representation, and the entity paths of the leaf nodes comprise: the method comprises the steps that entity nodes from a leaf node to a root node of a sample entity tree are subjected to and preset relations among the subjected entity nodes are obtained, so that first sample node semantic representations and first sample path semantic representations are fused to obtain first sample semantic representations, a node semantic extraction network can be utilized to carry out semantic extraction on sample texts corresponding to the sample texts to obtain second sample node semantic representations, a path semantic extraction network is utilized to carry out semantic extraction on the sample texts to obtain second sample path semantic representations, the second sample node semantic representations and the second sample path semantic representations are fused to obtain second sample semantic representations, semantic similarity between the second sample semantic representations and first sample semantic representations of all the leaf nodes is obtained, finally, the semantic similarity and sample target entities marked by the sample texts can be utilized to calculate loss values of a node semantic extraction network and a path semantic extraction network, and network parameters of the node semantic extraction network and the path semantic extraction network are adjusted by utilizing the loss values. In particular, the semantic similarity between the second sample semantic representation and the first sample semantic representation of each leaf node may be normalized, e.g., the semantic similarity may be normalized using softmax, without limitation. On the basis, the normalized semantic similarity is taken as a scalar with the length of the leaf node number, taking 4 leaf nodes as an example of a sample entity tree, the scalar can be expressed as [ 0.1.0.1.0.1.7 ], and a sample target entity is encoded by one-hot, for example, the 4 th leaf node is the sample target entity and can be encoded as [0 0 0 1], on the basis, the scalar and one-hot encoding can be processed by adopting a cross entropy loss function to obtain a loss value, and the specific calculation formula of the cross entropy loss function is not repeated here.
Different from the foregoing embodiment, by acquiring the first semantic representation of the leaf node of the entity tree and acquiring the second semantic representation of the text to be identified, so as to respectively acquire the semantic similarity between the second semantic representation and the first semantic representation of each leaf node, and using the leaf node corresponding to the highest semantic similarity as the target entity, the target entity can be determined only by the first semantic representation of the leaf child node of the entity and the second semantic representation of the text to be identified, so that the speed of searching the target entity can be improved, and the leaf node corresponding to the highest semantic similarity can be used as the target entity, which is also beneficial to improving the accuracy of the target entity.
Referring to fig. 8, fig. 8 is a schematic diagram of a frame of an embodiment of an electronic device 80 of the present application. The electronic device 80 comprises a memory 81 and a processor 82 coupled to each other, the memory 81 having stored therein program instructions, the processor 82 being adapted to execute the program instructions to implement the steps of any of the entity identification method embodiments described above. In particular, the electronic device 80 may include, but is not limited to: servers, desktop computers, notebook computers, cell phones, tablet computers, etc., are not limited herein.
In particular, the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the entity identification method embodiments described above. The processor 82 may also be referred to as a CPU (Central Processing Unit ). The processor 82 may be an integrated circuit chip having signal processing capabilities. The processor 82 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be commonly implemented by an integrated circuit chip.
In the embodiment of the present disclosure, the processor 82 is configured to obtain a text to be identified, and determine a knowledge domain to which the text to be identified belongs; the processor 82 is configured to determine a target entity matching the real intention of the text to be recognized using the entity tree matching the knowledge domain; the entity tree comprises a plurality of node entities, a preset relation exists between two connected node entities, and the target entity is from the plurality of node entities.
According to the scheme, the text to be recognized is obtained, the knowledge domain to which the text to be recognized belongs is determined, so that the entity tree matched with the knowledge domain is utilized to determine the target entity matched with the real intention of the text to be recognized, the entity tree comprises a plurality of node entities, a preset relationship exists between two connected node entities, and the target entity is from the plurality of node entities. Therefore, based on the entity tree matched with the knowledge field, the relation among a plurality of node entities related to the knowledge field can be considered in the process of identifying the target entity, so that the true intention in the text to be identified can be fully mined, and the accuracy of entity identification is improved.
In some disclosed embodiments, the text to be identified includes at least one keyword related to the actual intent, and the processor 82 is configured to perform any of the following: determining a target entity based on candidate paths containing key words in the entity tree; the candidate path comprises a plurality of sequentially connected node entities and preset relations between adjacent node entities; and determining the target entity by utilizing semantic relations between the text to be identified and the plurality of node entities.
Different from the foregoing embodiment, by acquiring the keyword related to the true intent included in the text to be recognized and determining the target entity based on the candidate path including the keyword in the entity tree, the fineness of searching the target entity in the entity tree can be improved, so that the accuracy of the target entity can be improved; the target entity is determined by utilizing the semantic relation between the text to be identified and the plurality of node entities, so that searching of candidate paths through key words can be avoided, calculation resources required by searching of the candidate paths can be reduced, the speed of searching the target entity in the entity tree can be improved, and the accuracy of searching the target entity can be improved due to searching of the target entity based on deep semantic.
In some disclosed embodiments, the processor 82 is configured to search the entity tree for candidate paths containing any one of the keywords; the last node entity of the candidate path is a leaf node of the entity tree; the processor 82 is configured to select one node entity from one of the candidate paths as a target entity based on the path depth of each candidate path.
Different from the foregoing embodiment, by searching candidate paths including any keyword in the entity tree, thereby selecting one node entity from one of the candidate paths as a target entity based on the path depth of each candidate path, it is possible to facilitate searching the target entity based on the candidate path including any keyword, thereby being able to facilitate improving the search fineness, and searching the target entity by the path depth of the candidate path, thereby being able to facilitate improving the accuracy of the target entity.
In some disclosed embodiments, the processor 82 is configured to cull candidate paths that do not contain all keywords.
In contrast to the foregoing embodiment, in the foregoing manner, by removing candidate paths that do not include all the keywords before determining the target path based on the path depth of the candidate paths, interference of candidate paths having low relevance to the true intent of the text to be identified on subsequent search target entities can be advantageously removed, so that efficiency and accuracy of searching the target entities can be advantageously improved.
In some disclosed embodiments, the processor 82 is configured to reject key terms of different names than any node entity.
Different from the embodiment, the method includes the steps of firstly eliminating the key words with different names from any node entity, searching candidate paths containing any key word in the entity tree on the basis of the key words, selecting one node entity from one candidate path with the path depth of each candidate path as a target entity, eliminating the key words which have no significance to searching the target entity before searching the candidate path, reducing the number of the candidate paths, greatly relieving the problem of candidate path explosion, reducing the resource load of searching the target entity, and improving the searching speed of the target entity.
In some disclosed embodiments, the preset relationship includes a parent-child relationship, and the inclusion range of the parent node entity is greater than the inclusion range of the child node entity, and the processor 82 is configured to determine the inclusion range of the at least one keyword using the entity tree, respectively; the processor 82 is configured to retain the keyword with the smallest included range and reject the keyword that is not retained.
Different from the foregoing embodiment, the entity tree is utilized to determine the inclusion range of at least one keyword, and retain the keyword with the minimum inclusion range, and reject the keyword that is not retained, and on this basis, the candidate paths of any keyword are searched for in the entity tree, so that, based on the path depth of each candidate path, a node entity is selected from one of the candidate paths, as a target entity, which can be beneficial to utilizing the inclusion range, and further consider the relationship between the node entities in the process of searching the target entity, which can be beneficial to further reducing the number of candidate paths, and further reducing the resource load of searching the target entity, and improving the searching speed of the target entity.
In some disclosed embodiments, the processor 82 is configured to select a candidate path with a greatest path depth as the target path; the processor 82 is configured to take the last node entity of the target path as the target entity.
Different from the foregoing embodiment, by selecting the candidate path with the greatest path depth as the target path and using the last node entity of the target path as the target entity, the complexity of searching the target entity can be reduced.
In some disclosed embodiments, the processor 82 is configured to determine the target entity using semantic similarity between the text to be identified and the plurality of node entities, respectively.
Different from the foregoing embodiment, by determining the target entity by using semantic similarity between the text to be recognized and the plurality of node entities, the accuracy of the searched target entity can be advantageously improved.
In some disclosed embodiments, the processor 82 is configured to obtain a first semantic representation of a leaf node of the entity tree and obtain a second semantic representation of text to be identified; the processor 82 is configured to obtain semantic similarity between the second semantic representation and the first semantic representation of each leaf node; the processor 82 is configured to take the leaf node corresponding to the highest semantic similarity as the target entity.
Different from the foregoing embodiment, by acquiring the first semantic representation of the leaf node of the entity tree and acquiring the second semantic representation of the text to be identified, so as to respectively acquire the semantic similarity between the second semantic representation and the first semantic representation of each leaf node, and using the leaf node corresponding to the highest semantic similarity as the target entity, the target entity can be determined only by the first semantic representation of the leaf child node of the entity and the second semantic representation of the text to be identified, so that the speed of searching the target entity can be improved, and the leaf node corresponding to the highest semantic similarity can be used as the target entity, which is also beneficial to improving the accuracy of the target entity.
In some disclosed embodiments, the processor 82 is configured to perform semantic extraction on the leaf nodes using the node semantic extraction network to obtain a first node semantic representation; and, the processor 82 is configured to perform semantic extraction on the entity paths of the leaf nodes by using the path semantic extraction network, so as to obtain a first path semantic representation; wherein the physical path of the leaf node comprises: the leaf node is connected with the entity node experienced by the root node of the entity tree, and the preset relation between the experienced entity nodes; the processor 82 is configured to fuse the first node semantic representation with the first path semantic representation to obtain a first semantic representation.
Different from the foregoing embodiment, the first node semantic representation of the leaf node is extracted through the node semantic extraction network, and the first path semantic representation of the entity path of the leaf node is extracted through the path semantic extraction network, and finally the first node semantic representation and the first path semantic representation are fused to obtain the first semantic representation, so that the first semantic representation not only contains the semantic information of the leaf node itself, but also contains the semantic information of the entity path, thereby being beneficial to improving the accuracy of the first semantic representation.
In some disclosed embodiments, the processor 82 is configured to perform semantic extraction on the text to be identified using the node semantic extraction network to obtain a second node semantic representation; and, the processor 82 is configured to perform semantic extraction on the text to be identified by using the path semantic extraction network to obtain a second path semantic representation; the processor 82 is configured to fuse the second node semantic representation with the second path semantic representation to obtain a second semantic representation.
Different from the foregoing embodiment, the second node semantic representation of the text to be recognized is extracted through the node semantic extraction network, and the second path semantic representation of the text to be recognized is extracted by using the path semantic extraction network, and finally the second node semantic representation and the second path semantic representation are fused to obtain the second semantic representation, so that semantic information of the text to be recognized can be extracted at the node level and the path cover, thereby being beneficial to improving the accuracy of the second semantic representation.
Referring to fig. 9, fig. 9 is a schematic diagram illustrating a frame of an embodiment of a storage device 90 of the present application. The storage device 90 stores program instructions 901 executable by the processor, the program instructions 901 for implementing steps in any of the above-described entity identification method embodiments.
According to the scheme, the relation among the plurality of node entities related to the knowledge field can be considered in the process of identifying the target entity, so that the true intention in the text to be identified can be fully mined, and the accuracy of entity identification is improved.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all or part of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims (12)

1. A method of entity identification, comprising:
acquiring a text to be identified, and determining the knowledge field to which the text to be identified belongs;
determining a target entity matched with the real intention of the text to be identified by using an entity tree matched with the knowledge field; the entity tree comprises a plurality of node entities, a preset relation exists between two connected node entities, the target entity is from the plurality of node entities, and the text to be identified comprises at least one keyword related to the real intention; the determining, by using an entity tree matching the knowledge domain, a target entity matching the true intent of the text to be identified, including any one of:
determining the target entity based on candidate paths containing the key words in the entity tree; the candidate paths comprise a plurality of sequentially connected node entities and preset relations between adjacent node entities;
and determining the target entity by utilizing semantic relations between the text to be identified and the plurality of node entities respectively.
2. The method of claim 1, wherein the determining the target entity based on candidate paths in the entity tree that contain the key terms comprises:
Searching candidate paths containing any key words in the entity tree; wherein the last node entity of the candidate path is a leaf node of the entity tree;
and selecting one node entity from one of the candidate paths as the target entity based on the path depth of each candidate path.
3. The method of claim 2, comprising, prior to said selecting one of said node entities from one of said candidate paths as said target entity based on path depth of each of said candidate paths:
and eliminating the candidate paths which do not contain all the key words.
4. The method of claim 2, wherein prior to said searching for candidate paths in the entity tree that contain any of the keywords, the method further comprises:
and eliminating key words with different names from any node entity.
5. The method of claim 2, wherein the predetermined relationship comprises a parent-child relationship, and wherein the inclusion scope of a parent node entity is greater than the inclusion scope of a child node entity, the method further comprising, prior to the searching for candidate paths in the entity tree that contain any of the keywords:
Determining the inclusion range of the at least one key word by using the entity tree;
and reserving the key words with the smallest included range, and eliminating the key words which are not reserved.
6. The method of claim 2, wherein the selecting one of the node entities from one of the candidate paths as the target entity based on the path depth of each of the candidate paths comprises:
selecting a candidate path with the greatest path depth as the target path;
and taking the last node entity of the target path as the target entity.
7. The method of claim 1, wherein determining the target entity using semantic relationships between the text to be identified and the plurality of node entities, respectively, comprises:
and determining the target entity by utilizing semantic similarity between the text to be identified and the plurality of node entities respectively.
8. The method of claim 7, wherein determining the target entity using semantic similarity between the text to be identified and the plurality of node entities, respectively, comprises:
Acquiring a first semantic representation of a leaf node of the entity tree, and acquiring a second semantic representation of the text to be identified;
respectively acquiring semantic similarity between the second semantic representation and the first semantic representation of each leaf node;
and taking the leaf node corresponding to the highest semantic similarity as the target entity.
9. The method of claim 8, wherein the obtaining a first semantic representation of a leaf node of the entity tree comprises:
carrying out semantic extraction on the leaf nodes by using a node semantic extraction network to obtain a first node semantic representation; the method comprises the steps of,
carrying out semantic extraction on the entity paths of the leaf nodes by using a path semantic extraction network to obtain a first path semantic representation; wherein the physical path of the leaf node comprises: the leaf node is connected with the entity node experienced by the root node of the entity tree and the preset relation between the experienced entity nodes;
and fusing the first node semantic representation and the first path semantic representation to obtain the first semantic representation.
10. The method of claim 8, wherein the obtaining the second semantic representation of the text to be identified comprises:
Carrying out semantic extraction on the text to be identified by using a node semantic extraction network to obtain a second node semantic representation; the method comprises the steps of,
carrying out semantic extraction on the text to be identified by using a path semantic extraction network to obtain a second path semantic representation;
and fusing the second node semantic representation and the second path semantic representation to obtain the second semantic representation.
11. An electronic device comprising a memory and a processor coupled to each other, the memory having stored therein program instructions for executing the program instructions to implement the entity identification method of any one of claims 1 to 10.
12. A storage device storing program instructions executable by a processor for implementing the entity identification method of any one of claims 1 to 10.
CN202011487574.8A 2020-12-16 2020-12-16 Entity identification method, electronic equipment and storage device Active CN112668334B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011487574.8A CN112668334B (en) 2020-12-16 2020-12-16 Entity identification method, electronic equipment and storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011487574.8A CN112668334B (en) 2020-12-16 2020-12-16 Entity identification method, electronic equipment and storage device

Publications (2)

Publication Number Publication Date
CN112668334A CN112668334A (en) 2021-04-16
CN112668334B true CN112668334B (en) 2024-02-13

Family

ID=75405687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011487574.8A Active CN112668334B (en) 2020-12-16 2020-12-16 Entity identification method, electronic equipment and storage device

Country Status (1)

Country Link
CN (1) CN112668334B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522935B (en) * 2023-03-29 2024-03-29 北京德风新征程科技股份有限公司 Text data processing method, processing device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297888A (en) * 2019-06-27 2019-10-01 四川长虹电器股份有限公司 A kind of domain classification method based on prefix trees and Recognition with Recurrent Neural Network
CN111079437A (en) * 2019-12-20 2020-04-28 深圳前海达闼云端智能科技有限公司 Entity identification method, electronic equipment and storage medium
CN111160041A (en) * 2019-12-30 2020-05-15 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN111813914A (en) * 2020-07-13 2020-10-23 龙马智芯(珠海横琴)科技有限公司 Question-answering method and device based on dictionary tree, recognition equipment and readable storage medium
CN111831830A (en) * 2020-07-01 2020-10-27 腾讯科技(深圳)有限公司 Knowledge graph entity domain conflict detection method and device and related equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9244909B2 (en) * 2012-12-10 2016-01-26 General Electric Company System and method for extracting ontological information from a body of text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297888A (en) * 2019-06-27 2019-10-01 四川长虹电器股份有限公司 A kind of domain classification method based on prefix trees and Recognition with Recurrent Neural Network
CN111079437A (en) * 2019-12-20 2020-04-28 深圳前海达闼云端智能科技有限公司 Entity identification method, electronic equipment and storage medium
CN111160041A (en) * 2019-12-30 2020-05-15 科大讯飞股份有限公司 Semantic understanding method and device, electronic equipment and storage medium
CN111831830A (en) * 2020-07-01 2020-10-27 腾讯科技(深圳)有限公司 Knowledge graph entity domain conflict detection method and device and related equipment
CN111813914A (en) * 2020-07-13 2020-10-23 龙马智芯(珠海横琴)科技有限公司 Question-answering method and device based on dictionary tree, recognition equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Named Entity Extraction for Knowledge Graphs: A Literature Overview;TAREQ AL-MOSLMI等;IEEEAccess(第8期);32862-32881 *
融合多类型特征的特定领域实体识别研究;雷树杰等;计算机应用与软件;第36卷(第11期);210-217 *

Also Published As

Publication number Publication date
CN112668334A (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN106156365B (en) A kind of generation method and device of knowledge mapping
US8782061B2 (en) Scalable lookup-driven entity extraction from indexed document collections
Li et al. Automatic instrument recognition in polyphonic music using convolutional neural networks
CN110196901A (en) Construction method, device, computer equipment and the storage medium of conversational system
Wan et al. Long-length legal document classification
CN108027814B (en) Stop word recognition method and device
CN109508373B (en) Method and device for calculating enterprise public opinion index and computer readable storage medium
US20220261545A1 (en) Systems and methods for producing a semantic representation of a document
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN111832305A (en) User intention identification method, device, server and medium
KR101887629B1 (en) system for classifying and opening information based on natural language
CN109101551B (en) Question-answer knowledge base construction method and device
CN114911929A (en) Classification model training method, text mining equipment and storage medium
CN110795561B (en) Automatic identification system for electronic file material types and autonomous learning method thereof
CN111353026A (en) Intelligent law attorney assistant customer service system
CN112527955A (en) Data processing method and device
Kebe et al. A spoken language dataset of descriptions for speech-based grounded language learning
CN112668334B (en) Entity identification method, electronic equipment and storage device
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN109660621A (en) A kind of content delivery method and service equipment
CN109672586A (en) A kind of DPI service traffics recognition methods, device and computer readable storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
CN111783425A (en) Intention identification method based on syntactic analysis model and related device
CN115062135B (en) Patent screening method and electronic equipment
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant