CN113139387B - Semantic error correction method, electronic device and storage medium - Google Patents

Semantic error correction method, electronic device and storage medium Download PDF

Info

Publication number
CN113139387B
CN113139387B CN202010461387.6A CN202010461387A CN113139387B CN 113139387 B CN113139387 B CN 113139387B CN 202010461387 A CN202010461387 A CN 202010461387A CN 113139387 B CN113139387 B CN 113139387B
Authority
CN
China
Prior art keywords
entity
text data
corrected
determining
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010461387.6A
Other languages
Chinese (zh)
Other versions
CN113139387A (en
Inventor
陆江
张文
张荣斐
谢光剑
陈浩
陆世民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2020/126494 priority Critical patent/WO2021143299A1/en
Publication of CN113139387A publication Critical patent/CN113139387A/en
Application granted granted Critical
Publication of CN113139387B publication Critical patent/CN113139387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application is applicable to the technical field of computers, and particularly relates to a semantic error correction method based on artificial intelligence, electronic equipment and a storage medium, wherein the semantic error correction method comprises the following steps: acquiring text data, extracting a first entity, a second entity and Guan Jici for associating the first entity and the second entity in the text data, determining the entity to be corrected and the entity after correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and a relation word, determining the text data after correction according to the entity to be corrected and the entity after correction, thereby realizing correction of the text data, acquiring correct semantics of the text data, and determining user intention.

Description

Semantic error correction method, electronic device and storage medium
Technical Field
The application belongs to the technical field of computers, and particularly relates to a semantic error correction method based on artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), electronic equipment and a storage medium.
Background
The existing text recognition method generally determines the intention of a user through a hot word matching method, and the hot word matching method cannot correct errors in time when text data input by the user are wrong because of timeliness of the hot word.
Disclosure of Invention
The embodiment of the application provides a semantic error correction method, electronic equipment and a storage medium, which can correct the acquired error text data.
In a first aspect, an embodiment of the present application provides a semantic error correction method, including: acquiring text data; extracting a first entity, a second entity and Guan Jici for associating the first entity and the second entity in the text data; determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and Guan Jici; and determining the text data after error correction according to the entity to be subjected to error correction and the entity after error correction.
In the above embodiment, the first entity, the second entity and Guan Jici for associating the first entity and the second entity in the text data are extracted by obtaining the text data, and the entity to be corrected and the entity after error correction corresponding to the entity to be corrected are determined from the first entity and the second entity according to the preset knowledge graph and the relation word.
In a possible implementation manner of the first aspect, the determining, according to a preset knowledge graph and Guan Jici, an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity includes:
Determining a third entity associated with the first entity in the knowledge graph according to the relational word, and determining a fourth entity associated with the second entity in the knowledge graph; and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
The first entity and the third entity are associated in the knowledge graph through the relation words, the second entity and the fourth entity are associated in the knowledge graph through the relation words, namely, the entity and the relation words extracted from the text data are compared with the knowledge graph, the entity to be corrected and the entity after the correction corresponding to the entity to be corrected are determined, and the accuracy of the determined entity to be corrected and the entity after the correction is improved.
In a possible implementation manner of the first aspect, the determining, according to the third entity and the fourth entity, an entity to be corrected and an entity after error correction corresponding to the entity to be corrected, includes:
And calculating the first similarity between the first entity and the fourth entity and the second similarity between the second entity and the third entity. The first similarity is calculated according to the edit distance between the first entity and the fourth entity, and the second similarity is calculated according to the edit distance between the second entity and the third entity.
If the first similarity is greater than the second similarity, that is, the similarity between the first entity and the fourth entity is higher, the first entity is indicated to be in error, and the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after correction;
If the second similarity is greater than the first similarity, that is, the similarity between the second entity and the third entity is higher, the second entity is error, and the second entity and the third entity are respectively used as an entity to be corrected and an entity after correction.
In a possible implementation manner of the first aspect, the determining, according to the relational word, a third entity associated with the first entity in the knowledge-graph, and determining, according to the relational word, a fourth entity associated with the second entity in the knowledge-graph, includes:
If the relation word does not exist in the knowledge graph, acquiring a synonym corresponding to the relation word from a preset synonym word stock, determining a third entity associated with the first entity in the knowledge graph according to the synonym, and determining a fourth entity associated with the second entity in the knowledge graph, so that the application range of semantic error correction is enlarged, and the accuracy of semantic error correction is improved.
In a possible implementation manner of the first aspect, the determining, by the first entity, before the relational word in the text data and the second entity, after the relational word in the text data, the third entity associated with the first entity in the knowledge-graph according to the relational word, and determining, by the fourth entity associated with the second entity in the knowledge-graph, includes:
If a preset reverse relation word library contains a reverse word corresponding to the relation word, determining a third entity associated with the first entity in the knowledge graph according to the relation word, and determining a fourth entity associated with the second entity in the knowledge graph according to the reverse word so as to improve the efficiency of semantic error correction.
In a possible implementation manner of the first aspect, before the extracting the first entity, the second entity, and the relational term for associating the first entity and the second entity in the text data, the method further includes:
Carrying out semantic recognition on the text data;
correspondingly, the extracting the first entity, the second entity and the relation word for associating the first entity and the second entity in the text data comprises the following steps:
And if the text data has semantic errors, extracting a first entity, a second entity and related words for associating the first entity and the second entity in the text data.
In the above embodiment, when the text data has a semantic error, the first entity, the second entity and the related words used for associating the first entity and the second entity in the text data are extracted to identify the entity with the error in the first entity and the second entity, so that the text data with the semantic error is corrected, the intention of the user is determined according to the text data after the error correction, and the user experience is improved.
In a second aspect, an embodiment of the present application provides a semantic error correction apparatus, including:
the acquisition module is used for acquiring text data;
an extraction module for extracting a first entity, a second entity and Guan Jici for associating the first entity and the second entity in the text data;
the determining module is used for determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and Guan Jici;
And the error correction module is used for determining error corrected text data according to the entity to be subjected to error correction and the entity subjected to error correction.
In a possible implementation manner of the second aspect, the determining module includes:
A first determining unit, configured to determine a third entity associated with the first entity in the knowledge graph according to the relational word, and determine a fourth entity associated with the second entity in the knowledge graph;
And the second determining unit is used for determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
In a possible implementation manner of the second aspect, the determining unit is specifically configured to:
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is greater than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after correction;
and if the second similarity is greater than the first similarity, respectively using the second entity and the third entity as an entity to be corrected and an entity after correction.
In a possible implementation manner of the second aspect, the determining unit is further configured to:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
In a possible implementation manner of the second aspect, the first determining unit is specifically configured to:
If the relation word does not exist in the knowledge graph, acquiring a synonym corresponding to the relation word from a preset synonym word stock;
and determining a third entity associated with the first entity in the knowledge graph according to the synonyms, and determining a fourth entity associated with the second entity in the knowledge graph.
In a possible implementation manner of the second aspect, the first determining unit is further configured to:
If a preset reverse word corresponding to the relationship word exists in a reverse word library, determining a third entity associated with the first entity in the knowledge graph according to the relationship word, and determining a fourth entity associated with the second entity in the knowledge graph according to the reverse word.
In a possible implementation manner of the second aspect, the semantic error correction apparatus further includes an identification module,
The identification module is used for carrying out semantic identification on the text data;
correspondingly, the extraction module is specifically configured to:
And if the text data has semantic errors, extracting a first entity, a second entity and related words for associating the first entity and the second entity in the text data.
In a possible implementation manner of the second aspect, the error correction module is further configured to:
and determining the user intention according to the text data after error correction.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the semantic error correction method as described in the first aspect above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the semantic error correction method as described in the first aspect above.
In a fifth aspect, an embodiment of the present application provides a computer program product, which when run on a terminal device causes the terminal device to perform the semantic error correction method described in the first aspect above.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a semantic error correction method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a knowledge graph according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a knowledge graph according to another embodiment of the present application;
FIG. 5 is a flow chart illustrating sub-steps of a semantic error correction method provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a knowledge graph according to yet another embodiment of the present application;
FIG. 7 is a schematic structural diagram of a semantic error correction device according to an embodiment of the present application;
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
The semantic error correction method provided by the embodiment of the application is applied to electronic equipment, wherein the electronic equipment can be mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal DIGITAL ASSISTANT, PDA), intelligent sound boxes and other terminal equipment, and the electronic equipment can also be a server, wherein the server can be a server cluster formed by a plurality of servers, or a cloud computing service center. The embodiment of the application does not limit the specific type of the electronic equipment.
As shown in fig. 1, in a possible implementation manner, the semantic error correction method provided by the embodiment of the present application is all implemented in the server 1, where the server 1 is in communication connection with the terminal device 2, for example, a mobile phone, and the terminal device 2 obtains voice data input by a user, sends the voice data to the server 1, and the server 1 converts the voice data into text data and identifies the semantics of the text data. If the text data has semantic errors, the server 1 extracts a first entity, a second entity and a relation word for associating the first entity and the second entity in the text data, a knowledge graph is stored on the server 1, the entity to be corrected and the entity after error correction corresponding to the entity to be corrected are determined from the first entity and the second entity according to the knowledge graph and Guan Jici, and because the accuracy of the knowledge graph is higher, the accuracy of the entity to be corrected and the entity after error correction determined according to the knowledge graph and the relation word is higher. And determining the corrected text data according to the corrected entity, so that the corrected text data can be corrected in time, identifying the intention of the user according to the corrected text data, and executing corresponding operation.
It should be noted that, in other possible implementation manners, the semantic error correction method provided by the embodiment of the present application may be implemented in the terminal device entirely or in part in the server and in part in the terminal device. For example, the knowledge graph is stored on the terminal device, the terminal device converts the voice data into text data, determines text data after error correction, and executes corresponding operations according to the text data after error correction. Or the knowledge graph is stored on the server, the terminal equipment converts the voice data into text data, after extracting a first entity, a second entity and related words in the text data, the server sends the corresponding knowledge graph to the terminal equipment according to the first entity, the second entity and the related words, and the terminal equipment determines the text data after error correction according to the knowledge graph and executes corresponding operation according to the text data after error correction.
The following describes the semantic error correction method provided by the embodiment of the present application in detail by taking the example that the semantic error correction method provided by the embodiment of the present application is all executed in a server.
As shown in fig. 2, the semantic error correction method provided by the embodiment of the application includes:
S101: text data is acquired.
The text data can be input into the terminal equipment by a user and sent to the server by the terminal equipment; or the terminal equipment collects the voice data, sends the voice data to the server, and then the server converts the voice data into text data. Specifically, after acquiring voice data sent by a terminal device, the server performs noise reduction processing on the voice data, extracts voice characteristics of the voice data, inputs the voice characteristics into a preset voice recognition model, and further recognizes text data.
S102: and carrying out semantic recognition on the text data.
Specifically, the server performs semantic recognition on the text data according to a preset semantic recognition model, recognizes the user intention corresponding to the text data, and executes corresponding operation according to the user intention. For example, if the identified text data is "what news exists today", the server searches for the news and then transmits the news to the terminal device, and the terminal device plays the corresponding news.
S103: and if the text data has semantic errors, extracting a first entity, a second entity and related words for associating the first entity and the second entity in the text data.
Specifically, if the text data has a semantic error, that is, a corresponding operation cannot be executed according to the identified semantic, extracting a first entity, a second entity and a relationship word from the text data with the semantic error, wherein one relationship word is associated with one first entity and one second entity.
In one possible implementation manner, if the text data has a semantic error, firstly, word segmentation is performed on the text data, each word in the text data is segmented, and the first entity, the second entity and the related word are extracted according to the part of speech of each word. The subject and object in each divided word are taken as entities, or the name of a person, the name of an article, the name of an organization, the name of a place, the name of a song or other proper nouns in the divided words are taken as entities, then the words associated with the first entity and the second entity are taken as related words, and the nouns between the first entity and the second entity are taken as related words, if no nouns exist between the first entity and the second entity, the related words are determined according to the relationship between the first entity and the second entity. After extracting the relationship words and entities, the entity located before Guan Jici in the text data is defined as a first entity, and the entity located after the relationship words is defined as a second entity. For example, the text data is "play song Y singed by singer Z", the server cannot find song Y corresponding to singer Z according to the identified semantics, determine that the text data has semantic errors, and word the text data to obtain "singer Z" as a name of a person, "song Y" as a name of a song, a first entity as "singer Z", a second entity as "song Y", guan Jici as "singing". For another example, the text data is "song Y playing singer Z", the server cannot find song Y corresponding to singer Z according to the identified semantics, determines that the text data has semantic errors, and performs word segmentation on the text data to obtain "singer Z" as a name, "song Y" as a name of song, the first entity is "singer Z", the second entity is "song Y", and determines that the relationship word is "singing" or "performing" according to the name of person and the name of song. For another example, the text data is "song C played by the host job celebrity B of the program a", the server cannot find the corresponding song according to the identified semantics, determine that the text data has semantic errors, word the text data, "program a" is the program name, "job celebrity B" is the name of the person, "song C" is the song name, "host", "singing" is the relation word, and for Guan Jici "host", the first entity is "program a" and the second entity is "job celebrity B"; for the relationship word "singing", the first entity is "job celebrity B", and the second entity is "song C".
In another possible implementation manner, if the text data has a semantic error, the text data is input into a preset entity extraction model, and the entity and the relation word in the text data are extracted. The entity extraction model is obtained by training a classification model by taking text data and first entities, second entities and related words corresponding to the text data as training samples and adopting a machine learning or deep learning algorithm, and the entity extraction model can extract the first entities, the second entities and the related words in the text data according to the input text data. For example, the terminal device determines that the text data has a semantic error according to the recognized semantics and no corresponding result is returned, wherein the text data is "the host of the program a is the host of the program B which has also host the program". The first entity extracted according to the entity extraction model is "program a", the second entity extracted is "job celebrity B", guan Jici is "moderator".
S104: and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and Guan Jici.
The knowledge graph is a pre-established relationship graph for describing association between the entities, and for different fields, for example, as shown in fig. 3, the knowledge graph is a knowledge graph of "job celebrity L", as shown in fig. 4, the knowledge graph is a knowledge graph of "guangdong province", two words at two ends of each line segment in the knowledge graph are entities, and the words associated with each two entities are relationship words, for example, in fig. 3, "job celebrity L", "actor, singer" are entities, "occupation" is a relationship word, "job celebrity L", "job celebrity Q" are entities, and "wife" is a relationship word. In FIG. 4, "Guangdong province", "Guangzhou" is an entity, "province" is a relationship word, "Guangdong province", "Shenzhen" is an entity, and "Economy" is a relationship word. It should be noted that fig. 3 and fig. 4 are only a part of the knowledge graph.
Comparing the first entity and the second entity extracted from the text data with the entities in the knowledge graph, determining the entity needing error correction in the first entity and the second entity, and determining the corresponding entity after error correction from the knowledge graph.
In one possible implementation, as shown in fig. 5, S104 includes S201 and S202.
S201: and determining a third entity associated with the first entity in the knowledge graph according to the relational word, and determining a fourth entity associated with the second entity in the knowledge graph.
Specifically, searching the same entities as the first entity and the second entity in the knowledge graph, determining a third entity corresponding to the first entity and the relation word in the knowledge graph, and a fourth entity corresponding to the second entity and the relation word, namely, in the knowledge graph, the first entity and the third entity are positioned at two ends of a line segment, and the first entity and the third entity are associated through the relation word; in the knowledge graph, the second entity and the fourth entity are positioned at two ends of a line segment, and the second entity and the fourth entity are related through a relational word.
For example, the identified text data is "which programs were also hosted by the host person celebrity B of program a", the first entity extracted is "program a", the second entity is "job celebrity B", guan Jici is "host", and there is a semantic error. In the knowledge graph shown in fig. 6, the same entity as the first entity, namely "program a", exists, and the third entity corresponding to the first entity "program a" and the relation word "host" in the knowledge graph is "job celebrity D". Similarly, the fourth entity corresponding to the second entity 'job celebrity B' and Guan Jici 'moderator' in the knowledge graph is determined to be 'program L'.
In one possible implementation manner, if the same relation word as the text data does not exist in the knowledge graph, the synonym corresponding to the relation word is obtained from a preset synonym word stock. Illustratively, "wife", "lady" in the synonym library are synonyms, and "father", "dad" are synonyms. For example, the text data identified is "news of dad king G of king E", and according to the identified semantics, the text data has a voice error, and no corresponding result is returned. Extracting an entity in text data with semantic errors, wherein the extracted first entity is 'king E', the extracted second entity is 'king G', guan Jici is 'dad', searching out that 'dad' does not exist in a relational word corresponding to the entity 'king E' which is the same as the first entity in a knowledge graph, and replacing 'dad' with the synonym 'father' when 'dad' exists in a synonym word stock, and determining that 'king H' is a third entity in the knowledge graph according to the first entity 'king E' and the relational word 'father'.
In one possible implementation manner, if a preset reverse word corresponding to the relationship word exists in a reverse word library, determining a third entity associated with the first entity in the knowledge graph according to the relationship word, and determining a fourth entity associated with the second entity in the knowledge graph according to the reverse word. Illustratively, in the reverse relation word library, "father", "dad" and "son" are reverse Guan Jici, and "wife" and "husband" are reverse relation words. For example, the identified text data is "news of dad king G of king E", no corresponding result is returned according to the identified semantics, the extracted first entity is "king E", the second entity is "king G", guan Jici is "dad". In the reverse relation word lexicon, a reverse word 'son' corresponding to 'dad' exists, a third entity determined from the knowledge graph according to the first entity 'king E' and Guan Jici 'dad' is 'king H', and a fourth entity determined from the knowledge graph according to the second entity 'king G' and the reverse word 'son' is 'king T'.
In other possible implementations, if the first entity and the second entity extracted from the text data do not exist in the knowledge graph, the third entity and the fourth entity may be determined according to the method of sub-graph matching. For example, selecting an entity and a relationship word with highest similarity with the first entity and the relationship word from the knowledge graph, and taking another entity associated with the selected entity through the relationship word as a third entity; and selecting the entity and the relation word with the highest similarity with the second entity and the relation word from the knowledge graph summary, and taking the other entity associated with the selected entity through the relation word as a fourth entity.
S202: and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
Specifically, according to the third entity and the fourth entity, determining the entity to be corrected in the first entity and the second entity, if the first entity is the entity to be corrected, the fourth entity is the entity after correction, and if the second entity is the entity to be corrected, the third entity is the entity after correction.
In one possible implementation, a first similarity of the first entity and the fourth entity, and a second similarity of the second entity and the third entity are calculated; if the first similarity is greater than the second similarity, that is, the similarity between the first entity and the fourth entity is higher, the intention of the user is that the fourth entity is possibly indicated, but the first entity is input by mistake, and the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after correction; if the second similarity is greater than the first similarity, that is, the similarity between the second entity and the third entity is higher, it is indicated that the intention of the user may be the third entity, but the user inputs the second entity by mistake, and the second entity and the third entity are respectively used as the entity to be corrected and the entity after correction. Wherein the first similarity and the second similarity are calculated according to an edit distance, which is a variable for representing a difference between two character strings.
For example, the identified text data is "which programs are still being hosted by the host person celebrity B of program a", the first entity extracted is "program a", the second entity is "job celebrity B", guan Jici is "host". If the third entity is the 'job celebrity D' according to the knowledge graph, the fourth entity is the 'program L'. And calculating the edit distance of the first entity 'program A' and the fourth entity 'program L', and the edit distance of the second entity 'work celebrity B' and the third entity 'work celebrity D'. If the difference between the character string of "work celebrity B" and the character string of "work celebrity D" is greater than the difference between the character string of "program a" and the character string of "program L", the edit distance of the second entity of "work celebrity B" and the third entity of "work celebrity D" is greater than the edit distance of the first entity of "program a" and the fourth entity of "program L", i.e., the second similarity is greater than the first similarity, the second entity of "work celebrity B" is used as the entity to be corrected, and the third entity of "work celebrity D" is used as the entity after correction.
For another example, the text data identified is "what population is in the economic special region of Guangzhou", the first entity extracted is "Guangzhou", the second entity is "Shenzhen", the relational term is "economic special region", and no entity corresponding to the first entity "Guangzhou" and the relational term "economic special region" is found in the knowledge graph, that is, the third entity is "empty" or "no entity is found", and the fourth entity is "Guangdong" according to the second entity "Shenzhen" and the relational term "economic special region". Since the edit distance of the first entity 'Guangzhou' and the fourth entity 'Guangdong' is greater than the edit distance of the second entity 'Shenzhen' and the third entity 'nonexistent', namely the first similarity is greater than the second similarity, the first entity 'Guangzhou' is taken as an entity to be corrected, and the fourth entity 'Guangdong' is taken as an entity after correction.
As another example, the text data identified is "how climate in los Angeles of capital" in the United states, "the first entity extracted is" United states, "the second entity is" los Angeles, "Guan Jici is" capital. In the knowledge graph, the first of the united states is washington and the second largest city of the united states is los angeles. Thus, according to the knowledge-graph, it is determined that the third entity corresponding to the first entity "united states" and Guan Jici "capital" is "washington" and the fourth entity corresponding to the second entity "los angeles" and Guan Jici "capital" is "absent". Since the edit distance of the first entity "usa" and the fourth entity "none" is smaller than the edit distance of the second entity "los angeles" and the third entity "washington", i.e. the first similarity is smaller than the second similarity, the second entity "los angeles" is used as the entity to be corrected, and the third entity "washington" is used as the entity after correction.
S105: and determining the text data after error correction according to the entity to be subjected to error correction and the entity after error correction.
Specifically, the entity to be corrected is replaced by the entity after correction, corrected text data is obtained, and the user intention is determined according to the corrected text data. For example, the identified text data is "which programs are still being hosted by the host person of the program a" and the second entity "the work celebrity B" is the entity to be corrected, and the third entity "the work celebrity D" is the entity after correction, and the text data after correction is "which programs are still being hosted by the host person of the program a" and the server searches for the corresponding programs according to the text data and sends to the terminal device. The terminal equipment outputs the corresponding search result in a voice or text mode. For another example, the identified text data is "what population number of the economic special area Shenzhen of Guangzhou", the first entity "Guangzhou" is the entity to be corrected, the fourth entity "Guangdong" is the entity after correction, the text data after correction is "what population number of the economic special area Shenzhen of Guangdong" and the server searches the corresponding result according to the text data and sends the search result to the terminal device.
In the above embodiment, by extracting the first entity, the second entity, and Guan Jici for associating the first entity and the second entity in the text data, the entity to be corrected and the entity after error correction corresponding to the entity to be corrected are determined from the first entity and the second entity according to the preset knowledge graph and the relational word, and because the accuracy of the knowledge graph is higher, the accuracy of the entity to be corrected and the entity after error correction determined according to the knowledge graph and the relational word is higher. And determining corrected text data according to the entity to be corrected and the corrected entity, so that the erroneous text data input by the user can be corrected in time, and the user intention can be accurately identified.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Corresponding to the semantic error correction method described in the above embodiments, fig. 7 shows a block diagram of the semantic error correction device according to an embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.
Referring to fig. 7, the apparatus includes:
an acquisition module 10 for acquiring text data;
an extraction module 20, configured to extract a first entity, a second entity, and Guan Jici for associating the first entity and the second entity in the text data;
a determining module 30, configured to determine an entity to be corrected and an corrected entity corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and the Guan Jici;
and the error correction module 40 is configured to determine error corrected text data according to the entity to be subjected to error correction and the entity subjected to error correction.
In one possible implementation, the determining module 30 includes:
A first determining unit, configured to determine a third entity associated with the first entity in the knowledge graph according to the relational word, and determine a fourth entity associated with the second entity in the knowledge graph;
And the second determining unit is used for determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
In a possible implementation manner, the determining unit is specifically configured to:
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is greater than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after correction;
and if the second similarity is greater than the first similarity, respectively using the second entity and the third entity as an entity to be corrected and an entity after correction.
In a possible implementation, the determining unit is further configured to:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
In one possible implementation manner, the first determining unit is specifically configured to:
If the relation word does not exist in the knowledge graph, acquiring a synonym corresponding to the relation word from a preset synonym word stock;
and determining a third entity associated with the first entity in the knowledge graph according to the synonyms, and determining a fourth entity associated with the second entity in the knowledge graph.
In a possible implementation manner, the first determining unit is further configured to:
If a preset reverse word corresponding to the relationship word exists in a reverse word library, determining a third entity associated with the first entity in the knowledge graph according to the relationship word, and determining a fourth entity associated with the second entity in the knowledge graph according to the reverse word.
In one possible implementation, the semantic error correction apparatus further comprises an identification module,
The identification module is used for carrying out semantic identification on the text data;
Correspondingly, the extraction module 20 is specifically configured to:
And if the text data has semantic errors, extracting a first entity, a second entity and related words for associating the first entity and the second entity in the text data.
In one possible implementation, the error correction module 40 is further configured to:
and determining the user intention according to the text data after error correction.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic apparatus of this embodiment includes: a processor 11 (only one processor is shown in fig. 8), a memory 12 and a computer program 13 stored in the memory 12 and executable on the processor 11, the processor 11 implementing the steps in the above-described embodiments of the semantic error correction method when executing the computer program 13, such as steps S101 to S105 shown in fig. 2. Or the processor 11, when executing the computer program 13, performs the functions of the modules/units of the device embodiments described above, e.g. the functions of the modules 10 to 40 shown in fig. 7.
Illustratively, the computer program 13 may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 11 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 13 in the terminal device.
It will be appreciated by those skilled in the art that fig. 8 is merely an example of an electronic device and is not meant to be limiting, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.
The Processor 11 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 12 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 12 may also be an external storage device of the terminal device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASHCARD) or the like, which are provided on the terminal device. Further, the memory 12 may also include both an internal storage unit and an external storage device of the terminal device. The memory 12 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 12 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random AccessMemory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A semantic error correction method, comprising:
acquiring text data;
extracting a first entity, a second entity and Guan Jici for associating the first entity and the second entity in the text data;
Determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and Guan Jici;
determining corrected text data according to the entity to be corrected and the corrected entity;
The determining, according to a preset knowledge graph and Guan Jici, an entity to be corrected and an corrected entity corresponding to the entity to be corrected from the first entity and the second entity, includes:
Determining a third entity associated with the first entity in the knowledge graph according to the relational word, and determining a fourth entity associated with the second entity in the knowledge graph;
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is greater than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after correction;
and if the second similarity is greater than the first similarity, respectively using the second entity and the third entity as an entity to be corrected and an entity after correction.
2. The semantic error correction method of claim 1, wherein the calculating a first similarity of the first entity to the fourth entity, a second similarity of the second entity to the third entity comprises:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
3. The semantic error correction method of claim 1, wherein the determining a third entity associated with the first entity in the knowledge-graph and a fourth entity associated with the second entity in the knowledge-graph based on the relational word comprises:
If the relation word does not exist in the knowledge graph, acquiring a synonym corresponding to the relation word from a preset synonym word stock;
and determining a third entity associated with the first entity in the knowledge graph according to the synonyms, and determining a fourth entity associated with the second entity in the knowledge graph.
4. The semantic error correction method of claim 1, wherein the first entity is located before the relational term in the text data, the second entity is located after the relational term in the text data, the determining a third entity associated with the first entity in the knowledge-graph according to the relational term, and determining a fourth entity associated with the second entity in the knowledge-graph comprise:
If a preset reverse word corresponding to the relationship word exists in a reverse word library, determining a third entity associated with the first entity in the knowledge graph according to the relationship word, and determining a fourth entity associated with the second entity in the knowledge graph according to the reverse word.
5. The semantic error correction method of claim 1, wherein prior to the extracting a first entity, a second entity, and a relational term for associating the first entity and the second entity in the text data, the method further comprises:
Carrying out semantic recognition on the text data;
correspondingly, the extracting the first entity, the second entity and the relation word for associating the first entity and the second entity in the text data comprises the following steps:
And if the text data has semantic errors, extracting a first entity, a second entity and related words for associating the first entity and the second entity in the text data.
6. The semantic error correction method according to claim 1, wherein after the determining of the error corrected text data from the entity to be error corrected and the error corrected entity, the method further comprises:
and determining the user intention according to the text data after error correction.
7. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the computer program.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.
CN202010461387.6A 2020-01-17 2020-05-27 Semantic error correction method, electronic device and storage medium Active CN113139387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/126494 WO2021143299A1 (en) 2020-01-17 2020-11-04 Semantic error correction method, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020100539578 2020-01-17
CN202010053957.8A CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN113139387A CN113139387A (en) 2021-07-20
CN113139387B true CN113139387B (en) 2024-06-14

Family

ID=71029073

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010053957.8A Pending CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium
CN202010461387.6A Active CN113139387B (en) 2020-01-17 2020-05-27 Semantic error correction method, electronic device and storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010053957.8A Pending CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium

Country Status (2)

Country Link
CN (2) CN111291571A (en)
WO (1) WO2021143299A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291571A (en) * 2020-01-17 2020-06-16 华为技术有限公司 Semantic error correction method, electronic device and storage medium
CN112016305B (en) * 2020-09-09 2023-03-28 平安科技(深圳)有限公司 Text error correction method, device, equipment and storage medium
CN112380848B (en) * 2020-11-19 2022-04-26 平安科技(深圳)有限公司 Text generation method, device, equipment and storage medium
CN112466307B (en) * 2020-11-19 2023-09-26 珠海格力电器股份有限公司 Voice replying method and device, storage medium and electronic device
CN113591457B (en) * 2021-07-30 2023-10-24 平安科技(深圳)有限公司 Text error correction method, device, equipment and storage medium
CN114048321B (en) * 2021-08-12 2024-08-13 湖南达德曼宁信息技术有限公司 Multi-granularity text error correction data set generation method, device and equipment
CN113938708B (en) * 2021-10-14 2024-04-09 咪咕文化科技有限公司 Live audio error correction method, device, computing equipment and storage medium
CN114817465A (en) * 2022-04-14 2022-07-29 海信电子科技(武汉)有限公司 Entity error correction method and intelligent device for multi-language semantic understanding
CN118709695A (en) * 2024-08-29 2024-09-27 网才科技(广州)集团股份有限公司 Knowledge proposition error correction method and system based on knowledge graph
CN118798201A (en) * 2024-09-12 2024-10-18 网才科技(广州)集团股份有限公司 New proposition error correction method and system based on large model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508390A (en) * 2018-12-28 2019-03-22 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220917A1 (en) * 2002-04-03 2003-11-27 Max Copperman Contextual search
KR101965887B1 (en) * 2017-04-07 2019-07-05 주식회사 카카오 Method for semantic rules generation and semantic error correction based on mass data, and error correction system implementing the method
CN107741928B (en) * 2017-10-13 2021-01-26 四川长虹电器股份有限公司 Method for correcting error of text after voice recognition based on domain recognition
CN110309258B (en) * 2018-03-15 2022-03-29 中国移动通信集团有限公司 Input checking method, server and computer readable storage medium
US10650098B2 (en) * 2018-06-26 2020-05-12 International Business Machines Corporation Content analyzer and recommendation tool
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN110489496B (en) * 2019-07-22 2024-09-10 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN111291571A (en) * 2020-01-17 2020-06-16 华为技术有限公司 Semantic error correction method, electronic device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map
CN109508390A (en) * 2018-12-28 2019-03-22 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment

Also Published As

Publication number Publication date
WO2021143299A1 (en) 2021-07-22
CN111291571A (en) 2020-06-16
CN113139387A (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN113139387B (en) Semantic error correction method, electronic device and storage medium
CN111046221A (en) Song recommendation method and device, terminal equipment and storage medium
CN112214576B (en) Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium
CN111460170B (en) Word recognition method, device, terminal equipment and storage medium
CN114970514A (en) Artificial intelligence based Chinese word segmentation method, device, computer equipment and medium
CN113033204A (en) Information entity extraction method and device, electronic equipment and storage medium
CN111738009B (en) Entity word label generation method, entity word label generation device, computer equipment and readable storage medium
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN114861635A (en) Chinese spelling error correction method, device, equipment and storage medium
CN115858773A (en) Keyword mining method, device and medium suitable for long document
CN114358979A (en) Hotel matching method and device, electronic equipment and storage medium
CN112748811A (en) English word input method and device
CN117235137B (en) Professional information query method and device based on vector database
CN109508390B (en) Input prediction method and device based on knowledge graph and electronic equipment
CN110335628B (en) Voice test method and device of intelligent equipment and electronic equipment
CN112541357B (en) Entity identification method and device and intelligent equipment
CN116542246A (en) Keyword quality inspection text-based method and device and electronic equipment
CN116361681A (en) Document classification method, device, computer equipment and medium based on artificial intelligence
CN113836378A (en) Data processing method and device
CN113204710A (en) Public opinion analysis method and device, terminal equipment and storage medium
CN109815996B (en) Scene self-adaptation method and device based on recurrent neural network
CN112786041A (en) Voice processing method and related equipment
CN113010642A (en) Semantic relation recognition method and device, electronic equipment and readable storage medium
CN112528646A (en) Word vector generation method, terminal device and computer-readable storage medium
CN113127636A (en) Method and device for selecting center point of text cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant