CN113139387A - Semantic error correction method, electronic device and storage medium - Google Patents

Semantic error correction method, electronic device and storage medium Download PDF

Info

Publication number
CN113139387A
CN113139387A CN202010461387.6A CN202010461387A CN113139387A CN 113139387 A CN113139387 A CN 113139387A CN 202010461387 A CN202010461387 A CN 202010461387A CN 113139387 A CN113139387 A CN 113139387A
Authority
CN
China
Prior art keywords
entity
error correction
corrected
text data
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010461387.6A
Other languages
Chinese (zh)
Inventor
陆江
张文
张荣斐
谢光剑
陈浩
陆世民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2020/126494 priority Critical patent/WO2021143299A1/en
Publication of CN113139387A publication Critical patent/CN113139387A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application is applicable to the technical field of computers, and particularly relates to a semantic error correction method based on artificial intelligence, an electronic device and a storage medium, wherein the semantic error correction method comprises the following steps: the method comprises the steps of obtaining text data, extracting a first entity, a second entity and relation words used for being associated with the first entity and the second entity in the text data, determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge map and the relation words, and determining text data after error correction according to the entity to be corrected and the entity after error correction, so that error correction of the text data is realized, correct semantics of the text data is obtained, and user intention is determined.

Description

Semantic error correction method, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a semantic error correction method based on Artificial Intelligence (AI), an electronic device, and a storage medium.
Background
The existing text recognition method generally determines the user intention through a hot word matching method, and due to the timeliness of hot words, the error can not be corrected in time when the text data input by a user is wrong through the hot word matching method.
Disclosure of Invention
The embodiment of the application provides a semantic error correction method, electronic equipment and a storage medium, which can correct the acquired error text data.
In a first aspect, an embodiment of the present application provides a semantic error correction method, including: acquiring text data; extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data; determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and the relation words; and determining the text data after error correction according to the entity to be corrected and the entity after error correction.
In the embodiment, the text data is acquired, the first entity, the second entity and the relation words used for associating the first entity and the second entity in the text data are extracted, the entity to be corrected and the corrected entity corresponding to the entity to be corrected are determined from the first entity and the second entity according to the preset knowledge map and the relation words, the accuracy of the knowledge map is high, the accuracy of the entity to be corrected and the corrected entity determined according to the knowledge map and the relation words is high, and the corrected text data is determined according to the entity to be corrected and the corrected entity, so that the text data can be corrected in time when errors occur, and the real intention of a user can be recognized.
In a possible implementation manner of the first aspect, the determining, according to a preset knowledge graph and the relation term, an entity to be error-corrected and an error-corrected entity corresponding to the entity to be error-corrected from the first entity and the second entity includes:
determining a third entity in the knowledge-graph associated with the first entity and a fourth entity in the knowledge-graph associated with the second entity according to the relation words; and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
The first entity and the third entity are associated in the knowledge graph through the relation words, the second entity and the fourth entity are associated in the knowledge graph through the relation words, namely the entity extracted from the text data and the relation words are compared with the knowledge graph, the entity to be corrected and the entity after error correction corresponding to the entity to be corrected are determined, and the accuracy of the determined entity to be corrected and the entity after error correction is improved.
In a possible implementation manner of the first aspect, the determining, according to the third entity and the fourth entity, an entity to be error-corrected and an error-corrected entity corresponding to the entity to be error-corrected includes:
and calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity. And calculating the first similarity according to the editing distance between the first entity and the fourth entity, and calculating the second similarity according to the editing distance between the second entity and the third entity.
If the first similarity is greater than the second similarity, namely the similarity between the first entity and the fourth entity is higher, the first entity is wrong, and the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after error correction;
and if the second similarity is greater than the first similarity, namely the similarity between the second entity and the third entity is higher, indicating that the second entity is wrong, and respectively using the second entity and the third entity as an entity to be corrected and an entity after error correction.
In a possible implementation manner of the first aspect, the determining, according to the relation term, a third entity in the knowledge-graph that is associated with the first entity and a fourth entity in the knowledge-graph that is associated with the second entity includes:
if the relation words do not exist in the knowledge graph, obtaining synonyms corresponding to the relation words from a preset synonym thesaurus, determining a third entity related to the first entity in the knowledge graph according to the synonyms, and determining a fourth entity related to the second entity in the knowledge graph, so that the application range of semantic error correction is expanded, and the accuracy of semantic error correction is improved.
In a possible implementation manner of the first aspect, the determining, by the first entity, a third entity in the knowledge-graph associated with the first entity and determining, by the second entity, a fourth entity in the knowledge-graph associated with the second entity includes:
if a reverse word corresponding to the relation word exists in a preset reverse relation word library, determining a third entity in the knowledge graph associated with the first entity according to the relation word, and determining a fourth entity in the knowledge graph associated with the second entity according to the reverse word, so as to improve the efficiency of semantic error correction.
In a possible implementation manner of the first aspect, before the extracting the first entity, the second entity, and the relation word for associating the first entity and the second entity in the text data, the method further includes:
performing semantic recognition on the text data;
correspondingly, the extracting a first entity, a second entity and a relation word for associating the first entity and the second entity in the text data comprises:
and if the text data has semantic errors, extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data.
In the above embodiment, when the text data has a semantic error, the first entity, the second entity and the relation word for associating the first entity with the second entity in the text data are extracted to identify the erroneous entity in the first entity and the second entity, so that the text data having the semantic error is corrected, the user intention is determined according to the text data after error correction, and the user experience is improved.
In a second aspect, an embodiment of the present application provides a semantic error correction apparatus, including:
the acquisition module is used for acquiring text data;
the extraction module is used for extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data;
the determining module is used for determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and the relation words;
and the error correction module is used for determining the text data after error correction according to the entity to be error corrected and the entity after error correction.
In a possible implementation manner of the second aspect, the determining module includes:
the first determining unit is used for determining a third entity related to the first entity in the knowledge graph and determining a fourth entity related to the second entity in the knowledge graph according to the relation words;
and the second determining unit is used for determining an entity to be subjected to error correction and an entity subjected to error correction corresponding to the entity to be subjected to error correction according to the third entity and the fourth entity.
In a possible implementation manner of the second aspect, the determining unit is specifically configured to:
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is larger than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after error correction;
and if the second similarity is greater than the first similarity, the second entity and the third entity are respectively used as an entity to be corrected and an entity after error correction.
In a possible implementation manner of the second aspect, the determining unit is further configured to:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
In a possible implementation manner of the second aspect, the first determining unit is specifically configured to:
if the relation words do not exist in the knowledge graph, obtaining synonyms corresponding to the relation words from a preset synonym word library;
determining a third entity in the knowledge-graph associated with the first entity and determining a fourth entity in the knowledge-graph associated with the second entity based on the synonyms.
In a possible implementation manner of the second aspect, the first determining unit is further configured to:
if a reverse word corresponding to the relation word exists in a preset reverse relation word library, determining a third entity in the knowledge graph associated with the first entity according to the relation word, and determining a fourth entity in the knowledge graph associated with the second entity according to the reverse word.
In one possible implementation manner of the second aspect, the semantic error correction apparatus further includes an identification module,
the recognition module is used for performing semantic recognition on the text data;
correspondingly, the extraction module is specifically configured to:
and if the text data has semantic errors, extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data.
In a possible implementation manner of the second aspect, the error correction module is further configured to:
and determining the user intention according to the corrected text data.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the semantic error correction method as described in the first aspect above when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program implements the semantic error correction method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the semantic error correction method according to the first aspect.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a semantic error correction method provided in an embodiment of the present application;
FIG. 3 is a schematic illustration of a knowledge-graph provided by an embodiment of the present application;
FIG. 4 is a schematic illustration of a knowledge-graph provided in another embodiment of the present application;
FIG. 5 is a flow chart illustrating sub-steps of a semantic error correction method provided by an embodiment of the present application;
FIG. 6 is a schematic illustration of a knowledge-graph provided in accordance with yet another embodiment of the present application;
FIG. 7 is a schematic structural diagram of a semantic error correction apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
The semantic error correction method provided by the embodiment of the application is applied to electronic equipment, and the electronic equipment can be a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, a super-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), a smart sound box and other terminal devices, and the electronic equipment can also be a server, and the server can be a server, or a server cluster composed of a plurality of servers, or a cloud computing service center. The embodiment of the present application does not set any limit to the specific type of the electronic device.
As shown in fig. 1, in a possible implementation manner, the semantic error correction method provided in this embodiment is entirely executed in a server 1, the server 1 is in communication connection with a terminal device 2, such as a mobile phone, the terminal device 2 obtains voice data input by a user, sends the voice data to the server 1, and the server 1 converts the voice data into text data and recognizes semantics of the text data. If the text data has semantic errors, the server 1 extracts a first entity, a second entity and relation words used for associating the first entity with the second entity in the text data, a knowledge map is stored on the server 1, an entity to be corrected and an error-corrected entity corresponding to the entity to be corrected are determined from the first entity and the second entity according to the knowledge map and the relation words, and due to the fact that the accuracy of the knowledge map is high, the accuracy of the entity to be corrected and the error-corrected entity determined according to the knowledge map and the relation words is high. And determining the text data after error correction according to the entity after error correction, thereby correcting the text data with errors in time, identifying the intention of the user according to the text data after error correction, and executing corresponding operation.
It should be noted that, in other possible implementation manners, the semantic error correction method provided in the embodiment of the present application may also be completely executed in the terminal device, or partially executed in the server and partially executed in the terminal device. For example, the knowledge map is stored in the terminal device, the terminal device converts the voice data into text data, determines the text data after error correction, and performs corresponding operations according to the text data after error correction. Or the knowledge graph is stored on the server, the terminal equipment converts the voice data into text data, after a first entity, a second entity and a relation word in the text data are extracted, the server sends the corresponding knowledge graph to the terminal equipment according to the first entity, the second entity and the relation word, the terminal equipment determines the text data after error correction according to the knowledge graph, and corresponding operation is executed according to the text data after error correction.
The semantic error correction method provided by the embodiment of the present application is described in detail below by taking an example that all of the semantic error correction method provided by the embodiment of the present application is executed in a server.
As shown in fig. 2, the semantic error correction method provided in the embodiment of the present application includes:
s101: text data is acquired.
The text data can be input into the terminal equipment by a user and sent to the server by the terminal equipment; or the terminal equipment collects voice data, the voice data are sent to the server, and the server converts the voice data into text data. Specifically, after acquiring voice data sent by the terminal device, the server performs noise reduction processing on the voice data, extracts voice features of the voice data, inputs the voice features into a preset voice recognition model, and then recognizes text data.
S102: and performing semantic recognition on the text data.
Specifically, the server performs semantic recognition on the text data according to a preset semantic recognition model, recognizes a user intention corresponding to the text data, and executes corresponding operation according to the user intention. For example, if the identified text data is "what news is available today", after the server searches for the news, the server sends the news to the terminal device, and the terminal device plays the corresponding news.
S103: and if the text data has semantic errors, extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data.
Specifically, if the text data has semantic errors, corresponding operations cannot be executed according to the identified semantics, and a first entity, a second entity and relation words are extracted from the text data with the semantic errors, wherein one relation word is associated with one first entity and one second entity.
In a possible implementation manner, if the text data has semantic errors, the text data is firstly segmented into words, each word in the text data is segmented, and the first entity, the second entity and the relation word are extracted according to the part of speech of each word. Exemplarily, a subject and an object in each segmented word are taken as entities, or a person name, an article name, a mechanism name, a place name, a song name or other proper nouns in each segmented word are taken as entities, and then words associating the first entity and the second entity are taken as relation words, exemplarily, nouns between the first entity and the second entity are taken as relation words, and if no nouns exist between the first entity and the second entity, the relation words are determined according to the relation between the first entity and the second entity. After the relation words and the entities are extracted, the entities located in front of the relation words in the text data are defined as first entities, and the entities located behind the relation words in the text data are defined as second entities. For example, the text data is "song Y played by singer Z", the server cannot find the song Y corresponding to the singer Z according to the identified semantics, it is determined that the text data has semantic errors, the text data is segmented, and the obtained "singer Z" is the name of the person, "song Y" is the name of the song, the first entity is "singer Z", the second entity is "song Y", and the relation term is "singing". For another example, the text data is "song Y of singer Z is played", the server cannot find the song Y corresponding to the singer Z according to the identified semantics, it is determined that the text data has semantic errors, the text data is segmented, and it is obtained that "singer Z" is the name, "song Y" is the name of song, the first entity is "singer Z", the second entity is "song Y", and the relation word is determined as "singing" or "performing" according to the name of person and the name of song. For another example, the text data is "song C sung by the celebrity B of the host who plays the program a", the server cannot find the corresponding song according to the recognized semantics, it is determined that the text data has semantic errors, the text data is segmented, the "program a" is the program name, the "celebrity B" is the name of the person, the "song C" is the name of the song, "the host" and the "singing" are related words, and for the related word "host", the first entity is "program a", and the second entity is "celebrity B"; for the relationship word "singing," the first entity is "workcelebrity B," and the second entity is "song C.
In another possible implementation manner, if the text data has semantic errors, the text data is input into a preset entity extraction model, and entities and relation words in the text data are extracted. The preset entity extraction model is obtained by training the classification model by using a machine learning or deep learning algorithm and taking a first entity, a second entity and relation words corresponding to the text data as training samples, and the entity extraction model can extract the first entity, the second entity and the relation words in the text data according to the input text data. For example, the terminal device recognizes that the text data is ' which programs are still hosted by the host of the program a ' as celebrity B ' according to the voice, and returns no corresponding result according to the recognized semantics to determine that the text data has semantic errors. The first entity extracted according to the entity extraction model is 'program A', the second entity extracted is 'work celebrity B', and the relation word is 'host'.
S104: and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and the relation words.
The knowledge graph is a pre-established relationship graph for describing associations among entities, and different knowledge graphs exist in different fields, for example, as shown in fig. 3, the knowledge graph is a knowledge graph of "work celebrity L", as shown in fig. 4, the knowledge graph is a knowledge graph of "guangdong province", two terms at two ends of each line segment in the knowledge graph are entities, and terms associated with each two entities are related terms, for example, as shown in fig. 3, the "work celebrity L", "actor, singer" are entities, the "profession" is related terms, the "work celebrity L", "work celebrity Q" are entities, and the "wife" is related terms. In fig. 4, "guangdong province", "guangzhou" is an entity, "province" is a relation word, "guangdong province", "Shenzhen" is an entity, and "economic special region" is a relation word. It should be noted that fig. 3 and 4 are only a part of the knowledge-graph.
And comparing the first entity and the second entity extracted from the text data with the entities in the knowledge graph to determine the entities needing error correction in the first entity and the second entity, and determining the corresponding entities after error correction from the knowledge graph.
In one possible implementation, as shown in fig. 5, S104 includes S201 and S202.
S201: determining a third entity in the knowledge-graph associated with the first entity according to the relation words, and determining a fourth entity in the knowledge-graph associated with the second entity.
Specifically, searching entities in the knowledge graph, which are the same as the first entity and the second entity, determining a third entity corresponding to the first entity and the relation word and a fourth entity corresponding to the second entity and the relation word in the knowledge graph, namely in the knowledge graph, the first entity and the third entity are located at two ends of a line segment, and the first entity and the third entity are related through the relation word; in the knowledge graph, the second entity and the fourth entity are located at two ends of a line segment, and the second entity and the fourth entity are associated through relation words.
For example, the recognized text data is "which programs are still hosted by the host celebrity B of the program a", semantic errors exist, the extracted first entity is "program a", the extracted second entity is "job celebrity B", and the relation word is "host". The same entity as the first entity, namely "program a", exists in the knowledge-graph shown in fig. 6, and the third entity corresponding to the first entity "program a" and the relation term "moderator" in the knowledge-graph is "celebrity D". Similarly, the fourth entity corresponding to the second entity 'workmanship B' and the relation word 'moderator' in the knowledge graph is determined to be 'program L'.
In a possible implementation manner, if the relation words same as those in the text data do not exist in the knowledge graph, the synonyms corresponding to the relation words are obtained from a preset synonym library. Illustratively, "wife", "husband" in the synonym library are synonyms, and "father", "dad" are synonyms. For example, the recognized text data is "news of dad and king G of king E", and according to the recognized semantics, the text data has a speech error and no corresponding result is returned. Extracting entities in text data with semantic errors, wherein the extracted first entity is 'king E', the extracted second entity is 'king G', the relation word is 'dad', searching out the entity 'king E' in the knowledge graph, which is the same as the first entity, the relation word corresponding to the 'king E' does not have 'dad', the synonym 'father' of the 'dad' exists in the synonym library, replacing the 'dad' with the synonym 'father', and determining a third entity in the knowledge graph according to the first entity 'king E' and the relation word 'father' as 'king H'.
In a possible implementation manner, if a reverse word corresponding to the relation word exists in a preset reverse relation word library, a third entity associated with the first entity in the knowledge graph is determined according to the relation word, and a fourth entity associated with the second entity in the knowledge graph is determined according to the reverse word. Illustratively, in the inverse relationship thesaurus, "father", "dad" and "son" are inverse relationship words, and "wife" and "husband" are inverse relationship words. For example, the recognized text data is "news of dad king G of king E", no corresponding result is returned according to the recognized semantics, the extracted first entity is "king E", the second entity is "king G", and the relation word is "dad". In the reverse relation word library, a reverse word 'son' corresponding to 'dad' exists, a third entity determined from the knowledge graph according to the first entity 'king E' and the relation word 'dad' is 'king H', and a fourth entity determined from the knowledge graph according to the second entity 'king G' and the reverse word 'son' is 'king T'.
In other possible implementation manners, if the first entity and the second entity extracted from the text data do not exist in the knowledge graph, the third entity and the fourth entity may be determined according to a sub-graph matching method. For example, an entity and a relation word with the highest similarity to the first entity and the relation word are selected from the knowledge graph, and another entity related to the selected entity through the relation word is used as a third entity; and summarizing the knowledge graph to select an entity and a relation word with the highest similarity with the second entity and the relation word, and taking another entity related to the selected entity through the relation word as a fourth entity.
S202: and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
Specifically, according to the third entity and the fourth entity, an entity to be corrected in the first entity and the second entity is determined, if the first entity is the entity to be corrected, the fourth entity is the entity subjected to error correction, and if the second entity is the entity to be corrected, the third entity is the entity subjected to error correction.
In a possible implementation manner, a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity are calculated; if the first similarity is greater than the second similarity, that is, the similarity between the first entity and the fourth entity is higher, it indicates that the intention of the user may be the fourth entity, but the user incorrectly inputs the first entity, and the first entity and the fourth entity are respectively used as an entity to be error corrected and an entity after error correction; if the second similarity is greater than the first similarity, that is, the similarity between the second entity and the third entity is higher, it indicates that the user's intention may be the third entity, but the user incorrectly inputs the second entity, and the second entity and the third entity are respectively used as the entity to be error-corrected and the entity after error correction. Wherein the first similarity and the second similarity are calculated from an edit distance, which is a variable representing a difference between two character strings.
For example, the recognized text data is "which programs are still hosted by the host celebrity B of the program a", the extracted first entity is "the program a", the second entity is "the celebrity B", and the relation word is "the host". If the third entity is the "work celebrity D" and the fourth entity is the "program L" according to the knowledge-graph. The edit distance of the first entity program A and the fourth entity program L and the edit distance of the second entity work celebrity B and the third entity work celebrity D are calculated. If the difference between the character string "work celebrity B" and the character string "work celebrity D" is greater than the difference between the character string "program a" and the character string "program L", the edit distance between the second entity "work celebrity B" and the third entity "work celebrity D" is greater than the edit distance between the first entity "program a" and the fourth entity "program L", that is, the second similarity is greater than the first similarity, the second entity "work celebrity B" is taken as an entity to be corrected, and the third entity "work celebrity D" is taken as an entity after error correction.
For another example, the recognized text data is "what the population number of Shenzhen in the economic special region of Guangzhou", the extracted first entity is "Guangzhou", the second entity is "Shenzhen", the relation term is "economic special region", no entity corresponding to the first entity "Guangzhou" and the relation term "economic special region" exists in the knowledge graph, that is, the determined third entity is "empty" or "nonexistent", and the fourth entity determined according to the second entity "Shenzhen" and the relation term "economic special region" is "Guangdong". Because the edit distance between the first entity "Guangzhou" and the fourth entity "Guangdong" is greater than the edit distance between the second entity "Shenzhen" and the third entity "nonexistence", that is, the first similarity is greater than the second similarity, the first entity "Guangzhou" is taken as the entity to be error-corrected, and the fourth entity "Guangdong" is taken as the entity after error correction.
For another example, the recognized text data is "how the climate is in first-capital los angeles in the united states", the extracted first entity is "united states", the second entity is "los angeles", and the relation word is "first capital". In the knowledge-graph, the first city in the United states is Washington, and the second largest city in the United states is los Angeles. Thus, based on the knowledge-graph, it is determined that the third entity corresponding to the first entity "US" and the relationship "capital" is "Washington" and the fourth entity corresponding to the second entity "los Angeles" and the relationship "capital" is "absent". Because the edit distance of the first entity "usa" and the fourth entity "do not exist" is smaller than the edit distance of the second entity "los angeles" and the third entity "washington", that is, the first similarity is smaller than the second similarity, the second entity "los angeles" is taken as the entity to be corrected, and the third entity "washington" is taken as the entity after error correction.
S105: and determining the text data after error correction according to the entity to be corrected and the entity after error correction.
Specifically, the entity to be corrected is replaced by the entity after error correction to obtain text data after error correction, and the user intention is determined according to the text data after error correction. For example, if the recognized text data is that "which programs are still hosted by the celebrity B hosting the program a", the second entity "the celebrity B" is an entity to be corrected, and the third entity "the celebrity D" is an entity subjected to error correction, the text data subjected to error correction is that "which programs are still hosted by the celebrity D hosting the program a", and the server searches for the corresponding programs according to the text data and sends the programs to the terminal device. And the terminal equipment outputs the corresponding search result in a voice or text mode. For another example, the identified text data is ' what the population number of Shenzhen in the economic special region of Guangzhou ', the first entity ' Guangzhou ' is the entity to be corrected, the fourth entity ' Guangdong ' is the entity after error correction, the text data after error correction is ' what the population number of Shenzhen in the economic special region of Guangdong ' is ', the server searches the corresponding result according to the text data, and sends the search result to the terminal device.
In the above embodiment, by extracting the first entity, the second entity and the relation word for associating the first entity with the second entity in the text data, the entity to be corrected and the corrected entity corresponding to the entity to be corrected are determined from the first entity and the second entity according to the preset knowledge map and the relation word, and because the accuracy of the knowledge map is higher, the accuracy of the entity to be corrected and the corrected entity determined according to the knowledge map and the relation word is higher. And determining the text data after error correction according to the entity to be corrected and the entity after error correction, thereby correcting the wrong text data input by the user in time and accurately identifying the intention of the user.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the semantic error correction method described in the foregoing embodiment, fig. 7 shows a structural block diagram of the semantic error correction apparatus provided in the embodiment of the present application, and for convenience of description, only the part related to the embodiment of the present application is shown.
Referring to fig. 7, the apparatus includes:
an obtaining module 10, configured to obtain text data;
an extracting module 20, configured to extract a first entity, a second entity, and a relation word for associating the first entity and the second entity in the text data;
a determining module 30, configured to determine, according to a preset knowledge graph and the relation term, an entity to be error-corrected and an error-corrected entity corresponding to the entity to be error-corrected from the first entity and the second entity;
and the error correction module 40 is configured to determine the text data after error correction according to the entity to be error corrected and the entity after error correction.
In one possible implementation, the determining module 30 includes:
the first determining unit is used for determining a third entity related to the first entity in the knowledge graph and determining a fourth entity related to the second entity in the knowledge graph according to the relation words;
and the second determining unit is used for determining an entity to be subjected to error correction and an entity subjected to error correction corresponding to the entity to be subjected to error correction according to the third entity and the fourth entity.
In a possible implementation manner, the determining unit is specifically configured to:
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is larger than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after error correction;
and if the second similarity is greater than the first similarity, the second entity and the third entity are respectively used as an entity to be corrected and an entity after error correction.
In a possible implementation manner, the determining unit is further configured to:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
In a possible implementation manner, the first determining unit is specifically configured to:
if the relation words do not exist in the knowledge graph, obtaining synonyms corresponding to the relation words from a preset synonym word library;
determining a third entity in the knowledge-graph associated with the first entity and determining a fourth entity in the knowledge-graph associated with the second entity based on the synonyms.
In a possible implementation manner, the first determining unit is further configured to:
if a reverse word corresponding to the relation word exists in a preset reverse relation word library, determining a third entity in the knowledge graph associated with the first entity according to the relation word, and determining a fourth entity in the knowledge graph associated with the second entity according to the reverse word.
In a possible implementation manner, the semantic error correction device further comprises an identification module,
the recognition module is used for performing semantic recognition on the text data;
correspondingly, the extraction module 20 is specifically configured to:
and if the text data has semantic errors, extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data.
In a possible implementation manner, the error correction module 40 is further configured to:
and determining the user intention according to the corrected text data.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic apparatus of this embodiment includes: a processor 11 (only one processor is shown in fig. 8), a memory 12, and a computer program 13 stored in the memory 12 and operable on the processor 11, wherein the processor 11 executes the computer program 13 to implement the steps in the above-mentioned semantic error correction method embodiment, such as the steps S101 to S105 shown in fig. 2. Alternatively, the processor 11, when executing the computer program 13, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 10 to 40 shown in fig. 7.
Illustratively, the computer program 13 may be partitioned into one or more modules/units, which are stored in the memory 12 and executed by the processor 11 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 13 in the terminal device.
Those skilled in the art will appreciate that fig. 8 is merely an example of an electronic device and is not limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the electronic device may also include input output devices, network access devices, buses, etc.
The Processor 11 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 12 may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 12 may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a flash memory Card (FlashCard), and the like, which are provided on the terminal device. Further, the memory 12 may also include both an internal storage unit and an external storage device of the terminal device. The memory 12 is used for storing the computer program and other programs and data required by the terminal device. The memory 12 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A semantic error correction method, comprising:
acquiring text data;
extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data;
determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected from the first entity and the second entity according to a preset knowledge graph and the relation words;
and determining the text data after error correction according to the entity to be corrected and the entity after error correction.
2. The semantic error correction method according to claim 1, wherein the determining an entity to be error-corrected and an error-corrected entity corresponding to the entity to be error-corrected from the first entity and the second entity according to a preset knowledge graph and the relation term comprises:
determining a third entity in the knowledge-graph associated with the first entity and a fourth entity in the knowledge-graph associated with the second entity according to the relation words;
and determining an entity to be corrected and an entity after error correction corresponding to the entity to be corrected according to the third entity and the fourth entity.
3. The semantic error correction method of claim 2, wherein the determining an entity to be error corrected and an error corrected entity corresponding to the entity to be error corrected according to the third entity and the fourth entity comprises:
calculating a first similarity between the first entity and the fourth entity and a second similarity between the second entity and the third entity;
if the first similarity is larger than the second similarity, the first entity and the fourth entity are respectively used as an entity to be corrected and an entity after error correction;
and if the second similarity is greater than the first similarity, the second entity and the third entity are respectively used as an entity to be corrected and an entity after error correction.
4. The semantic error correction method of claim 3, wherein the calculating a first similarity of the first entity to the fourth entity and a second similarity of the second entity to the third entity comprises:
and calculating the first similarity according to the edit distance between the first entity and the fourth entity, and calculating the second similarity according to the edit distance between the second entity and the third entity.
5. The method of semantic error correction according to claim 2, wherein the determining a third entity in the knowledge-graph associated with the first entity and a fourth entity in the knowledge-graph associated with the second entity according to the relationship term comprises:
if the relation words do not exist in the knowledge graph, obtaining synonyms corresponding to the relation words from a preset synonym word library;
determining a third entity in the knowledge-graph associated with the first entity and determining a fourth entity in the knowledge-graph associated with the second entity based on the synonyms.
6. The method of semantic error correction according to claim 2, wherein the first entity is located before the relation in the text data and the second entity is located after the relation in the text data, and wherein the determining a third entity in the knowledge-graph associated with the first entity and a fourth entity in the knowledge-graph associated with the second entity according to the relation comprises:
if a reverse word corresponding to the relation word exists in a preset reverse relation word library, determining a third entity in the knowledge graph associated with the first entity according to the relation word, and determining a fourth entity in the knowledge graph associated with the second entity according to the reverse word.
7. The semantic error correction method of claim 1, wherein prior to the extracting a first entity, a second entity, and a relationship term for associating the first entity and the second entity in the text data, the method further comprises:
performing semantic recognition on the text data;
correspondingly, the extracting a first entity, a second entity and a relation word for associating the first entity and the second entity in the text data comprises:
and if the text data has semantic errors, extracting a first entity, a second entity and relation words for associating the first entity and the second entity in the text data.
8. The semantic error correction method of claim 1, wherein after the determining the error corrected text data from the entity to be error corrected and the error corrected entity, the method further comprises:
and determining the user intention according to the corrected text data.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 8 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.
CN202010461387.6A 2020-01-17 2020-05-27 Semantic error correction method, electronic device and storage medium Pending CN113139387A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/126494 WO2021143299A1 (en) 2020-01-17 2020-11-04 Semantic error correction method, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020100539578 2020-01-17
CN202010053957.8A CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113139387A true CN113139387A (en) 2021-07-20

Family

ID=71029073

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010053957.8A Pending CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium
CN202010461387.6A Pending CN113139387A (en) 2020-01-17 2020-05-27 Semantic error correction method, electronic device and storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202010053957.8A Pending CN111291571A (en) 2020-01-17 2020-01-17 Semantic error correction method, electronic device and storage medium

Country Status (2)

Country Link
CN (2) CN111291571A (en)
WO (1) WO2021143299A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113938708A (en) * 2021-10-14 2022-01-14 咪咕文化科技有限公司 Live audio error correction method and device, computing device and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291571A (en) * 2020-01-17 2020-06-16 华为技术有限公司 Semantic error correction method, electronic device and storage medium
CN112016305B (en) * 2020-09-09 2023-03-28 平安科技(深圳)有限公司 Text error correction method, device, equipment and storage medium
CN112380848B (en) * 2020-11-19 2022-04-26 平安科技(深圳)有限公司 Text generation method, device, equipment and storage medium
CN112466307B (en) * 2020-11-19 2023-09-26 珠海格力电器股份有限公司 Voice replying method and device, storage medium and electronic device
CN113591457B (en) * 2021-07-30 2023-10-24 平安科技(深圳)有限公司 Text error correction method, device, equipment and storage medium
CN114048321A (en) * 2021-08-12 2022-02-15 湖南达德曼宁信息技术有限公司 Multi-granularity text error correction data set generation method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
KR20180113849A (en) * 2017-04-07 2018-10-17 주식회사 카카오 Method for semantic rules generation and semantic error correction based on mass data, and error correction system implementing the method
CN109508390A (en) * 2018-12-28 2019-03-22 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map
US20190392036A1 (en) * 2018-06-26 2019-12-26 International Business Machines Corporation Content analyzer and recommendation tool

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220917A1 (en) * 2002-04-03 2003-11-27 Max Copperman Contextual search
CN110309258B (en) * 2018-03-15 2022-03-29 中国移动通信集团有限公司 Input checking method, server and computer readable storage medium
CN110489496A (en) * 2019-07-22 2019-11-22 腾讯科技(深圳)有限公司 A kind of data processing method, device, electronic equipment and storage medium
CN111291571A (en) * 2020-01-17 2020-06-16 华为技术有限公司 Semantic error correction method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180113849A (en) * 2017-04-07 2018-10-17 주식회사 카카오 Method for semantic rules generation and semantic error correction based on mass data, and error correction system implementing the method
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
US20190392036A1 (en) * 2018-06-26 2019-12-26 International Business Machines Corporation Content analyzer and recommendation tool
CN109522465A (en) * 2018-10-22 2019-03-26 国家电网公司 The semantic searching method and device of knowledge based map
CN109918640A (en) * 2018-12-22 2019-06-21 浙江工商大学 A kind of Chinese text proofreading method of knowledge based map
CN109508390A (en) * 2018-12-28 2019-03-22 北京金山安全软件有限公司 Input prediction method and device based on knowledge graph and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113938708A (en) * 2021-10-14 2022-01-14 咪咕文化科技有限公司 Live audio error correction method and device, computing device and storage medium
CN113938708B (en) * 2021-10-14 2024-04-09 咪咕文化科技有限公司 Live audio error correction method, device, computing equipment and storage medium

Also Published As

Publication number Publication date
CN111291571A (en) 2020-06-16
WO2021143299A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
CN113139387A (en) Semantic error correction method, electronic device and storage medium
CN111046221A (en) Song recommendation method and device, terminal equipment and storage medium
WO2021135455A1 (en) Semantic recall method, apparatus, computer device, and storage medium
CN112214576B (en) Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium
CN112115232A (en) Data error correction method and device and server
CN110162637B (en) Information map construction method, device and equipment
CN113032862A (en) Building information model checking method and device and terminal equipment
CN111738009B (en) Entity word label generation method, entity word label generation device, computer equipment and readable storage medium
CN114970514A (en) Artificial intelligence based Chinese word segmentation method, device, computer equipment and medium
CN109885180B (en) Error correction method and apparatus, computer readable medium
CN112748811A (en) English word input method and device
CN111858966A (en) Knowledge graph updating method and device, terminal equipment and readable storage medium
CN109508390B (en) Input prediction method and device based on knowledge graph and electronic equipment
CN112541357B (en) Entity identification method and device and intelligent equipment
CN110598112A (en) Topic recommendation method and device, terminal equipment and storage medium
CN115544204A (en) Bad corpus filtering method and system
CN115544214A (en) Event processing method and device and computer readable storage medium
CN112528646B (en) Word vector generation method, terminal device and computer-readable storage medium
CN113850643A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN114528824A (en) Text error correction method and device, electronic equipment and storage medium
CN113836378A (en) Data processing method and device
CN113010642A (en) Semantic relation recognition method and device, electronic equipment and readable storage medium
CN111104790B (en) Method, apparatus, device and computer readable medium for extracting key relation
CN111241240B (en) Industry keyword extraction method and device
CN116205236B (en) Data rapid desensitization system and method based on entity naming identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination