CN114678027A - Error correction method and device for voice recognition result, terminal equipment and storage medium - Google Patents

Error correction method and device for voice recognition result, terminal equipment and storage medium Download PDF

Info

Publication number
CN114678027A
CN114678027A CN202011555097.4A CN202011555097A CN114678027A CN 114678027 A CN114678027 A CN 114678027A CN 202011555097 A CN202011555097 A CN 202011555097A CN 114678027 A CN114678027 A CN 114678027A
Authority
CN
China
Prior art keywords
information
keyword
recognition result
initial
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011555097.4A
Other languages
Chinese (zh)
Inventor
盛佳琦
任希佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL New Technology Co Ltd
Original Assignee
Shenzhen TCL New Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL New Technology Co Ltd filed Critical Shenzhen TCL New Technology Co Ltd
Priority to CN202011555097.4A priority Critical patent/CN114678027A/en
Priority to PCT/CN2021/140162 priority patent/WO2022135414A1/en
Publication of CN114678027A publication Critical patent/CN114678027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method, a device, terminal equipment and a storage medium for correcting a voice recognition result, wherein the method comprises the following steps: acquiring a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information; determining target keywords matched with the initial keywords according to the initial keywords; and replacing the initial keyword with the target keyword, associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result. The invention can correct the initial keywords corresponding to the intention information in the voice recognition result, determine the target keywords corresponding to the intention information, and associate the target keywords with the intention information, thereby realizing error correction and facilitating more accurate voice execution.

Description

Error correction method and device for voice recognition result, terminal equipment and storage medium
Technical Field
The present invention relates to the field of speech recognition error correction technologies, and in particular, to a method and an apparatus for correcting an error of a speech recognition result, a terminal device, and a storage medium.
Background
With the development of Automatic Speech Recognition (ASR) technology and natural language understanding (NLP) technology in the field of artificial intelligence, human-machine dialogues, such as chat robots, are also being applied to various terminal devices. The process of the terminal equipment in conversation is to convert the voice of the user into a text through an automatic speech recognition Algorithm (ASR), further to recognize the intention and key information of the user through a natural speech understanding algorithm, and finally to execute the voice instruction of the user or return the required information. However, due to defects of the device itself or accents of the dialect of the user, errors occur in the text converted from the speech, which further affects the intention understanding and the recognition of key information, and finally, a correct speech recognition result cannot be returned.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
The present invention is to provide a method, an apparatus, a terminal device and a storage medium for correcting a speech recognition result, aiming at solving the problem that recognition errors are likely to occur during speech recognition, which affects intention understanding and recognition of key information in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for correcting errors of speech recognition results, wherein the method includes:
acquiring a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information;
determining a target keyword matched with the initial keyword according to the initial keyword;
and replacing the initial keyword with the target keyword, associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result.
In one implementation, the obtaining a voice recognition result and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information includes:
acquiring voice request information, and determining text information corresponding to the voice request information according to the voice request information;
determining the voice recognition result according to the text information;
performing word segmentation processing on the voice recognition result to obtain a word segmentation result;
and obtaining intention information in the voice recognition result and an initial keyword corresponding to the intention information according to the word segmentation result.
In one implementation, the determining, according to the initial keyword, a target keyword matched with the initial keyword includes:
matching the initial keywords with a preset mapping file, wherein target keywords corresponding to the initial keywords are arranged in the mapping file;
and if the matching is successful, acquiring a target keyword corresponding to the initial keyword.
In one implementation, the determining, according to the initial keyword, a target keyword matched with the initial keyword includes:
if the matching is unsuccessful, acquiring historical behavior data associated with the initial keyword according to the initial keyword;
and acquiring the text content with the use frequency greater than a preset threshold value in the historical behavior data, and taking the text content as the target keyword.
In one implementation, the determining, according to the initial keyword, a target keyword matched with the initial keyword includes:
acquiring entry information associated with the initial keyword according to the initial keyword;
and carrying out weight analysis on the entry information to determine the target keyword.
In one implementation manner, the obtaining entry information associated with the initial keyword according to the initial keyword includes:
respectively acquiring pinyin information, character information and character string information corresponding to the initial keyword;
and searching out first entry information corresponding to the pinyin information, second entry information corresponding to the text information and third entry information corresponding to the character string information according to the pinyin information, the text information and the character string information.
In one implementation, the performing weight analysis on the entry information and determining the target keyword includes:
carrying out duplication removal processing on the first entry information, the second entry information and the third entry information to obtain candidate entry information;
acquiring the weight corresponding to the candidate entry information, and calculating the score information of each candidate entry information;
and determining target entry information from the candidate entry information according to the score information, and taking the target entry information as the target keyword.
In one implementation, the method further comprises:
acquiring historical behavior data, and counting the use record of each target keyword in the historical behavior data;
determining the cancellation times of the intention information corresponding to each target keyword according to the use records, wherein the cancellation times are used for reflecting the times of canceling the execution of the operation corresponding to the intention information;
and if the cancellation times exceed a time threshold, disconnecting the association between the target keyword of which the cancellation times exceed the time threshold and the intention information.
In a second aspect, an embodiment of the present invention further provides an apparatus for correcting an error of a speech recognition result, where the apparatus includes:
the initial keyword determining module is used for acquiring a voice recognition result and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information;
the target keyword determining module is used for determining a target keyword matched with the initial keyword according to the initial keyword;
and the voice recognition result error correction module is used for replacing the initial keyword with the target keyword, associating the initial keyword with the intention information, forming a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result.
In a third aspect, an embodiment of the present invention further provides a terminal device, where the terminal device includes a memory, a processor, and an error correction program of a speech recognition result, which is stored in the memory and is executable on the processor, and when the processor executes the error correction program of the speech recognition result, the step of implementing the method for correcting an error of a speech recognition result according to any one of the foregoing schemes.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where an error correction program for a speech recognition result is stored on the computer-readable storage medium, and when the error correction program for the speech recognition result is executed by a processor, the steps of the method for correcting an error of a speech recognition result according to any one of the above schemes are implemented
Has the advantages that: compared with the prior art, the invention provides an error correction method of a voice recognition result. The initial keyword determined at this time is directly obtained from the result of the speech recognition, and the initial keyword may not completely match the intention information because there may be an error in the speech recognition. In order to accurately execute the voice instruction, the initial keyword needs to be corrected, and since the initial keyword is obtained from the voice recognition result, the initial keyword does not completely match the intention information, but there may be some voice recognition errors resulting in deviation between the initial keyword and the intention information. Therefore, the embodiment can determine the corresponding target keyword according to the initial keyword, the target keyword is obtained by performing error correction on the initial keyword, and the target keyword can eliminate the deviation between the initial keyword and the intention information and perfectly match the intention information. After the target keyword is obtained, the target keyword can replace the initial keyword and is associated with the intention information to form a corresponding relation between the intention information and the target keyword, and an error-corrected voice recognition result is obtained, so that the error correction of the voice recognition result is completed, the voice instruction can be accurately executed, and the accuracy of voice recognition is improved.
Drawings
Fig. 1 is a flowchart of a specific implementation of a method for correcting a speech recognition result according to an embodiment of the present invention.
Fig. 2 is a flowchart of determining an initial keyword in the method for correcting a speech recognition result according to the embodiment of the present invention.
Fig. 3 is a flowchart of a first embodiment of determining a target keyword in a method for correcting a speech recognition result according to an embodiment of the present invention.
Fig. 4 is a flowchart of a second embodiment of determining a target keyword in the method for correcting a speech recognition result according to the embodiment of the present invention.
Fig. 5 is a schematic block diagram of an apparatus for correcting a speech recognition result according to an embodiment of the present invention.
Fig. 6 is a schematic block diagram of an internal structure of a terminal device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
With the development of Automatic Speech Recognition (ASR) technology and Natural language understanding (NLP) technology in the field of artificial intelligence, human-computer interaction (such as chat robots) is also applied to various terminal devices. The user can control various devices with voice by "speaking", accomplish set tasks and be able to "chat" with the devices blindly. The process of the terminal equipment in conversation is to convert the voice of the user into a text through an automatic speech recognition Algorithm (ASR), further to recognize the intention and key information of the user through a natural speech understanding algorithm, and finally to execute the voice instruction of the user or return the required information. However, due to defects of the device itself or accents of the dialect of the user, errors occur in the text converted from the speech, which further affects the intention understanding and the recognition of key information, and finally, a correct speech recognition result cannot be returned.
In order to solve the problems in the prior art, the present embodiment provides an error correction method for a speech recognition result, and with the method of the present embodiment, after a speech instruction sent by a user is recognized, an error correction can be performed on the speech recognition result, so that the speech instruction can be accurately executed, and the accuracy of speech recognition is improved. In specific implementation, the embodiment first obtains a voice recognition result, and determines intention information in the voice recognition result and an initial keyword corresponding to the intention information. The initial keyword determined at this time is directly obtained from the result of the speech recognition, and the initial keyword may not completely match the intention information because there may be an error in the speech recognition. In order to accurately execute the voice instruction, the initial keyword needs to be corrected, and since the initial keyword is obtained from the voice recognition result, the initial keyword does not completely match the intention information, but there may be some voice recognition errors resulting in deviation between the initial keyword and the intention information. Therefore, the embodiment can determine the corresponding target keyword according to the initial keyword, the target keyword is obtained by performing error correction on the initial keyword, and the target keyword can eliminate the deviation between the initial keyword and the intention information and perfectly match with the intention information. After the target keyword is obtained, the target keyword can replace the initial keyword and is associated with the intention information to form a corresponding relation between the intention information and the target keyword, so that the error correction of the voice recognition result is completed.
For example, when the voice command sent by the user to the smart speaker is "i want to listen to a song of a schoolmate". And the speech recognition result is 'i want to listen to a song of a snowman'. The intention information obtained from the voice recognition result is 'listen to songs', and the initial keyword is 'Zhang Xue you'. The intelligent sound box can not search out the song of the Zhangxue friend because the Zhangxue friend is not a singer. Therefore, error correction is required for "zhangxue". The intelligent sound box can judge that the target keyword Zhang Xuezou is a target keyword, so that the target keyword Zhang Xuezou is associated with the intention information "listen to songs", and the corresponding relation between the target keyword Zhang Xuezou and the intention information "listen to songs" can be obtained, therefore, the new voice recognition result is the song of Zhang Xuezou, the error correction of the voice recognition result is realized, and the song of Zhang Xuezou can be played to the user by the intelligent sound box.
Exemplary method
The error correction method for the voice recognition result in the embodiment can be applied to terminal equipment, such as equipment such as a smart phone, a smart sound box, a smart television and the like. As shown in fig. 1, the method specifically includes the following steps:
step S100, obtaining a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information.
Since the present embodiment corrects the voice recognition result, the voice recognition result needs to be acquired first. In this embodiment, the voice recognition result is obtained by recognizing a voice command issued by a user, and includes intention information and an initial keyword corresponding to the intention information. The intention information refers to the user intention expressed in the speech recognition result, i.e. the requirement of the user is reflected, for example, in the above example, if the speech recognition result is "i want to listen to a song of a snowy friend", the intention information is "listen to a song". The initial keyword is obtained directly according to the voice recognition result, and the initial keyword corresponds to the intention information. The initial keyword refers to the content that the user intends to correspond to and needs to execute, and is equivalent to the slot position value in fig. 6. For example, in the above example, if the speech recognition result is "i want to listen to a song of a snowman", the initial keyword is "snowman", that is, the user wants to listen to a song of a singer who is a snowman. Therefore, according to the intention information and the initial keywords, the fact that the user demand reflected by the voice instruction sent by the user is a song of listening to a snowman can be obtained.
In one implementation, as shown in fig. 2, the step S100 includes the following steps:
step S101, acquiring voice request information, and determining text information corresponding to the voice request information according to the voice request information;
step S102, determining the voice recognition result according to the text information;
step S103, performing word segmentation processing on the voice recognition result to obtain a word segmentation result;
and step S104, obtaining intention information in the voice recognition result and an initial keyword corresponding to the intention information according to the word segmentation result.
In specific implementation, in order to obtain the voice recognition result, the embodiment needs to first obtain a voice instruction sent by a user, and perform voice recognition on the voice instruction, so as to obtain the voice recognition result. Specifically, the embodiment may first obtain the voice request information, where the voice request information is a voice instruction sent by the user, and the expression is the requirement of the user. In this embodiment, the acquisition of the voice request information may be acquired by a voice acquisition device, such as a microphone device. After the voice acquisition device on the terminal equipment acquires the voice request information, the voice request information can be a sentence or a paragraph, and the voice identification can be carried out on the voice request information to obtain a voice identification result. In one implementation, the embodiment may convert the speech in the speech request information into words through a speech recognition model (ASR), so as to obtain the text information corresponding to the speech request information. And when the text information is obtained, the recognition of the voice request information is finished.
Then, the present embodiment performs word segmentation processing on the text information to obtain the intention information and the keyword information. Since the text message is obtained by converting the voice request message into words, the text message reflects a sentence or a paragraph in the voice request message. By performing word segmentation processing on the text information, the text content in the text information can be cut to obtain a plurality of words or phrases, namely word segmentation results, and then semantic recognition is performed on the words or phrases, so that intention information and initial keywords included in the text information are determined. For example, when the text information corresponding to the voice request information is: and if the user wants to watch the movie Nezha, the text information is subjected to word segmentation processing, and a word segmentation result can be obtained: i want to watch the film and the Nezha, and then the intention information is 'watch the film' and the initial keyword is 'Nezha' according to the word segmentation result. Of course, after the word segmentation processing is performed in this embodiment, the obtained word segmentation result may be input to a preset combined extraction model of intention information and keywords, and the combined extraction model of intention information and keywords is used to extract intention information and keywords from the words obtained by the word segmentation. For example, the present embodiment will obtain the word segmentation result: i, want to see, movie and Nezha are input into the intention information and keyword combined extraction model, the obtained intention information is 'see movie', and the initial keyword is 'Nezha'
In this embodiment, after the text information is obtained, the present embodiment may perform error correction on the text information in advance, where the error correction is to perform error correction on a wrongly written word in the text information obtained after recognition according to the voice request information, so as to ensure that the text information can accurately reflect the user's requirements. In one implementation, the embodiment may use a pre-trained error correction model (such as Soft-mask-bert model) to correct the text information. If the text information has no wrongly written characters, the text information can be directly subjected to word segmentation processing to obtain the intention information and the initial keyword information. If the text information has wrongly written characters, the wrongly written characters need to be modified, and then word segmentation processing is performed on the modified text information.
And S200, determining a target keyword matched with the initial keyword according to the initial keyword.
Since there may be an error in speech recognition of the speech request information, the initial keyword may not completely match the intention information. In order to accurately execute the voice instruction, the initial keyword needs to be corrected, and since the initial keyword is obtained from the voice recognition result, the initial keyword does not completely match the intention information, but there may be some voice recognition errors resulting in deviation between the initial keyword and the intention information. Therefore, in this embodiment, a corresponding target keyword can be determined according to the initial keyword, where the target keyword is obtained by error correction of the initial keyword, and the target keyword refers to content that the user intends to really need to execute, for example, in the above example, if the voice recognition result is "i want to listen to a song of a snowman", the initial keyword is "snowman", and "snowman" is not a singer, so that the terminal device cannot search for the song of "snowman". Therefore, the correction of the Zhang Xuyou is needed, and the Zhang Xuyou is judged to be a Zhang Xuyou really corresponding to the Zhang Xuyou, namely the Zhang Xuyou is a target keyword. The target keyword in the present embodiment can eliminate the deviation between the initial keyword and the intention information, and perfectly match with the intention information.
In one implementation, as shown in fig. 3, the step S200 specifically includes the following steps:
step S201, matching the initial keywords with a preset mapping file, wherein target keywords corresponding to the initial keywords are arranged in the mapping file;
step S202, if the matching is successful, obtaining a target keyword corresponding to the initial keyword;
step S203, if the matching is unsuccessful, acquiring historical behavior data associated with the initial keyword according to the initial keyword;
and S204, acquiring the text content with the use frequency greater than a preset threshold value in the historical behavior data, and taking the text content as the target keyword.
In specific implementation, after the initial keyword is obtained, the embodiment first determines whether error correction needs to be performed on the initial keyword, and if the initial keyword does not need error correction, the embodiment can directly perform response operation according to the determined intention information and the initial keyword. For example, if the intention information is an instruction type intention (e.g., turning on or off a sound device), since the instruction type intention is clear and there is no ambiguity, the speech recognition result can be directly executed without correcting the speech recognition result. However, if the intention information is a search-type intention (such as a movie search, a music search, a poetry search, a takeaway restaurant search, a scenic spot name search, etc.), since the search-type intention is rich in content and ambiguous, the speech recognition result cannot be directly executed, and an error correction needs to be performed on the initial keyword.
In one implementation, in order to correct the error of the initial keyword, a preset mapping file may be used, where the mapping file is provided with target keywords corresponding to the initial keyword, that is, one initial keyword corresponds to one target keyword. Therefore, after the initial keyword is obtained, the initial keyword can be matched with the mapping file, and if the matching is successful, the matched target keyword is obtained, namely the keyword really corresponding to the intention information is found. For example, the mapping relationship between the initial keyword and the target keyword set in the mapping file includes:
foley warrior-fairy dream foley dog feline-dog MOCO
Notification-Notification of whether Green fat, Red lean and Shake Sen-Thomschideston
In this way, when the initial keyword is "no known", the target keyword is "whether or not the green fatness should be known" corresponding to "no known", and thus "whether or not the green fatness should be known" is obtained.
And when the initial keyword is matched with the mapping file, and the matching is unsuccessful, the mapping file does not have a target keyword corresponding to the initial keyword. At this time, the target keyword matched with the initial keyword information can be determined according to the historical behavior data. In specific implementation, the embodiment first obtains historical behavior data in the terminal device, and then obtains historical behavior data associated with the initial keyword information from the historical behavior data. The historical behavior data reflects the operation performed by the terminal device in the past period, that is, reflects the voice command received and executed in the past period, wherein the intention information and the corresponding text content (that is, the keyword corresponding to the intention information) are included. The historical behavior data associated with the initial keyword pair is analyzed in the embodiment, that is, the historical behavior data are all in certain relation with the initial keyword, for example, some text contents in the historical behavior data are the same as the initial keyword in harmony or similar in syllable, so the text contents are associated with the initial keyword. If the use frequency of a certain text content in the historical behavior data is greater than the preset threshold, it indicates that the text contents with the use frequency greater than the preset threshold are used when the terminal device executes the intention information corresponding to the initial keyword, so that it can be obtained that the text contents with the use frequency greater than the preset threshold are matched with the initial keyword, and the text contents with the use frequency greater than the preset threshold are only corresponding to the intention information, so that the text contents with the use frequency greater than the preset threshold are taken as the target keyword.
For example, when the speech recognition result is: i want to see a dumpling hall of movie milk. Wherein the intention information is: watch a movie. The initial keywords are: dumpling house of milk. And the mapping file does not have a target keyword corresponding to the milky dumpling hall. At the moment, the historical behavior data of the terminal equipment is obtained, and the historical behavior data contains a grandma dumpling hall which is similar to a grandma dumpling hall syllable. The 'grandma dumpling hall' film is watched by a user for more than 10 times, and is watched for more than 20% of length each time, the use frequency of the 'grandma dumpling hall' is over a threshold value, so that the 'grandma dumpling hall' and the 'milk dumpling hall' are matched, namely, when the terminal equipment responds to a voice recognition result of the 'I want to see the movie milk dumpling hall', a used keyword is the 'grandma dumpling hall', and therefore the 'grandma dumpling hall' can be used as the target keyword.
In another implementation manner, as shown in fig. 4, step S200 in this embodiment may further include:
step S21, obtaining entry information associated with the initial keyword according to the initial keyword;
and step S22, carrying out weight analysis on the entry information and determining the target keyword.
Since the user sends the voice request message randomly and the content of the request is also random, the terminal device may not necessarily determine and identify the target keyword corresponding to the initial keyword directly according to the mapping file, and therefore the target keyword corresponding to the initial keyword needs to be set temporarily in this embodiment. In specific implementation, in this embodiment, first, entry information associated with the initial keyword is obtained according to the initial keyword. In this embodiment, pinyin information, text information, and character string information corresponding to the initial keyword may be obtained respectively; and searching out first entry information corresponding to the pinyin information, second entry information corresponding to the text information and third entry information corresponding to the character string information according to the pinyin information, the text information and the character string information. That is, the present embodiment can provide three ways of searching for the entry information, which can respectively perform searching according to the pinyin information, the text information, and the character string information of the initial keyword, so as to obtain the first entry information, the second entry information, and the third entry information.
For example, when the initial keywords are: and the forest miraculous mirror searches entry information according to the pinyin information as follows: is personally on the scene.
When the initial keywords are: and (3) searching entry information according to the character information by the girls at the bridge edge, wherein the entry information is as follows: girl bridge.
The initial keywords are: and describing the world by Tianqi, the searched entry information according to the character string information is as follows: my world angel commentary. In which "celestial commentary my world" and "celestial commentary my world" have the same character string.
After the entry information is searched, the present embodiment may perform weight analysis on the entry information to determine the target keyword. In an implementation manner, in this embodiment, the duplication removal processing may be performed on the first entry information, the second entry information, and the third entry information, so as to prevent repeated entry information from being repeated, and obtain candidate entry information. Then, the weight corresponding to the candidate entry information is obtained, and the score information of each candidate entry information is calculated. And finally, determining target entry information from the candidate entry information according to the score information, and taking the target entry information as the target keyword. Specifically, in this embodiment, weights for performing entry information search according to pinyin information, text information, and character string information of the initial keyword, which are Wx, Wy, and Wz, respectively, may be preset, and each candidate entry information is searched based on the three search methods, so that this embodiment may obtain the weight, which is the weight corresponding to the candidate entry information. And each way of searching the entry information has a respective sorting way. The present embodiment may perform ranking according to the accuracy of the searched term information (i.e., the similarity between the searched term information and the initial keyword). Each candidate entry information is searched based on different search modes, so that the embodiment can sort each candidate entry information according to the accuracy of the search based on the three search modes to obtain respective sorting values (namely, for measuring accuracy), and further can calculate the score information of each candidate entry information searched based on the search modes according to the weight of each search mode. The formula is as follows:
score(A)=(Wx*XA+Wy*YA+Wz*ZA)/(Wx+Wy+Wz),
wherein, XAAs candidate entriesThe information A is a ranking value which is accurately ranked when being searched according to the pinyin information; wherein, YAThe ranking value is the accuracy ranking of the candidate entry information A during searching according to the character information; wherein Z isAAnd the ranking value is the ranking value of the candidate entry information A in the accurate ranking when searching according to the character string information.
When the score information of each candidate entry information is obtained through a calculation mode, according to the score information, the embodiment can determine the target entry information with the highest score, and the target entry information is the information which is closest to the initial keyword and corresponds to the intention information, so that the embodiment takes the target entry information as the target keyword. In one implementation manner, the weight in this embodiment may be set by customizing each piece of dimension information, so as to order the candidate entry information more accurately, for example, in a movie search, the weight may be set according to information such as a showing time, a click rate, a movie score, and the like of a movie, and each piece of candidate entry information is analyzed from multiple dimensions, so as to screen out a target keyword that can be most matched with the intention information, so as to implement error correction on the initial keyword more accurately. In addition, after the target keyword is determined by the method, since the mapping file does not have the target keyword corresponding to the initial keyword at first, the embodiment can add the corresponding relationship between the determined target keyword and the initial keyword matched with the determined target keyword to the mapping file, so that the target keyword can be directly used in the subsequent steps. For example, if the initial keyword is "mouse and mouse", after the three search modes and the weight analysis, the determined target keyword is "cat and mouse", so that the "mouse and mouse" and the corresponding relationship between the "cat and mouse" can be added to the mapping file, and if the initial keyword of "mouse and mouse" is corrected in the subsequent steps, the target keyword matched with the initial keyword can be directly obtained as "cat and mouse".
And step S300, replacing the initial keyword with the target keyword, associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result.
After the target keyword is determined, the target keyword is obtained by correcting the error of the initial keyword, and the target keyword reflects the content which is really required to be executed by the user intention. However, due to an error of the voice recognition, the initial keyword cannot be completely matched with the intention information, and the target keyword corresponding to the initial keyword is obtained by analyzing and is matched with the intention information, so that in order to realize error correction of the voice recognition result, the target keyword replaces the initial keyword, and the target keyword is associated with the intention information to obtain a corresponding relationship between the target keyword and the intention information, so that the voice recognition result has a corresponding relationship between the intention information and the target keyword, and the terminal device can perform corresponding operations according to the intention information and the target keyword.
In addition, after the corresponding relationship between the target keyword and the intention information is constructed, the embodiment may obtain historical behavior data in which all executed intention information and corresponding keywords are recorded for a period of time. Therefore, the embodiment can count the usage record of each target keyword in the historical behavior data; and then determining the canceling times of the intention information corresponding to each target keyword according to the usage record, wherein the canceling times are used for reflecting the times of canceling the execution of the operation corresponding to the intention information. For example, if the terminal device searches for the content corresponding to the intention information according to the target keyword and then the user issues an exit instruction, this indicates that the operation of executing the content required for searching for the intention information by using the target keyword is cancelled, that is, the content searched by using the target keyword may not be the content desired by the user. If the number of times of cancellation exceeds a threshold number of times (for example, three times), it means that the target keyword cannot search for the content required by the intention information, that is, the target keyword and the intention information are not completely matched, so the embodiment disconnects the association between the target keyword and the intention information, where the number of times of cancellation exceeds the threshold number of times, and deletes the correspondence between the target keyword and the initial keyword from the mapping file, so as to ensure that the corresponding operation is executed more accurately according to the voice recognition result.
In summary, the present embodiment provides an error correction method for a speech recognition result, and the present embodiment first obtains the speech recognition result and determines intention information in the speech recognition result and an initial keyword corresponding to the intention information. The initial keyword determined at this time is directly obtained from the result of the speech recognition, and the initial keyword may not completely match the intention information because there may be an error in the speech recognition. In order to accurately execute the voice instruction, the initial keyword needs to be corrected, and since the initial keyword is obtained from the voice recognition result, the initial keyword does not completely match the intention information, but there may be some voice recognition errors resulting in deviation between the initial keyword and the intention information. Therefore, the embodiment can determine the corresponding target keyword according to the initial keyword, the target keyword is obtained by performing error correction on the initial keyword, and the target keyword can eliminate the deviation between the initial keyword and the intention information and perfectly match the intention information. After the target keyword is obtained, the target keyword can replace the initial keyword and is associated with the intention information to form the corresponding relation between the intention information and the target keyword, so that the error correction of the voice recognition result is completed, the voice instruction can be accurately executed, and the accuracy of voice recognition is improved.
Exemplary device
As shown in fig. 5, an embodiment of the present invention provides an apparatus for correcting a speech recognition result, including: an initial keyword determination module 10, a target keyword determination module 20, and a speech recognition result error correction module 30. Specifically, the initial keyword determining module 10 is configured to obtain a speech recognition result, and determine intention information in the speech recognition result and an initial keyword corresponding to the intention information. And a target keyword determining module 20, configured to determine, according to the initial keyword, a target keyword that matches the initial keyword. And the voice recognition result error correction module 30 is configured to associate the target keyword with the intention information to form a corresponding relationship between the intention information and the target keyword, so as to obtain an error-corrected voice recognition result.
In one implementation, the initial keyword determination module 10 includes:
the text information acquisition unit is used for acquiring voice request information and determining text information corresponding to the voice request information according to the voice request information;
a recognition result determining unit configured to determine the speech recognition result according to the text information;
a word segmentation result acquisition unit, configured to perform word segmentation processing on the voice recognition result to obtain a word segmentation result;
and the initial keyword determining unit is used for obtaining intention information in the voice recognition result and an initial keyword corresponding to the intention information according to the word segmentation result.
In one implementation, the target keyword determination module 20 includes:
the mapping file matching unit is used for matching the initial keywords with a preset mapping file, and target keywords corresponding to the initial keywords are arranged in the mapping file;
and the target keyword determining unit is used for acquiring the target keyword corresponding to the initial keyword if the matching is successful.
In one implementation, the target keyword determination module 20 further includes:
the entry information acquisition unit is used for acquiring entry information related to the initial keyword according to the initial keyword;
and the weight analysis unit is used for carrying out weight analysis on the entry information and determining the target keyword.
Based on the above embodiments, the present invention further provides a terminal device, and a schematic block diagram thereof may be as shown in fig. 6. The terminal equipment comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein the processor of the terminal device is configured to provide computing and control capabilities. The memory of the terminal equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the terminal device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of error correction of speech recognition results. The display screen of the terminal equipment can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the terminal equipment is arranged in the terminal equipment in advance and used for detecting the operating temperature of the internal equipment.
It will be understood by those skilled in the art that the block diagram of fig. 6 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the terminal device to which the solution of the present invention is applied, and a specific terminal device may include more or less components than those shown in the figure, or may combine some components, or have different arrangements of components.
In one embodiment, a terminal device is provided, where the terminal device includes a memory, a processor, and an error correction program of a speech recognition result stored in the memory and executable on the processor, and when the processor executes the error correction program of the speech recognition result, the following operation instructions are implemented:
acquiring a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information;
determining a target keyword matched with the initial keyword according to the initial keyword;
and replacing the initial keyword with the target keyword, associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
In summary, the present invention discloses a method, an apparatus, a terminal device and a storage medium for correcting a speech recognition result, wherein the method comprises: acquiring a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information; determining a target keyword matched with the initial keyword according to the initial keyword; and replacing the initial keyword with the target keyword, associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result. The invention can correct the initial keywords corresponding to the intention information in the voice recognition result, determine the target keywords corresponding to the intention information, and associate the target keywords with the intention information, thereby realizing error correction and facilitating more accurate voice execution.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for correcting a speech recognition result, the method comprising:
acquiring a voice recognition result, and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information;
determining a target keyword matched with the initial keyword according to the initial keyword;
and replacing the initial keyword with the target keyword, and associating the target keyword with the intention information to form a corresponding relation between the intention information and the target keyword so as to obtain an error-corrected voice recognition result.
2. The method for correcting the speech recognition result according to claim 1, wherein the obtaining the speech recognition result and determining intention information in the speech recognition result and an initial keyword corresponding to the intention information comprises:
acquiring voice request information, and determining text information corresponding to the voice request information according to the voice request information;
determining the voice recognition result according to the text information;
performing word segmentation processing on the voice recognition result to obtain a word segmentation result;
and obtaining intention information in the voice recognition result and an initial keyword corresponding to the intention information according to the word segmentation result.
3. The method for correcting the speech recognition result according to claim 1, wherein the determining the target keyword matched with the initial keyword according to the initial keyword comprises:
matching the initial keywords with a preset mapping file, wherein target keywords corresponding to the initial keywords are arranged in the mapping file;
and if the matching is successful, acquiring a target keyword corresponding to the initial keyword.
4. The method for correcting the speech recognition result according to claim 3, wherein the determining the target keyword matched with the initial keyword according to the initial keyword comprises:
if the matching is unsuccessful, acquiring historical behavior data associated with the initial keyword according to the initial keyword;
and acquiring the text content with the use frequency greater than a preset threshold value in the historical behavior data, and taking the text content as the target keyword.
5. The method for correcting the speech recognition result according to claim 1, wherein the determining the target keyword matched with the initial keyword according to the initial keyword comprises:
acquiring entry information associated with the initial keyword according to the initial keyword;
and carrying out weight analysis on the entry information to determine the target keyword.
6. The method for correcting the speech recognition result according to claim 5, wherein the obtaining entry information associated with the initial keyword according to the initial keyword comprises:
respectively acquiring pinyin information, character information and character string information corresponding to the initial keyword;
and searching out first entry information corresponding to the pinyin information, second entry information corresponding to the text information and third entry information corresponding to the character string information according to the pinyin information, the text information and the character string information.
7. The method of claim 6, wherein the performing a weight analysis on the entry information to determine the target keyword comprises:
carrying out duplication removal processing on the first entry information, the second entry information and the third entry information to obtain candidate entry information;
acquiring the weight corresponding to the candidate entry information, and calculating the score information of each candidate entry information;
and determining target entry information from the candidate entry information according to the score information, and taking the target entry information as the target keyword.
8. The method of correcting a speech recognition result according to any one of claims 1 to 7, further comprising:
acquiring historical behavior data, and counting the use record of each target keyword in the historical behavior data;
determining the cancellation times of the intention information corresponding to each target keyword according to the use records, wherein the cancellation times are used for reflecting the times of canceling and executing the operation corresponding to the intention information;
and if the cancellation times exceed a time threshold, disconnecting the association between the target keyword of which the cancellation times exceed the time threshold and the intention information.
9. An apparatus for correcting a speech recognition result, comprising:
the initial keyword determining module is used for acquiring a voice recognition result and determining intention information in the voice recognition result and an initial keyword corresponding to the intention information;
the target keyword determining module is used for determining a target keyword matched with the initial keyword according to the initial keyword;
and the voice recognition result error correction module is used for replacing the initial keyword with the target keyword, associating the initial keyword with the intention information, forming a corresponding relation between the intention information and the target keyword, and obtaining an error-corrected voice recognition result.
10. A terminal device, characterized in that the terminal device comprises a memory, a processor and an error correction program of a speech recognition result stored in the memory and operable on the processor, and the processor implements the steps of the method for correcting an error of a speech recognition result according to any one of claims 1-8 when executing the error correction program of a speech recognition result.
11. A computer-readable storage medium, on which an error correction program for a speech recognition result is stored, which when executed by a processor implements the steps of the method for error correction of a speech recognition result according to any one of claims 1 to 8.
CN202011555097.4A 2020-12-24 2020-12-24 Error correction method and device for voice recognition result, terminal equipment and storage medium Pending CN114678027A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011555097.4A CN114678027A (en) 2020-12-24 2020-12-24 Error correction method and device for voice recognition result, terminal equipment and storage medium
PCT/CN2021/140162 WO2022135414A1 (en) 2020-12-24 2021-12-21 Speech recognition result error correction method and apparatus, and terminal device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011555097.4A CN114678027A (en) 2020-12-24 2020-12-24 Error correction method and device for voice recognition result, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114678027A true CN114678027A (en) 2022-06-28

Family

ID=82069634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011555097.4A Pending CN114678027A (en) 2020-12-24 2020-12-24 Error correction method and device for voice recognition result, terminal equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114678027A (en)
WO (1) WO2022135414A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229975A (en) * 2023-03-17 2023-06-06 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene
CN118506783A (en) * 2024-07-18 2024-08-16 珠海市枫逸科技有限公司 Agent-based voice interaction method and related device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246633B (en) * 2023-05-12 2023-07-21 深圳市宏辉智通科技有限公司 Wireless intelligent Internet of things conference system
CN116580408B (en) * 2023-06-06 2023-11-03 上海任意门科技有限公司 Image generation method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101892734B1 (en) * 2013-01-04 2018-08-28 한국전자통신연구원 Method and apparatus for correcting error of recognition in speech recognition system
CN106710592B (en) * 2016-12-29 2021-05-18 北京奇虎科技有限公司 Voice recognition error correction method and device in intelligent hardware equipment
CN107045496B (en) * 2017-04-19 2021-01-05 畅捷通信息技术股份有限公司 Error correction method and error correction device for text after voice recognition
CN109243433B (en) * 2018-11-06 2021-07-09 北京百度网讯科技有限公司 Speech recognition method and device
CN111696545B (en) * 2019-03-15 2023-11-03 北京汇钧科技有限公司 Speech recognition error correction method, device and storage medium
CN110956959B (en) * 2019-11-25 2023-07-25 科大讯飞股份有限公司 Speech recognition error correction method, related device and readable storage medium
CN111241814B (en) * 2019-12-31 2023-04-28 中移(杭州)信息技术有限公司 Error correction method and device for voice recognition text, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229975A (en) * 2023-03-17 2023-06-06 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene
CN116229975B (en) * 2023-03-17 2023-08-18 杭州盈禾嘉田科技有限公司 System and method for voice reporting of field diseases and insect pests in intelligent interaction scene
CN118506783A (en) * 2024-07-18 2024-08-16 珠海市枫逸科技有限公司 Agent-based voice interaction method and related device
CN118506783B (en) * 2024-07-18 2024-09-27 珠海市枫逸科技有限公司 Agent-based voice interaction method and related device

Also Published As

Publication number Publication date
WO2022135414A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
CN110148416B (en) Speech recognition method, device, equipment and storage medium
US11100921B2 (en) Pinyin-based method and apparatus for semantic recognition, and system for human-machine dialog
US11055355B1 (en) Query paraphrasing
CN114678027A (en) Error correction method and device for voice recognition result, terminal equipment and storage medium
US7603279B2 (en) Grammar update system and method for speech recognition
US9454525B2 (en) Information extraction in a natural language understanding system
CN111462748B (en) Speech recognition processing method and device, electronic equipment and storage medium
CN111737979B (en) Keyword correction method, device, correction equipment and storage medium for voice text
US11615787B2 (en) Dialogue system and method of controlling the same
CN114154487A (en) Text automatic error correction method and device, electronic equipment and storage medium
CN111508497B (en) Speech recognition method, device, electronic equipment and storage medium
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
CN113297366A (en) Multi-turn dialogue emotion recognition model training method, device, equipment and medium
CN113254613A (en) Dialogue question-answering method, device, equipment and storage medium
CN114550718A (en) Hot word speech recognition method, device, equipment and computer readable storage medium
CN112767921A (en) Voice recognition self-adaption method and system based on cache language model
JP5897718B2 (en) Voice search device, computer-readable storage medium, and voice search method
CN114242047A (en) Voice processing method and device, electronic equipment and storage medium
CN110020429A (en) Method for recognizing semantics and equipment
CN111639160A (en) Domain identification method, interaction method, electronic device and storage medium
CN116913279A (en) Voice instruction recognition method and device, electronic equipment and vehicle
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN113570404B (en) Target user positioning method, device and related equipment
US11823671B1 (en) Architecture for context-augmented word embedding
CN114566147A (en) Speech evaluation method, computer device, storage medium, and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination