CN114360549A - Voice recognition error correction method and device, electronic equipment and storage medium - Google Patents

Voice recognition error correction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114360549A
CN114360549A CN202111642506.9A CN202111642506A CN114360549A CN 114360549 A CN114360549 A CN 114360549A CN 202111642506 A CN202111642506 A CN 202111642506A CN 114360549 A CN114360549 A CN 114360549A
Authority
CN
China
Prior art keywords
text
information
corrected
error correction
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111642506.9A
Other languages
Chinese (zh)
Inventor
王娜
李霞
马宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Visual Technology Co Ltd
Original Assignee
Hisense Visual Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Visual Technology Co Ltd filed Critical Hisense Visual Technology Co Ltd
Priority to CN202111642506.9A priority Critical patent/CN114360549A/en
Publication of CN114360549A publication Critical patent/CN114360549A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a voice recognition error correction method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring voice information, converting the voice information into a text to be verified and displaying the text; when a voice instruction for indicating that the error text is corrected to be the target text is received, acquiring text information of the error text and text information of the target text from the voice instruction, wherein the text information at least comprises sound-shape code information; determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text; acquiring an error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text; and correcting the text to be corrected into the target text based on the error correction mapping relation. The method can correct errors which are easy to occur in voice recognition, so that the accuracy of semantic subsequent error correction and the reach rate of media resources are improved.

Description

Voice recognition error correction method and device, electronic equipment and storage medium
Technical Field
The present application relates to voice recognition technologies, and in particular, to a method and an apparatus for voice recognition error correction, an electronic device, and a storage medium.
Background
With the development of artificial intelligence, many electronic devices (e.g., smart televisions) support a voice search function, for example, a user says "search for a movie", and the electronic device searches for a movie.
However, in the speech recognition portion of the speech search function, there are also problems that the speech recognition is prone to be erroneous, such as that the recognition of homophones and characters is prone to be erroneous, and that the recognition is also prone to be erroneous when the user pronounces an irregular sound. For example, the name of the movie the user wants to search for is the "hao" word, but is recognized as the "good" word, and thus the wrong movie is displayed.
In order to improve the accuracy of speech recognition, errors which easily occur in speech recognition need to be corrected, but the prior art has no perfect method for correcting the errors in speech recognition in real time. Therefore, how to correct errors which are easy to occur in speech recognition so as to improve the semantic understanding accuracy and the media resource reach rate in speech recognition still remains a problem worthy of research.
Disclosure of Invention
The application provides a voice recognition error correction method and device, an electronic device and a storage medium, which are used for correcting errors which are easy to occur in voice recognition so as to improve the accuracy of the voice recognition.
In one aspect, the present application provides a speech recognition error correction method, including:
acquiring voice information, converting the voice information into a text to be verified and displaying the text;
when a voice instruction for indicating that an error text is corrected to a target text is received, acquiring text information of the error text and text information of the target text from the voice instruction, wherein the text information at least comprises sound-shape code information;
determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text;
acquiring an error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text;
and correcting the text to be corrected into the target text based on the error correction mapping relation.
Optionally, the obtaining of the error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text includes:
acquiring an error correction mapping relation between the text to be corrected and the target text from a plurality of stored error correction mapping relations according to the text information of the error text and the text information of the target text;
and when the plurality of stored error correction mapping relations do not have the error correction mapping relation between the text to be corrected and the target text, establishing the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
Optionally, the determining, according to the text information of the error text and the text information of the target text, a text to be corrected in the text to be verified includes:
generating the sound and shape code information of each character in the text to be verified;
determining the similarity between the sound-shape code information of each character in the text to be verified and the sound-shape code information of the target text by a sound-shape code similarity algorithm;
determining a text composed of words with similarity reaching a preset similarity in the text to be verified as the text to be corrected;
the establishing of the error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text comprises:
establishing a mapping relation between the font information of the error text and the font information of the target text;
and establishing an error correction mapping relation between the text to be corrected and the target text according to the phono-configurational code information of the target text or the phono-configurational code information of the text to be corrected and the mapping relation between the font information of the error text and the font information of the target text.
Optionally, the method further includes:
acquiring text information of a plurality of corrected texts and text information of texts to be corrected corresponding to the plurality of corrected texts;
and establishing and storing the plurality of error correction mapping relations according to the text information of the plurality of corrected texts and the text information of the text to be corrected corresponding to the plurality of corrected texts.
Optionally, the obtaining the text information of the error text and the text information of the target text from the voice instruction includes:
converting the voice instruction into an instruction text;
and acquiring text information of the error text and text information of the target text indicated by the instruction text through a sequence labeling algorithm.
Optionally, the method further includes:
and when the received voice instruction is not used for indicating that the error text is corrected to be the target text, storing the text to be verified.
In another aspect, the present application provides a speech recognition error correction apparatus, including:
the voice conversion module is used for acquiring voice information, converting the voice information into a text to be verified and displaying the text;
the acquiring module is used for acquiring text information of the wrong text and text information of the target text from a voice instruction when the voice instruction for indicating that the wrong text is corrected to the target text is received, wherein the text information at least comprises sound-shape code information;
the processing module is used for determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text;
the obtaining module is further configured to obtain an error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text;
and the error correction module is used for correcting the text to be corrected into the target text based on the error correction mapping relation.
Optionally, the obtaining module is specifically configured to:
acquiring an error correction mapping relation between the text to be corrected and the target text from a plurality of stored error correction mapping relations according to the text information of the error text and the text information of the target text;
and when the plurality of stored error correction mapping relations do not have the error correction mapping relation between the text to be corrected and the target text, establishing the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
In another aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the speech recognition error correction method according to the first aspect.
In another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed, cause a computer to perform the speech recognition error correction method according to the first aspect.
In another aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a speech recognition error correction method as described in the first aspect.
After the voice recognition error correction method provided by the embodiment of the application acquires the voice information, converts the voice information into the text to be verified, and displays the text, and if an error correction instruction actively initiated by a user or other equipment is received, the text information of the error text and the target text in the error correction instruction is acquired. Determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text, and acquiring an error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text. When the text to be corrected is corrected, the text to be corrected is corrected into the target text according to the error correction mapping relation between the text to be corrected and the target text in the text to be verified. The error correction mapping may have been previously stored or may be re-established. Therefore, when the voice recognition is wrong, the user actively initiates a correction instruction, so that the electronic equipment corrects the text (text) to be verified displayed in the first round, the correct text (text) is displayed in the second round, the user actively corrects the text in real time, the place where the voice recognition is wrong can be corrected, and after the correction, if the subsequent voice recognition is wrong, the accuracy of the semantic subsequent correction can be greatly improved.
In addition, the speech recognition error correction method provided by the embodiment of the application can continuously update and store the error correction mapping relationship in the ES cache, that is, memorize a lot of error correction mapping relationships and error correction results, and can directly display correct texts according to the error correction results or perform error correction according to the error correction mapping relationships in the next error correction. Therefore, the accuracy of semantic subsequent error correction and the reach rate of media assets are greatly improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic view of an application scenario of the speech recognition error correction method provided in the present application.
Fig. 2 is a flowchart illustrating a speech recognition error correction method according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating a partial speech recognition error correction method according to an embodiment of the present application.
Fig. 4 is another schematic diagram of a speech recognition error correction method according to an embodiment of the present application.
Fig. 5 is a flowchart illustrating a partial speech recognition error correction method according to an embodiment of the present application.
Fig. 6 is a schematic diagram of a speech recognition error correction apparatus according to an embodiment of the present application.
Fig. 7 is a schematic diagram of an electronic device provided in an embodiment of the present application.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
With the development of artificial intelligence, many electronic devices (e.g., smart televisions) support a voice search function, for example, a user says "search for a movie", and the electronic device searches for a movie. However, in the speech recognition portion of the speech search function, there are also problems that the speech recognition is prone to be erroneous, such as that the recognition of homophones and characters is prone to be erroneous, and that the recognition is also prone to be erroneous when the user pronounces an irregular sound. For example, the name of the movie the user wants to search for is the "hao" word, but is recognized as the "good" word, and thus the wrong movie is displayed. In order to improve the accuracy of speech recognition, it is necessary to correct errors that easily occur in speech recognition. However, in the prior art, after the voice recognition is wrong, a user cannot actively correct the error in real time, and only the electronic equipment can display the wrong recognition result. Therefore, the prior art has no perfect method for real-time error correction of voice recognition, and how to correct errors which are easy to occur in voice recognition so as to improve the accuracy rate of semantic understanding and the reach rate of media resources in voice recognition is still a problem worthy of research.
Based on this, the application provides a method and a device for voice recognition error correction, an electronic device and a storage medium, after acquiring voice information and converting the voice information into a text to be verified for display, if a voice instruction for indicating error correction is actively initiated by a user or other devices is received, the text to be corrected in the text to be verified is corrected into a target text. Specifically, when the text to be corrected is corrected, the text to be corrected is corrected into the target text according to the error correction mapping relationship between the text to be corrected and the target text. The error correction mapping may have been previously stored or may be re-established. Therefore, when the voice recognition is wrong, the user actively initiates a voice instruction for indicating error correction, so that the electronic equipment corrects the text to be verified displayed in the first round (namely the text above) and the text displayed in the second round (namely the text below) meets the expectation of the user, the real-time active error correction of the user is realized, the place where the voice recognition is wrong can be corrected, and after the correction, if the subsequent voice recognition is wrong, the accuracy of the semantic subsequent error correction can be greatly improved.
The voice recognition error correction method provided by the application is applied to electronic equipment such as a smart television, a tablet computer, a mobile phone and the like. Fig. 1 is a schematic application diagram of a speech recognition error correction method provided by the present application, in which the electronic device converts acquired speech information into a text to be verified (for example, the speech information is converted into "xx is good xx" shown in fig. 1 and displayed). The user finds that "good" is wrong, and should be "hao" correctly, and can send a voice command to the electronic device indicating error correction (e.g., "not this good, is hao with a bare red left and a bare ear right"). When the electronic equipment receives a voice instruction for indicating error correction, acquiring text information of the wrong text ('good') and text information of the target text ('Hao') from the voice instruction, determining a text to be corrected in the text to be verified (for example, the text to be corrected is 'good' as shown in FIG. 1) according to the text information of the wrong text and the text information of the target text, and correcting the text to be corrected in the text to be verified to be the target text (correcting 'good' to 'Hao') according to an error correction mapping relationship between the text to be corrected and the target text.
Referring to fig. 2, an embodiment of the present application provides a method for correcting errors in speech recognition, including:
s210, voice information is obtained, and the voice information is converted into a text to be verified and displayed.
The voice information is voice information with searching property input by outside (such as user), for example, the voice information is "i want to watch xx Hao xx", and "xx Hao xx" is movie name. When the electronic equipment converts the voice information into the text to be verified, if the recognition error occurs, the displayed text to be verified may be 'i want to see xx and xx'.
S220, when a voice instruction for indicating that the error text is corrected to be the target text is received, acquiring text information of the error text and text information of the target text from the voice instruction, wherein the text information at least comprises the sound-shape code information.
When the text to be verified displayed by the electronic device is wrong, the user can send out a voice instruction again to correct the text to be verified, and the voice instruction is not good, and is a Hao beside the left ear and the right ear. Upon receiving the voice instruction for instructing to correct the erroneous text ("good") to the target text ("hao"), the phonetic code information P11 { [ 'h "ao' ] } of the erroneous text is acquired from the voice instruction, and the font information S11 { [ 'woman" sub' ] } of the erroneous text is acquired, the phonetic code information P12 { [ 'h "ao' ] } of the target text is acquired, and the font information S12 { [ 'red" ' ] } of the target text is acquired.
Optionally, when the text information of the error text and the text information of the target text are obtained, the voice instruction is converted into an instruction text, and then the text information of the error text and the text information of the target text, which are indicated by the instruction text, are obtained through a sequence labeling algorithm. The method comprises the steps of firstly labeling an error text and a target text in the instruction text by a sequence labeling algorithm, and labeling texts related to the target text by the sequence labeling algorithm when determining text information of the target text. After the sequence labeling algorithm is labeled, the instruction text is 'not the above [ good ], but [ left ] one (naked) (red), [ right ] one (beside the ear)' [ Hao ]. In determining the text information of the target text, from "[ left ] face one (bare) and [ right ] face one" near ear "] a sound-shape code P21L { [ ' ch" i ' ] [ ' l "u ' o ' ] [ ' l" u "o ' ] }, a font information S21L { [ ' red ' ] [ ' " fruit ' ] [ ' "fruit ' ] } and a side information C23R { [ ' near ear ' ] } are determined. Determining that the font information of the target text is S12 { [ 'red "' ] } according to the sound-shape code, font information and side information determined from" [ left ] (red with bare naked eyes), "right ]" one (side of ears "]".
Optionally, if the received voice instruction is not used for instructing to correct the error text in the text to be verified as the target text, storing the text to be verified. For example, if the user finds that the text to be verified is correct or does not want to correct the error, another voice instruction (e.g., "i want to listen to music") is issued, and then the other voice instruction is executed to directly store the text to be verified.
Optionally, as shown in fig. 3, when recognizing whether the voice command is used for indicating error correction, the voice command may be recognized as a command text, and then the command text may be analyzed according to an Abnf semantic rule analysis method. If the instruction text is used for indicating error correction, the text information of the error text and the text information of the target text indicated by the instruction text are acquired through a sequence labeling algorithm, and then the error correction mapping relationship between the text to be corrected and the target text described in step S240 is acquired.
And S230, determining the text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text.
For example, the text information includes phonetic-shape code information, the phonetic-shape code information includes pinyin information and font information, then the text to be corrected is determined to be "good" from the text to be verified according to the pinyin information P11 { [ 'h "ao' ] } of the error text and the target text, and the font information S11 { [ 'female" sub' ] } of the error text.
Optionally, the phonetic-font code information (including pinyin information and font information) of each word in the text to be verified may be generated, and the text to be verified is, for example, "i want to see xx good xx", and then the obtained pinyin information of each word in the text to be verified is PP { [ 'w "o' ] [ 'x" i "ang' ] [ 'k" an' ] … [ 'h "ao' ] … }. And determining the similarity between the pinyin information of each word in the text to be verified and the pinyin information P11 { [ 'h "ao' ] } of the target text (or the wrong text) by a sound-shape code similarity calculation method, and determining that the text consisting of the words with the similarity reaching the preset similarity in the text to be verified is the text to be corrected (good). The preset similarity should be as high as possible so that the text to be corrected can be accurately found.
S240, acquiring the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
The error correction mapping relationship between the text to be corrected and the target text is, for example { "pronunciation": "[ ' h" ao ' ] "," errorchiracts ":" [ ' woman's children ' ] "," CorrectCharacters ":" [ ' red " ' ]" }.
Optionally, a plurality of error correction mapping relationships have been stored in an identification text cache (ES) of the electronic device, where the plurality of error correction mapping relationships are obtained based on a plurality of texts that have been subjected to error correction before. The method comprises the steps of acquiring text information of a plurality of corrected texts and text information of texts to be corrected corresponding to the corrected texts, and establishing and storing a plurality of correction mapping relations according to the text information of the corrected texts and the text information of the texts to be corrected corresponding to the corrected texts. For example, the electronic device may have cached some error correction mapping relationships established for the voice information frequently recognized as an error in daily life when shipped from the factory, and the electronic device may store the error correction mapping relationships established at that time when text correction is performed continuously afterwards.
Therefore, when the error correction mapping relationship between the text to be corrected and the target text is obtained according to the text information of the error text and the text information of the target text, the error correction mapping relationship can be preferentially found from the ES cache according to the text information of the error text and the text information of the target text, that is, as shown in fig. 4, it is first determined whether there is a matching result in the ES cache. That is, according to the text information of the error text and the text information of the target text, an error correction mapping relationship is matched from a plurality of stored error correction mapping relationships, and if the error correction mapping relationship can be matched, the result stored in the ES cache can be obtained, that is, the error correction mapping relationship between the text to be corrected in the text to be verified and the target text can be obtained.
If the error correction mapping relationship between the text to be corrected in the text to be verified and the target text cannot be obtained from the ES cache, the error correction mapping relationship between the text to be corrected in the text to be verified and the target text needs to be established. As shown in fig. 4, the text information of the error text and the text information of the target text may be confirmed by the sequence labeling algorithm described in step S230, and the text to be corrected may be confirmed by the phonographic code similarity algorithm. And then establishing an error correction mapping relation between the text to be corrected and the target text according to the pinyin information of the target text or the pinyin information of the text to be corrected and the mapping relation between the font information of the error text and the font information of the target text.
And S250, correcting the text to be corrected in the text to be verified into the target text based on the error correction mapping relation between the text to be corrected and the target text.
An error correction mapping relationship between the text to be corrected and the target text is { "pronunciation": "[ 'h" ao' ] "," errorchiracts ":" [ 'female "child' ]", "CorrectCharacters": "[ 'red" ' ] "}, and after the text to be corrected in the text to be corrected is corrected to the target text based on the error correction mapping relationship between the text to be corrected and the target text, the obtained error correction result is coretresult {" pronunciation ":" [ 'h "ao' ]", "errorchiracters": "[ 'female" child' ] "," CorrectCharacters ":" [ 'red "' ]", "correctext": "[" my xx "]". }
As shown in fig. 4, after the text to be verified is corrected, normal semantic understanding logic is executed, and at this time, the video displayed by the electronic device is "xx Hao xx".
Optionally, as shown in fig. 4, it may be redetected whether there is an error correction mapping relationship between the text to be corrected and the target text in the ES cache, and if not, the established error correction mapping relationship between the text to be corrected and the target text is stored in the ES cache. Optionally, as shown in fig. 4, after the error correction mapping relationship between the text to be corrected and the target text is established, it may be further detected whether the error correction mapping relationship stored in the ES cache and related to the target text is consistent with the newly established error correction mapping relationship, if not, the error correction mapping relationship stored in the ES cache and related to the target text that is established in the previous round is deleted, and the established error correction mapping relationship between the text to be corrected and the target text is stored in the ES cache.
Alternatively, as shown in fig. 5, when generating the error correction result of the text to be verified and storing the newly created error correction mapping relationship between the text to be corrected and the target text, the newly created error correction mapping relationship may be stored in the form of a phono-configurational code generation table (or further including a radical mapping table), and then a mapping code may be created for the newly created error correction mapping relationship, where the mapping code may be linked to the newly created error correction mapping relationship. Therefore, when the text to be corrected needs to be corrected into the target text subsequently, the corresponding correction result can be determined according to the mapping code, and the correction of the corrected text can be directly carried out according to the unique correction result.
After the mapping code is created, the phonetic-configurational code information in the newly created error correction mapping relation and the corresponding target text are participled to generate a labeled entity based on an open-source phonetic-configurational code generating tool, the error correction result is updated after the entity phonetic-configurational code and the target text are recorded, and the updated error correction result is stored in the ES cache. If the sound-shape code information mapping needs to be updated in the later period, the word segmentation interface can be called for updating. After creating the sound-shape code information mapping (or also including the radical mapping) between the text to be corrected and the target text, the sound-shape code information of the target text only corresponds to the sound-shape code information of the text to be corrected. In this way, the newly created pictophonetic code information map (or the radical map is also included) makes the text to be corrected and the target text correspond one-to-one. After the sound and shape code information of the target text is obtained, the text to be corrected can be directly matched, and the text to be corrected in the text to be verified can be directly corrected into the target text.
In summary, after the speech recognition error correction method provided by this embodiment obtains the speech information, converts the speech information into the text to be verified, and displays the text, if an error correction instruction actively initiated by a user or other equipment is received, the text information of the error text and the target text in the error correction instruction is obtained. Determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text, and acquiring an error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text. When the text to be corrected is corrected, the text to be corrected is corrected into the target text according to the error correction mapping relation between the text to be corrected and the target text in the text to be verified. The error correction mapping may have been previously stored or may be re-established. Therefore, when the voice recognition is wrong, the user actively initiates a voice instruction for indicating correction, so that the electronic equipment corrects the text (text) to be verified displayed in the first round, the correct text (text) is displayed in the second round, the user actively corrects the text in real time, the place where the voice recognition is wrong can be corrected, and after correction, if the follow-up voice recognition is wrong, the accuracy of the follow-up semantic correction can be greatly improved.
In addition, the speech recognition error correction method provided by this embodiment will continuously update the stored error correction mapping relationship in the ES buffer, that is, memorize many error correction mapping relationships and error correction results, and can directly display correct text according to the error correction results or perform error correction according to the error correction mapping relationships in the next error correction. Therefore, the accuracy of semantic subsequent error correction and the reach rate of media assets are improved to a great extent.
Referring to fig. 6, an embodiment of the present application provides a speech recognition error correction apparatus 10, including:
the voice conversion module 11 is configured to acquire voice information, convert the voice information into a text to be verified, and display the text;
an obtaining module 12, configured to, when a voice instruction for instructing to correct an error text into a target text is received, obtain text information of the error text and text information of the target text from the voice instruction, where the text information at least includes phonographic code information;
the processing module 13 is configured to determine a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text;
the obtaining module 12 is further configured to obtain an error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text;
and the error correction module 14 is configured to correct the text to be corrected into the target text based on the error correction mapping relationship.
The obtaining module 12 is specifically configured to obtain, according to the text information of the error text and the text information of the target text, an error correction mapping relationship between the text to be corrected and the target text from the stored multiple error correction mapping relationships; and when the plurality of stored error correction mapping relations do not have the error correction mapping relation between the text to be corrected and the target text, establishing the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
The sound-shape code information comprises pinyin information and character shape information, and the processing module 13 is specifically used for generating the sound-shape code information of each character in the text to be verified; determining the similarity between the pinyin information of each character in the text to be verified and the pinyin information of the target text by a sound-shape code similarity algorithm; and determining a text composed of words with the similarity reaching the preset similarity in the text to be verified as the text to be corrected.
The obtaining module 12 is specifically configured to establish a mapping relationship between the font information of the error text and the font information of the target text; and establishing an error correction mapping relation between the text to be corrected and the target text according to the pinyin information of the target text or the pinyin information of the text to be corrected and the mapping relation between the font information of the error text and the font information of the target text.
The speech recognition error correction device 10 further includes a storage module 15, configured to obtain text information of a plurality of corrected texts and text information of texts to be corrected corresponding to the plurality of corrected texts; and establishing and storing the plurality of error correction mapping relations according to the text information of the plurality of corrected texts and the text information of the text to be corrected corresponding to the plurality of corrected texts.
The obtaining module 12 specifically converts the voice command into a command text; and acquiring text information of the error text and text information of the target text indicated by the instruction text through a sequence labeling algorithm.
The storage module 15 is further configured to store the text to be verified when the received voice instruction is not for instructing to correct the erroneous text into the target text.
Referring to fig. 7, an embodiment of the present application provides an electronic device 20, where the electronic device 20 includes a processor 21 and a memory 22 communicatively connected to the processor. The memory 22 stores computer executable instructions, and the processor 21 executes the computer executable instructions stored in the memory to implement the speech recognition error correction method as described in any one of the above embodiments.
The present application also provides a computer-readable storage medium having stored therein computer-executable instructions, which when executed, cause a processor to execute the instructions for implementing the speech recognition error correction method provided by any one of the above embodiments.
The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the speech recognition error correction method as provided in any of the above embodiments.
The computer-readable storage medium may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM). And may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., including one or any combination of the above-mentioned memories.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method described in the embodiments of the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. A method for speech recognition error correction, comprising:
acquiring voice information, converting the voice information into a text to be verified and displaying the text;
when a voice instruction for indicating that an error text is corrected to a target text is received, acquiring text information of the error text and text information of the target text from the voice instruction, wherein the text information at least comprises sound-shape code information;
determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text;
acquiring an error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text;
and correcting the text to be corrected into the target text based on the error correction mapping relation.
2. The method according to claim 1, wherein the obtaining the error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text comprises:
acquiring an error correction mapping relation between the text to be corrected and the target text from a plurality of stored error correction mapping relations according to the text information of the error text and the text information of the target text;
and when the plurality of stored error correction mapping relations do not have the error correction mapping relation between the text to be corrected and the target text, establishing the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
3. The method of claim 2, wherein the pictographic code information includes pinyin information and font information, and the determining the text to be corrected in the text to be verified according to the text information of the erroneous text and the text information of the target text includes:
generating the sound and shape code information of each character in the text to be verified;
determining the similarity between the pinyin information of each character in the text to be verified and the pinyin information of the target text by a sound-shape code similarity algorithm;
determining a text composed of words with similarity reaching a preset similarity in the text to be verified as the text to be corrected;
the establishing of the error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text comprises:
establishing a mapping relation between the font information of the error text and the font information of the target text;
and establishing an error correction mapping relation between the text to be corrected and the target text according to the pinyin information of the target text or the pinyin information of the text to be corrected and the mapping relation between the font information of the error text and the font information of the target text.
4. The method of claim 2, further comprising:
acquiring text information of a plurality of corrected texts and text information of texts to be corrected corresponding to the plurality of corrected texts;
and establishing and storing the plurality of error correction mapping relations according to the text information of the plurality of corrected texts and the text information of the text to be corrected corresponding to the plurality of corrected texts.
5. The method of claim 1, wherein the obtaining the text information of the error text and the text information of the target text from the voice instruction comprises:
converting the voice instruction into an instruction text;
and acquiring text information of the error text and text information of the target text indicated by the instruction text through a sequence labeling algorithm.
6. The method of claim 1, further comprising:
and when the received voice instruction is not used for indicating that the error text is corrected to be the target text, storing the text to be verified.
7. A speech recognition error correction apparatus, comprising:
the voice conversion module is used for acquiring voice information, converting the voice information into a text to be verified and displaying the text;
the acquiring module is used for acquiring text information of the wrong text and text information of the target text from a voice instruction when the voice instruction for indicating that the wrong text is corrected to the target text is received, wherein the text information at least comprises sound-shape code information;
the processing module is used for determining a text to be corrected in the text to be verified according to the text information of the error text and the text information of the target text;
the obtaining module is further configured to obtain an error correction mapping relationship between the text to be corrected and the target text according to the text information of the error text and the text information of the target text;
and the error correction module is used for correcting the text to be corrected into the target text based on the error correction mapping relation.
8. The apparatus of claim 7, wherein the obtaining module is specifically configured to:
acquiring an error correction mapping relation between the text to be corrected and the target text from a plurality of stored error correction mapping relations according to the text information of the error text and the text information of the target text;
and when the plurality of stored error correction mapping relations do not have the error correction mapping relation between the text to be corrected and the target text, establishing the error correction mapping relation between the text to be corrected and the target text according to the text information of the error text and the text information of the target text.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the speech recognition error correction method of any of claims 1 to 6.
10. A computer-readable storage medium having computer-executable instructions stored therein, which when executed, cause a computer to perform the speech recognition error correction method of any one of claims 1-6.
CN202111642506.9A 2021-12-29 2021-12-29 Voice recognition error correction method and device, electronic equipment and storage medium Pending CN114360549A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111642506.9A CN114360549A (en) 2021-12-29 2021-12-29 Voice recognition error correction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111642506.9A CN114360549A (en) 2021-12-29 2021-12-29 Voice recognition error correction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114360549A true CN114360549A (en) 2022-04-15

Family

ID=81102579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111642506.9A Pending CN114360549A (en) 2021-12-29 2021-12-29 Voice recognition error correction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114360549A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024076404A1 (en) * 2022-10-04 2024-04-11 Google Llc Generation and utilization of pseudo-correction(s) to prevent forgetting of personalized on-device automatic speech recognition (asr) model(s)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024076404A1 (en) * 2022-10-04 2024-04-11 Google Llc Generation and utilization of pseudo-correction(s) to prevent forgetting of personalized on-device automatic speech recognition (asr) model(s)

Similar Documents

Publication Publication Date Title
CN109800407B (en) Intention recognition method and device, computer equipment and storage medium
US10152965B2 (en) Learning personalized entity pronunciations
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
US20180218735A1 (en) Speech recognition involving a mobile device
EP3451328A1 (en) Method and apparatus for verifying information
CN103956168A (en) Voice recognition method and device, and terminal
US20170206897A1 (en) Analyzing textual data
CN106997764B (en) Instant messaging method and instant messaging system based on voice recognition
KR102046486B1 (en) Information inputting method
CN109840318B (en) Filling method and system for form item
CN109979450B (en) Information processing method and device and electronic equipment
CN110910903B (en) Speech emotion recognition method, device, equipment and computer readable storage medium
JP2014063088A (en) Voice recognition device, voice recognition system, voice recognition method and voice recognition program
CN109582825B (en) Method and apparatus for generating information
CN109326284B (en) Voice search method, apparatus and storage medium
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN109410923B (en) Speech recognition method, apparatus, system and storage medium
CN106686226B (en) Terminal audio playing method and system
CN111063355A (en) Conference record generation method and recording terminal
JP2018063271A (en) Voice dialogue apparatus, voice dialogue system, and control method of voice dialogue apparatus
US11488603B2 (en) Method and apparatus for processing speech
CN114360549A (en) Voice recognition error correction method and device, electronic equipment and storage medium
CN110970026A (en) Voice interaction matching method, computer device and computer-readable storage medium
CN111508497A (en) Voice recognition method and device, electronic equipment and storage medium
CN112885335A (en) Speech recognition method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination