CN113178194A - Voice recognition method and system for interactive hot word updating - Google Patents

Voice recognition method and system for interactive hot word updating Download PDF

Info

Publication number
CN113178194A
CN113178194A CN202010016662.3A CN202010016662A CN113178194A CN 113178194 A CN113178194 A CN 113178194A CN 202010016662 A CN202010016662 A CN 202010016662A CN 113178194 A CN113178194 A CN 113178194A
Authority
CN
China
Prior art keywords
recognition
result
text result
text
hotword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010016662.3A
Other languages
Chinese (zh)
Other versions
CN113178194B (en
Inventor
闫博群
马家旭
汪俊
李索恒
张志齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yitu Information Technology Co ltd
Original Assignee
Shanghai Yitu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yitu Information Technology Co ltd filed Critical Shanghai Yitu Information Technology Co ltd
Priority to CN202010016662.3A priority Critical patent/CN113178194B/en
Publication of CN113178194A publication Critical patent/CN113178194A/en
Application granted granted Critical
Publication of CN113178194B publication Critical patent/CN113178194B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a voice recognition method and a system for updating interactive hotwords, wherein the method comprises the following steps: the received audio is identified and transcribed, then an identification text result is output, whether the identification text result is accurate or not is judged, if the identification text result is accurate, the identification text result is output, and if the identification text result is not accurate, hot words are added; combining the recognized text result with the hot words for recognition and transcription, outputting a combined text result, judging, and if the result is accurate, outputting the combined text result; if the result is inaccurate, adding hot words, repeating the previous step until the result is accurate, and outputting a combined text result; the system comprises: the voice recognition unit receives the audio, recognizes and transcribes the audio to generate a recognition text result; the text judging unit judges whether the text recognition result is accurate or not; the judgment processing unit outputs a recognition text result if the recognition result is accurate, and adds hot words if the recognition result is not accurate; a hotword adding unit for adding hotwords; and the voice recognition unit combines the recognition text result with the hot words to perform recognition and transcription, and then outputs a combined text result.

Description

Voice recognition method and system for interactive hot word updating
Technical Field
The invention relates to the field of voice recognition methods and systems, in particular to a voice recognition method and system for interactive hot word updating.
Background
With the development of information technology and the popularization of the internet, the human-computer and intelligent effective interaction is realized, and an efficient and natural human-computer communication environment is constructed, which becomes an urgent need for the application and development of the current information technology;
in recent years, with the rapid development of voice recognition technology, various online voice recognitions such as voice input, voice recognition, voice judgment and the like are receiving more and more attention, a system trained on mass data in advance can meet the requirements of commonly used word input writing, particularly, the recognition accuracy is often high when the voice input content meets the probability distribution of an original language model, however, in practical application, mobile internet and social network are rapidly developed to continuously generate new hot topics and corresponding hot words, consistent recognition results are provided for all user scenes in the prior art, but the personalized requirements of users are difficult to meet, and different users also have the recognition requirements of different personalized words, but because some hot words or personalized words are low in frequency when recognition, transcription and judgment are carried out due to timeliness and specificity, the system can not accurately identify, transcribe and judge;
in the existing voice recognition method, when the hot word is updated on the recognized text result, the audio needs to be re-recognized after passing through the voice recognition module and the decoder again, so that the text result with the hot word added is obtained, but the process takes a long time and cannot realize the text refreshing in real time;
in view of the existing situation, it is desirable to have a speech recognition method and system for interactive hotword update that can satisfy the personalized needs of users and increase the recognition efficiency.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a voice recognition method for updating interactive hot words, wherein a recognition voice unit is adopted to recognize, transcribe and decode audio and generate a recognition text result for judgment, if the judgment is not accurate, the hot words can be added and combined with the hot words to recognize, transcribe and decode again to generate a combined text result, so that the highest accuracy is achieved, the individual requirements of different users are met, and the interactive function is realized;
the newly added encoder identifies and transcribes the audio to generate an identification text result;
the newly added decoder caches the recognition text result, simultaneously decodes the recognition text result, combines the recognition text result with the hot word, recognizes the hot word, and generates a combined recognition result for judgment after transcription, so that the steps of re-recognizing and transcribing the audio are reduced, the accuracy is improved, the efficiency of the whole process is increased, the time cost required by updating the audio and the hot word is saved, and the processing is performed quickly;
the newly added decoder can also combine the hot word score table with the recognition text result and the hot words, then recognize the hot words and transfer the hot words to generate a combined recognition result for judgment, the hot word score table is used for adding scores to the hot words, the same hot words are added once, the higher the score of the hot words is, the higher the recognition degree of the hot words is during recognition, the more the probability of being transferred is, and the accuracy of recognition and transfer is effectively improved;
the added hot words and the hot words set by the user can be recorded through the hot word list, so that the creation of a user personalized hot word library is realized, the personalized requirements of each user are met, the effect of interactive hot word updating is realized, and the defects caused by the prior art are overcome.
The invention also provides a voice recognition system for updating the interactive hotword.
In order to solve the technical problems, the invention provides the following technical scheme:
a speech recognition method for interactive hotword updating comprises the following steps:
the received audio is identified and transcribed, and then an identification text result is output;
judging whether the recognition text result is accurate or not; if the result is accurate, outputting the recognition text result; if not, increasing hot words;
combining the recognition text result with the hot words in any existing mode, for example, adding the hot words in the front section of the recognition text result or in the middle of the recognition text result or in the rear section of the recognition text result, then performing recognition and transcription, outputting the combined text result, and performing judgment, if the result is accurate, outputting the combined text result;
if not, adding hot words, repeating the previous step until the judgment is accurate, and outputting the combined text result.
The above speech recognition method for interactive hotword updating is provided, wherein the hotword is a hotword defined by an administrator or a hotword defined by a user.
In the above-mentioned speech recognition method for interactive hotword update, the received audio may be received in a conventional manner in the art, for example, in an audio library stored in the system, or from the outside. Preferably, the received audio is a piece of audio input by a user, and the piece of audio is stored in an audio collection.
The above speech recognition method for interactive hotword updating, wherein the process of recognizing and transcribing the received audio further includes decoding the audio to generate a decoding result, and the decoding result is included in the recognition text result and is cached in the recognition text result.
In the foregoing speech recognition method for interactive hotword updating, before the output of the combined text result, the text result is scored to generate a word-level decoding score, and the output of the combined text result includes the text and the word-level decoding score generated by scoring.
The above speech recognition method for interactive hotword updating further includes managing the added hotwords to generate a hotword list containing the hotwords.
In the above speech recognition method for interactive hotword updating, each time a different hotword is added, the newly added hotword is updated in the hotword list.
In the above speech recognition method for updating interactive hotwords, each time a hotword is added, the corresponding hotword is added and divided once, and a hotword score table is generated, wherein the higher the score of the hotword is, the higher the probability that the hotword appears in the recognition text result or the combined text result is.
The above speech recognition method for interactive hotword updating, wherein the process of combining the recognition text result with the hotword for recognition and transcription further comprises combining the hotword score table with the recognition text result and the hotword for recognition and transcription and re-outputting a combined text result.
The above speech recognition method for interactive hotword updating further includes re-decoding the cached recognition text result, and performing recognition and transcription by combining the recognition text result with the hotword score table, the recognition text result and the hotword to re-output a combined text result.
An interactive hotword updating speech recognition system, comprising:
the voice recognition unit is used for receiving the audio, recognizing and transcribing the audio to generate a recognition text result;
the text judging unit is used for judging whether the text recognition result is accurate or not;
the judgment processing unit outputs the recognition text result if the judgment is accurate, and adds hot words if the judgment is not accurate;
the hot word adding unit is used for adding hot words;
and the voice recognition unit is also used for combining the recognition text result with the hot words to perform recognition and transcription and then outputting a combined text result.
The above speech recognition system for updating interactive hotwords further includes an audio obtaining unit, configured to obtain the audio, store the audio, and generate an audio set.
In the above speech recognition system for interactive hotword updating, the audio obtaining unit is connected with the user audio input module through wireless to obtain the audio.
The above speech recognition system for interactive hotword updating, wherein the audio is a segment of audio input by a user.
The above speech recognition system for interactive hotword update, wherein the speech recognition unit further comprises an encoding unit and a decoding unit;
the coding unit is used for receiving the audio, identifying and transcribing the audio and generating an identification text result;
the decoding unit is used for decoding the audio and generating a decoding result, and importing the decoding result into the identification text result;
the decoding unit is internally provided with a cache unit for caching the decoded recognition text result;
a scoring module is arranged in the decoding unit and used for generating a decoding score of a word level and leading the decoding score into the recognition text result;
the decoding unit is also used for re-decoding the cached recognition text result, combining the recognition text result with the hot word for recognition and transcription, and then outputting a combined text result.
In the above speech recognition system for interactive hotword updating, a hotword list generation module is built in the hotword adding unit;
the hot word list generation module is used for recording the added hot words and generating a hot word list.
In the above speech recognition system for interactive hotword updating, a hotword adding module is built in the hotword adding unit;
the hot word scoring module is used for scoring the hot words corresponding to the added hot words and generating a hot word scoring table;
in the above speech recognition system for interactive hot word update, the decoding unit is further configured to combine the hot word score table with the recognized text result and the hot word for recognition and transcription, and output a combined text result again.
In the foregoing speech recognition system for updating interactive hot words, the decoding unit is further configured to decode the cached recognition text result again, combine the recognition text result, the hot words, and the hot word score table to perform recognition, transcription, and output a combined text result again.
In the above speech recognition system for interactive hotword updating, an acoustic model, a language model, and a hotword model are built in the decoding module. The hotword model is a neural network model.
A memory of an interactive hot word update speech recognition system, the memory having stored thereon a computer program and executing instructions, wherein the computer program, when executed by a processor, implements the method of any of the above.
A chip, wherein the memory is installed on the chip, and is used for calling the computer program stored in the memory from the chip and executing the computer program, so that a device installed with the chip executes the method of any one of the above.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method of any of the above.
A computer program product comprising computer program instructions for causing a computer to perform the method of any of the above.
A speech recognition device for interactive hotword updating comprises a processor, a speech recognizer, an encoder, a decoder and a memory, wherein the processor is internally provided with a judgment processor and a hotword processor, and the decoder is internally provided with a buffer;
the speech recognizer is connected with the encoder, the encoder is connected with the decoder, and the processor is respectively connected with the decoder, the memory, the judgment processor and the hotword processor in a control mode;
the processor controls the memory to execute a computer program to execute instructions to implement the method of any one of the above.
The above speech recognition device for updating interactive hotwords further comprises a user speech input unit, wherein the speech recognizer is connected with the user speech input unit to realize data interaction, and is used for acquiring audio stored in the user speech input unit.
The technical scheme provided by the voice recognition method and the system for updating the interactive hotword has the following technical effects:
the voice recognition unit is adopted to recognize, transcribe and decode the audio frequency and generate a recognition text result for judgment, if the judgment is not accurate, the combined text result is generated after the hot words are added and combined with the hot words for recognition, transcription and decoding again, so that the highest accuracy is achieved, the individual requirements of different users are met, and the interactive function is realized;
the newly added encoder identifies and transcribes the audio to generate an identification text result;
the newly added decoder caches the recognition text result, simultaneously decodes the recognition text result, combines the recognition text result with the hot word, recognizes the hot word, and generates a combined recognition result for judgment after transcription, so that the steps of re-recognizing and transcribing the audio are reduced, the accuracy is improved, the efficiency of the whole process is increased, the time cost required by updating the audio and the hot word is saved, and the processing is performed quickly;
the newly added decoder can also combine the hot word score table with the recognition text result and the hot words, then recognize the hot words and transfer the hot words to generate a combined recognition result for judgment, the hot word score table is used for adding scores to the hot words, the same hot words are added once, the higher the score of the hot words is, the higher the recognition degree of the hot words is during recognition, the more the probability of being transferred is, and the accuracy of recognition and transfer is effectively improved;
the added hot words and the hot words set by the user can be recorded through the hot word list, so that the creation of a user personalized hot word library is realized, the personalized requirements of each user are met, and the effect of interactive hot word updating is realized.
Drawings
FIG. 1 is a flow chart illustrating a method for speech recognition with interactive hotword update according to the present invention;
FIG. 2 is a schematic diagram of a voice recognition system for interactive hotword update according to the present invention;
FIG. 3 is a schematic structural diagram of a speech recognition apparatus for interactive hotword update according to the present invention;
FIG. 4 is a schematic diagram of a decoding unit structure of an interactive hotword updating speech recognition system according to the present invention;
fig. 5 is a schematic structural diagram of a conventional speech recognition system.
Wherein the reference numbers are as follows:
the system comprises a voice recognition unit 101, a text judgment unit 102, a judgment processing unit 103, a hotword adding unit 104, an encoding unit 105, a decoding unit 106, a buffer unit 107, a scoring module 108, a processor 201, a voice recognizer 202, an encoder 203, a decoder 204, a memory 205 and a buffer 206.
Detailed Description
In order to make the technical means, the inventive features, the objectives and the effects of the invention easily understood and appreciated, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the specific drawings, and it is obvious that the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions under which the present invention can be implemented, so that the present invention has no technical significance, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.
In addition, the terms "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not to be construed as a scope of the present invention.
Meanwhile, in the present specification, relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or operation from another entity or operation without necessarily requiring or implying any actual such relationship or order between such entities or operations.
The invention provides a voice recognition method and a voice recognition system for interactive hot word updating, aiming at adopting a voice recognition unit to recognize, transcribe and decode audio and generate a recognition text result for judgment, if the judgment is not accurate, the method can increase hot words and then combine the hot words to recognize, transcribe and decode again to generate a combined text result so as to meet the individual requirements of different users while achieving the highest accuracy and realize the interactive function;
the newly added encoder identifies and transcribes the audio to generate an identification text result;
the newly added decoder caches the recognition text result, simultaneously decodes the recognition text result, combines the recognition text result with the hot word, recognizes the hot word, and generates a combined recognition result for judgment after transcription, so that the steps of re-recognizing and transcribing the audio are reduced, the accuracy is improved, the efficiency of the whole process is increased, the time cost required by updating the audio and the hot word is saved, and the processing is performed quickly;
the newly added decoder can also combine the hot word score table with the recognition text result and the hot words, then recognize the hot words and transfer the hot words to generate a combined recognition result for judgment, the hot word score table is used for adding scores to the hot words, the same hot words are added once, the higher the score of the hot words is, the higher the recognition degree of the hot words is during recognition, the more the probability of being transferred is, and the accuracy of recognition and transfer is effectively improved;
the added hot words and the hot words set by the user can be recorded through the hot word list, so that the creation of a user personalized hot word library is realized, the personalized requirements of each user are met, and the effect of interactive hot word updating is realized.
In a first aspect, as shown in fig. 1, a speech recognition method for interactive hotword update includes the following steps:
the received audio is identified and transcribed, and then an identification text result is output;
judging whether the recognition text result is accurate or not; if the result is accurate, outputting the recognition text result; if not, increasing hot words;
combining the recognized text result with the hot words for recognition and transcription, outputting a combined text result and judging, and if the result is accurate, outputting the combined text result;
if not, adding hot words, repeating the previous step until the judgment is accurate, and outputting the combined text result;
in the specific process, the content of the recognized and transcribed audio is recorded in the recognized text result, after the hot words are added, the added hot words and the recognized text result are only required to be combined for reprocessing to obtain a combined text result, the audio is not required to be processed, and the processing steps are reduced.
The embodiment provides a voice recognition method for interactive hotword updating, wherein the hotword is a hotword defined by an administrator or a hotword defined by a user.
In the speech recognition method for interactive hotword updating provided by this embodiment, the received audio is a piece of audio input by the user, and the piece of audio is stored in the audio set.
In the speech recognition method for updating interactive hotwords provided in this embodiment, the process of recognizing and transcribing the audio further includes decoding the audio, generating a decoding result, including the decoding result in the recognition text result, and caching the recognition text result.
In the speech recognition method for interactive hotword updating provided by this embodiment, before the output of the combined text result, the text result is scored to generate a word-level decoding score, and the output of the combined text result includes the text and the word-level decoding score generated by scoring. The decoding scores are compared one word by one word according to the hot words and the words in the recognition text result, the more the same words are, the higher the scoring is, the higher the decoding scores are, and the higher the similarity between the words in the recognition text result and the hot words is.
The speech recognition method for interactive hotword updating provided by the embodiment further includes managing the added hotwords to generate a hotword list containing the hotwords.
In the speech recognition method for interactive hotword updating provided by this embodiment, each time a different hotword is added, the newly added hotword is updated in the hotword list;
the hot words are self-defined by a user (the user can add the hot words by himself) or defined by an administrator (namely, the hot words are added by the operation of a system administrator), the hot word list can record all the hot words which appear, and when the hot words are added, the hot words are added into the hot word list if the added hot words do not appear before, so that the personalized hot word list can be formulated based on the hot word list to meet the needs of the user, the efficiency is increased while the accuracy is increased during the audio processing and searching.
The speech recognition method for updating interactive hotwords provided in this embodiment is characterized in that each time a hotword is added, a score is added to the corresponding hotword once, and a hotword score table is generated, wherein the higher the score of the hotword is, the higher the probability that the higher the score of the hotword appears in a recognized text result or a combined text result is.
In the speech recognition method for interactive hotword update provided by this embodiment, the process of combining the recognized text result with the hotword for recognition and transcription further includes combining the hotword score table with the recognized text result and the hotword for recognition and transcription and re-outputting a combined text result.
The speech recognition method for interactive hotword updating provided by the embodiment further comprises the steps of re-decoding the cached recognition text result, combining the decoded recognition text result with the hotword score table, the recognition text result and the hotword for recognition, transcribing and re-outputting a combined text result.
In a second aspect, as shown in fig. 2, an interactive hotword updating speech recognition system includes:
the voice recognition unit 101 is used for receiving audio, recognizing and transcribing the audio to generate a recognition text result;
a text judgment unit 102, configured to judge whether a text recognition result is accurate;
the judgment processing unit 103 outputs the recognition text result if the judgment is accurate, and adds a hotword if the judgment is not accurate;
a hotword adding unit 104 for adding a hotword;
the speech recognition unit 101 is further configured to combine the recognized text result with the hotword for recognition and transcription, and then output a combined text result.
The speech recognition system for interactive hotword updating provided by the embodiment further comprises an audio acquisition unit, configured to acquire audio and store the audio to generate an audio set.
In the speech recognition system for interactive hotword update provided by this embodiment, the audio acquiring unit is connected with the user audio input module through wireless to acquire audio.
The embodiment provides a speech recognition system for interactive hotword updating, wherein the audio is a segment of audio.
In the speech recognition system for interactive hotword update provided in this embodiment, the speech recognition unit 101 further includes an encoding unit 105 and a decoding unit 106;
the encoding unit 105 is used for receiving audio, performing identification and transcription and generating an identification text result;
the decoding unit 106 is configured to decode the audio, generate a decoding result, and import the decoding result into the recognition text result;
a scoring module is arranged in the decoding unit 106 and used for generating a decoding score of a word level and importing the decoding score into a recognition text result;
the decoding unit 106 is internally provided with a caching unit 107 for caching the decoded recognition text result;
the decoding unit 106 is further configured to re-decode the cached recognition text result, combine the recognition text result with the hot word, perform recognition, and output a combined text result after transcription.
In the speech recognition system for interactive hotword updating provided in this embodiment, a hotword list generation module is built in the hotword adding unit 104;
and the hot word list generating module is used for recording the added hot words and generating a hot word list.
In the speech recognition system for interactive hotword updating provided in this embodiment, a hotword adding module is built in the hotword adding unit 104;
the hot word scoring module is used for scoring the hot words corresponding to the added hot words and generating a hot word scoring table;
in the speech recognition system with interactive hot word update provided in this embodiment, the decoding unit 106 is further configured to combine the hot word score table with the recognized text result and the hot word for recognition, transcription, and re-output a combined text result.
In the speech recognition system for interactive hotword update provided by this embodiment, the decoding unit 106 is further configured to decode the cached recognition text result again, combine the recognition text result, the hotword, and the hotword score table for recognition, transcribe, and output a combined text result again.
As shown in fig. 4, the present embodiment provides an interactive hotword updating speech recognition system, in which an acoustic model (e.g., GMM HMM DNN RNN), a language model (e.g., generative model, analytic model, and discriminative model), a hotword model, and a buffer unit 107 are built in a decoding unit 106 (decoder). The hot word model is a neural network model, and is different from the existing hot word function (usually a large dictionary of words or scores). The decoding unit 106 of the present invention is provided with a buffer unit 107, but the decoding unit 106 in the prior art is not provided with the buffer unit 107 (see fig. 5), and the prior art needs to perform encoding and decoding again after performing a judgment on a text once, the buffer unit 107 of the present invention can buffer the scores of an acoustic model and a language model, and can also buffer the result of a recognized text, and the judgment can be directly performed again after adding a hotword after the judgment, so that the speed of hotword refreshing is greatly increased. Assuming that the word "kendyy" is recognized, and the first recognition text result is judged to be inaccurate, the recognition text result is stored in a cache unit 107 in a decoding unit 106 (at this time, in the cache unit 107, scores according to an acoustic model and a language model are 76 scores for gnawing chicken, 52 scores for kendyy chicken and 44 scores for kendyy chicken); then, the hot word "kendirk" is added to the hot word list, the scoring module 108 directly obtains data such as scores and recognition text results of the acoustic model and the language model which are cached in the caching unit 107, compares the data with the hot word "kendirk" added to the hot word list, judges that the score of the final "kendirk" result is 95 scores after the score is re-scored, and the score is higher when the similarity is higher. Since the data in the buffer unit 107 can be directly obtained, the rate of judgment and the speed of hot word refreshing are greatly increased. Therefore, the greatest difference between the present invention and the prior art is: 1. the voice recognition can be used for hot words; 2. the decoding unit 106 of the present invention is provided with the buffer unit 107, so that the number of operation steps is reduced, and the hot word refreshing speed is greatly increased.
In a third aspect, a memory 205 of an interactive hotword updating speech recognition system, the memory 205 having stored thereon a computer program and executable instructions, wherein the computer program, when executed by a processor 201, implements the method of any one of the above.
For example, memory 205 may include random access memory, flash memory, read only memory, programmable read only memory, non-volatile memory or registers, or the like;
processor 201 may be a Central Processing Unit (CPU) or the like, or image processor (GPU) memory 205 may store executable instructions;
the processor 201 may execute execution instructions stored in the memory 205 to implement the various processes described herein.
It will be appreciated that the memory 205 in the present embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory;
the non-volatile memory may be a ROM (Read-only memory), a PROM (programmable Read-only memory), an EPROM (erasable programmable Read-only memory), an EEPROM (electrically erasable programmable Read-only memory), or a flash memory.
The volatile memory may be a RAM (random access memory) which functions as an external cache;
by way of illustration and not limitation, many forms of RAM are available, such as SRAM (staticaram, static random access memory), DRAM (dynamic RAM, dynamic random access memory), SDRAM (synchronous DRAM ), DDRSDRAM (double data rate SDRAM, double data rate synchronous DRAM), ESDRAM (Enhanced SDRAM, Enhanced synchronous DRAM), SLDRAM (synchlink DRAM, synchronous link DRAM), and DRRAM (directrrambus RAM, direct memory random access memory). The memory 205 described herein is intended to comprise, without being limited to, these and any other suitable types of memory 205.
In some embodiments, memory 205 stores elements, upgrade packages, executable units, or data structures, or a subset thereof, or an extended set thereof: operating systems and applications;
the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks;
the application programs comprise various application programs and are used for realizing various application services. The program for implementing the method of the embodiment of the present invention may be included in the application program.
In a fourth aspect, a chip is provided, wherein a memory 205 is installed on the chip, and is used for calling a computer program stored in the memory 205 from the chip and executing the computer program, so that a device installed with the chip executes any one of the above methods.
In a fifth aspect, a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, which computer program, when executed by the processor 201, performs the steps of the method of any of the above.
In a sixth aspect, a computer program product comprises computer program instructions for causing a computer to perform the method of any of the above.
A seventh aspect, as shown in fig. 3, a speech recognition apparatus for interactive hotword update, which includes a processor 201, a speech recognizer 202, an encoder 203, a decoder 204, and a memory 205, wherein the processor 201 is internally provided with a judgment processor 201 and a hotword processor 201, and the decoder 204 is internally provided with a buffer 206;
the speech recognizer 202 is connected with the encoder 203, the encoder 203 is connected with the decoder 204, and the processor 201 controls and is connected with the decoder 204, the memory 205, the judgment processor and the hotword processor respectively;
the processor 201 controls the memory 205 to execute computer program execution instructions to implement the method of any of the above;
the speech recognizer 202 acquires audio, then the encoder 203 recognizes and transcribes the audio to generate a recognition text result which is transmitted to the decoder 204, the decoder 204 transmits the recognition text result to the judgment processor 201 to judge, the judgment is accurate and direct output, a hot word is added after the judgment is inaccurate, then the decoder 204 recognizes and transcribes the result to generate a combined text result which is transmitted to the judgment processor 201 to judge, the judgment is accurate and direct output, and the previous step is repeated after the judgment is inaccurate.
The speech recognition apparatus for interactive hotword updating provided by this embodiment further includes a user speech input unit, and the speech recognizer 202 establishes a connection with the user speech input unit to implement data interaction, so as to obtain audio stored in the user speech input unit.
Those of skill in the art would understand that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of software and electronic hardware;
whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution;
skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments of the present application, the disclosed system, apparatus and method may be implemented in other ways;
for example, the division of the unit is only one logic function division, and there may be another division manner in actual implementation;
for example, a plurality of units or components may be combined or may be integrated into another system;
in addition, functional units in the embodiments of the present application may be integrated into one processing unit, or may exist separately and physically.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a machine-readable storage medium;
therefore, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a machine-readable storage medium and may include several instructions to cause an electronic device to execute all or part of the processes of the technical solution described in the embodiments of the present application;
the storage medium may include various media that can store program codes, such as ROM, RAM, a removable disk, a hard disk, a magnetic disk, or an optical disk.
In conclusion, the voice recognition method and the voice recognition system for updating the interactive hot words can adopt the voice recognition unit to recognize, transcribe and decode the audio frequency and generate a recognition text result for judgment, if the judgment is not accurate, the combined text result is generated after the hot words are added and combined with the hot words to recognize, transcribe and decode again, so that the highest accuracy is achieved, the personalized requirements of different users are met, and the interactive function is realized;
the newly added encoder identifies and transcribes the audio to generate an identification text result;
the newly added decoder caches the recognition text result, simultaneously decodes the recognition text result, combines the recognition text result with the hot word, recognizes the hot word, and generates a combined recognition result for judgment after transcription, so that the steps of re-recognizing and transcribing the audio are reduced, the accuracy is improved, the efficiency of the whole process is increased, the time cost required by updating the audio and the hot word is saved, and the processing is performed quickly;
the newly added decoder can also combine the hot word score table with the recognition text result and the hot words, then recognize the hot words and transfer the hot words to generate a combined recognition result for judgment, the hot word score table is used for adding scores to the hot words, the same hot words are added once, the higher the score of the hot words is, the higher the recognition degree of the hot words is during recognition, the more the probability of being transferred is, and the accuracy of recognition and transfer is effectively improved;
the added hot words and the hot words set by the user can be recorded through the hot word list, so that the creation of a user personalized hot word library is realized, the personalized requirements of each user are met, and the effect of interactive hot word updating is realized.
Specific embodiments of the invention have been described above. It is to be understood that the invention is not limited to the particular embodiments described above, in that devices and structures not described in detail are understood to be implemented in a manner common in the art; various changes or modifications may be made by one skilled in the art within the scope of the claims without departing from the spirit of the invention, and without affecting the spirit of the invention.

Claims (10)

1. A speech recognition method for interactive hotword update is characterized by comprising the following steps:
the received audio is identified and transcribed, and then an identification text result is output;
judging whether the recognition text result is accurate or not; if the result is accurate, outputting the recognition text result; if not, increasing hot words;
combining the recognition text result with the hot words for recognition, transcribing, outputting a combined text result, and judging, if the result is accurate, outputting the combined text result; if not, adding hot words, repeating the previous step until the judgment is accurate, and outputting the combined text result.
2. The method as claimed in claim 1, wherein the step of recognizing and transcribing the received audio further comprises decoding the audio to generate a decoding result, the decoding result being included in the recognized text result, and buffering the recognized text result.
3. The method of claim 1, wherein scoring the text result before outputting the combined text result generates a word-level decoding score, and wherein the outputting the combined text result comprises the text and the scoring generated word-level decoding score.
4. An interactive hotword update speech recognition system, comprising:
the voice recognition unit is used for receiving the audio, recognizing and transcribing the audio to generate a recognition text result;
the text judging unit is used for judging whether the text recognition result is accurate or not;
the judgment processing unit outputs the recognition text result if the judgment is accurate, and adds hot words if the judgment is not accurate;
the hot word adding unit is used for adding hot words;
and the voice recognition unit is also used for combining the recognition text result with the hot words to perform recognition and transcription and then outputting a combined text result.
5. The interactive hotword updating speech recognition system of claim 4 wherein said speech recognition unit further comprises an encoding unit and a decoding unit;
the coding unit is used for receiving the audio, identifying and transcribing the audio and generating an identification text result;
the decoding unit is used for decoding the audio and generating a decoding result, and importing the decoding result into the identification text result;
the decoding unit is internally provided with a cache unit for caching the decoded recognition text result; a scoring module is arranged in the decoding unit and used for generating a decoding score of a word level and leading the decoding score into the recognition text result;
the decoding unit is also used for re-decoding the cached recognition text result, combining the recognition text result with the hot word for recognition and transcription, and then outputting a combined text result.
6. A memory of an interactive hotword-updating speech recognition system, the memory having stored thereon a computer program and executing instructions, characterized in that the computer program, when executed by a processor, implements the method of any of the preceding claims 1-3.
7. A chip, wherein the memory is installed on the chip, and the computer program stored in the memory is called from the chip and executed, so that a device in which the chip is installed performs the method according to any one of claims 1 to 3.
8. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1-3.
9. A computer program product comprising computer program instructions for causing a computer to perform the method of any one of claims 1 to 3.
10. A speech recognition device for interactive hotword updating is characterized by comprising a processor, a speech recognizer, an encoder, a decoder and a memory, wherein the processor is internally provided with a judgment processor and a hotword processor, and the decoder is internally provided with a buffer;
the speech recognizer is connected with the encoder, the encoder is connected with the decoder, and the processor is respectively connected with the decoder, the memory, the judgment processor and the hotword processor in a control mode;
the processor controls the memory to execute a computer program to execute instructions to implement the method of any of the above claims 1-3.
CN202010016662.3A 2020-01-08 2020-01-08 Voice recognition method and system for interactive hotword updating Active CN113178194B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016662.3A CN113178194B (en) 2020-01-08 2020-01-08 Voice recognition method and system for interactive hotword updating

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016662.3A CN113178194B (en) 2020-01-08 2020-01-08 Voice recognition method and system for interactive hotword updating

Publications (2)

Publication Number Publication Date
CN113178194A true CN113178194A (en) 2021-07-27
CN113178194B CN113178194B (en) 2024-03-22

Family

ID=76921383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016662.3A Active CN113178194B (en) 2020-01-08 2020-01-08 Voice recognition method and system for interactive hotword updating

Country Status (1)

Country Link
CN (1) CN113178194B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US20090006095A1 (en) * 2007-06-29 2009-01-01 Brian Leung Learning to reorder alternates based on a user's personalized vocabulary
CN107133222A (en) * 2017-04-17 2017-09-05 中译语通科技(北京)有限公司 A kind of real-time language conversion equipment and conversion method based on heterogeneous framework
CN108733650A (en) * 2018-05-14 2018-11-02 科大讯飞股份有限公司 Personalized word acquisition methods and device
CN108984529A (en) * 2018-07-16 2018-12-11 北京华宇信息技术有限公司 Real-time court's trial speech recognition automatic error correction method, storage medium and computing device
CN109523991A (en) * 2017-09-15 2019-03-26 阿里巴巴集团控股有限公司 Method and device, the equipment of speech recognition
CN110415705A (en) * 2019-08-01 2019-11-05 苏州奇梦者网络科技有限公司 A kind of hot word recognition methods, system, device and storage medium
CN110473531A (en) * 2019-09-05 2019-11-19 腾讯科技(深圳)有限公司 Audio recognition method, device, electronic equipment, system and storage medium
CN110517692A (en) * 2019-08-30 2019-11-29 苏州思必驰信息科技有限公司 Hot word audio recognition method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
US20090006095A1 (en) * 2007-06-29 2009-01-01 Brian Leung Learning to reorder alternates based on a user's personalized vocabulary
CN107133222A (en) * 2017-04-17 2017-09-05 中译语通科技(北京)有限公司 A kind of real-time language conversion equipment and conversion method based on heterogeneous framework
CN109523991A (en) * 2017-09-15 2019-03-26 阿里巴巴集团控股有限公司 Method and device, the equipment of speech recognition
CN108733650A (en) * 2018-05-14 2018-11-02 科大讯飞股份有限公司 Personalized word acquisition methods and device
CN108984529A (en) * 2018-07-16 2018-12-11 北京华宇信息技术有限公司 Real-time court's trial speech recognition automatic error correction method, storage medium and computing device
CN110415705A (en) * 2019-08-01 2019-11-05 苏州奇梦者网络科技有限公司 A kind of hot word recognition methods, system, device and storage medium
CN110517692A (en) * 2019-08-30 2019-11-29 苏州思必驰信息科技有限公司 Hot word audio recognition method and device
CN110473531A (en) * 2019-09-05 2019-11-19 腾讯科技(深圳)有限公司 Audio recognition method, device, electronic equipment, system and storage medium

Also Published As

Publication number Publication date
CN113178194B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
US11545142B2 (en) Using context information with end-to-end models for speech recognition
JP7280382B2 (en) End-to-end automatic speech recognition of digit strings
US11594215B2 (en) Contextual voice user interface
CN111883110B (en) Acoustic model training method, system, equipment and medium for speech recognition
US11823678B2 (en) Proactive command framework
JP6550068B2 (en) Pronunciation prediction in speech recognition
US20180301145A1 (en) System and Method for Using Prosody for Voice-Enabled Search
JP7092953B2 (en) Phoneme-based context analysis for multilingual speech recognition with an end-to-end model
US10366690B1 (en) Speech recognition entity resolution
US10970470B1 (en) Compression of machine learned models
US10515637B1 (en) Dynamic speech processing
US20220199094A1 (en) Joint automatic speech recognition and speaker diarization
CN110914898A (en) System and method for speech recognition
CN102063900A (en) Speech recognition method and system for overcoming confusing pronunciation
CN111613215B (en) Voice recognition method and device
JP2023545988A (en) Transformer transducer: One model that combines streaming and non-streaming speech recognition
CN110473527B (en) Method and system for voice recognition
CN113178194B (en) Voice recognition method and system for interactive hotword updating
JP2024512579A (en) Lookup table recurrent language model
CN115424616A (en) Audio data screening method, device, equipment and computer readable medium
CN112951204B (en) Speech synthesis method and device
US11861521B2 (en) System and method for identification and verification
KR20230156795A (en) Word segmentation regularization
CN117877460A (en) Speech synthesis method, device, speech synthesis model training method and device
CN117672195A (en) Speech recognition method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant