CN113205809A - Voice wake-up method and device - Google Patents

Voice wake-up method and device Download PDF

Info

Publication number
CN113205809A
CN113205809A CN202110481376.9A CN202110481376A CN113205809A CN 113205809 A CN113205809 A CN 113205809A CN 202110481376 A CN202110481376 A CN 202110481376A CN 113205809 A CN113205809 A CN 113205809A
Authority
CN
China
Prior art keywords
awakening
score
model
preset
local offline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110481376.9A
Other languages
Chinese (zh)
Inventor
陈明佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN202110481376.9A priority Critical patent/CN113205809A/en
Publication of CN113205809A publication Critical patent/CN113205809A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a voice awakening method and a voice awakening device, wherein the voice awakening method comprises the following steps: in response to the acquisition of the voice information, sending the voice information to an awakening model for first score calculation, and judging whether the first score is greater than or equal to an awakening threshold value preset by the awakening model; if the first score is larger than or equal to a preset awakening threshold of the awakening model, sending the voice information to the local offline recognition model for second score calculation, and judging whether the second score is larger than or equal to the preset awakening threshold of the local offline recognition model or not; and if the second score is larger than or equal to a preset awakening threshold value of the local offline recognition model, awakening the equipment to interact with the user. The voice information of the preset awakening threshold value of the first score greater than or equal to the awakening model is sent to the local offline recognition model to be calculated according to the second score, so that awakening is more accurate, mistaken awakening is reduced, furthermore, the local offline recognition model is used, so that the recognition can be carried out more quickly, and the recognition delay is reduced.

Description

Voice wake-up method and device
Technical Field
The invention belongs to the technical field of voice data processing, and particularly relates to a voice awakening method and device.
Background
In the existing voice interaction products, basically, a wake-up word is preset in the products, but real users want to be able to define the content of the wake-up word by themselves. In the existing self-defined awakening technical scheme, an awakening word is configured in a relatively large awakening model to complete the self-defined awakening function, wherein the classification node of the general model comprises all phoneme modeling units. Meanwhile, in order to reduce the problem of false awakening, some existing technologies adopt comparison between acoustic features and template features, and only when the confidence of comparison exceeds a threshold value, voice data is sent to a general awakening engine.
In a similar technology, to realize a self-defined awakening word, in the first step, a character corresponding to the awakening word needs to be converted into a modeling unit corresponding to an awakening model, wherein the character can be pinyin or phoneme, and specifically, which modeling unit is in an actual system needs to be considered; calculating a threshold value of the awakening word; and finally, configuring the converted information and the corresponding threshold value into a wake-up engine. In the actual use process, if the voice information exceeds the threshold value through the model calculation, the voice information is considered to be awakened. In some embodiments, to reduce false arousal, an acoustic feature matching process is added before the arousal model. Only the voice which exceeds the acoustic feature comparison enters the awakening model for calculation.
Because the self-defined awakening words need to take effect in real time, training and optimization of the model cannot be performed aiming at the self-defined words, and the original bearing capacity of the awakening model is limited. This presents the problem that a higher wake-up rate is possible if the threshold setting is too low, but the problem is that the false wake-up rate is higher, and vice versa.
In the scheme of reducing false awakening by adopting acoustic feature comparison, the problem of unstable acoustic features exists. Namely, before the use, the acoustic characteristics of the awakening words need to be recorded and then the awakening words can be normally used. Since the acoustic features of the awakening words of each user cannot be recorded in advance, the registered acoustic features have more individual features of registrants, and the problem that the awakening rate of others is not high in the use process after one person registers in the use process may exist.
Disclosure of Invention
An embodiment of the present invention provides a voice wake-up method and apparatus, which are used to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a voice wake-up method, including: in response to the acquisition of voice information, sending the voice information to an awakening model for first score calculation, and judging whether the first score is greater than or equal to a preset awakening threshold value; if the first score is larger than or equal to a preset awakening threshold value, sending the voice information to a local offline recognition model for second score calculation, and judging whether the second score is larger than or equal to the preset awakening threshold value of the local offline recognition model; and if the second score is larger than or equal to a preset awakening threshold value of the local offline recognition model, awakening the equipment to interact with the user.
In a second aspect, an embodiment of the present invention provides a voice wake-up apparatus, including: the acquisition calculation judgment program module is configured to respond to acquisition of voice information, send the voice information to an awakening model to perform first score calculation, and judge whether the first score is greater than or equal to a preset awakening threshold value; the calculation judging program module is configured to send the voice information to a local offline recognition model for second score calculation if the first score is greater than or equal to a preset awakening threshold value, and judge whether the second score is greater than or equal to the preset awakening threshold value of the local offline recognition model; and the awakening program module is configured to awaken the equipment to interact with the user if the second score is greater than or equal to an awakening threshold preset by the local offline recognition model.
In a third aspect, an electronic device is provided, comprising: the apparatus includes at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the voice wake-up method of any of the embodiments of the present invention.
In a fourth aspect, the present invention also provides a computer program product, where the computer program product includes a computer program stored on a non-volatile computer-readable storage medium, and the computer program includes program instructions, which, when executed by a computer, cause the computer to execute the steps of the voice wake-up method according to any embodiment of the present invention.
According to the method and the device, the voice information is sent to the awakening model to carry out first score calculation, and then the voice information with the first score larger than or equal to the preset awakening threshold value of the awakening model is sent to the local offline recognition model to carry out second score calculation, so that awakening is more accurate, mistaken awakening is reduced, furthermore, the local offline recognition model is used, so that the recognition can be carried out more quickly, and the recognition delay is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a voice wake-up method according to an embodiment of the present invention;
fig. 2 is a flowchart of another voice wake-up method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a registration wakeup process of a specific example of a voice wakeup method according to an embodiment of the present invention;
fig. 4 is a flowchart illustrating a specific example of a voice wake-up method according to an embodiment of the present invention for preventing noise misrecognition;
fig. 5 is a block diagram of a voice wake-up apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Please refer to fig. 1, which shows a flowchart of an embodiment of a voice wake-up method of the present application.
As shown in fig. 1, in step 101, in response to acquiring voice information, sending the voice information to an awakening model for first score calculation, and determining whether the first score is greater than or equal to an awakening threshold preset by the awakening model;
in step 102, if the first score is greater than or equal to a preset awakening threshold of the awakening model, sending the voice information to a local offline recognition model for second score calculation, and judging whether the second score is greater than or equal to the preset awakening threshold of the local offline recognition model;
in step 103, if the second score is greater than or equal to a wakeup threshold preset by the local offline recognition model, waking up the device to interact with the user.
In this embodiment, for step 101, in response to acquiring the voice information, the voice wake-up apparatus sends the voice information to the wake-up model to perform a first score calculation, for example, compare and verify an acoustic feature corresponding to the voice information with an acoustic feature preset during registration, and determine whether the first score is greater than or equal to a wake-up threshold preset for the wake-up model, where the wake-up threshold preset for the wake-up model may be set to a lower wake-up threshold.
Then, in step 102, if the first score is greater than or equal to the preset wake-up threshold of the wake-up model, the voice message is sent to the local offline recognition model for second score calculation, and it is determined whether the second score is greater than or equal to the preset wake-up threshold of the local offline recognition model, for example, the preset wake-up threshold of the local offline recognition model is higher than the preset wake-up threshold of the wake-up model, and when the first score is greater than or equal to the preset wake-up threshold of the wake-up model, the voice message is sent to the local offline recognition model for second verification.
Finally, in step 103, if the second score is greater than or equal to the wakeup threshold preset by the local offline recognition model, the device is woken up to interact with the user, for example, if the second score of the second verification is greater than or equal to the wakeup threshold preset by the local offline recognition model, it can be considered that the voice message of the user can be woken up.
According to the method, the voice information is sent to the awakening model to perform first score calculation, and then the voice information with the first score larger than or equal to the preset awakening threshold value of the awakening model is sent to the local offline recognition model to perform second score calculation, so that awakening can be more accurate, mistaken awakening is reduced, furthermore, the local offline recognition model is used, so that the recognition can be performed more quickly, and the recognition delay is reduced.
In the method according to the foregoing embodiment, after determining whether the first score is greater than or equal to a wakeup threshold preset by the wakeup model or determining whether the second score is greater than or equal to a wakeup threshold preset by the local offline recognition model, the method further includes:
and if the first score is smaller than a preset awakening threshold of the awakening model or the second score is smaller than a preset awakening threshold of the local offline recognition model, judging that the equipment is not awakened.
The method of this embodiment determines whether the first score and the second score are greater than or equal to a preset wake-up threshold, so as to determine whether to wake up the device.
Further referring to fig. 2, a flow chart of another voice wake-up method provided in an embodiment of the present application is shown. The flow chart is mainly a flow chart of a step defined further by the flow chart 1 before 'responding to the acquisition of the voice information, sending the voice information into the awakening model for first score calculation, and judging whether the first score is larger than or equal to the awakening threshold preset by the awakening model'.
As shown in fig. 2, in step 201, in response to obtaining a registration instruction of a user, obtaining a custom wake-up word input by the user;
in step 202, based on the user-defined wake-up word input by the user, the user-defined wake-up word is converted into information capable of being configured into the wake-up model and the local offline recognition model, and is configured into the wake-up model and the local offline recognition model.
In this embodiment, for step 201, in response to acquiring a registration instruction of a user, the voice wake-up device acquires a custom wake-up word input by the user, where the user inputs the custom wake-up word including voice input and text input.
Then, for step 202, the voice wakeup apparatus converts the custom wakeup word into information that can be configured into the wakeup model and the local offline recognition model and configures into the wakeup model and the local offline recognition model based on the custom wakeup word input by the user, for example, in a registration process, after obtaining the custom wakeup word input by the user, converts the custom wakeup word into information that can be configured into the wakeup model and the local offline recognition model, and then configures the converted information into the corresponding wakeup model and the local offline recognition model.
According to the method, the user-defined awakening words are converted into the information which can be configured to enter the awakening model and the local offline recognition model, and the information is configured to enter the awakening model and the local offline recognition model, so that secondary verification of the voice information of the user can be achieved.
In the method according to the foregoing embodiment, the local offline recognition model is trained using a large amount of general texts and corpora in non-fixed domains.
The method of the embodiment can ensure that the acoustic characteristics corresponding to the dynamically configured self-defined awakening words are learned by the model in advance in the general data by using a large amount of different texts and vocabularies.
In the method according to the foregoing embodiment, the secondary verification framework of the local offline recognition model includes: and the grammar-based identification system and/or the deep-learning end-to-end identification system can be replaced and modified based on the updating and modification of the local offline identification model system.
The method of the embodiment can realize the effective real-time dynamic configuration of the words through the grammar recognition system.
In the method of the above embodiment, the local offline identification model further includes a normal path and a noise path, where the output content of the noise path carries a special mark so that when the output result carries the special mark, the output content can be identified as the output content of the noise path and discarded; or
The noise path has a higher penalty score than the normal path and is discarded directly if the output content penalty score is above a penalty threshold.
According to the method, the output content of the noise path is provided with the special mark, and when the output result is provided with the special mark, the output content can be identified as the output content of the noise path and discarded or the punishment score of the output content is directly discarded, so that the false identification and the false awakening can be effectively reduced.
In the method according to any of the above embodiments, the local offline recognition model may be replaced with a cloud-based recognition system for secondary verification.
It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.
The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.
The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application:
mainly caused by using a not-specifically trained wake-up model, which is usually not capable of being trained in real time because of the need for real-time validation. This problem is also a long standing problem in this field.
The inventors have also found that the solutions commonly used in the industry to solve the above problems are the following:
firstly, training a model by collecting data to optimize the performance of the model, but the words of the self-defined awakening words are changed and the data cannot be collected in advance; even late-stage collection can result in an unfeasible and enforceable scheme for collecting data training due to the diversity of wake-up words.
Secondly, through a secondary verification method, for example, through comparing and verifying the acoustic features corresponding to the awakening words with the registered features, the further awakening operation can be carried out only if the verified score exceeds the threshold value.
The scheme provided by the invention is to achieve balance of the awakening rate and the false awakening rate by secondarily checking the awakening result by using a local offline identification algorithm. The difficulty of the scheme is that firstly, awakening words need to be dynamically configured into an offline recognition engine; secondly, the problem of false identification also exists in local offline identification, and false awakening caused by false identification of secondary verification is reduced by a new technical means.
The scheme of the application is mainly designed and optimized from the following aspects:
the biggest defect in the existing scheme is the difficult balance of the wake-up rate and the false wake-up rate under the condition of ensuring real-time performance. In the existing scheme, the false wake-up rate can be reduced by a secondary verification mode, but the wake-up rate is greatly lost.
Due to the fact that the scheme of offline recognition can support different words to be recognized, word training is not needed in a targeted mode, and good recognition rate and error recognition rate of the words can be guaranteed. The basic idea here is to use local offline recognition for secondary verification.
The basic flow of the scheme is that the user-defined awakening words are converted into information required by the awakening model and the local offline recognition model, and the information is configured into the awakening model and the offline local recognition model at the same time. In the using process, awakened audio data firstly enter an awakening model, and after the awakening model can be triggered, the audio data are sent to a local offline model for verification. If the verification is passed, the voice is considered to be capable of waking up normally.
The difference between the scheme and the existing scheme using the secondary verification technology is that the secondary verification is performed by using an offline recognition model, and a filter technology for preventing noise from being recognized by mistake is added in the offline recognition model.
Referring to fig. 3, a flowchart of a specific example of a voice wake-up method according to an embodiment of the present invention is shown.
As shown in fig. 3, the whole system mainly comprises two major parts. One part is the basic flow of registering the self-defined awakening words, and the other part is the secondary verification in the using process.
In the registration process, the user-defined awakening words are firstly converted into information which can be configured to enter an awakening model and a local offline recognition model, and then the information is configured to enter the corresponding model.
In the using process, voice information firstly enters the awakening model, if the calculated score exceeds the set awakening threshold, voice data is sent to the offline recognition model for secondary verification, and if the score of the secondary verification also exceeds the threshold of the local offline recognition model, the voice can be considered to be awakened finally.
The model information conversion module in the scheme is mainly used for converting a text corresponding to a user-defined awakening word into information required by an awakening model and a recognition model, the content comprises a modeling unit corresponding to the two modules, and the modeling unit can be a phoneme or a syllable, and the model information conversion module is flexibly arranged according to the awakening model and an offline recognition model. Because the real-time requirement on the whole process is weaker than that of the awakening scene when the awakening words are set by self-defining, the module runs on the cloud service, the design can avoid the limitation of calculation amount, memory and storage in a local off-line system, and a model with higher requirements on calculation amount, memory and storage is used, so that the accuracy of information conversion is increased.
When the awakening module is set, a not high awakening threshold value is usually set, so that more voice information can enter the module for secondary verification through the awakening module, and the problem of low awakening rate caused by the overhigh awakening model threshold value is solved.
The key technical point of the scheme is to perform secondary verification on the offline identification. The off-line recognition can be used for secondary verification, because the off-line recognition system uses a large amount of general texts and linguistic data in the non-fixed field for training at the initial stage of model construction, and the general texts usually contain a large amount of different texts and vocabularies, so that the acoustic characteristics corresponding to the dynamically configured custom wake-up words are ensured to be learned by the model in advance in the general data. And the offline recognition system adopts a scheme based on gram recognition and a system based on gram compiling, so that real-time dynamic configuration of words can be effective, and the real-time performance of the whole system is ensured. Certainly, the problem of misidentification also exists in the offline identification system, and by adding the absorption module in the scheme, misidentification caused by sending noise into the secondary check module after the awakening module is awakened by mistake can be effectively reduced.
Referring to fig. 4, a flowchart illustrating a noise misrecognition prevention method according to a specific example of the voice wakeup method according to an embodiment of the present invention is shown;
as shown in fig. 4, the absorption module is a portion with a dotted frame. In this example, the fixed content that can be normally identified is hao, where the corresponding number on each edge is the corresponding penalty score, and a higher score indicates a greater penalty, and the path is less likely to pass through. And the path from the starting point to the end point is the identification result. When the input speech content is hao, the illustrated structure will simultaneously pass through the normal hao path and the hao path in the absorption module. But because the penalty score of the sink module is high, the result of the sink module is not actually output. When the input speech is non-hao noise content, for example, the input noise content is niu at this time, since the normal non-penalty path in the graph is h-a-o, but n-i-u cannot go the non-penalty path, so only the penalty path in the graph can be taken, so n-i-u will go the path in the absorption module, and since the output content of the absorption module has a special mark, when the content of the special mark is found in the final output process, the output result will be directly discarded. The noise is not mistakenly recognized as the content of the self-defined awakening word in the actual use.
When explaining the effects of the scheme, not only can the effects be directly achieved, but also whether other problems can be solved and other purposes can be achieved or not is concerned, chain reactions which can be caused by the scheme are fully dispersed, and deeper effects can be achieved.
Firstly, in the scheme, the framework is designed for secondary verification, and a secondary verification module can be an identification system based on gamma and can also be an end-to-end identification system based on deep learning. Substitutions and modifications may be made to updates and improvements in the offline recognition system.
The scheme can be expanded to a new application scene, for example, in the existing voice interaction scene, voice recognition can be performed after the voice interaction scene is usually required to be awakened, but in order to simplify the interaction process, a user hopes that instruction word control can be realized without awakening, and the instruction word can be preset by a system or defined by the user in real time. In such a scenario, the scheme of the text can be used to set the instruction word into the wake-up model, and after the wake-up model is triggered by voice, the instruction word is verified for the second time by using the local offline model.
Beta version formed by the inventor in the process of implementing the invention:
and verifying by adopting a cloud system identification model. In the scheme, the voice passing through the awakening model is sent to the cloud recognition engine, and secondary verification is carried out by using a recognition system at the cloud. The method has the advantages that the accuracy of the cloud system is higher, the better awakening rate and the error awakening rate can be guaranteed, and more network delay exists due to the fact that data needs to be transmitted to the cloud system.
Referring to fig. 5, a block diagram of a voice wake-up apparatus according to an embodiment of the invention is shown.
As shown in fig. 5, the voice wake-up apparatus 500 includes an acquisition calculation determining program module 510, a calculation determining program module 520, and a wake-up program module 530.
The obtaining calculation determining program module 510 is configured to, in response to obtaining the voice information, send the voice information to the wake-up model to perform a first score calculation, and determine whether the first score is greater than or equal to a preset wake-up threshold; a calculation and determination program module 520, configured to send the voice message to a local offline recognition model for second score calculation if the first score is greater than or equal to a preset wake-up threshold, and determine whether the second score is greater than or equal to the preset wake-up threshold of the local offline recognition model; and the wakeup program module 530 is configured to wake up the device to interact with the user if the second score is greater than or equal to a wakeup threshold preset by the local offline recognition model.
It should be understood that the modules depicted in fig. 5 correspond to various steps in the methods described with reference to fig. 1 and 2. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 5, and are not described again here.
It should be noted that the modules in the embodiment of the present disclosure are not limited to the scheme of the present disclosure, for example, the obtaining calculation determining program module may be described as a module that, in response to obtaining the voice information, sends the voice information to the wake-up model to perform a first score calculation, and determines whether the first score is greater than or equal to a preset wake-up threshold. In addition, the related function module may also be implemented by a hardware processor, for example, the module for obtaining the calculation judgment program may also be implemented by a processor, which is not described herein again.
In other embodiments, an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the voice wakeup method in any of the above method embodiments;
as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:
in response to the acquisition of voice information, sending the voice information to an awakening model for first score calculation, and judging whether the first score is greater than or equal to an awakening threshold value preset by the awakening model;
if the first score is larger than or equal to a preset awakening threshold of the awakening model, sending the voice information to a local offline recognition model for second score calculation, and judging whether the second score is larger than or equal to the preset awakening threshold of the local offline recognition model;
and if the second score is larger than or equal to a preset awakening threshold value of the local offline recognition model, awakening the equipment to interact with the user.
The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the voice wake-up apparatus, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the voice wake up device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Embodiments of the present invention also provide a computer program product, which includes a computer program stored on a non-volatile computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer executes any of the above voice wake-up methods.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device includes: one or more processors 610 and a memory 620, with one processor 610 being an example in fig. 6. The voice wake-up method may further include: an input device 630 and an output device 640. The processor 610, the memory 620, the input device 630, and the output device 640 may be connected by a bus or other means, such as the bus connection in fig. 6. The memory 620 is a non-volatile computer-readable storage medium as described above. The processor 610 executes various functional applications and data processing of the server by executing nonvolatile software programs, instructions and modules stored in the memory 620, namely, implements the voice wake-up method of the above-mentioned method embodiment. The input device 630 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the communication compensation device. The output device 640 may include a display device such as a display screen.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
As an embodiment, the electronic device is applied to a voice wake-up apparatus, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:
in response to the acquisition of voice information, sending the voice information to an awakening model for first score calculation, and judging whether the first score is greater than or equal to an awakening threshold value preset by the awakening model;
if the first score is larger than or equal to a preset awakening threshold of the awakening model, sending the voice information to a local offline recognition model for second score calculation, and judging whether the second score is larger than or equal to the preset awakening threshold of the local offline recognition model;
and if the second score is larger than or equal to a preset awakening threshold value of the local offline recognition model, awakening the equipment to interact with the user.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc.
(3) A portable entertainment device: such devices can display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A voice wake-up method, comprising:
in response to the acquisition of voice information, sending the voice information to an awakening model for first score calculation, and judging whether the first score is greater than or equal to an awakening threshold value preset by the awakening model;
if the first score is larger than or equal to a preset awakening threshold of the awakening model, sending the voice information to a local offline recognition model for second score calculation, and judging whether the second score is larger than or equal to the preset awakening threshold of the local offline recognition model;
and if the second score is larger than or equal to a preset awakening threshold value of the local offline recognition model, awakening the equipment to interact with the user.
2. The method of claim 1, wherein after the determining whether the second score is greater than or equal to a preset wake-up threshold of the local offline recognition model, the method further comprises:
and if the first score is smaller than a preset awakening threshold value of the awakening model or the second score is smaller than a preset awakening threshold value of the local offline identification model, judging that the equipment is not awakened.
3. The method according to claim 1, wherein before the step of sending the voice information to an awakening model for first score calculation in response to the acquisition of the voice information and determining whether the first score is greater than or equal to an awakening threshold preset by the awakening model, the method further comprises:
in response to the acquired registration instruction of the user, acquiring a user-defined awakening word input by the user;
and converting the user-defined awakening words into information which can be configured to enter the awakening model and the local offline recognition model and configuring the information to enter the awakening model and the local offline recognition model based on the user-defined awakening words input by the user.
4. The method of claim 1, wherein the local offline recognition model is trained using a large amount of non-stationary domain common text and corpora.
5. The method of claim 4, wherein the secondary verification framework of the local offline recognition model comprises: the method comprises the steps of identifying a system based on grammar and/or an end-to-end deep learning identification system, wherein the system based on grammar and/or the system based on deep learning can be replaced and modified based on updating and modifying of the local offline identification model system.
6. The method of claim 5, wherein the local offline recognition model further comprises a normal path and a noise path, wherein the output content of the noise path is marked with a special mark so that the output content of the noise path can be recognized and discarded when the output result is marked with the special mark; or
The noise path has a higher penalty score than the normal path and is discarded directly if the output content penalty score is above a penalty threshold.
7. The method of claims 1-6, wherein the local offline recognition model can be replaced with a cloud-based recognition system for secondary verification.
8. A voice wake-up apparatus comprising:
the acquisition calculation judgment program module is configured to respond to acquisition of voice information, send the voice information to an awakening model to perform first score calculation, and judge whether the first score is greater than or equal to a preset awakening threshold value;
the calculation judging program module is configured to send the voice information to a local offline recognition model for second score calculation if the first score is greater than or equal to a preset awakening threshold value, and judge whether the second score is greater than or equal to the preset awakening threshold value of the local offline recognition model;
and the awakening program module is configured to awaken the equipment to interact with the user if the second score is greater than or equal to an awakening threshold preset by the local offline recognition model.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 7.
10. A storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 7.
CN202110481376.9A 2021-04-30 2021-04-30 Voice wake-up method and device Withdrawn CN113205809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110481376.9A CN113205809A (en) 2021-04-30 2021-04-30 Voice wake-up method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110481376.9A CN113205809A (en) 2021-04-30 2021-04-30 Voice wake-up method and device

Publications (1)

Publication Number Publication Date
CN113205809A true CN113205809A (en) 2021-08-03

Family

ID=77028206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110481376.9A Withdrawn CN113205809A (en) 2021-04-30 2021-04-30 Voice wake-up method and device

Country Status (1)

Country Link
CN (1) CN113205809A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838462A (en) * 2021-09-09 2021-12-24 北京捷通华声科技股份有限公司 Voice wake-up method and device, electronic equipment and computer readable storage medium
CN114168208A (en) * 2021-12-07 2022-03-11 思必驰科技股份有限公司 Wake-up decision method, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device
CN109448725A (en) * 2019-01-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of interactive voice equipment awakening method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device
CN109448725A (en) * 2019-01-11 2019-03-08 百度在线网络技术(北京)有限公司 A kind of interactive voice equipment awakening method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838462A (en) * 2021-09-09 2021-12-24 北京捷通华声科技股份有限公司 Voice wake-up method and device, electronic equipment and computer readable storage medium
CN114168208A (en) * 2021-12-07 2022-03-11 思必驰科技股份有限公司 Wake-up decision method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN108694940B (en) Voice recognition method and device and electronic equipment
WO2014208231A1 (en) Voice recognition client device for local voice recognition
CN107909998B (en) Voice instruction processing method and device, computer equipment and storage medium
CN109767763B (en) Method and device for determining user-defined awakening words
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
CN110534099A (en) Voice wakes up processing method, device, storage medium and electronic equipment
CN110503944B (en) Method and device for training and using voice awakening model
CN109473104B (en) Voice recognition network delay optimization method and device
CN110910885B (en) Voice wake-up method and device based on decoding network
CN111145732B (en) Processing method and system after multi-task voice recognition
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN110600008A (en) Voice wake-up optimization method and system
CN113205809A (en) Voice wake-up method and device
CN111179915A (en) Age identification method and device based on voice
CN111816190A (en) Voice interaction method and device for upper computer and lower computer
CN111243604B (en) Training method for speaker recognition neural network model supporting multiple awakening words, speaker recognition method and system
JP2020109475A (en) Voice interactive method, device, facility, and storage medium
CN110600029A (en) User-defined awakening method and device for intelligent voice equipment
CN110706691B (en) Voice verification method and device, electronic equipment and computer readable storage medium
CN109273004B (en) Predictive speech recognition method and device based on big data
CN110473524B (en) Method and device for constructing voice recognition system
CN112951219A (en) Noise rejection method and device
CN112447177B (en) Full duplex voice conversation method and system
CN111081256A (en) Digital string voiceprint password verification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210803