CN112581960A - Voice wake-up method and device, electronic equipment and readable storage medium - Google Patents

Voice wake-up method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112581960A
CN112581960A CN202011502794.3A CN202011502794A CN112581960A CN 112581960 A CN112581960 A CN 112581960A CN 202011502794 A CN202011502794 A CN 202011502794A CN 112581960 A CN112581960 A CN 112581960A
Authority
CN
China
Prior art keywords
awakening
precision
current
determining
wake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011502794.3A
Other languages
Chinese (zh)
Inventor
周毅
左声勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011502794.3A priority Critical patent/CN112581960A/en
Publication of CN112581960A publication Critical patent/CN112581960A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Telephone Function (AREA)

Abstract

The application discloses a voice awakening method and device, electronic equipment and a readable storage medium, and relates to the technical fields of artificial intelligence, voice technology, automatic driving, Internet of vehicles and the like. The specific implementation scheme is as follows: after the electronic equipment collects the audio signals in the current environment, the current environment volume is determined according to the audio signals, the current awakening precision is determined according to the current environment volume, the current awakening precision is used for indicating the difficulty degree of the voice assistant awakening currently, and the awakening precision corresponding to different environment volumes is different. And then, the electronic equipment determines whether to wake up the voice assistant according to the current wake-up precision. By adopting the scheme, the electronic equipment dynamically adjusts the awakening precision of the voice assistant by comparing the current environment volume with the ideal environment volume, thereby avoiding mistaken awakening with low cost and high efficiency.

Description

Voice wake-up method and device, electronic equipment and readable storage medium
Technical Field
The application relates to the technical fields of artificial intelligence, voice technology, automatic driving, Internet of vehicles and the like, in particular to a voice awakening method and device, electronic equipment and a readable storage medium.
Background
At present, many electronic devices, such as smart speakers, mobile phones, car machines, and the like, are equipped with voice assistants. The voice assistant helps the user to solve problems, such as weather inquiry and the like, through intelligent conversation and intelligent interaction of instant question and answer.
Typically, the user wakes up the voice assistant using a wake-up word that is self-contained by the electronic device or set by the user. This way of waking up easily leads to false wake-up. For example, if the waveform of a certain sound in the current environment of the electronic device is similar to the waveform of the wake-up word sent by the user, a false wake-up is triggered. To avoid false wake-up, the common method is: the voice recognition model in the voice assistant is trained by collecting the audio which can cause the false awakening, so that the voice assistant can not enter an awakening state after recognizing the sound which is mistakenly awakened.
The above-mentioned mode of preventing the mistake and awakening needs to collect a large amount of audio frequencies that can lead to the mistake and awaken and carry out model training, and is with high costs and inefficiency.
Disclosure of Invention
The application provides a voice awakening method, a voice awakening device, electronic equipment and a readable storage medium, wherein the current scene environment volume and the ideal environment volume are compared, the awakening precision is dynamically adjusted according to the comparison result, and the lower the awakening precision is, the more difficult the voice assistant is to awaken, so that mistaken awakening is avoided with low cost and high efficiency.
In a first aspect, an embodiment of the present application provides a voice assistant wake-up method, including:
collecting audio signals in a current environment;
determining the current environment volume according to the audio signal;
determining current awakening precision according to the current environment volume, wherein different environment volumes correspond to different awakening precisions;
and determining whether to wake up the voice assistant according to the current wake-up precision.
In a second aspect, an embodiment of the present application provides a voice wake-up apparatus, including:
the acquisition module is used for acquiring audio signals in the current environment;
the processing module is used for determining the current environment volume according to the audio signal;
the determining module is used for determining the current awakening precision according to the current environment volume, and different environment volumes correspond to different awakening precisions;
and the awakening module is used for determining whether to awaken the voice assistant according to the current awakening precision.
In a third aspect, an embodiment of the present application provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the first aspect or any possible implementation of the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product including a computer program, which, when run on an electronic device, causes the electronic device computer to execute the method in the first aspect or in various possible implementation manners of the first aspect.
In a fifth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing an electronic device to perform the method of the first aspect or the various possible implementations of the first aspect.
In a sixth aspect, an embodiment of the present application provides a terminal device, where a semantic assistant is installed on the terminal device, and the voice assistant wakes up by using the first aspect or various possible implementation manners of the first aspect.
According to the technology of the application, the electronic equipment dynamically adjusts the awakening precision of the voice assistant by comparing the current environment volume with the ideal environment volume, so that mistaken awakening is avoided with low cost and high efficiency.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a flowchart of a voice wake-up method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a first user interface for adjusting a corresponding relationship between a wake-up precision and an environmental volume in a voice wake-up method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a second user interface for setting a wake-up precision mode in the voice wake-up method according to the embodiment of the present application;
fig. 4 is a schematic structural diagram of a voice assistant wake-up apparatus according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of another voice assistant wake-up apparatus according to an embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of an example electronic device that may be used to implement embodiments of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, with the rapid development of the voice interaction technology, more and more electronic devices have a voice wake-up function, and a user can wake up a voice assistant on the electronic device by sending a simple wake-up word. During the process of using the voice assistant, the user is easy to be awoken due to the environment difference and the like. For example, the electronic device is a vehicle-mounted terminal mounted on a vehicle, and when the vehicle travels on different roads, some roads are likely to mistakenly wake up the voice assistant due to different road noises. For another example, when a person in a car chats, the person may be confused with a word similar to the awakening word, or the person may be too noisy to wake up by mistake. For another example, when the car machine plays music, the music sound is too loud, and words similar to the awakening words are mixed in the music, so that the car machine is awakened by mistake. False wake-up refers to: the electronic device is in a wide variety of sounds in the environment that are not the wake-up word but that wake up the voice assistant on the electronic device.
To avoid false wake-up, the common method is: the voice recognition model in the voice assistant is trained by collecting the audio which can cause the false awakening, so that the voice assistant can not enter an awakening state after recognizing the sound which is mistakenly awakened.
The above-mentioned mode of preventing the mistake and awakening needs to collect a large amount of audio frequencies that can lead to the mistake to awaken and carry out model training, and the human cost is high and inefficiency.
The embodiment of the application relates to the technical fields of artificial intelligence, voice technology, automatic driving, Internet of vehicles and the like, and aims to dynamically adjust the awakening precision of the voice assistant by comparing the current environment volume with the ideal environment volume, so that mistaken awakening is avoided with low cost and high efficiency.
Fig. 1 is a flowchart of a voice wake-up method according to an embodiment of the present application. The execution subject of the present embodiment is the electronic device, and the present embodiment includes:
101. audio signals in a current environment are acquired.
Illustratively, the electronic device has a microphone (mic) or the like thereon, and the electronic device can collect an audio signal in the current environment by using the mic, wherein the audio signal is a sum of sounds emitted by various sound sources in the current environment, including but not limited to chat sounds, music sounds, awakening words emitted by a user, car sounds, sounds of kidnapping, and the like. The mic is, for example, a microphone array, and the microphone array is a sound pickup device in which a plurality of microphones are arranged on a device with a preset spatial distribution characteristic.
102. And determining the current environment volume according to the audio signal.
Illustratively, the volume may also be referred to as sound pressure in decibels (dB). The electronic equipment determines the current sound pressure of the audio signal and determines the current environment volume according to the current sound pressure, the reference sound pressure and the like. The reference sound pressure is, for example, 0.00002 pascal.
103. And determining the current awakening precision according to the current environment volume, wherein different environment volumes correspond to different awakening precisions.
Illustratively, the value range of the wake-up precision is, for example, 0 to 1, and when the wake-up precision is 1, the lower the probability of false wake-up is, and when the wake-up precision is 0, the higher the probability of false wake-up is. The higher the wake-up accuracy, the easier the voice assistant is to wake up, and the lower the wake-up accuracy, the harder the voice assistant is to wake up. In addition, in order to prevent the voice assistant from being awakened due to too low awakening precision, a minimum awakening progress can be set, and if the current awakening progress is lower than the minimum awakening precision, the minimum awakening precision is used as the current awakening precision. The minimum wake-up accuracy is, for example, 0.5. Experience has found that when the current wake-up accuracy is 0.5, many false wake-ups are intercepted and 80% of the wake-up rate can be guaranteed. Many false awakenings will be intercepted, meaning: even if the waveform of the audio signal of the current environment is similar to the waveform of the audio corresponding to the awakening word, the voice assistant is not awakened, namely, false awakening does not occur.
The electronic equipment dynamically adjusts the awakening precision according to the current environment volume, the lower the current environment volume is, the higher the awakening precision is, which indicates that the current environment is quieter, and the easier the voice assistant is to be awakened, for example, if the user lightly says "little black and little black", the voice assistant is awakened. Wherein, the small black is the awakening word of the voice assistant. The larger the volume of the current environment is, the lower the awakening precision is, the noisier the current environment is, and the more difficult the voice assistant is to be awakened. For example, the user wants to wake up the voice assistant to clearly and loudly send out "black and small".
Assuming that the volume of the normal speaking of the user is 65dB, the ideal environment volume is set to 65dB, the awakening precision corresponding to 65dB is 1, and the higher the environment volume is, the lower the awakening precision is. For example, the wake-up accuracy is 1 for 0-67dB, and the wake-up schedule is 0.9 for 68-72 dB. The electronic device stores the corresponding relationship between the environment volume and the awakening precision.
104. Determining whether to awaken the voice assistant according to the current awakening precision, and executing step 105 if the electronic equipment determines to awaken the voice assistant; if the electronic device does not wake up the voice assistant, step 106 is performed.
Illustratively, the voice assistant corresponds to a voice recognition model, and the voice recognition model is used for judging whether the audio signal of the current environment is the standard audio corresponding to the awakening word. When the awakening precision is higher, the judgment condition that whether the audio signal is the same as the standard audio is lower by the voice recognition model, so that the voice assistant is easily awakened. When the awakening precision is lower, the judgment condition that whether the audio signal of the current environment is the same as the standard audio is higher by the voice recognition model, so that the voice assistant is not easy to awaken.
105. The voice assistant on the electronic device wakes up and interacts with the user by voice.
For example, after the voice assistant is awakened, the electronic device responds to a voice command input by the user, such as playing music, inquiring and broadcasting weather forecast, and the like.
106. The voice assistant does not wake up on standby.
Illustratively, whether the electronic device is playing an animation, in a standby state, etc., the voice assistant is not awakened, waits to acquire the audio signal again and decides whether to awaken.
According to the voice awakening method provided by the embodiment of the application, after the electronic equipment collects the audio signal in the current environment, the current environment volume is determined according to the audio signal, the current awakening precision is determined according to the current environment volume, the current awakening precision is used for indicating the difficulty degree of the voice assistant awakening currently, and the awakening precision corresponding to different environment volumes is different. And then, the electronic equipment determines whether to wake up the voice assistant according to the current wake-up precision. By adopting the scheme, the electronic equipment dynamically adjusts the awakening precision of the voice assistant by comparing the current environment volume with the ideal environment volume, thereby avoiding mistaken awakening with low cost and high efficiency.
In the above embodiment, when the electronic device determines whether to wake up the voice assistant according to the current wake-up accuracy, first, a similarity between a feature of the audio signal and a feature of a standard audio is determined. And then, the electronic equipment determines the awakening index of the audio signal according to the similarity and the awakening precision, if the awakening index is greater than or equal to a preset awakening index, the voice assistant is awakened, and if the awakening index is smaller than the preset awakening index, the voice assistant is not awakened.
Illustratively, the electronic device stores standard audio corresponding to the wake-up word of the voice assistant, the standard audio is a segment of audio carried by the program installation package of the voice assistant, and the higher the similarity between the audio signal in the current environment and the standard audio is, the closer the current audio signal is to the standard audio is. The electronic device can determine the similarity between the audio signal in the current environment and the standard audio by means of feature comparison. The characteristics are, for example, waveform, frequency, amplitude, phase, etc.
And after the electronic equipment obtains the similarity between the audio signal and the standard audio, determining the awakening index of the audio signal according to the similarity and the awakening precision. For example, the electronic device uses the wake-up accuracy as a coefficient, multiplies the similarity by the coefficient, and multiplies the wake-up index without considering the wake-up accuracy by the product to obtain a final wake-up index. For example, the similarity is 0.8, and the similarity corresponds to 80 minutes of arousal index. Assuming that the wake-up precision is 0.5, 0.5 × 0.8 is 0.4, the corresponding wake-up index of 0.4 is 40 minutes, and the preset wake-up index is 60 minutes, and obviously, the determined wake-up index according to the similarity and the wake-up precision is smaller than the preset wake-up index, and the voice assistant is not woken up.
By adopting the scheme, the electronic equipment determines the awakening index of the audio signal according to the similarity and the awakening precision of the current audio signal and the standard audio, and further determines whether to awaken the voice assistant, the method is simple, and the probability of mistaken awakening is greatly reduced.
In the above embodiment, when the electronic device determines the current waking precision according to the current environment volume, first, a difference between the current environment volume and a preset ideal volume is determined. And then, the electronic equipment determines the current awakening precision according to the difference value.
Illustratively, the audio signal in the current environment contains noise, including road noise, fetal noise, wind noise, music, and the like. In the process of waking up the voice assistant, the electronic device needs to perform noise reduction processing on the audio signal by using a noise reduction algorithm, filter noise in the audio signal through the noise reduction processing, only retain voice, and finally transmit the voice to the voice recognition model. However, in some scenes, noise in the audio signal can be filtered out by using a noise reduction algorithm, and only pure human voice is reserved; in other scenarios, all noise cannot be filtered out by using a noise reduction algorithm, and some noise remains in the noise-reduced audio signal. That is, the noise reduction algorithm is not robust. Therefore, in the embodiment of the application, the current environment volume is determined according to the audio signal, the current environment volume is compared with the ideal volume to obtain a difference value, and the awakening precision is dynamically adjusted according to the difference value. For example, assuming that the differences are 0-5, 6-10, 11-15, 16-20, 21-25, 26-30, the wake-up accuracies are 0.95, 0.9, 0.85, 0.8, 0.7, 0.6 in order. When the ideal volume is 65dB, the current environment volume is 90dB, and the difference is 25dB, the awakening precision is 0.7.
By adopting the scheme, the current environment volume is determined according to the audio signal, the current environment volume is compared with the ideal volume to obtain a difference value, and the awakening precision is dynamically adjusted according to the difference value, so that the aim of improving the robustness of the noise reduction algorithm is fulfilled.
In the above embodiment, after determining the current waking precision according to the current environment volume, a prompt message is further output, where the prompt message is used to prompt the user that the current waking precision is low and the voice assistant is difficult to wake up.
When the current awakening precision is lower than the preset awakening precision, the electronic equipment outputs prompt information in modes of voice broadcasting, animation display and the like so as to prompt a user that the current awakening precision is low and a voice assistant is difficult to awaken. For example, a piece of text is stored on the electronic device, and the content of the text is "the current environment is noisy, and the wake-up rate has been reduced to ensure the wake-up accuracy". Assuming that the preset awakening precision is 0.7, when the awakening precision is lower than 0.7, the electronic device prompts the user in a Text-To-Speech (TTS) mode, namely, the Text is displayed while the voice is broadcasted.
By adopting the scheme, when the awakening precision is low, the electronic equipment actively sends prompt information to the user, so that the user can master the current working state of the electronic equipment skillfully, and flexible adjustment can be performed according to the requirement.
Sometimes, some users pay more attention to the wake-up rate than to avoid false wake-up. That is, the user is able to receive false wakeups, and what he would prefer to experience is: the voice assistant can be awakened by lightly speaking the awakening word. In the embodiment of the application, the entrance is provided for the user to adjust the corresponding relation between the awakening precision and the environmental audio. For example, the text stored on the electronic device is "is the current environment noisy, and in order to ensure the waking precision, the waking rate has been reduced, and whether the waking precision needs to be adjusted to be free from the noisy environment? When the user says ' yes ', ' good ', ' OK ', ' etc., the electronic device pops up a first user interface for the user to adjust the corresponding relationship between the awakening precision and the environmental volume.
In addition, the user can modify the corresponding relation between the awakening progress and the environment volume at any time. For example, at any time, the user can send a voice command of "little black, and adjust the corresponding relationship between the awakening precision and the environmental volume". And then, the electronic equipment adjusts the corresponding relation between the awakening precision and the environmental volume in a voice mode or touch operation.
Examples of speech patterns are as follows:
the user: "little black, i want to wake you up easier".
An electronic device: "good, the ambient volume corresponding to each wakeup accuracy has been turned up".
Assuming that the difference values are 0-5, 6-10, 11-15, 16-20, 21-25 and 26-30 before adjustment, the awakening precision is 0.95, 0.9, 0.85, 0.8, 0.7 and 0.6 in sequence. After adjustment, when the difference values are 0-5, 6-10, 11-15, 16-20, 21-25 and 26-30, the awakening precision is 0.98, 0.95, 0.9, 0.85, 0.8 and 0.8 in sequence.
The above description is given by taking an example that the user sends "little black and little black, i want to wake you up more easily", however, the embodiment of the present application is not limited, for example, the user may also say that: small black and small black, fast reaction point, small black and small black, fast start mode and the like
The touch operation mode comprises the following steps:
fig. 2 is a schematic diagram of a first user interface for adjusting a corresponding relationship between a wake-up precision and an ambient volume in a voice wake-up method according to an embodiment of the present application. Referring to fig. 2, the electronic device is an intelligent speaker with a display screen. And when the electronic equipment receives the first setting request, popping up a first user interface, wherein the awakening precision is displayed on the first user interface, the lowest awakening precision is 0.5, the highest awakening precision is 1, and the black filling shows the current awakening progress. The user increases the awakening precision through right sliding and decreases the awakening precision through left sliding.
By adopting the scheme, the electronic equipment can adjust the awakening precision according to the user requirements of the user so as to ensure the awakening rate or avoid mistaken awakening according to the user requirements.
In the above embodiment, when the electronic device determines the current environment volume according to the audio signal, the signal-to-noise ratio of the audio signal is determined first. Then, the electronic device determines the current ambient volume according to the signal-to-noise ratio.
For example, the electronic device can know whether the current environment is a noisy environment or a quiet environment according to the signal-to-noise ratio. For example, the signal to noise ratio is high and the surrounding environment is quiet. In addition, after the electronic equipment determines the signal to noise ratio, the signal to noise ratio can be processed according to the distance between sound and the electronic equipment, the sound intensity of sound emitted by a user for audio signals and the like, so that the accurate signal to noise ratio is obtained. For example, the coefficients of signal-to-noise ratio, distance and sound intensity are preset, and a more accurate signal-to-noise ratio is determined by combining a plurality of sound factors through a weighted summation method.
By adopting the method, the electronic equipment determines the current environment volume according to the signal-to-noise ratio of the audio signal, and the aim of accurately and quickly determining the current environment volume is fulfilled.
In the above embodiment, the user can flexibly set whether to start the wake-up precision mode. When the electronic equipment starts the awakening precision mode, the electronic equipment dynamically adjusts the awakening precision according to the current environment volume so as to avoid mistaken awakening. And when the electronic equipment does not start the awakening precision mode, the electronic equipment does not consider mistaken awakening. The user can set the awakening precision mode through operation modes such as voice operation or touch operation.
When the user starts the wake-up precision mode in the voice operation mode, for example, the following is performed: the user: and if the voice is small, black, small and black, the awakening precision mode is started, and the section of voice is the second setting request input by the user. And after the electronic equipment receives the voice instruction, setting the voice assistant into an awakening precision mode.
Fig. 3 is a schematic diagram of a second user interface for setting a wake-up precision mode in the voice wake-up method according to the embodiment of the present application. Referring to fig. 3, the electronic device is an intelligent speaker with a display screen. And when the electronic equipment receives a second setting request, popping up a second user interface, displaying a switch of the awakening precision mode on the second user interface, and clicking the switch by a user to start the awakening progress mode. Then, if the user wants to turn off the wake-up accuracy mode, the user clicks the switch again.
By adopting the scheme, the electronic equipment can determine whether to enter the awakening precision mode according to the user requirements of the user, and the mode is simple and high in flexibility.
In the above, a specific implementation of the face reconstruction model evaluation method according to the embodiment of the present application is introduced, and the following is an embodiment of the apparatus according to the present application, which can be used to implement the embodiment of the method according to the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 4 is a schematic structural diagram of a voice assistant wake-up apparatus according to an embodiment of the present disclosure. The apparatus may be integrated in or implemented by an electronic device. As shown in FIG. 4, in this embodiment, the voice assistant wake-up apparatus 400 may comprise: an acquisition module 41, a processing module 42, a determination module 43 and a wake-up module 44.
An acquisition module 41, configured to acquire an audio signal in a current environment;
a processing module 42, configured to determine a current ambient volume according to the audio signal;
a determining module 43, configured to determine a current awakening precision according to the current environment volume, where different environment volumes correspond to different awakening precisions;
and the awakening module 44 is configured to determine whether to awaken the voice assistant according to the current awakening precision.
In a possible implementation manner, the wake-up module 44 is configured to determine a similarity between the audio signal and a standard audio, determine a wake-up index of the audio signal according to the similarity and the wake-up precision, and wake up the voice assistant if the wake-up index is greater than or equal to a preset wake-up index; and if the awakening index is smaller than the preset awakening index, not awakening the voice assistant.
In a possible implementation manner, the determining module 43 is configured to determine a difference between the current ambient volume and a preset ideal volume; and determining the current awakening precision according to the difference value.
Fig. 5 is a schematic structural diagram of another voice assistant wake-up apparatus according to an embodiment of the present disclosure. The capturing module 51, the processing module 52, the determining module 53 and the waking module 54 in the voice assistant waking device 500 according to this embodiment are respectively equivalent to the capturing module 41, the processing module 42, the determining module 43 and the waking module 44 in fig. 4. The voice assistant wake-up apparatus 500 provided in this embodiment further includes, on the basis of fig. 4:
and an output module 55, configured to output a prompt message when the current wake-up precision is lower than a preset wake-up precision after the determining module 53 determines the current wake-up precision according to the current environment volume, where the prompt message is used to prompt a user that the current wake-up precision is low and the voice assistant is difficult to wake up.
Referring to fig. 5 again, in a possible implementation manner, the voice assistant wake-up apparatus 500 further includes:
a receiving module 56, configured to receive a first setting request of a user, where the first setting request is used to set environment volumes corresponding to different wake-up accuracies;
a display module 57, configured to display a first user interface, where the first user interface is used for the user to modify a correspondence between the awakening precision and the environmental volume;
the processing module 52 is further configured to modify a corresponding relationship between the waking precision and the environmental volume according to an operation of the user on the first user interface.
In a possible implementation, the processing module 52 is configured to determine a signal-to-noise ratio of the audio signal; and determining the current environment volume according to the signal-to-noise ratio.
In a possible implementation manner, the receiving module 56 is configured to receive a second setting request input by a user before the determining module 53 determines the current waking precision according to the current ambient volume, where the second setting request is used to set the operating mode of the electronic device to the mode of turning on the waking precision; a display module 57, configured to display a second user interface, where the second user interface is used for the user to modify the operating mode of the electronic device; the processing module 42 is further configured to set a working mode of the electronic device to an open wake-up precision mode according to the operation of the user on the second user interface.
The embodiment of the present application further provides a terminal device, where a voice assistant is installed on the terminal device, and the voice assistant wakes up in the voice assistant wake-up manner described in any of the above embodiments. The terminal device is, for example, a mobile phone, a sound box, a vehicle-mounted terminal, and the like, and the embodiment of the present application is not limited.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
FIG. 6 is a schematic block diagram of an example electronic device that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs the various methods and processes described above, such as the voice assistant wake-up method. For example, in some embodiments, the voice assistant wake method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM602 and/or the communication unit 609. When the computer program is loaded into RAM603 and executed by the computing unit 601, one or more steps of the voice assistant wake-up method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the voice assistant wake-up method.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty, weak service extensibility and the like in a conventional physical host and a Virtual Private Server (VPS). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (18)

1. A voice wake-up method, comprising:
collecting audio signals in a current environment;
determining the current environment volume according to the audio signal;
determining current awakening precision according to the current environment volume, wherein different environment volumes correspond to different awakening precisions;
and determining whether to wake up the voice assistant according to the current wake-up precision.
2. The method of claim 1, wherein the determining whether to wake the voice assistant based on the current wake accuracy comprises:
determining similarity of the audio signal and standard audio;
determining the awakening index of the audio signal according to the similarity and the awakening precision;
if the awakening index is larger than or equal to a preset awakening index, awakening the voice assistant;
and if the awakening index is smaller than the preset awakening index, not awakening the voice assistant.
3. The method of claim 1, wherein the determining a current wake-up accuracy from the current ambient volume comprises:
determining the difference value between the current environment volume and a preset ideal volume;
and determining the current awakening precision according to the difference value.
4. The method according to any one of claims 1-3, wherein after determining a current wake-up accuracy from the current ambient volume, further comprising:
and when the current awakening precision is lower than the preset awakening precision, outputting prompt information, wherein the prompt information is used for prompting the user that the current awakening precision is low and the voice assistant is difficult to awaken.
5. The method according to any one of claims 1-3, further comprising:
receiving a first setting request of a user, wherein the first setting request is used for setting environment volumes corresponding to different awakening precisions;
displaying a first user interface, wherein the first user interface is used for the user to modify the corresponding relation between the awakening precision and the environmental volume;
and modifying the corresponding relation between the awakening precision and the environmental volume according to the operation of the user on the first user interface.
6. The method of any of claims 1-3, wherein the determining a current ambient volume from the audio signal comprises:
determining a signal-to-noise ratio of the audio signal;
and determining the current environment volume according to the signal-to-noise ratio.
7. The method according to any one of claims 1-3, wherein prior to determining the current wake-up accuracy from the current ambient volume, further comprising:
receiving a second setting request input by a user, wherein the second setting request is used for setting the working mode of the electronic equipment to be an opening awakening precision mode;
displaying a second user interface, wherein the second user interface is used for the user to modify the working mode of the electronic equipment;
and setting the working mode of the electronic equipment to be an opening awakening precision mode according to the operation of the user on the second user interface.
8. A voice wake-up apparatus comprising:
the acquisition module is used for acquiring audio signals in the current environment;
the processing module is used for determining the current environment volume according to the audio signal;
the determining module is used for determining the current awakening precision according to the current environment volume, and different environment volumes correspond to different awakening precisions;
and the awakening module is used for determining whether to awaken the voice assistant according to the current awakening precision.
9. The apparatus of claim 8, wherein,
the awakening module is used for determining the similarity between the audio signal and a standard audio, determining an awakening index of the audio signal according to the similarity and the awakening precision, and awakening the voice assistant if the awakening index is greater than or equal to a preset awakening index; and if the awakening index is smaller than the preset awakening index, not awakening the voice assistant.
10. The apparatus of claim 8, wherein,
the determining module is used for determining the difference value between the current environment volume and a preset ideal volume; and determining the current awakening precision according to the difference value.
11. The apparatus of any of claims 8-10, further comprising:
and the output module is used for outputting prompt information when the current awakening precision is lower than the preset awakening precision after the determining module determines the current awakening precision according to the current environment volume, wherein the prompt information is used for prompting a user that the current awakening precision is low and the voice assistant is difficult to awaken.
12. The apparatus of any of claims 8-10, further comprising:
the receiving module is used for receiving a first setting request of a user, wherein the first setting request is used for setting environment volumes corresponding to different awakening precisions;
the display module is used for displaying a first user interface, and the first user interface is used for the user to modify the corresponding relation between the awakening precision and the environmental volume;
the processing module is further configured to modify a correspondence between the wake-up accuracy and the ambient volume according to an operation of the user on the first user interface.
13. The apparatus of any one of claims 8-10,
the processing module is used for determining the signal-to-noise ratio of the audio signal; and determining the current environment volume according to the signal-to-noise ratio.
14. The apparatus of any of claims 8-10, further comprising:
the receiving module is used for receiving a second setting request input by a user before the determining module determines the current awakening precision according to the current environment volume, wherein the second setting request is used for setting the working mode of the electronic equipment to be the mode for starting the awakening precision;
the display module is used for displaying a second user interface, and the second user interface is used for the user to modify the working mode of the electronic equipment;
the processing module is further configured to set a working mode of the electronic device to an open wake-up precision mode according to the operation of the user on the second user interface.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-7 above.
18. A terminal device, comprising: a voice assistant that wakes up using the method of any of claims 1-7 above.
CN202011502794.3A 2020-12-18 2020-12-18 Voice wake-up method and device, electronic equipment and readable storage medium Withdrawn CN112581960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011502794.3A CN112581960A (en) 2020-12-18 2020-12-18 Voice wake-up method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011502794.3A CN112581960A (en) 2020-12-18 2020-12-18 Voice wake-up method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112581960A true CN112581960A (en) 2021-03-30

Family

ID=75135931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011502794.3A Withdrawn CN112581960A (en) 2020-12-18 2020-12-18 Voice wake-up method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112581960A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593561A (en) * 2021-07-29 2021-11-02 普强时代(珠海横琴)信息技术有限公司 Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism
CN113808585A (en) * 2021-08-16 2021-12-17 百度在线网络技术(北京)有限公司 Earphone awakening method, device, equipment and storage medium
CN115881126A (en) * 2023-02-22 2023-03-31 广东浩博特科技股份有限公司 Switch control method and device based on voice recognition and switch equipment
CN115995231A (en) * 2023-03-21 2023-04-21 北京探境科技有限公司 Voice wakeup method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447472A (en) * 2017-02-16 2018-08-24 腾讯科技(深圳)有限公司 Voice awakening method and device
CN109243431A (en) * 2017-07-04 2019-01-18 阿里巴巴集团控股有限公司 A kind of processing method, control method, recognition methods and its device and electronic equipment
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
CN110782891A (en) * 2019-10-10 2020-02-11 珠海格力电器股份有限公司 Audio processing method and device, computing equipment and storage medium
CN111429902A (en) * 2020-03-17 2020-07-17 北京百度网讯科技有限公司 Method and apparatus for waking up a device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447472A (en) * 2017-02-16 2018-08-24 腾讯科技(深圳)有限公司 Voice awakening method and device
CN109243431A (en) * 2017-07-04 2019-01-18 阿里巴巴集团控股有限公司 A kind of processing method, control method, recognition methods and its device and electronic equipment
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
CN110782891A (en) * 2019-10-10 2020-02-11 珠海格力电器股份有限公司 Audio processing method and device, computing equipment and storage medium
CN111429902A (en) * 2020-03-17 2020-07-17 北京百度网讯科技有限公司 Method and apparatus for waking up a device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113593561A (en) * 2021-07-29 2021-11-02 普强时代(珠海横琴)信息技术有限公司 Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism
CN113808585A (en) * 2021-08-16 2021-12-17 百度在线网络技术(北京)有限公司 Earphone awakening method, device, equipment and storage medium
CN115881126A (en) * 2023-02-22 2023-03-31 广东浩博特科技股份有限公司 Switch control method and device based on voice recognition and switch equipment
CN115995231A (en) * 2023-03-21 2023-04-21 北京探境科技有限公司 Voice wakeup method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN112581960A (en) Voice wake-up method and device, electronic equipment and readable storage medium
EP4060658A1 (en) Voice wake-up method, apparatus, and system
CN107509153B (en) Detection method and device of sound playing device, storage medium and terminal
CN108681440A (en) A kind of smart machine method for controlling volume and system
CN110875045A (en) Voice recognition method, intelligent device and intelligent television
CN110870201A (en) Audio signal adjusting method and device, storage medium and terminal
CN110248021A (en) A kind of smart machine method for controlling volume and system
CN113630708B (en) Method and device for detecting abnormal earphone microphone, earphone kit and storage medium
CN111968644B (en) Intelligent device awakening method and device and electronic device
CN108922517A (en) The method, apparatus and storage medium of training blind source separating model
JP2024507916A (en) Audio signal processing method, device, electronic device, and computer program
CN112233676B (en) Intelligent device awakening method and device, electronic device and storage medium
CN115810356A (en) Voice control method, device, storage medium and electronic equipment
JP2022095689A (en) Voice data noise reduction method, device, equipment, storage medium, and program
CN113176870B (en) Volume adjustment method and device, electronic equipment and storage medium
EP3929723A1 (en) Method and apparatus for waking up smart device, smart device and medium
CN112420043A (en) Intelligent awakening method and device based on voice, electronic equipment and storage medium
CN113808566B (en) Vibration noise processing method and device, electronic equipment and storage medium
WO2022052691A1 (en) Multi-device voice processing method, medium, electronic device, and system
CN113824843B (en) Voice call quality detection method, device, equipment and storage medium
CN114255779A (en) Audio noise reduction method for VR device, electronic device and storage medium
CN114333017A (en) Dynamic pickup method and device, electronic equipment and storage medium
CN113889084A (en) Audio recognition method and device, electronic equipment and storage medium
CN111833883A (en) Voice control method and device, electronic equipment and storage medium
CN112885341A (en) Voice wake-up method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211028

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd.

Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
WW01 Invention patent application withdrawn after publication

Application publication date: 20210330

WW01 Invention patent application withdrawn after publication