CN111028838A - Voice wake-up method, device and computer readable storage medium - Google Patents
Voice wake-up method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN111028838A CN111028838A CN201911300422.XA CN201911300422A CN111028838A CN 111028838 A CN111028838 A CN 111028838A CN 201911300422 A CN201911300422 A CN 201911300422A CN 111028838 A CN111028838 A CN 111028838A
- Authority
- CN
- China
- Prior art keywords
- signal
- bone vibration
- detection
- voice
- obtaining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 129
- 238000001514 detection method Methods 0.000 claims abstract description 126
- 230000005236 sound signal Effects 0.000 claims abstract description 58
- 230000000694 effects Effects 0.000 claims abstract description 28
- 230000002452 interceptive effect Effects 0.000 claims abstract description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000008054 signal transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 210000001847 jaw Anatomy 0.000 description 2
- 210000004373 mandible Anatomy 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 210000000214 mouth Anatomy 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 208000031361 Hiccup Diseases 0.000 description 1
- 208000001705 Mouth breathing Diseases 0.000 description 1
- 210000000613 ear canal Anatomy 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 206010041232 sneezing Diseases 0.000 description 1
- 230000009747 swallowing Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
Abstract
The invention discloses a voice awakening method, equipment and a computer readable storage medium, wherein the method comprises the following steps: obtaining a bone vibration signal; carrying out voice activity detection on the bone vibration signal to obtain a detection result; determining whether the detection result is a voice signal; collecting a sound signal when the detection result is determined to be a voice signal; executing a designated task corresponding to the sound signal to generate an interactive behavior for a user; by applying the awakening method, the equipment and the computer readable storage medium provided by the embodiment of the invention, the equipment can be in a low power consumption state, has high real-time performance, can reach 10 ms-level time delay, improves the awakening accuracy and avoids mistaken awakening.
Description
Technical Field
The present invention relates to the field of voice processing technologies, and in particular, to a voice wake-up method, device, and computer-readable storage medium.
Background
The voice wake-up means that a user wakes up the electronic device by speaking a wake-up word, so that the electronic device enters a state of waiting for a voice instruction or directly executes a predetermined voice instruction. Voice wake-up is commonly applied in smart devices to improve the user experience. The existing voice awakening technology is not sensitive enough in detection, and the previous frame signal of the awakening word is likely to be lost, so that if the awakening signal is to be completely analyzed, a voice activity detection algorithm and an awakening algorithm are always required to be always on in a standby state of equipment, and power consumption is affected.
Disclosure of Invention
Embodiments of the present invention provide a voice wake-up method, device, and computer-readable storage medium, which have an effect of being able to still wake up quickly in a low power consumption state.
One aspect of the present invention provides a voice wake-up method, including: obtaining a bone vibration signal; carrying out voice activity detection on the bone vibration signal to obtain a detection result; determining whether a voice signal is present in the bone vibration signal based on the detection result; collecting a sound signal when the bone vibration signal is determined to have a voice signal; and executing a specified task corresponding to the sound signal to generate an interactive behavior for the user.
In one embodiment, obtaining a bone vibration signal comprises: obtaining a detection signal; determining whether the detection signal satisfies a bone vibration condition; and when the detection signal is determined to meet the bone vibration condition, determining the detection signal as a bone vibration signal.
In an embodiment, the method further comprises: judging whether the designated object is executing the playing task or not, and obtaining a judgment result; when the judgment result is that the designated object is executing the playing task, performing echo cancellation processing on the detection signal to obtain a processing signal; the processed signal is used to determine whether a bone vibration condition is satisfied.
In one embodiment, after acquiring the sound signal when the presence of the speech signal in the bone vibration signal is determined, the method further comprises: judging whether the sound signal meets a wake-up condition or not; and when the sound signal is judged to meet the awakening condition, executing a specified task corresponding to the sound signal.
In one embodiment, the voice activity detection of the bone vibration signal comprises: performing voice activity detection on the bone vibration signal through a model; wherein the model is obtained by training, and the data used for training the model comprises noise and human voice.
Another aspect of the present invention provides a voice wake-up apparatus, including: an obtaining module for obtaining a bone vibration signal; the detection module is used for carrying out voice activity detection on the bone vibration signal to obtain a detection result; a determination module for determining whether a voice signal exists in the bone vibration signal based on the detection result; the acquisition module is used for acquiring a sound signal when the bone vibration signal is determined to have a voice signal; and the execution module is used for executing a specified task corresponding to the sound signal so as to generate an interactive behavior for a user.
In an embodiment, the obtaining module includes: an obtaining submodule for obtaining a detection signal; a first determination submodule for determining whether the detection signal satisfies a bone vibration condition; a second determining submodule, configured to determine that the detection signal is a bone vibration signal when it is determined that the detection signal satisfies a bone vibration condition.
In an embodiment, the apparatus further comprises: the judging module is used for judging whether the specified object executes the playing task or not and obtaining a judging result; the echo cancellation module is used for performing echo cancellation processing on the detection signal to obtain a processing signal when the judgment result indicates that the designated object is executing a playing task; the processed signal is used to determine whether a bone vibration condition is satisfied.
In an embodiment, the apparatus further comprises: the judging module is also used for judging whether the sound signal meets the awakening condition; and the execution module is used for executing a specified task corresponding to the sound signal when the sound signal is judged to meet the awakening condition.
In one embodiment, the detection module includes: performing voice activity detection on the bone vibration signal through a model; wherein the model is obtained by training, and the data used for training the model comprises noise and human voice.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions which, when executed, perform a wake-up method as defined in any one of the above.
The voice awakening method, the voice awakening device and the computer readable storage medium provided by the embodiment of the invention are used for awakening the target device as required so as to generate interactive behaviors for a user. Except for the module corresponding to the bone vibration signal, other algorithm modules can be in a closed state, so that the device is in a low-power consumption state, has high real-time performance, can reach 10 ms-level time delay, improves the awakening accuracy and avoids mistaken awakening.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic diagram of an implementation flow of a voice wake-up method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an implementation process of obtaining a bone vibration signal in a voice wake-up method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an implementation flow of echo cancellation processing in a voice wake-up method according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an implementation flow of determining a wake-up condition in a voice wake-up method according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a voice wake-up apparatus according to an embodiment of the present invention;
fig. 6 is a block diagram of another voice wake-up apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of a voice wake-up method according to an embodiment of the present invention.
Referring to fig. 1, in an aspect, an embodiment of the present invention provides a voice wake-up method, where the method includes: step 101, obtaining a bone vibration signal; step 102, carrying out voice activity detection on the bone vibration signal to obtain a detection result; 103, determining whether a voice signal exists in the bone vibration signal based on the detection result; 104, collecting a sound signal when the bone vibration signal is determined to have a voice signal; and step 105, executing a specified task corresponding to the sound signal to generate interactive behavior for the user.
The awakening method provided by the embodiment of the invention is used for rapidly awakening the target equipment as required so as to execute the specified task corresponding to the user requirement and generate the interactive behavior for the user. The method is applied to intelligent equipment with a data processing function, such as wearable intelligent equipment, a portable intelligent terminal, a fixed terminal and the like, and further comprises the following steps: intelligent glasses, intelligent earphone, intelligent gloves, intelligent wrist-watch, intelligent dress, intelligent ornaments, notebook computer, cell-phone, intelligent audio amplifier, desktop computer etc.. Specifically, the method includes obtaining a bone vibration signal, where the bone vibration signal is used to determine whether a bone used by a user to generate sound has motion, and the bone includes, but is not limited to, one or more of a mandible, a hyoid, a larynx, etc. When the bone vibration signal is obtained, it can be considered that the bone of the user at this time has a vocal action, and further, that the user at this time has a possibility of making a voice. Such as: when the user speaks, the mandible can be opened and closed, and at the moment, the device can acquire bone vibration signals.
The method further comprises the step of carrying out voice activity detection on the bone vibration signal to obtain a detection result. Voice activity detection is used to detect whether a voice signal is present. The detection result obtained by the voice activity detection is whether a voice signal is present in the bone vibration signal. Can be used to determine whether the user's skeletal motion corresponding to the bone vibration signal is such that the user is speaking. That is, when the detection result indicates that a speech signal exists in the bone vibration signal, it may be considered that the reason for the bone motion of the user is that the user is speaking, and at this time, the sound of the user is collected to obtain a sound signal corresponding to the sound of the user. It should be added that, when the detection result is that no voice signal exists in the bone vibration signal, it may be considered that the cause of the bone motion of the user is not the user speaking, and at this time, the bone vibration signal may be discarded and the next round of bone vibration signal may be obtained. Here, a microphone may be preset on the device to collect the user's voice, or the microphone may be in communication connection with the device, and a voice signal may be obtained by the device through signal transmission with the microphone. The microphone is selected to be a common air conduction microphone. The sound signal is analyzed through voice recognition analysis processing, a target intention corresponding to the sound signal of the user is obtained, according to the target intention of the user, and according to the function of the device, the device can correspondingly obtain a specified task corresponding to the target intention, the specified task can have differences according to different devices, and commonly used specified tasks can include but are not limited to: the method comprises the steps of man-machine conversation, telephone dialing, short message sending, short message reading, entertainment content playing switching, message playing switching on the Internet, map navigation, task switching, intelligent control and the like. The equipment executes the specified tasks according to the requirements of the user so as to generate interactive behaviors for the user.
In one implementation scenario, the method is applied to an intelligent earphone, when a user speaks, the oral cavity of the user acts, and the intelligent earphone obtains a bone vibration signal. The intelligent earphone carries out voice activity detection on the bone vibration signal to obtain a detection result, the existence of the voice signal in the bone vibration signal is determined according to the detection result, a microphone of the intelligent earphone collects sound in the environment to obtain a sound signal, and the microphone can be preset on the equipment. Through carrying out voice recognition analysis on the sound signals, the fact that the user intention contained in the sound signals is 'playing songs' is obtained, and the intelligent earphone starts playing the songs according to the user intention. It is supplementary to discard the sound signal when the obtained user intention cannot correspond to the specified task by performing the voice recognition analysis on the sound signal.
The awakening method provided by the embodiment of the invention is applied to equipment, and in a standby state of the equipment, the equipment only needs to start the module corresponding to the bone vibration signal without normally opening a microphone for sound collection, and other algorithm modules except the module corresponding to the bone vibration signal can be in a closed state, so that the equipment is in a low power consumption state. The method has high real-time performance, and can reach 10 ms-level time delay, while the time delay of the existing awakening mode reaches 24 ms-300 ms, and the method can solve the problem of high power consumption of the awakening algorithm due to the fact that the time delay is always on. And because the method has high real-time performance and low time delay, the sound signal can be quickly obtained, and the loss of the previous frame signal of the sound signal due to overhigh time delay is avoided.
Fig. 2 is a schematic flow chart illustrating an implementation process of obtaining a bone vibration signal in a voice wake-up method according to an embodiment of the present invention.
Referring to fig. 2, in an embodiment of the present invention, step 101, obtaining a bone vibration signal includes: step 1011, obtaining a detection signal; step 1012, determining whether the detection signal meets the bone vibration condition; and step 1013, when the detection signal is determined to meet the bone vibration condition, determining the detection signal as a bone vibration signal.
Specifically, the method includes obtaining a detection signal, where the detection signal may be collected by a detection device, and the detection device may be a bone vibration sensor, a multi-axis sensor, or other devices with detection functions. The detection device can be preset on the equipment or can be in communication connection with the equipment, and the equipment obtains a detection signal through signal transmission. The bone vibration signal is used for judging whether the bone of the user has motion or not, and the detection device can be arranged at a position where the bone vibration can be detected when speaking. Including but not limited to in the ear canal of the user, behind the user's ear, in the user's chin, in the user's throat, etc. If when equipment is intelligent in-ear earphone, can arrange bone vibration sensor at intelligent earphone casing front end, make bone vibration sensor stretch into user's the duct in, gather the detection signal that produces when the jaw of inner ear department vibrates. When equipment is intelligent hangers formula earphone, can arrange bone vibration sensor in the hangers department of intelligent earphone, make bone vibration sensor fix the user's behind the ear, gather the detection signal that produces when the jaw vibrates behind the ear. The detection device adopts a continuous or discontinuous acquisition mode to acquire detection signals. The detection signal is preferably acquired in a continuous acquisition manner.
After obtaining the detection signal, the method further includes determining whether the detection signal satisfies a bone vibration condition. The bone vibration condition is referred to as a bone vibration condition which generates vibration bones when a user speaks, and the determination mode of whether the bone vibration condition is met can be realized by comparing the amplitude of the detection signal with a preset value.
It should be understood that when the user does not speak, slight bone vibration may be generated, such as bone vibration generated by mouth breathing, tooth grinding, swallowing, and the like, and when the bone vibration condition is set, the amplitude of the bone vibration signal during speaking in a general situation may be used as a preset value, so that bone vibration which is absolutely impossible to generate a sound signal is eliminated. The specific value of the preset value needs to be set according to statistics and requirements, and is not specifically limited herein. It should be understood that there are also situations such as sneezing, hiccups, etc. that are closer to the bone vibrating during speaking, and therefore all non-user speaking situations cannot be excluded by the preset values. Judging whether the amplitude of the detection signal is greater than a preset value to judge whether the detection signal meets bone vibration conditions, and when the amplitude of the detection signal is greater than the preset value, the detection signal can be considered to meet the bone vibration conditions; when the amplitude of the detection signal is smaller than the preset value, the detection signal is considered not to satisfy the bone vibration condition, and the detection signal can be discarded at this time. It should be added that the preset value is set to be smaller in practical application based on that bone vibration signals are different from person to person, so as to reduce the strictness of requirements on bone vibration conditions. When it is determined that the detection signal satisfies the bone vibration condition, the detection signal may be determined to be a bone vibration signal. I.e. it means that the user has a similar skeletal motion as when speaking, at which point the following operations, such as step 102, can be performed.
Fig. 3 is a schematic diagram illustrating an implementation flow of echo cancellation processing in a voice wake-up method according to an embodiment of the present invention.
Referring to fig. 3, in an embodiment of the present invention, the method further comprises: step 301, judging whether the designated object is executing the playing task, and obtaining a judgment result; step 302, when the judgment result is that the designated object is executing the playing task, performing echo cancellation processing on the detection signal to obtain a processing signal; the processed signal is used to determine whether a bone vibration condition is satisfied.
When equipment is in the user state, especially when equipment plays the task, the sound of equipment broadcast causes the condition that equipment mistake awakened up easily, for avoiding the mistake awakening up, this application still need carry out echo cancellation to detected signal, gets rid of the influence that causes because the environmental sound among the detected signal.
Specifically, the method includes judging whether the designated object is executing the playing task, and obtaining a judgment result. The designated object is specifically an object which is easy to cause misjudgment of the detection signal, and the designated object may be an intelligent device or other articles, such as an earphone, a mobile phone and other intelligent terminals. The specified object may be the device itself to which the method is applied or may be a device other than itself. When the designated object is not the device which refers to the method, a query can be sent to the designated device through signal transmission to determine whether the designated device carries out the playing task, so that information interaction with the designated device is realized, and whether the designated object is executing the playing task is judged. The playback task herein refers to a task that may cause misjudgment of the detection signal, such as a task involving sound playback.
The judgment result is used for evaluating whether the specified object is executing the playing task. When the judgment result is that the designated object is executing the play task, the detection signal is processed by Echo Cancellation (AEC) to obtain a processing signal. The echo cancellation process may be implemented by passing the detected signal through an echo canceller process. After echo cancellation processing, the detection signal can eliminate bone vibration signal misjudgment caused by a playing task. The processed signal may then be used to determine whether the bone vibration condition is satisfied. It should be noted that, in step 302 of the method, when the determination result is that the designated object is executing the play task, the echo cancellation processing is performed on the detection signal, and the obtaining of the processing signal needs to be performed after the detection signal is obtained, but in step 301 of the method, it is determined whether the designated object is executing the play task, and the determination result is obtained, which may be performed before the detection signal is obtained or after the detection signal is obtained. When the judgment result is that the designated object does not execute the playing task, the detection signal obtained at this time can be directly used for determining the bone vibration condition.
Fig. 4 is a schematic diagram illustrating an implementation flow of determining a wake-up condition in a voice wake-up method according to an embodiment of the present invention.
In an embodiment of the present invention, in step 104, when it is determined that the speech signal exists in the bone vibration signal, after the sound signal is collected, the method further includes: step 401, judging whether the sound signal meets a wake-up condition; and 402, when the sound signal is judged to meet the awakening condition, executing a specified task corresponding to the sound signal.
Since the user does not necessarily need to communicate with the device during the speaking process, the collected voice signal is not necessarily used to indicate the intention of the user to give an instruction to the device after the voice signal is determined. Based on this, it is also necessary to perform a wake-up judgment on the sound signal to determine whether the sound signal satisfies the wake-up condition. The wake up determination herein includes, but is not limited to, a determination by a wake up word, a determination by a voiceprint, a determination by a combination of a wake up word and a voiceprint, or by other means. If the voice signal is judged to contain the awakening words, the voice signal is judged to meet the awakening condition. When the awakening mode is voiceprint judgment, analyzing the sound signal through a voiceprint algorithm to obtain a voiceprint result, judging whether the voiceprint result corresponding to the sound signal is the same as a preset voiceprint or not, and when the voiceprint result corresponding to the sound signal is judged to be the same as the preset voiceprint, determining that the sound signal meets the awakening condition. The preset voiceprint can be recorded by the user in advance, the voiceprint of the user is obtained in advance through equipment processing, and the voiceprint of the user is determined to be the preset voiceprint. It should be noted that, because the voiceprint algorithm needs enough computing resources, the device to which the method is applied may be in communication connection with a third-party device, and the voiceprint analysis is performed by sending the voice signal of the user to the third-party device, and the voiceprint analysis result is sent to the device to which the method is applied by the third-party device for judgment, so that whether the voice signal meets the requirement of the wake-up condition or not may also be determined. The communication connection mode can be a wired connection mode, a wireless connection mode, a Bluetooth connection mode or other communication connection modes.
In another specific implementation scenario, the method is applied to an intelligent headset, the intelligent headset is in communication connection with a mobile phone, and the audio of the mobile phone is played through the intelligent headset. Firstly, the intelligent earphone judges whether the earphone is executing a playing task, and a judgment result is obtained. The judgment result can also be obtained by sending an inquiry signal to the mobile phone through the intelligent earphone to inquire whether the intelligent earphone carries out a playing task or not, receiving a reply signal from the mobile phone and analyzing the reply signal. The judgment result here is that the smart headset is playing music. The detection signal from the detection device is processed by echo cancellation through an echo canceller to obtain a processed signal, then the amplitude of the processed signal is compared with a preset value, when a user speaks, the oral cavity of the user acts, the amplitude of the detection signal meets the preset value, namely the detection signal meets the bone vibration condition, and the detection signal is determined to be the bone vibration signal. After the bone vibration signal is obtained, the intelligent earphone carries out voice activity detection on the bone vibration signal to obtain a detection result, the bone vibration signal is determined to be a voice signal according to the detection result, and a microphone of the intelligent earphone collects sound in the environment to obtain a sound signal. The voice recognition analysis is carried out on the sound signals, the awakening word 'XX' which is the same as the preset awakening word is obtained, the earphone is awakened, the sound signals are analyzed, the user intention contained in the sound signals is obtained and is 'playing songs', and the intelligent earphone starts playing the songs according to the user intention. It is supplementary to discard the sound signal when the obtained user intention cannot correspond to the specified task by performing the voice recognition analysis on the sound signal.
In the embodiment of the present invention, step 102, performing voice activity detection on the bone vibration signal, including performing voice activity detection on the bone vibration signal through a model; wherein the model is obtained by training, and the data used for training the model comprises noise and human voice.
In the method, in order to improve the robustness of the system, the voice activity detection is carried out on bone vibration signals through the model, the training data of the model can be used for recording and acquiring according to the use scenes of product scenes besides the voice data, for example, in the scenes such as stairs, riding, restaurants, crossroads, markets, vehicles, subways, bars, offices and the like, the noise of the scenes and the voice of China in the scenes are acquired, the noise data acquired in common general scenes are added into training, and the robustness of each scene is improved. The model here is a speech activity detection model, such as a DNN algorithm model.
Fig. 5 is a block diagram of a voice wake-up device according to an embodiment of the present invention.
Referring to fig. 5, another aspect of the present invention provides a wake-up apparatus, where the apparatus includes: an obtaining module 501, configured to obtain a bone vibration signal; a detection module 502, configured to perform voice activity detection on the bone vibration signal to obtain a detection result; a determining module 503, configured to determine whether a voice signal exists in the bone vibration signal based on the detection result; an acquisition module 504, configured to acquire a sound signal when it is determined that a voice signal exists in the bone vibration signal; and the execution module 505 is used for executing a specified task corresponding to the sound signal so as to generate an interactive behavior for the user.
In this embodiment of the present invention, the obtaining module 501 includes: an obtaining sub-module 5011 for obtaining a detection signal; a first determination submodule 5012 for determining whether the detection signal satisfies a bone vibration condition; the second determining sub-module 5013 determines the detection signal as a bone vibration signal when it is determined that the detection signal satisfies the bone vibration condition.
In an embodiment of the present invention, the apparatus further includes: a judging module 506, configured to judge whether the designated object is executing the play task, and obtain a judgment result; the echo cancellation module 507 is configured to perform echo cancellation processing on the detection signal to obtain a processed signal when the determination result is that the designated object is executing the play task; the processed signal is used to determine whether a bone vibration condition is satisfied.
In an embodiment of the present invention, the apparatus further includes: the determining module 506 is further configured to determine whether the sound signal satisfies a wake-up condition; and the executing module 505 is configured to execute a specified task corresponding to the sound signal when it is determined that the sound signal satisfies the wake-up condition.
In this embodiment of the present invention, the detecting module 502 includes: carrying out voice activity detection on the bone vibration signal through the model; wherein the model is obtained by training, and the data used for training the model comprises noise and human voice.
Fig. 6 is a block diagram of another voice wake-up apparatus according to an embodiment of the present invention.
Referring to fig. 6, another wake-up device according to an embodiment of the present invention is provided, wherein when it is determined that the player of the device does not play audio, the detection signal collected by the bone vibration sensor is subjected to bone vibration condition determination and voice activity detection. The other modules are in an off state, which is a low power consumption state. When the detection signal meets the bone vibration condition and the voice activity detection judges that a voice signal exists, the air conduction microphone collects the voice signal, the voice signal outputs a 16k 16bit mono audio through a beamforming algorithm, the audio is input into a wake-up module, whether the signal is wakened up or not is detected, and if the judgment is successful, the wake-up signal is output.
When the device player is judged to play audio, the detection signal acquired by the bone vibration sensor is processed by echo cancellation (AEC), and then is judged by bone vibration conditions and detected by voice activity. The other modules are in an off state, which is a low power consumption state. When the processing signal after echo cancellation processing meets the bone vibration condition and voice activity detection judges that a voice signal exists, a voice signal is collected through an air conduction microphone, the voice signal outputs and outputs a 16k 16bit mono audio through a beamforming algorithm, the audio is input into a wake-up module, whether a wake-up signal exists is detected, and if the judgment is successful, the wake-up signal is output.
In another aspect, embodiments of the present invention provide a computer-readable storage medium, which includes a set of computer-executable instructions, and when executed, is configured to perform any one of the above-mentioned wake-up methods.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A voice wake-up method, the method comprising:
obtaining a bone vibration signal;
carrying out voice activity detection on the bone vibration signal to obtain a detection result;
determining whether a voice signal is present in the bone vibration signal based on the detection result;
collecting a sound signal when the bone vibration signal is determined to have a voice signal;
and executing a specified task corresponding to the sound signal to generate an interactive behavior for the user.
2. The method of claim 1, wherein obtaining a bone vibration signal comprises:
obtaining a detection signal;
determining whether the detection signal satisfies a bone vibration condition;
and when the detection signal is determined to meet the bone vibration condition, determining the detection signal as a bone vibration signal.
3. The method of claim 2, further comprising:
judging whether the designated object is executing the playing task or not, and obtaining a judgment result;
when the judgment result is that the designated object is executing the playing task, performing echo cancellation processing on the detection signal to obtain a processing signal;
the processed signal is used to determine whether a bone vibration condition is satisfied.
4. The method of claim 1, wherein after acquiring the sound signal when it is determined that the speech signal is present in the bone vibration signal, the method further comprises:
judging whether the sound signal meets a wake-up condition or not;
and when the sound signal is judged to meet the awakening condition, executing a specified task corresponding to the sound signal.
5. The method of claim 1, wherein performing voice activity detection on the bone vibration signal comprises:
performing voice activity detection on the bone vibration signal through a model; wherein the model is obtained by training, and the data used for training the model comprises noise and human voice.
6. A voice wake-up device, characterized in that the device comprises:
an obtaining module for obtaining a bone vibration signal;
the detection module is used for carrying out voice activity detection on the bone vibration signal to obtain a detection result;
a determination module for determining whether a voice signal exists in the bone vibration signal based on the detection result;
the acquisition module is used for acquiring a sound signal when the bone vibration signal is determined to have a voice signal;
and the execution module is used for executing a specified task corresponding to the sound signal so as to generate an interactive behavior for a user.
7. The apparatus of claim 6, wherein the obtaining module comprises:
an obtaining submodule for obtaining a detection signal;
a first determination submodule for determining whether the detection signal satisfies a bone vibration condition;
a second determining submodule, configured to determine that the detection signal is a bone vibration signal when it is determined that the detection signal satisfies a bone vibration condition.
8. The apparatus of claim 7, further comprising:
the judging module is used for judging whether the specified object executes the playing task or not and obtaining a judging result;
the echo cancellation module is used for performing echo cancellation processing on the detection signal to obtain a processing signal when the judgment result indicates that the designated object is executing a playing task; the processed signal is used to determine whether a bone vibration condition is satisfied.
9. The apparatus of claim 6, further comprising:
the judging module is also used for judging whether the sound signal meets the awakening condition;
and the execution module is used for executing a specified task corresponding to the sound signal when the sound signal is judged to meet the awakening condition.
10. A computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the voice wake-up method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911300422.XA CN111028838A (en) | 2019-12-17 | 2019-12-17 | Voice wake-up method, device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911300422.XA CN111028838A (en) | 2019-12-17 | 2019-12-17 | Voice wake-up method, device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111028838A true CN111028838A (en) | 2020-04-17 |
Family
ID=70209932
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911300422.XA Pending CN111028838A (en) | 2019-12-17 | 2019-12-17 | Voice wake-up method, device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028838A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951243A (en) * | 2021-02-07 | 2021-06-11 | 深圳市汇顶科技股份有限公司 | Voice awakening method, device, chip, electronic equipment and storage medium |
CN113593561A (en) * | 2021-07-29 | 2021-11-02 | 普强时代(珠海横琴)信息技术有限公司 | Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism |
CN114143651A (en) * | 2021-11-26 | 2022-03-04 | 思必驰科技股份有限公司 | Voice wake-up method and device for bone conduction headset |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108322859A (en) * | 2018-02-05 | 2018-07-24 | 北京百度网讯科技有限公司 | Equipment, method and computer readable storage medium for echo cancellor |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN108665895A (en) * | 2018-05-03 | 2018-10-16 | 百度在线网络技术(北京)有限公司 | Methods, devices and systems for handling information |
CN109195042A (en) * | 2018-07-16 | 2019-01-11 | 恒玄科技(上海)有限公司 | The high-efficient noise-reducing earphone and noise reduction system of low-power consumption |
CN109346075A (en) * | 2018-10-15 | 2019-02-15 | 华为技术有限公司 | Identify user speech with the method and system of controlling electronic devices by human body vibration |
CN109412544A (en) * | 2018-12-20 | 2019-03-01 | 歌尔科技有限公司 | A kind of voice acquisition method of intelligent wearable device, device and associated component |
-
2019
- 2019-12-17 CN CN201911300422.XA patent/CN111028838A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108322859A (en) * | 2018-02-05 | 2018-07-24 | 北京百度网讯科技有限公司 | Equipment, method and computer readable storage medium for echo cancellor |
CN108538305A (en) * | 2018-04-20 | 2018-09-14 | 百度在线网络技术(北京)有限公司 | Audio recognition method, device, equipment and computer readable storage medium |
CN108665895A (en) * | 2018-05-03 | 2018-10-16 | 百度在线网络技术(北京)有限公司 | Methods, devices and systems for handling information |
CN109195042A (en) * | 2018-07-16 | 2019-01-11 | 恒玄科技(上海)有限公司 | The high-efficient noise-reducing earphone and noise reduction system of low-power consumption |
CN109346075A (en) * | 2018-10-15 | 2019-02-15 | 华为技术有限公司 | Identify user speech with the method and system of controlling electronic devices by human body vibration |
CN109412544A (en) * | 2018-12-20 | 2019-03-01 | 歌尔科技有限公司 | A kind of voice acquisition method of intelligent wearable device, device and associated component |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951243A (en) * | 2021-02-07 | 2021-06-11 | 深圳市汇顶科技股份有限公司 | Voice awakening method, device, chip, electronic equipment and storage medium |
CN113593561A (en) * | 2021-07-29 | 2021-11-02 | 普强时代(珠海横琴)信息技术有限公司 | Ultra-low power consumption awakening method and device based on multi-stage trigger mechanism |
CN114143651A (en) * | 2021-11-26 | 2022-03-04 | 思必驰科技股份有限公司 | Voice wake-up method and device for bone conduction headset |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3547712B1 (en) | Method for processing signals, terminal device, and non-transitory readable storage medium | |
WO2020228095A1 (en) | Real-time voice wake-up audio device, operation method and apparatus, and storage medium | |
CN106782591B (en) | Device and method for improving speech recognition rate under background noise | |
CN107799126A (en) | Sound end detecting method and device based on Supervised machine learning | |
CN109256146B (en) | Audio detection method, device and storage medium | |
CN111028838A (en) | Voice wake-up method, device and computer readable storage medium | |
WO2018095035A1 (en) | Earphone and speech recognition method therefor | |
WO2011100890A1 (en) | Reminding method of environmental sound and mobile terminal thereof | |
WO2020207376A1 (en) | Denoising method and electronic device | |
US11348584B2 (en) | Method for voice recognition via earphone and earphone | |
US20220303688A1 (en) | Activity Detection On Devices With Multi-Modal Sensing | |
US11533574B2 (en) | Wear detection | |
CN107863110A (en) | Safety prompt function method, intelligent earphone and storage medium based on intelligent earphone | |
CN114360527A (en) | Vehicle-mounted voice interaction method, device, equipment and storage medium | |
CN104092809A (en) | Communication sound recording method and recorded communication sound playing method and device | |
CN111491236A (en) | Active noise reduction earphone, awakening method and device thereof and readable storage medium | |
CN113194383A (en) | Sound playing method and device, electronic equipment and readable storage medium | |
CN112261229A (en) | Bone conduction call equipment testing method, device and system | |
CN110992953A (en) | Voice data processing method, device, system and storage medium | |
WO2015131634A1 (en) | Audio noise reduction method and terminal | |
KR20220015427A (en) | detection of voice | |
WO2020118560A1 (en) | Recording method and apparatus, electronic device and computer readable storage medium | |
CN113259826B (en) | Method and device for realizing hearing aid in electronic terminal | |
CN113808566B (en) | Vibration noise processing method and device, electronic equipment and storage medium | |
CN113517000A (en) | Echo cancellation test method, terminal and storage device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Building 14, Tengfei science and Technology Park, 388 Xinping street, Suzhou Industrial Park, Suzhou area, China (Jiangsu) pilot Free Trade Zone, Suzhou, Jiangsu 215000 Applicant after: Sipic Technology Co.,Ltd. Address before: 215024 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Jiangsu Province Applicant before: AI SPEECH Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200417 |