WO2020244257A1 - Method and system for voice wake-up, electronic device, and computer-readable storage medium - Google Patents
Method and system for voice wake-up, electronic device, and computer-readable storage medium Download PDFInfo
- Publication number
- WO2020244257A1 WO2020244257A1 PCT/CN2020/076473 CN2020076473W WO2020244257A1 WO 2020244257 A1 WO2020244257 A1 WO 2020244257A1 CN 2020076473 W CN2020076473 W CN 2020076473W WO 2020244257 A1 WO2020244257 A1 WO 2020244257A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- electronic device
- signal
- preset
- voice signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 117
- 230000008569 process Effects 0.000 claims abstract description 67
- 230000005484 gravity Effects 0.000 claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 24
- 230000000694 effects Effects 0.000 claims abstract description 24
- 230000004913 activation Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims 1
- 210000000988 bone and bone Anatomy 0.000 abstract description 7
- 230000003213 activating effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 230000001960 triggered effect Effects 0.000 description 10
- 230000005236 sound signal Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 4
- 238000005265 energy consumption Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010002953 Aphonia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 210000004373 mandible Anatomy 0.000 description 1
- 210000002050 maxilla Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Definitions
- the invention of this application belongs to the technical field of voice processing, and in particular relates to a voice wake-up method, system, electronic device, and computer-readable storage medium.
- the traditional language wake-up solution uses voice activity detection (Voice Activity Detection, VAD) to obtain the audio signal collected by the microphone, and calculate the voice energy according to the audio signal.
- VAD Voice Activity Detection
- the processor is triggered to start keyword recognition , To determine whether the above audio signal is a language instruction issued by the user.
- This voice wake-up solution does not consider whether the audio signal collected by the microphone is caused by the wearer's speech, which leads to the situation of false wake-up of the device, that is, when people around unintentionally say a keyword, it will also trigger the wake-up of the device, and In a noisy environment, VAD will continue to trigger and cause the digital signal processor to perform keyword recognition, which will result in a large power loss.
- the existing language wake-up solution generally first determines whether the audio signal collected by the microphone is caused by the wearer's speech before triggering the processor to perform keyword recognition.
- a special bone conduction microphone or other contact microphones are generally used to extract audio signals to determine whether the audio signals are caused by the wearer’s speech, because bone conduction microphones and other contact microphones are expensive. , Resulting in high overall equipment costs.
- the prior art also uses software algorithms to determine whether the microphone audio signal is caused by the wearer's speech, but the determination algorithm is generally more complicated, which causes the determination itself to consume more resources.
- the embodiments of the present invention provide a voice wake-up method, system, electronic device, and computer-readable storage medium to solve the above-mentioned voice wake-up solution that may cause false wake-up of the device, large power loss, and equipment cost bias. High and the judgment algorithm is more complicated, leading to the problem that the judgment itself consumes more resources.
- the first aspect of the embodiments of the present invention provides a voice wake-up method, which is applied to an electronic device.
- the electronic device includes a processor, a gravity sensor, and a microphone.
- the gravity sensor and the microphone are electrically connected to the processor.
- the voice wake-up method includes using the processor to execute the following steps:
- the voice signal is generated by the user's speech, a keyword recognition process is initiated, and if it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
- a second aspect of the embodiments of the present invention provides a voice wake-up system, which is applied to an electronic device, the electronic device includes a processor, a gravity sensor, and a microphone, and the gravity sensor and the microphone are respectively electrically connected to the processor,
- the voice wake-up system includes:
- a voice activity detection unit configured to trigger the processor to read the data signal collected by the gravity sensor if the voice activity detection process detects that the voice signal collected by the microphone meets the first preset condition
- the first determining unit is configured to determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device;
- the execution unit is used to ignore the voice signal if the voice signal is not generated by the user's speech; or, if the voice signal is generated by the user's speech, start a keyword recognition process If it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
- the third aspect of the embodiments of the present invention provides an electronic device, including a gravity sensor, a microphone, a memory, a processor, and a computer program stored in the memory and running on the processor.
- the gravity sensor, Both the microphone and the memory are electrically connected to the processor, and the processor implements the steps of the voice wake-up method according to any one of the embodiments of the first aspect when the processor executes the computer program.
- the fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement any of the above-mentioned embodiments of the first aspect.
- the voice activity detection process monitors that the voice signal collected by the microphone meets the first preset condition, it is further determined whether the voice signal is When a user wearing an electronic device speaks, the keyword recognition process is started only when it is recognized that the above-mentioned voice signal is generated by the user's speech, thereby saving the energy consumption of the electronic device and avoiding false wake-up of the electronic device; in addition, Because the data signal collected by the gravity sensor of the electronic device is used to determine whether the voice signal is generated by the user wearing the electronic device, there is no need to use a special bone conduction microphone or other contact microphone, and the cost is low, and The algorithm is simple and practical, with high accuracy and low resource consumption.
- Figure 1 is a structural block diagram of an electronic device provided by an embodiment of the present invention.
- FIG. 2 is a schematic diagram of a specific implementation process of a voice wake-up method according to Embodiment 1 of the present invention
- FIG. 3 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the second embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a voice wake-up system provided by Embodiment 3 of the present invention.
- Fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. .
- Fig. 1 is a structural block diagram of an electronic device provided by an embodiment of the present invention. For convenience of description, only the parts related to this embodiment are shown.
- an electronic device 100 provided by an embodiment of the present invention includes a processor 103, a gravity sensor 101, and a microphone 102, and the gravity sensor 101 and the microphone 102 are electrically connected to the processor 103, respectively.
- the electronic device 100 includes, but is not limited to, smart wearable devices such as earphones.
- the microphone 102 is a common, low-cost type microphone 102 built in the electronic device 100.
- the gravity sensor 101 is a sensor of the electronic device 100 that is used to determine the wearing state and realize the double-click function.
- FIG. 2 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the first embodiment of the present invention. The method is applied to the electronic device 100 shown in FIG. 1, and its execution body is the processor 103 in the electronic device 100 shown in FIG. 1. As shown in FIG. 2, the voice wake-up method provided in this embodiment may include the following steps:
- step S201 if the voice activity detection process detects that the voice signal collected by the microphone 102 meets the first preset condition, the processor 103 is triggered to read the data signal collected by the gravity sensor 101.
- the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold.
- the voice activity detection process and the microphone 102 remain on when the electronic device 100 is in the standby state.
- the voice signal is transferred to the voice activity detection process, and the voice activity detection process The process detects the voice energy of the voice signal, and when the voice energy is greater than the second preset energy threshold, triggers the processor 103 to read the data signal collected by the gravity sensor 101.
- the processor 103 since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 only when the voice energy of the detected voice signal is greater than the second preset energy threshold, it can avoid the noisy Under the environment, the processor 103 is repeatedly triggered to perform a process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
- Step S202 Determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device 100; if the voice signal is not generated by the speech of the user, go to step S203; if the The voice signal is generated by the user's speech, then jump to step S204.
- the electronic device 100 is worn on the head of the user.
- the vibration frequency and amplitude of the maxilla and mandible of the head are different, resulting in different data signals collected by the gravity sensor 101. Therefore, by analyzing the data signal collected by the gravity sensor 101, it can be determined whether the voice signal is generated by the speech of the user wearing the electronic device 100.
- the judging whether the voice signal is generated by the user wearing the electronic device 100 speaking according to the data signal includes:
- the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device 100 speaking;
- the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
- the judging whether the voice signal is generated by the user wearing the electronic device 100 according to the data signal includes:
- the voice signal is generated by the speech of the user wearing the electronic device 100;
- the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
- the first preset energy threshold in the above two specific implementations is obtained through a large amount of training and learning in advance, and is used to distinguish whether the voice signal is an energy threshold generated by the speech of the user wearing the electronic device.
- the vibration frequency and amplitude of the upper and lower jaw bones of the user's head are relatively large, and the signal energy of the data signal collected by the gravity sensor 101 is too large, so the signal energy of the data signal is greater than
- the first preset energy threshold it means that the voice signal is generated by the user wearing the electronic device speaking; conversely, if the signal energy is less than or equal to the first preset energy threshold, it means that The vibration frequency and amplitude of the upper jaw and the lower jaw of the user's head are relatively small, which indicates that the voice signal is not generated by the speech of the user wearing the electronic device.
- the preset frequency range is the frequency range of the voice signal when a person speaks. Specifically, the preset frequency range is 300 ⁇ 3000 Hz. Since the frequency range of the voice signal when a person speaks is different from the frequency range of the environmental noise, in this embodiment, only statistics within the preset frequency range, that is, the voice signal when the person speaks The signal energy in the frequency band can filter out the influence of other noise energy on the judgment result, making the judgment result more accurate.
- Step S203 Ignore the voice signal.
- the voice signal is not caused by the speech of the user wearing the electronic device 100, it means that the voice signal is a voice signal generated by the noise of the surrounding environment or the speech of other people, and it is not caused by wearing the electronic device 100. Therefore, the voice signal is ignored and no keyword recognition is performed on the voice signal, which can save the power consumption of the electronic device 100.
- step S204 a keyword recognition process is started, and if it is recognized that the voice signal contains a preset voice command keyword, the electronic device 100 is controlled to perform a corresponding function.
- the voice signal is caused by the speech of the user wearing the electronic device 100, it means that the voice signal may be a voice control instruction input by the user. Therefore, the keyword recognition process is further initiated to identify all Whether the voice signal contains preset voice command keywords, if it contains the preset voice command keywords, the electronic device 100 is controlled to perform the corresponding voice control function; otherwise, if the preset voice command keywords are not included, then It indicates that the voice signal is generated by the user's speech, but is not a voice control command, so the voice signal is ignored and the electronic device 100 is not awakened.
- the voice wake-up method when the voice activity detection process monitors that the voice signal collected by the microphone 102 meets the first preset condition, further determines whether the voice signal is from the user wearing the electronic device 100
- the keyword recognition process is started, which can save the energy consumption of the electronic device 100, and can prevent the electronic device 100 from waking up by mistake;
- the data signal collected by the gravity sensor 101 of the electronic device 100 is used to determine whether the voice signal is generated by the speech of the user wearing the electronic device 100, so there is no need to use a special bone conduction microphone or other contact microphones, and the cost is low, and
- the algorithm is simple and practical, with high accuracy and low resource consumption.
- FIG. 3 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the second embodiment of the present invention. The method is applied to the electronic device 100 shown in FIG. 1, and its execution body is the processor 103 in the electronic device 100 shown in FIG. 1.
- the voice wake-up method provided in this embodiment may include the following steps:
- Step S301 Determine whether the voice activity detection process monitors whether the voice signal collected by the microphone 102 meets the first preset condition, and if it meets the first preset condition, enter step S302-1 and step S302-2 at the same time.
- the specific implementation of this step is the same as the implementation of Embodiment 1, and will not be repeated here.
- Step S302-1 Determine whether to start the keyword recognition process in advance according to the degree of word loss allowed by the keyword recognition process and the activation speed of the microphone 102; if the degree of word loss allowed by the keyword recognition process and the start speed of the microphone 102 If the second preset condition is met, step S303-1 is entered.
- Step S303-1 Start the keyword recognition process, and recognize whether the voice signal contains preset voice command keywords.
- the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold.
- the voice activity detection process and the microphone 102 remain on when the electronic device 100 is in the standby state.
- the microphone 102 collects a voice signal
- the voice signal is transferred to the voice activity detection process, and the voice activity detection process
- the process detects the voice energy of the voice signal, and when the voice energy is greater than the second preset energy threshold, triggers the processor 103 to read the data signal collected by the gravity sensor 101.
- the second preset energy threshold is an energy threshold set in advance in order to prevent the processor from being repeatedly triggered to determine whether the voice signal is spoken by the user wearing the electronic device. Since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 when the speech energy of the speech signal is greater than the second preset energy threshold, it can avoid the processor 103 in a noisy environment. 103 is repeatedly triggered to perform the process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
- the second preset condition is that the degree of word loss allowed by the keyword recognition process is less than a preset threshold of word loss and the activation speed of the microphone 102 is less than the preset activation speed threshold.
- step S303-1 when the degree of word loss allowed by the keyword recognition process is less than the preset threshold of word loss and the activation speed of the microphone 102 is less than the preset activation speed threshold, the process proceeds to step S303-1 to enable the key in advance.
- Word recognition process which can avoid the slow start of microphone 102. If the keyword recognition process is not started in advance, too many words will be lost, that is, the voice signal sent by the user cannot be collected in time, and the complete voice control cannot be recognized.
- the keyword recognition process is not started in advance.
- the wake-up process in this case is the same as the voice wake-up process provided in the first embodiment, so it is not repeated here.
- Step S302-2 trigger the processor 103 to read the data signal collected by the gravity sensor 101;
- Step S303-2 Determine whether the voice signal is generated by the user wearing the electronic device 100 speaking according to the data signal. It should be noted that the implementation manners of step S302-2 and step S303-2 are the same as the implementation manners of the corresponding steps in Embodiment 1, so they will not be repeated here.
- Step S304 If the voice signal includes preset voice command keywords and the voice signal is generated by the user's speech, the electronic device 100 is controlled to perform a corresponding function.
- Step S305 If the voice signal does not include the preset voice command keywords or the voice signal is not generated by the user's speech, the voice signal is ignored.
- the electronic device 100 only when the voice signal satisfies the preset voice command keywords and is generated by the user wearing the electronic device 100, the electronic device 100 is controlled to perform the corresponding voice control function.
- the voice signal does not meet any of the above two conditions, the voice signal is ignored, so that false wake-up of the electronic device 100 can be avoided.
- the voice wake-up method can also avoid false wake-up of the electronic device 100, and the data signal collected by the gravity sensor 101 of the electronic device 100 is used to determine whether the voice signal is caused by wearing the electronic device.
- 100 users speak, so there is no need to use a special bone conduction microphone or other contact microphones, the cost is low, and the algorithm is simple and practical, the accuracy rate is high, and the resource consumption is small; in addition, compared to the previous embodiment, this embodiment
- the provided voice wake-up method starts the keyword recognition process in advance when the allowable word loss degree of the keyword recognition process is less than the preset word loss degree threshold and the activation speed of the microphone 102 is less than the preset activation speed threshold, which can avoid Because the microphone 102 is too slow to start, if the keyword recognition process is not started in advance, too many words are lost and the complete voice control command cannot be recognized.
- FIG. 4 is a schematic structural diagram of a voice wake-up system provided by Embodiment 3 of the present invention.
- the system is applied to the electronic device 100 described in FIG. 1 and runs in the processor 103 of the electronic device 100 described in FIG. 1. For convenience of description, only the parts related to this embodiment are shown.
- the voice wake-up system 4 provided in this embodiment includes:
- the voice activity detection unit 41 is configured to trigger the processor 103 to read the data signal collected by the gravity sensor 101 if the voice activity detection process detects that the voice signal collected by the microphone 102 meets the first preset condition ; Wherein, the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold.
- the first determining unit 42 is configured to determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device 100;
- the execution unit 43 is configured to ignore the voice signal if the voice signal is not generated by the user's speech; or, if the voice signal is generated by the user's speech, start keyword recognition During the process, if it is recognized that the voice signal contains a preset voice command keyword, the electronic device 100 is controlled to perform a corresponding function.
- the first determining unit 42 is specifically configured to:
- the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device 100 speaking;
- the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device 100;
- the first preset energy threshold value is obtained through a large amount of training and learning in advance, and is used to distinguish whether the voice signal is an energy threshold value generated by the speech of the user wearing the electronic device.
- the vibration frequency and amplitude of the upper and lower jaw bones of the user's head are relatively large, and the signal energy of the data signal collected by the gravity sensor 101 is too large, so the signal energy of the data signal is greater than
- the first preset energy threshold it means that the voice signal is generated by the user wearing the electronic device speaking; conversely, if the signal energy is less than or equal to the first preset energy threshold, it means that The vibration frequency and amplitude of the upper jaw and the lower jaw of the user's head are relatively small, which indicates that the voice signal is not generated by the speech of the user wearing the electronic device.
- the first determining unit 42 is specifically configured to:
- the voice signal is generated by the speech of the user wearing the electronic device 100;
- the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
- the voice wake-up system further includes:
- the second determining unit 44 is configured to determine whether to start the keyword recognition process in advance according to the degree of word loss allowed by the keyword recognition process and the activation speed of the microphone 102;
- the execution unit 43 is further configured to:
- the keyword recognition process is started in advance. At this time, the keyword recognition process and the detection of whether the voice signal is worn by the wearer The speech generation process of the user of the electronic device 100 is performed synchronously;
- the voice signal contains preset voice command keywords and the voice signal is generated by the user's speech, control the electronic device 100 to perform the corresponding function; or,
- the voice signal does not include preset voice command keywords or the voice signal is not generated by the user's speech, then the voice signal is ignored.
- the second preset condition is that the degree of word loss allowed by the keyword recognition process is less than a preset degree of word loss threshold and the activation speed of the microphone 102 is less than a preset activation speed threshold.
- the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold.
- the second preset energy threshold is an energy threshold preset in order to prevent the processor from being repeatedly triggered to determine whether the voice signal is spoken by the user wearing the electronic device. Since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 when the speech energy of the speech signal is greater than the second preset energy threshold, it can avoid the processor 103 in a noisy environment. 103 is repeatedly triggered to perform the process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
- FIG. 5 is a schematic structural diagram of an electronic device 100 according to Embodiment 4 of the present invention. For convenience of description, only the parts related to this embodiment are shown.
- the electronic device 100 includes a gravity sensor 101, a microphone 102, a memory 104, a processor 103, and a computer program 105 that is stored in the memory 104 and can run on the processor 103.
- the gravity sensor 101, the microphone 102, and the memory 104 are all electrically connected to the processor 103, and the processor 103 executes the computer program 105 to implement the first or second embodiment described above The steps of the voice wake-up method.
- the electronic device 100 includes, but is not limited to, smart wearable devices such as earphones.
- the electronic device 100 of this embodiment belongs to the same concept as the voice wake-up method of the first embodiment or the second embodiment.
- the specific implementation process is detailed in the method embodiment, and the technical features in the method embodiment correspond to the device embodiment. Applicable, so I won’t repeat it here.
- the fifth embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the voice wake-up method described in the first or second embodiment above A step of.
- the computer-readable storage medium of this embodiment belongs to the same concept as the voice wake-up method of the above-mentioned embodiment 1 or embodiment 2, and the specific implementation process is detailed in the method embodiment, and the technical features in the method embodiment are in the device embodiment They are all applicable, so I won’t repeat them here.
- the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components.
- the components are executed cooperatively.
- Some physical components or all physical components can be implemented as software executed by the processor 103, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as a dedicated integrated circuit. Circuit.
- Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium).
- computer storage medium includes volatile and nonvolatile implementations in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data).
- Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer.
- communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media .
- the voice activity detection process monitors that the voice signal collected by the microphone meets the first preset condition, it is further determined whether the voice signal is When a user wearing an electronic device speaks, the keyword recognition process is started only when it is recognized that the above-mentioned voice signal is generated by the user's speech, thereby saving the energy consumption of the electronic device and avoiding false wake-up of the electronic device; in addition, Because the data signal collected by the gravity sensor of the electronic device is used to determine whether the voice signal is generated by the user wearing the electronic device, there is no need to use a special bone conduction microphone or other contact microphone, and the cost is low, and The algorithm is simple and practical, with high accuracy and low resource consumption. Therefore, it has industrial applicability. .
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Provided are a method and system for voice wake-up, an electronic device, and a computer-readable storage medium, related to the technical field of smart devices. The method comprises: if a voice activity detection process detects that a voice signal collected by a microphone satisfies a first preset criterion, then triggering a processor to read a data signal collected by a gravity sensor (S201); determining, on the basis of the data signal, whether the voice signal is generated by a user wearing an electronic device talking (S202); if not, then ignoring the voice signal (S203); or, if yes, then activating a keyword recognition process; and, if the voice signal is recognized as comprising a preset voice command keyword, then controlling the electronic device to execute a corresponding function (S204). The method saves the power consumption of the electronic device and prevents the electronic device from being waked up by mistake. In addition, this obviates the need to employ a dedicated bone conduction microphone or other contact microphones, is inexpensive, has a simple and practical algorithm, is highly accurate, and consumes less resources.
Description
本申请发明属于语音处理技术领域,尤其涉及一种语音唤醒方法、系统、电子设备及计算机可读存储介质。The invention of this application belongs to the technical field of voice processing, and in particular relates to a voice wake-up method, system, electronic device, and computer-readable storage medium.
随着科学技术的发展,目前各种电子设备普遍具有语音唤醒功能,通过在设备或软件中预置唤醒词,当用户发出该语音指令时,设备便从休眠状态中被唤醒。With the development of science and technology, various electronic devices currently generally have a voice wake-up function. By presetting a wake-up word in the device or software, when the user issues the voice command, the device is awakened from the sleep state.
传统的语言唤醒方案是通过语音活动检测(Voice Activity Detection,VAD)获取麦克风采集到的音频信号,并根据音频信号统计语音能量,当语音能量大于预设阈值时,则触发处理器启动关键词识别,以判断上述音频信号是否为用户发出的语言指令。这种语音唤醒方案没有考虑到麦克风采集到的音频信号是否是由佩戴人讲话引起的,导致存在误唤醒设备的情况,即当周围的人无意说出关键词时,也会触发唤醒设备,而且在比较嘈杂的环境下,VAD会不断触发引起数字信号处理器进行关键词识别,功耗损失较大。The traditional language wake-up solution uses voice activity detection (Voice Activity Detection, VAD) to obtain the audio signal collected by the microphone, and calculate the voice energy according to the audio signal. When the voice energy is greater than a preset threshold, the processor is triggered to start keyword recognition , To determine whether the above audio signal is a language instruction issued by the user. This voice wake-up solution does not consider whether the audio signal collected by the microphone is caused by the wearer's speech, which leads to the situation of false wake-up of the device, that is, when people around unintentionally say a keyword, it will also trigger the wake-up of the device, and In a noisy environment, VAD will continue to trigger and cause the digital signal processor to perform keyword recognition, which will result in a large power loss.
针对传统的语言唤醒方案存在的上述缺陷,现有的语言唤醒方案在触发处理器进行关键词识别前,一般会先判断麦克风采集到的音频信号是否是由佩戴人讲话引起的。然而,现有技术中一般是通过采用专门的骨传导麦克风或者是其他接触性麦克风提取音频信号,来判断音频信号是否是由佩戴人讲话引起的,由于骨传导麦克风和其他接触性麦克风成本较高,导致整体设备成本偏高。此外,现有技术中也有通过软件算法来判断麦克风音频信号是否是由佩戴人讲话引起的,但是其判断算法一般比较复杂,导致判断本身就比较消耗资源。In view of the above-mentioned shortcomings of the traditional language wake-up solution, the existing language wake-up solution generally first determines whether the audio signal collected by the microphone is caused by the wearer's speech before triggering the processor to perform keyword recognition. However, in the prior art, a special bone conduction microphone or other contact microphones are generally used to extract audio signals to determine whether the audio signals are caused by the wearer’s speech, because bone conduction microphones and other contact microphones are expensive. , Resulting in high overall equipment costs. In addition, the prior art also uses software algorithms to determine whether the microphone audio signal is caused by the wearer's speech, but the determination algorithm is generally more complicated, which causes the determination itself to consume more resources.
综上,传统及现有的语音唤醒方案存在有可能导致误唤醒设备、功率损失较大、设备成本偏高以及判断算法比较复杂,导致判断本身就比较消耗资源的问题。In summary, traditional and existing voice wake-up solutions may cause false wake-up of the device, large power loss, high device cost, and complex judgment algorithms, which cause the judgment itself to consume more resources.
有鉴于此,本发明实施例提供了一种语音唤醒方法、系统、电子设备及计算机可读存储介质,以解决上述语音唤醒方案存在有可能导致误唤醒设备、功耗损失较大、设备成本偏高以及判断算法比较复杂,导致判断本身就比较消耗资源的问题。In view of this, the embodiments of the present invention provide a voice wake-up method, system, electronic device, and computer-readable storage medium to solve the above-mentioned voice wake-up solution that may cause false wake-up of the device, large power loss, and equipment cost bias. High and the judgment algorithm is more complicated, leading to the problem that the judgment itself consumes more resources.
本发明实施例的第一方面提供了一种语音唤醒方法,应用于电子设备,所述电子设备包括处理器、重力传感器和麦克风,所述重力传感器和所述麦克风分别与所述处理器电性连接,所述语音唤醒方法包括采用所述处理器执行以下步骤:The first aspect of the embodiments of the present invention provides a voice wake-up method, which is applied to an electronic device. The electronic device includes a processor, a gravity sensor, and a microphone. The gravity sensor and the microphone are electrically connected to the processor. Connect, the voice wake-up method includes using the processor to execute the following steps:
若语音活动检测进程监测到所述麦克风采集到的语音信号符合第一预设条件,则触发所述处理器读取所述重力传感器采集到的数据信号;If the voice activity detection process detects that the voice signal collected by the microphone meets the first preset condition, trigger the processor to read the data signal collected by the gravity sensor;
根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的;Judging whether the voice signal is generated by the user wearing the electronic device speaking according to the data signal;
若所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号;或者,If the voice signal is not generated by the user's speech, ignore the voice signal; or,
若所述语音信号是由所述用户讲话产生的,则启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备执行相应的功能。If the voice signal is generated by the user's speech, a keyword recognition process is initiated, and if it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
本发明实施例第二方面提供一种语音唤醒系统,应用于电子设备,所述电子设备包括处理器、重力传感器和麦克风,所述重力传感器和所述麦克风分别与所述处理器电性连接,所述语音唤醒系统包括:A second aspect of the embodiments of the present invention provides a voice wake-up system, which is applied to an electronic device, the electronic device includes a processor, a gravity sensor, and a microphone, and the gravity sensor and the microphone are respectively electrically connected to the processor, The voice wake-up system includes:
语音活动检测单元,用于若语音活动检测进程监测到所述麦克风采集到的语音信号符合第一预设条件,则触发所述处理器读取所述重力传感器采集到的数据信号;A voice activity detection unit, configured to trigger the processor to read the data signal collected by the gravity sensor if the voice activity detection process detects that the voice signal collected by the microphone meets the first preset condition;
第一判断单元,用于根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的;The first determining unit is configured to determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device;
执行单元,用于若所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号;或者,用于若所述语音信号是由所述用户讲话产生的,则启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备执行相应的功能。The execution unit is used to ignore the voice signal if the voice signal is not generated by the user's speech; or, if the voice signal is generated by the user's speech, start a keyword recognition process If it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
本发明实施例的第三方面提供了一种电子设备,包括重力传感器、麦克风、存储器、处理器及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述重力传感器、所述麦克风及所述存储器均与所述处理器电性连接,所述处理器执行所述计算机程序时实现如上述第一方面实施例中任一项所述语音唤醒方法的步骤。The third aspect of the embodiments of the present invention provides an electronic device, including a gravity sensor, a microphone, a memory, a processor, and a computer program stored in the memory and running on the processor. The gravity sensor, Both the microphone and the memory are electrically connected to the processor, and the processor implements the steps of the voice wake-up method according to any one of the embodiments of the first aspect when the processor executes the computer program.
本发明实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面实施例中任一项所述语音唤醒方法的步骤。。The fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it can implement any of the above-mentioned embodiments of the first aspect. The steps of the voice wake-up method described in item. .
本发明实施例提供的语音唤醒方法、系统、电子设备及计算机可读存储介质,由于在语音活动检测进程监测到麦克风采集到的语音信号符合第一预设条件时,进一步判断上述语音信号是否是有佩戴电子设备的用户讲话产生的,在识别到上述语音信号是所述用户讲话产生时,才开启关键字识别进程,从而可以节省电子设备能耗,且可以避免电子设备的误唤醒;此外,其由于通过电子设备自带的重力传感器采集到的数据信号来判断语音信号是否是由佩戴电子设备的用户讲话产生的,从而无需采用专门的骨传导麦克风或其他接触性麦克风,成本较低,且算法简单实用、准确率高,消耗资源少。According to the voice wake-up method, system, electronic device, and computer-readable storage medium provided by the embodiments of the present invention, when the voice activity detection process monitors that the voice signal collected by the microphone meets the first preset condition, it is further determined whether the voice signal is When a user wearing an electronic device speaks, the keyword recognition process is started only when it is recognized that the above-mentioned voice signal is generated by the user's speech, thereby saving the energy consumption of the electronic device and avoiding false wake-up of the electronic device; in addition, Because the data signal collected by the gravity sensor of the electronic device is used to determine whether the voice signal is generated by the user wearing the electronic device, there is no need to use a special bone conduction microphone or other contact microphone, and the cost is low, and The algorithm is simple and practical, with high accuracy and low resource consumption.
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1是本发明实施例提供的电子设备的结构框图;Figure 1 is a structural block diagram of an electronic device provided by an embodiment of the present invention;
图2是本发明实施例一提供的语音唤醒方法的具体实现流程示意图;FIG. 2 is a schematic diagram of a specific implementation process of a voice wake-up method according to Embodiment 1 of the present invention;
图3是本发明实施例二提供的语音唤醒方法的具体实现流程示意图;FIG. 3 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the second embodiment of the present invention;
图4是本发明实施例三提供的语音唤醒系统的结构示意图;4 is a schematic structural diagram of a voice wake-up system provided by Embodiment 3 of the present invention;
图5是本发明实施例四提供的电子设备的结构示意图。。Fig. 5 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. .
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的系统、界面切换设备、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present invention. However, it should be clear to those skilled in the art that the present invention can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, interface switching devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of the present invention.
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。In order to illustrate the technical solution of the present invention, specific embodiments are used for description below.
图1是本发明实施例提供的电子设备的结构框图。为了便于说明仅仅示出了与本实施例相关的部分。Fig. 1 is a structural block diagram of an electronic device provided by an embodiment of the present invention. For convenience of description, only the parts related to this embodiment are shown.
参见图1所示,本发明实施例提供的电子设备100包括处理器103、重力传感器101和麦克风102,所述重力传感器101和所述麦克风102分别与所述处理器103电性连接。As shown in FIG. 1, an electronic device 100 provided by an embodiment of the present invention includes a processor 103, a gravity sensor 101, and a microphone 102, and the gravity sensor 101 and the microphone 102 are electrically connected to the processor 103, respectively.
其中,所述电子设备100包括但不限于耳机等智能穿戴设备。所述麦克风102为所述电子设备100上自带的普通、低成本类型的麦克风102。所述重力传感器101为所述电子设备100自带的用来判断佩戴状态以及实现单击双击功能的传感器。Wherein, the electronic device 100 includes, but is not limited to, smart wearable devices such as earphones. The microphone 102 is a common, low-cost type microphone 102 built in the electronic device 100. The gravity sensor 101 is a sensor of the electronic device 100 that is used to determine the wearing state and realize the double-click function.
基于上述电子设备100的结构,提出本发明的以下实施例。Based on the structure of the above electronic device 100, the following embodiments of the present invention are proposed.
实施例一Example one
图2是本发明实施例一提供的语音唤醒方法的具体实现流程示意图,该方法应用于图1所示的电子设备100,其执行主体为图1所示电子设备100中的处理器103。参见图2所示,本实施例提供的语音唤醒方法可以包括以下步骤:FIG. 2 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the first embodiment of the present invention. The method is applied to the electronic device 100 shown in FIG. 1, and its execution body is the processor 103 in the electronic device 100 shown in FIG. 1. As shown in FIG. 2, the voice wake-up method provided in this embodiment may include the following steps:
步骤S201,若语音活动检测进程监测到所述麦克风102采集到的语音信号符合第一预设条件,则触发所述处理器103读取所述重力传感器101采集到的数据信号。In step S201, if the voice activity detection process detects that the voice signal collected by the microphone 102 meets the first preset condition, the processor 103 is triggered to read the data signal collected by the gravity sensor 101.
在一具体实现方式中,所述第一预设条件为所述语音信号的语音能量大于第二预设能量阈值。其中,所述语音活动检测进程及所述麦克风102在电子设备100处于待机状态下时保持开启,当麦克风102采集到语音信号时,将所述语音信号传递至语音活动检测进程,由语音活动检测进程检测所述语音信号的语音能量,在所述语音能量大于所述第二预设能量阈值时,触发所述处理器103读取所述重力传感器101采集到的数据信号。In a specific implementation, the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold. Wherein, the voice activity detection process and the microphone 102 remain on when the electronic device 100 is in the standby state. When the microphone 102 collects a voice signal, the voice signal is transferred to the voice activity detection process, and the voice activity detection process The process detects the voice energy of the voice signal, and when the voice energy is greater than the second preset energy threshold, triggers the processor 103 to read the data signal collected by the gravity sensor 101.
本实施例中,由于在检测到语音信号的语音能量大于所述第二预设能量阈值时才触发所述处理器103读取所述重力传感器101采集到的数据信号,这样可以避免在嘈杂的环境下,处理器103反复被触发进行判断语音信号是否是由佩戴电子设备100的用户讲话产生的流程,进一步的节省终端的功耗。In this embodiment, since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 only when the voice energy of the detected voice signal is greater than the second preset energy threshold, it can avoid the noisy Under the environment, the processor 103 is repeatedly triggered to perform a process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
步骤S202,根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备100的用户讲话产生的;若所述语音信号不是由所述用户讲话产生的,则进入步骤S203;若所述语音信号是由所述用户讲话产生的,则跳转至步骤S204。Step S202: Determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device 100; if the voice signal is not generated by the speech of the user, go to step S203; if the The voice signal is generated by the user's speech, then jump to step S204.
在本实施例中,所述电子设备100佩戴于用户头部,用户在讲话和未讲话时,头部上颌骨和下颌骨的振动频率和幅度不同,导致重力传感器101采集到的数据信号不同,因此通过分析重力传感器101采集到的数据信号可以判断出语音信号是否是由佩戴所述电子设备100的用户讲话产生的。In this embodiment, the electronic device 100 is worn on the head of the user. When the user is speaking and not speaking, the vibration frequency and amplitude of the maxilla and mandible of the head are different, resulting in different data signals collected by the gravity sensor 101. Therefore, by analyzing the data signal collected by the gravity sensor 101, it can be determined whether the voice signal is generated by the speech of the user wearing the electronic device 100.
在一具体实现方式中,所述根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备100的用户讲话产生的包括:In a specific implementation, the judging whether the voice signal is generated by the user wearing the electronic device 100 speaking according to the data signal includes:
对所述数据信号进行时频转换,筛选出频率在预设频率范围内的数据信号;Performing time-frequency conversion on the data signal to filter out data signals with a frequency within a preset frequency range;
在频域上统计频率在所述预设频率范围内的数据信号的信号能量;Count the signal energy of data signals with frequencies within the preset frequency range in the frequency domain;
判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;
若所述信号能量大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备100的用户讲话产生的;If the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device 100 speaking;
若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备100的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
在另一具体实现方式中,所述根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备100的用户讲话产生的包括:In another specific implementation manner, the judging whether the voice signal is generated by the user wearing the electronic device 100 according to the data signal includes:
对所述数据信号进行带通滤波处理,筛选出频率在预设频率范围内的数据信号;Performing band-pass filtering processing on the data signal to filter out data signals with a frequency within a preset frequency range;
在时域上统计频率在所述预设频率范围内的数据信号的信号能量;Statistic the signal energy of the data signal whose frequency is within the preset frequency range in the time domain;
判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;
若大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备100的用户讲话产生的;If it is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device 100;
若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备100的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
需要说明的是,上述两种具体实现方式中的所述第一预设能量阈值为预先通过大量训练学习得到,用于区分语音信号是否是由佩戴所述电子设备的用户讲话产生的能量阈值。由于佩戴所述电子设备的用户讲话时,用户头部上颌骨和下颌骨的振动频率和幅度较大,重力传感器101采集到的数据信号的信号能量就偏大,因此在数据信号的信号能量大于所述第一预设能量阈值时,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;反之,若所述信号能量小于或等于所述第一预设能量阈值,则说明用户头部上颌骨和下颌骨的振动频率和幅度均较小,因此说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的。所述预设频率范围为人讲话时的语音信号的频率范围。具体的,所述预设频率范围为300~3000Hz,由于人讲话时语音信号的频率段与环境噪音的频率段不同,本实施例中仅统计该预设频率范围内,即人讲话时语音信号所处频率段内的信号能量可以过滤掉其他噪音能量给判断结果带来的影响,使得判断结果更加准确。It should be noted that the first preset energy threshold in the above two specific implementations is obtained through a large amount of training and learning in advance, and is used to distinguish whether the voice signal is an energy threshold generated by the speech of the user wearing the electronic device. When the user wearing the electronic device speaks, the vibration frequency and amplitude of the upper and lower jaw bones of the user's head are relatively large, and the signal energy of the data signal collected by the gravity sensor 101 is too large, so the signal energy of the data signal is greater than When the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device speaking; conversely, if the signal energy is less than or equal to the first preset energy threshold, it means that The vibration frequency and amplitude of the upper jaw and the lower jaw of the user's head are relatively small, which indicates that the voice signal is not generated by the speech of the user wearing the electronic device. The preset frequency range is the frequency range of the voice signal when a person speaks. Specifically, the preset frequency range is 300~3000 Hz. Since the frequency range of the voice signal when a person speaks is different from the frequency range of the environmental noise, in this embodiment, only statistics within the preset frequency range, that is, the voice signal when the person speaks The signal energy in the frequency band can filter out the influence of other noise energy on the judgment result, making the judgment result more accurate.
步骤S203,忽略所述语音信号。Step S203: Ignore the voice signal.
在本实施例中,若所述语音信号不是由佩戴所述电子设备100的用户讲话引起的,则说明该语音信号是周围环境噪音或其他人讲话产生的语音信号,不是佩戴所述电子设备100的用户输入的语音控制指令,因此忽略该语音信号,不进一步对该语音信号进行关键词识别,这样可以节省电子设备100的功耗。In this embodiment, if the voice signal is not caused by the speech of the user wearing the electronic device 100, it means that the voice signal is a voice signal generated by the noise of the surrounding environment or the speech of other people, and it is not caused by wearing the electronic device 100. Therefore, the voice signal is ignored and no keyword recognition is performed on the voice signal, which can save the power consumption of the electronic device 100.
步骤S204,启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备100执行相应的功能。In step S204, a keyword recognition process is started, and if it is recognized that the voice signal contains a preset voice command keyword, the electronic device 100 is controlled to perform a corresponding function.
在本实施例中,若所述语音信号是由佩戴所述电子设备100的用户讲话引起的,则说明该语音信号可能是用户输入的语音控制指令,因此进一步的启动关键词识别进程,识别所述语音信号中是否包含预设的语音指令关键词,若包含预设的语音指令关键词,则控制电子设备100执行相应的语音控制功能;反之,若不包含预设的语音指令关键词,则说明该语音信号是用户讲话产生的,但是不是语音控制指令,因此忽略该语音信号,不唤醒电子设备100。In this embodiment, if the voice signal is caused by the speech of the user wearing the electronic device 100, it means that the voice signal may be a voice control instruction input by the user. Therefore, the keyword recognition process is further initiated to identify all Whether the voice signal contains preset voice command keywords, if it contains the preset voice command keywords, the electronic device 100 is controlled to perform the corresponding voice control function; otherwise, if the preset voice command keywords are not included, then It indicates that the voice signal is generated by the user's speech, but is not a voice control command, so the voice signal is ignored and the electronic device 100 is not awakened.
以上可以看出,本实施例提供的语音唤醒方法由于在语音活动检测进程监测到麦克风102采集到的语音信号符合第一预设条件时,进一步判断上述语音信号是否是由佩戴电子设备100的用户讲话产生的,在识别到上述语音信号是由所述用户讲话产生时,才开启关键字识别进程,从而可以节省电子设备100能耗,且可以避免电子设备100的误唤醒;此外,其由于通过电子设备100自带的重力传感器101采集到的数据信号来判断语音信号是否是由佩戴电子设备100的用户讲话产生的,从而无需采用专门的骨传导麦克风或其他接触性麦克风,成本较低,且算法简单实用、准确率高,消耗资源少。It can be seen from the above that the voice wake-up method provided in this embodiment, when the voice activity detection process monitors that the voice signal collected by the microphone 102 meets the first preset condition, further determines whether the voice signal is from the user wearing the electronic device 100 When it is recognized that the above-mentioned voice signal is generated by the user’s speech, the keyword recognition process is started, which can save the energy consumption of the electronic device 100, and can prevent the electronic device 100 from waking up by mistake; The data signal collected by the gravity sensor 101 of the electronic device 100 is used to determine whether the voice signal is generated by the speech of the user wearing the electronic device 100, so there is no need to use a special bone conduction microphone or other contact microphones, and the cost is low, and The algorithm is simple and practical, with high accuracy and low resource consumption.
To
实施例二Example two
图3是本发明实施例二提供的语音唤醒方法的具体实现流程示意图,该方法应用于图1所示的电子设备100,其执行主体为图1所示电子设备100中的处理器103。参见图3所示,本实施例提供的语音唤醒方法可以包括以下步骤:FIG. 3 is a schematic diagram of a specific implementation flow of the voice wake-up method provided in the second embodiment of the present invention. The method is applied to the electronic device 100 shown in FIG. 1, and its execution body is the processor 103 in the electronic device 100 shown in FIG. 1. Referring to FIG. 3, the voice wake-up method provided in this embodiment may include the following steps:
步骤S301,判断语音活动检测进程监测到所述麦克风102采集到的语音信号是否符合第一预设条件,若符合第一预设条件,则同时进入步骤S302-1和步骤S302-2。该步骤的具体实现方式与实施例一的实现方式相同,在此不再赘述。Step S301: Determine whether the voice activity detection process monitors whether the voice signal collected by the microphone 102 meets the first preset condition, and if it meets the first preset condition, enter step S302-1 and step S302-2 at the same time. The specific implementation of this step is the same as the implementation of Embodiment 1, and will not be repeated here.
步骤S302-1,根据关键词识别进程允许的丢字程度和麦克风102启动速度判断是否提前开启所述关键词识别进程;若所述关键词识别进程允许的丢字程度和所述麦克风102启动速度符合第二预设条件,则进入步骤S303-1。Step S302-1: Determine whether to start the keyword recognition process in advance according to the degree of word loss allowed by the keyword recognition process and the activation speed of the microphone 102; if the degree of word loss allowed by the keyword recognition process and the start speed of the microphone 102 If the second preset condition is met, step S303-1 is entered.
步骤S303-1,开启所述关键词识别进程,识别所述语音信号中是否包含预设的语音指令关键词。Step S303-1: Start the keyword recognition process, and recognize whether the voice signal contains preset voice command keywords.
其中,所述第一预设条件为所述语音信号的语音能量大于第二预设能量阈值。其中,所述语音活动检测进程及所述麦克风102在电子设备100处于待机状态下时保持开启,当麦克风102采集到语音信号时,将所述语音信号传递至语音活动检测进程,由语音活动检测进程检测所述语音信号的语音能量,在所述语音能量大于所述第二预设能量阈值时,触发所述处理器103读取所述重力传感器101采集到的数据信号。Wherein, the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold. Wherein, the voice activity detection process and the microphone 102 remain on when the electronic device 100 is in the standby state. When the microphone 102 collects a voice signal, the voice signal is transferred to the voice activity detection process, and the voice activity detection process The process detects the voice energy of the voice signal, and when the voice energy is greater than the second preset energy threshold, triggers the processor 103 to read the data signal collected by the gravity sensor 101.
所述第二预设能量阈值是为了避免在嘈杂环境下,处理器被反复触发进行判断语音信号是否是由佩戴电子设备的用户讲话的流程而预先设置的能量阈值。由于在检测到语音信号的语音能量大于所述第二预设能量阈值时才触发所述处理器103读取所述重力传感器101采集到的数据信号,这样可以避免在嘈杂的环境下,处理器103反复被触发进行判断语音信号是否是由佩戴电子设备100的用户讲话产生的流程,进一步的节省终端的功耗。The second preset energy threshold is an energy threshold set in advance in order to prevent the processor from being repeatedly triggered to determine whether the voice signal is spoken by the user wearing the electronic device. Since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 when the speech energy of the speech signal is greater than the second preset energy threshold, it can avoid the processor 103 in a noisy environment. 103 is repeatedly triggered to perform the process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
所述第二预设条件为所述关键词识别进程允许的丢字程度小于预设丢字程度阈值且所述麦克风102启动速度小于预设启动速度阈值。The second preset condition is that the degree of word loss allowed by the keyword recognition process is less than a preset threshold of word loss and the activation speed of the microphone 102 is less than the preset activation speed threshold.
在本实施例中,当所述关键词识别进程允许的丢字程度小于预设丢字程度阈值且所述麦克风102启动速度小于预设启动速度阈值时,则进入到步骤S303-1提前开启关键词识别进程,这样可以避免由于麦克风102启动速度过慢,若关键词识别进程不提前启动,导致丢字太多,即无法及时采集到用户发出的语音信号,进而导致无法识别出完整的语音控制指令的情况;相反,若关键词识别进程允许的丢字程度大于或等于预设丢字程度或麦克风102启动速度大于或等于预设启动速度阈值,则出现语音控制指令丢失情况的可能性较小,所以不提前开启关键词识别进程,这种情况的唤醒流程与实施例一提供的语音唤醒流程相同,因此在此不再赘述。In this embodiment, when the degree of word loss allowed by the keyword recognition process is less than the preset threshold of word loss and the activation speed of the microphone 102 is less than the preset activation speed threshold, the process proceeds to step S303-1 to enable the key in advance. Word recognition process, which can avoid the slow start of microphone 102. If the keyword recognition process is not started in advance, too many words will be lost, that is, the voice signal sent by the user cannot be collected in time, and the complete voice control cannot be recognized. In the case of instructions; on the contrary, if the degree of word loss allowed by the keyword recognition process is greater than or equal to the preset degree of word loss or the microphone 102 activation speed is greater than or equal to the preset activation speed threshold, the possibility of loss of voice control commands is less likely Therefore, the keyword recognition process is not started in advance. The wake-up process in this case is the same as the voice wake-up process provided in the first embodiment, so it is not repeated here.
步骤S302-2,触发所述处理器103读取所述重力传感器101采集到的数据信号;Step S302-2, trigger the processor 103 to read the data signal collected by the gravity sensor 101;
步骤S303-2,根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备100的用户讲话产生的。需要说明的是,步骤S302-2和步骤S303-2的实现方式由于与实施例一中对应步骤的实现方式相同,因此在此不再赘述。Step S303-2: Determine whether the voice signal is generated by the user wearing the electronic device 100 speaking according to the data signal. It should be noted that the implementation manners of step S302-2 and step S303-2 are the same as the implementation manners of the corresponding steps in Embodiment 1, so they will not be repeated here.
步骤S304,若所述语音信号包含预设的语音指令关键词且所述语音信号是由所述用户讲话产生的,则控制所述电子设备100执行相应的功能。Step S304: If the voice signal includes preset voice command keywords and the voice signal is generated by the user's speech, the electronic device 100 is controlled to perform a corresponding function.
步骤S305,若所述语音信号不包含预设的语音指令关键词或所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号。Step S305: If the voice signal does not include the preset voice command keywords or the voice signal is not generated by the user's speech, the voice signal is ignored.
在本实施例中,当语音信号同时满足包含预设的语音指令关键词和是由佩戴所述电子设备100的用户讲话产生的,才控制所述电子设备100执行相应的语音控制功能,在语音信号不满足上述两个条件中任一条件时,则忽略所述语音信号,这样可以避免电子设备100的误唤醒。In this embodiment, only when the voice signal satisfies the preset voice command keywords and is generated by the user wearing the electronic device 100, the electronic device 100 is controlled to perform the corresponding voice control function. When the signal does not meet any of the above two conditions, the voice signal is ignored, so that false wake-up of the electronic device 100 can be avoided.
以上可以看出,本实施例提供的语音唤醒方法同样可以避免电子设备100的误唤醒,且由于通过电子设备100自带的重力传感器101采集到的数据信号来判断语音信号是否是由佩戴电子设备100的用户讲话产生的,从而无需采用专门的骨传导麦克风或其他接触性麦克风,成本较低,且算法简单实用、准确率高,消耗资源少;此外,相对于上一实施例,本实施例提供的语音唤醒方法由于在关键词识别进程允许的丢字程度小于预设丢字程度阈值且所述麦克风102启动速度小于预设启动速度阈值时,提前开启所述关键词识别进程,这样可以避免出现由于麦克风102启动速度过慢,若关键词识别进程不提前启动,导致丢字太多,无法识别出完整的语音控制指令的情况。It can be seen from the above that the voice wake-up method provided by this embodiment can also avoid false wake-up of the electronic device 100, and the data signal collected by the gravity sensor 101 of the electronic device 100 is used to determine whether the voice signal is caused by wearing the electronic device. 100 users speak, so there is no need to use a special bone conduction microphone or other contact microphones, the cost is low, and the algorithm is simple and practical, the accuracy rate is high, and the resource consumption is small; in addition, compared to the previous embodiment, this embodiment The provided voice wake-up method starts the keyword recognition process in advance when the allowable word loss degree of the keyword recognition process is less than the preset word loss degree threshold and the activation speed of the microphone 102 is less than the preset activation speed threshold, which can avoid Because the microphone 102 is too slow to start, if the keyword recognition process is not started in advance, too many words are lost and the complete voice control command cannot be recognized.
To
实施例三Example three
图4是本发明实施例三提供的语音唤醒系统的结构示意图,该系统应用于图1所述的电子设备100,运行在图1所述电子设备100的处理器103中。为了便于说明仅仅示出了与本实施例相关的部分。FIG. 4 is a schematic structural diagram of a voice wake-up system provided by Embodiment 3 of the present invention. The system is applied to the electronic device 100 described in FIG. 1 and runs in the processor 103 of the electronic device 100 described in FIG. 1. For convenience of description, only the parts related to this embodiment are shown.
参见图4所示,本实施例提供的语音唤醒系统4包括:Referring to FIG. 4, the voice wake-up system 4 provided in this embodiment includes:
语音活动检测单元41,用于若语音活动检测进程监测到所述麦克风102采集到的语音信号符合第一预设条件,则触发所述处理器103读取所述重力传感器101采集到的数据信号;其中,所述第一预设条件为所述语音信号的语音能量大于第二预设能量阈值。The voice activity detection unit 41 is configured to trigger the processor 103 to read the data signal collected by the gravity sensor 101 if the voice activity detection process detects that the voice signal collected by the microphone 102 meets the first preset condition ; Wherein, the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold.
第一判断单元42,用于根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备100的用户讲话产生的;The first determining unit 42 is configured to determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device 100;
执行单元43,用于若所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号;或者,用于若所述语音信号是由所述用户讲话产生的,则启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备100执行相应的功能。The execution unit 43 is configured to ignore the voice signal if the voice signal is not generated by the user's speech; or, if the voice signal is generated by the user's speech, start keyword recognition During the process, if it is recognized that the voice signal contains a preset voice command keyword, the electronic device 100 is controlled to perform a corresponding function.
可选的,所述第一判断单元42具体用于:Optionally, the first determining unit 42 is specifically configured to:
对所述数据信号进行时频转换,筛选出频率在预设频率范围内的数据信号;Performing time-frequency conversion on the data signal to filter out data signals with a frequency within a preset frequency range;
在频域上统计频率在所述预设频率范围内的数据信号的信号能量;Count the signal energy of data signals with frequencies within the preset frequency range in the frequency domain;
判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;
若所述信号能量大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备100的用户讲话产生的;If the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device 100 speaking;
若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备100的用户讲话产生的;If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device 100;
其中,所述第一预设能量阈值为预先通过大量训练学习得到,用于区分语音信号是否是由佩戴所述电子设备的用户讲话产生的能量阈值。由于佩戴所述电子设备的用户讲话时,用户头部上颌骨和下颌骨的振动频率和幅度较大,重力传感器101采集到的数据信号的信号能量就偏大,因此在数据信号的信号能量大于所述第一预设能量阈值时,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;反之,若所述信号能量小于或等于所述第一预设能量阈值,则说明用户头部上颌骨和下颌骨的振动频率和幅度均较小,因此说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的。Wherein, the first preset energy threshold value is obtained through a large amount of training and learning in advance, and is used to distinguish whether the voice signal is an energy threshold value generated by the speech of the user wearing the electronic device. When the user wearing the electronic device speaks, the vibration frequency and amplitude of the upper and lower jaw bones of the user's head are relatively large, and the signal energy of the data signal collected by the gravity sensor 101 is too large, so the signal energy of the data signal is greater than When the first preset energy threshold, it means that the voice signal is generated by the user wearing the electronic device speaking; conversely, if the signal energy is less than or equal to the first preset energy threshold, it means that The vibration frequency and amplitude of the upper jaw and the lower jaw of the user's head are relatively small, which indicates that the voice signal is not generated by the speech of the user wearing the electronic device.
或者,所述第一判断单元42具体用于:Alternatively, the first determining unit 42 is specifically configured to:
对所述数据信号进行带通滤波处理,筛选出频率在预设频率范围内的数据信号;Performing band-pass filtering processing on the data signal to filter out data signals with a frequency within a preset frequency range;
在时域上统计频率在所述预设频率范围内的数据信号的信号能量;Statistic the signal energy of the data signal whose frequency is within the preset frequency range in the time domain;
判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;
若大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备100的用户讲话产生的;If it is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device 100;
若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备100的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the user wearing the electronic device 100 speaking.
可选的,所述语音唤醒系统还包括:Optionally, the voice wake-up system further includes:
第二判断单元44,用于根据关键词识别进程允许的丢字程度和麦克风102启动速度判断是否提前开启所述关键词识别进程;The second determining unit 44 is configured to determine whether to start the keyword recognition process in advance according to the degree of word loss allowed by the keyword recognition process and the activation speed of the microphone 102;
所述执行单元43,还用于:The execution unit 43 is further configured to:
若所述关键词识别进程允许的丢字程度和所述麦克风102启动速度符合第二预设条件,则提前开启所述关键词识别进程,此时关键词识别进程和检测语音信号是否由佩戴所述电子设备100的用户讲话产生的进程同步进行;If the degree of word loss allowed by the keyword recognition process and the activation speed of the microphone 102 meet the second preset condition, the keyword recognition process is started in advance. At this time, the keyword recognition process and the detection of whether the voice signal is worn by the wearer The speech generation process of the user of the electronic device 100 is performed synchronously;
若所述语音信号包含预设的语音指令关键词且所述语音信号是由所述用户讲话产生的,则控制所述电子设备100执行相应的功能;或者,If the voice signal contains preset voice command keywords and the voice signal is generated by the user's speech, control the electronic device 100 to perform the corresponding function; or,
若所述语音信号不包含预设的语音指令关键词或所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号。If the voice signal does not include preset voice command keywords or the voice signal is not generated by the user's speech, then the voice signal is ignored.
可选的,所述第二预设条件为所述关键词识别进程允许的丢字程度小于预设丢字程度阈值且所述麦克风102启动速度小于预设启动速度阈值。Optionally, the second preset condition is that the degree of word loss allowed by the keyword recognition process is less than a preset degree of word loss threshold and the activation speed of the microphone 102 is less than a preset activation speed threshold.
可选的,所述第一预设条件为所述语音信号的语音能量大于第二预设能量阈值。其中,所述第二预设能量阈值是为了避免在嘈杂环境下,处理器被反复触发进行判断语音信号是否是由佩戴电子设备的用户讲话的流程而预先设置的能量阈值。由于在检测到语音信号的语音能量大于所述第二预设能量阈值时才触发所述处理器103读取所述重力传感器101采集到的数据信号,这样可以避免在嘈杂的环境下,处理器103反复被触发进行判断语音信号是否是由佩戴电子设备100的用户讲话产生的流程,进一步的节省终端的功耗。Optionally, the first preset condition is that the speech energy of the speech signal is greater than a second preset energy threshold. Wherein, the second preset energy threshold is an energy threshold preset in order to prevent the processor from being repeatedly triggered to determine whether the voice signal is spoken by the user wearing the electronic device. Since the processor 103 is triggered to read the data signal collected by the gravity sensor 101 when the speech energy of the speech signal is greater than the second preset energy threshold, it can avoid the processor 103 in a noisy environment. 103 is repeatedly triggered to perform the process of determining whether the voice signal is generated by the speech of the user wearing the electronic device 100, which further saves the power consumption of the terminal.
需要说明的是,本发明实施例提供的上述系统的各个单元,由于与本发明方法实施例基于同一构思,其带来的技术效果与本发明方法实施例相同,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。It should be noted that the various units of the above-mentioned system provided by the embodiments of the present invention are based on the same concept as the method embodiments of the present invention, and their technical effects are the same as those of the method embodiments of the present invention. For details, please refer to the method implementation of the present invention. The description in the example will not be repeated here.
本领域普通技术人员可以理解,本实施例所公开方法中的全部或某些步骤、可以被实施为软件、固件、硬件及其适当的组合。 A person of ordinary skill in the art can understand that all or some of the steps in the method disclosed in this embodiment can be implemented as software, firmware, hardware, and appropriate combinations thereof.
To
实施例四Example four
图5是本发明实施例四提供的电子设备100的结构示意图。为了便于说明仅仅示出了与本实施例相关的部分。FIG. 5 is a schematic structural diagram of an electronic device 100 according to Embodiment 4 of the present invention. For convenience of description, only the parts related to this embodiment are shown.
参见图5所示,本实施例提供的电子设备100包括重力传感器101、麦克风102、存储器104、处理器103及存储在所述存储器104中并可在所述处理器103上运行的计算机程序105,所述重力传感器101、所述麦克风102及所述存储器104均与所述处理器103电性连接,所述处理器103执行所述计算机程序105时实现上述实施例一或实施例二所述语音唤醒方法的步骤。其中,所述电子设备100包括但不限于耳机等智能穿戴设备。As shown in FIG. 5, the electronic device 100 provided in this embodiment includes a gravity sensor 101, a microphone 102, a memory 104, a processor 103, and a computer program 105 that is stored in the memory 104 and can run on the processor 103. , The gravity sensor 101, the microphone 102, and the memory 104 are all electrically connected to the processor 103, and the processor 103 executes the computer program 105 to implement the first or second embodiment described above The steps of the voice wake-up method. Wherein, the electronic device 100 includes, but is not limited to, smart wearable devices such as earphones.
本实施例的电子设备100与上述实施例一或实施例二的语音唤醒方法属于同一构思,其具体实现过程详细见方法实施例,且方法实施例中的技术特征在本设备实施例中均对应适用,这里不再赘述。The electronic device 100 of this embodiment belongs to the same concept as the voice wake-up method of the first embodiment or the second embodiment. The specific implementation process is detailed in the method embodiment, and the technical features in the method embodiment correspond to the device embodiment. Applicable, so I won’t repeat it here.
本领域普通技术人员可以理解,本实施例所公开方法中的全部或某些步骤、可以被实施为软件、固件、硬件及其适当的组合。A person of ordinary skill in the art can understand that all or some of the steps in the method disclosed in this embodiment can be implemented as software, firmware, hardware, and appropriate combinations thereof.
To
实施例五Example five
本发明实施例五提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例一或实施例二所述语音唤醒方法的步骤。The fifth embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements the voice wake-up method described in the first or second embodiment above A step of.
本实施例的计算机可读存储介质与上述实施例一或实施例二的语音唤醒方法属于同一构思,其具体实现过程详细见方法实施例,且方法实施例中的技术特征在本设备实施例中均对应适用,这里不再赘述。The computer-readable storage medium of this embodiment belongs to the same concept as the voice wake-up method of the above-mentioned embodiment 1 or embodiment 2, and the specific implementation process is detailed in the method embodiment, and the technical features in the method embodiment are in the device embodiment They are all applicable, so I won’t repeat them here.
本领域普通技术人员可以理解,本实施例所公开方法中的全部或某些步骤、可以被实施为软件、固件、硬件及其适当的组合。A person of ordinary skill in the art can understand that all or some of the steps in the method disclosed in this embodiment can be implemented as software, firmware, hardware, and appropriate combinations thereof.
在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器103,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. The components are executed cooperatively. Some physical components or all physical components can be implemented as software executed by the processor 103, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as a dedicated integrated circuit. Circuit. Such software may be distributed on a computer-readable medium, and the computer-readable medium may include a computer storage medium (or a non-transitory medium) and a communication medium (or a transitory medium). As is well known by those of ordinary skill in the art, the term computer storage medium includes volatile and nonvolatile implementations in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Flexible, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, tapes, magnetic disk storage or other magnetic storage devices, or Any other medium used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media usually contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier waves or other transmission mechanisms, and may include any information delivery media .
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。本领域技术人员不脱离本发明的范围和实质内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。。The preferred embodiments of the present invention are described above with reference to the accompanying drawings, and the scope of rights of the present invention is not limited thereby. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and essence of the present invention shall fall within the scope of the rights of the present invention. .
本发明实施例提供的语音唤醒方法、系统、电子设备及计算机可读存储介质,由于在语音活动检测进程监测到麦克风采集到的语音信号符合第一预设条件时,进一步判断上述语音信号是否是有佩戴电子设备的用户讲话产生的,在识别到上述语音信号是所述用户讲话产生时,才开启关键字识别进程,从而可以节省电子设备能耗,且可以避免电子设备的误唤醒;此外,其由于通过电子设备自带的重力传感器采集到的数据信号来判断语音信号是否是由佩戴电子设备的用户讲话产生的,从而无需采用专门的骨传导麦克风或其他接触性麦克风,成本较低,且算法简单实用、准确率高,消耗资源少。因此,具有工业实用性。。According to the voice wake-up method, system, electronic device, and computer-readable storage medium provided by the embodiments of the present invention, when the voice activity detection process monitors that the voice signal collected by the microphone meets the first preset condition, it is further determined whether the voice signal is When a user wearing an electronic device speaks, the keyword recognition process is started only when it is recognized that the above-mentioned voice signal is generated by the user's speech, thereby saving the energy consumption of the electronic device and avoiding false wake-up of the electronic device; in addition, Because the data signal collected by the gravity sensor of the electronic device is used to determine whether the voice signal is generated by the user wearing the electronic device, there is no need to use a special bone conduction microphone or other contact microphone, and the cost is low, and The algorithm is simple and practical, with high accuracy and low resource consumption. Therefore, it has industrial applicability. .
Claims (10)
- 一种语音唤醒方法,应用于电子设备,所述电子设备包括处理器、重力传感器和麦克风,所述重力传感器和所述麦克风分别与所述处理器电性连接,所述语音唤醒方法包括采用所述处理器执行以下步骤:A voice wake-up method is applied to an electronic device. The electronic device includes a processor, a gravity sensor and a microphone. The gravity sensor and the microphone are electrically connected to the processor. The voice wake-up method includes The processor performs the following steps:若语音活动检测进程监测到所述麦克风采集到的语音信号符合第一预设条件,则触发所述处理器读取所述重力传感器采集到的数据信号;If the voice activity detection process detects that the voice signal collected by the microphone meets the first preset condition, trigger the processor to read the data signal collected by the gravity sensor;根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的;Judging whether the voice signal is generated by the user wearing the electronic device speaking according to the data signal;若所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号;或者,If the voice signal is not generated by the user's speech, ignore the voice signal; or,若所述语音信号是由所述用户讲话产生的,则启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备执行相应的功能。If the voice signal is generated by the user's speech, a keyword recognition process is initiated, and if it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
- .如权利要求1所述的语言唤醒方法,其中,所述根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的包括:The language wake-up method according to claim 1, wherein said judging whether said voice signal is generated by the speech of a user wearing said electronic device according to said data signal comprises:对所述数据信号进行时频转换,筛选出频率在预设频率范围内的数据信号;Performing time-frequency conversion on the data signal to filter out data signals with a frequency within a preset frequency range;在频域上统计频率在所述预设频率范围内的数据信号的信号能量;Count the signal energy of data signals with frequencies within the preset frequency range in the frequency domain;判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;若所述信号能量大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;If the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device;若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device.
- 如权利要求1所述的语音唤醒方法,其中,所述根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的包括:The voice wake-up method according to claim 1, wherein said judging whether said voice signal is generated by a user wearing said electronic device speaking according to said data signal comprises:对所述数据信号进行带通滤波处理,筛选出频率在预设频率范围内的数据信号;Performing band-pass filtering processing on the data signal to filter out data signals with a frequency within a preset frequency range;在时域上统计频率在所述预设频率范围内的数据信号的信号能量;Statistic the signal energy of the data signal whose frequency is within the preset frequency range in the time domain;判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;若大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;If it is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device;若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device.
- .如权利要求1所述的语音唤醒方法,其中,所述若处理器上运行的语音活动检测进程监测到麦克风采集到的语音信号符合第一预设条件之后还包括:The voice wake-up method according to claim 1, wherein if the voice activity detection process running on the processor detects that the voice signal collected by the microphone meets the first preset condition, the method further comprises:根据关键词识别进程允许的丢字程度和麦克风启动速度判断是否提前开启所述关键词识别进程;Judging whether to start the keyword recognition process in advance according to the degree of word loss allowed by the keyword recognition process and the microphone activation speed;若所述关键词识别进程允许的丢字程度和所述麦克风启动速度符合第二预设条件,则提前开启所述关键词识别进程,此时关键词识别进程和检测语音信号是否由佩戴所述电子设备的用户讲话产生的进程同步进行;If the degree of word loss allowed by the keyword recognition process and the microphone activation speed meet the second preset condition, the keyword recognition process is started in advance. At this time, the keyword recognition process and detecting whether the voice signal is worn by the The process of speech generated by the user of the electronic device is synchronized;若所述语音信号包含预设的语音指令关键词且所述语音信号是由所述用户讲话产生的,则控制所述电子设备执行相应的功能;或者,If the voice signal contains preset voice command keywords and the voice signal is generated by the user's speech, control the electronic device to perform the corresponding function; or,若所述语音信号不包含预设的语音指令关键词或所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号。If the voice signal does not include preset voice command keywords or the voice signal is not generated by the user's speech, then the voice signal is ignored.
- 如权利要求4所述的语音唤醒方法,其中,所述第二预设条件为所述关键词识别进程允许的丢字程度小于预设丢字程度阈值且所述麦克风启动速度小于预设启动速度阈值。The voice wake-up method of claim 4, wherein the second preset condition is that the degree of word loss allowed by the keyword recognition process is less than a preset word loss threshold and the microphone activation speed is less than the preset activation speed Threshold.
- 如权利要求1所述的语音唤醒方法,其特征在于,所述第一预设条件为所述语音信号的语音能量大于第二预设能量阈值。The voice wake-up method according to claim 1, wherein the first preset condition is that the voice energy of the voice signal is greater than a second preset energy threshold.
- 一种语音唤醒系统,应用于电子设备,所述电子设备包括处理器、重力传感器和麦克风,所述重力传感器和所述麦克风分别与所述处理器电性连接,所述语音唤醒系统包括:A voice wake-up system is applied to an electronic device. The electronic device includes a processor, a gravity sensor, and a microphone. The gravity sensor and the microphone are respectively electrically connected to the processor. The voice wake-up system includes:语音活动检测单元,用于若语音活动检测进程监测到所述麦克风采集到的语音信号符合第一预设条件,则触发所述处理器读取所述重力传感器采集到的数据信号;A voice activity detection unit, configured to trigger the processor to read the data signal collected by the gravity sensor if the voice activity detection process detects that the voice signal collected by the microphone meets the first preset condition;第一判断单元,用于根据所述数据信号判断所述语音信号是否是由佩戴所述电子设备的用户讲话产生的;The first determining unit is configured to determine, according to the data signal, whether the voice signal is generated by the speech of the user wearing the electronic device;执行单元,用于若所述语音信号不是由所述用户讲话产生的,则忽略所述语音信号;或者,用于若所述语音信号是由所述用户讲话产生的,则启动关键词识别进程,若识别到所述语音信号包含预设的语音指令关键词,则控制所述电子设备执行相应的功能。The execution unit is used to ignore the voice signal if the voice signal is not generated by the user's speech; or, if the voice signal is generated by the user's speech, start a keyword recognition process If it is recognized that the voice signal contains a preset voice command keyword, the electronic device is controlled to perform a corresponding function.
- 如权利要求7所述的语言唤醒系统,其中,所述第一判断单元具体用于:The language wake-up system according to claim 7, wherein the first judgment unit is specifically configured to:对所述数据信号进行时频转换,筛选出频率在预设频率范围内的数据信号;Performing time-frequency conversion on the data signal to filter out data signals with a frequency within a preset frequency range;在频域上统计频率在所述预设频率范围内的数据信号的信号能量;Count the signal energy of data signals with frequencies within the preset frequency range in the frequency domain;判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;若所述信号能量大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;If the signal energy is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device;若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的;If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device;或者,所述第一判断单元具体用于:Or, the first judgment unit is specifically configured to:对所述数据信号进行带通滤波处理,筛选出频率在预设频率范围内的数据信号;Performing band-pass filtering processing on the data signal to filter out data signals with a frequency within a preset frequency range;在时域上统计频率在所述预设频率范围内的数据信号的信号能量;Statistic the signal energy of the data signal whose frequency is within the preset frequency range in the time domain;判断所述信号能量是否大于第一预设能量阈值;Judging whether the signal energy is greater than a first preset energy threshold;若大于所述第一预设能量阈值,则说明所述语音信号是由佩戴所述电子设备的用户讲话产生的;If it is greater than the first preset energy threshold, it means that the voice signal is generated by the speech of the user wearing the electronic device;若所述信号能量小于或等于所述第一预设能量阈值,则说明所述语音信号不是由佩戴所述电子设备的用户讲话产生的。If the signal energy is less than or equal to the first preset energy threshold, it means that the voice signal is not generated by the speech of the user wearing the electronic device.
- 一种电子设备,包括重力传感器、麦克风、存储器、处理器及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述重力传感器、所述麦克风及所述存储器均与所述处理器电性连接,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述语音唤醒方法的步骤。An electronic device including a gravity sensor, a microphone, a memory, a processor, and a computer program stored in the memory and running on the processor. The gravity sensor, the microphone, and the memory are all related to The processor is electrically connected, and when the processor executes the computer program, the steps of the voice wake-up method according to any one of claims 1 to 6 are realized.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述语音唤醒方法的步骤。A computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of the voice wake-up method according to any one of claims 1 to 6 are realized.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910492994.6A CN110265036A (en) | 2019-06-06 | 2019-06-06 | Voice awakening method, system, electronic equipment and computer readable storage medium |
CN201910492994.6 | 2019-06-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020244257A1 true WO2020244257A1 (en) | 2020-12-10 |
Family
ID=67917165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/076473 WO2020244257A1 (en) | 2019-06-06 | 2020-02-24 | Method and system for voice wake-up, electronic device, and computer-readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110265036A (en) |
WO (1) | WO2020244257A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265036A (en) * | 2019-06-06 | 2019-09-20 | 湖南国声声学科技股份有限公司 | Voice awakening method, system, electronic equipment and computer readable storage medium |
CN113377225B (en) * | 2020-03-10 | 2024-04-26 | 北京钛方科技有限责任公司 | Trigger action recognition method, trigger action recognition system and storage medium |
CN111432303B (en) * | 2020-03-19 | 2023-01-10 | 交互未来(北京)科技有限公司 | Monaural headset, intelligent electronic device, method, and computer-readable medium |
CN111524513A (en) * | 2020-04-16 | 2020-08-11 | 歌尔科技有限公司 | Wearable device and voice transmission control method, device and medium thereof |
CN111510662B (en) * | 2020-04-27 | 2021-06-22 | 深圳米唐科技有限公司 | Network call microphone state prompting method and system based on audio and video analysis |
CN113823288A (en) * | 2020-06-16 | 2021-12-21 | 华为技术有限公司 | Voice wake-up method, electronic equipment, wearable equipment and system |
TWI790647B (en) * | 2021-01-13 | 2023-01-21 | 神盾股份有限公司 | Voice assistant system |
CN112967723B (en) * | 2021-02-01 | 2024-05-31 | 珠海格力电器股份有限公司 | Identity confirmation method and control device, sleep parameter detection method and control device |
CN113808585A (en) * | 2021-08-16 | 2021-12-17 | 百度在线网络技术(北京)有限公司 | Earphone awakening method, device, equipment and storage medium |
CN113782038A (en) * | 2021-09-13 | 2021-12-10 | 北京声智科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN115547312B (en) * | 2022-11-30 | 2023-03-21 | 深圳时识科技有限公司 | Preprocessor with activity detection, chip and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN203882609U (en) * | 2014-05-08 | 2014-10-15 | 钰太芯微电子科技(上海)有限公司 | Awakening apparatus based on voice activation detection |
CN104144377A (en) * | 2013-05-09 | 2014-11-12 | Dsp集团有限公司 | Low power activation of voice activated device |
CN106714023A (en) * | 2016-12-27 | 2017-05-24 | 广东小天才科技有限公司 | Bone conduction earphone-based voice awakening method and system and bone conduction earphone |
CN107231584A (en) * | 2016-03-25 | 2017-10-03 | 美特科技(苏州)有限公司 | A kind of microphone apparatus |
CN108766468A (en) * | 2018-06-12 | 2018-11-06 | 歌尔科技有限公司 | A kind of intelligent sound detection method, wireless headset, TWS earphones and terminal |
CN108882087A (en) * | 2018-06-12 | 2018-11-23 | 歌尔科技有限公司 | A kind of intelligent sound detection method, wireless headset, TWS earphone and terminal |
CN110265036A (en) * | 2019-06-06 | 2019-09-20 | 湖南国声声学科技股份有限公司 | Voice awakening method, system, electronic equipment and computer readable storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7580540B2 (en) * | 2004-12-29 | 2009-08-25 | Motorola, Inc. | Apparatus and method for receiving inputs from a user |
CN105206271A (en) * | 2015-08-25 | 2015-12-30 | 北京宇音天下科技有限公司 | Intelligent equipment voice wake-up method and system for realizing method |
CN109729448A (en) * | 2017-10-27 | 2019-05-07 | 北京金锐德路科技有限公司 | Neck wears the voice control optimization method and device of formula interactive voice earphone |
CN107995547A (en) * | 2017-11-29 | 2018-05-04 | 联想(北京)有限公司 | Headphone device and control method |
CN108735219B (en) * | 2018-05-09 | 2021-08-31 | 深圳市宇恒互动科技开发有限公司 | Voice recognition control method and device |
-
2019
- 2019-06-06 CN CN201910492994.6A patent/CN110265036A/en active Pending
-
2020
- 2020-02-24 WO PCT/CN2020/076473 patent/WO2020244257A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104144377A (en) * | 2013-05-09 | 2014-11-12 | Dsp集团有限公司 | Low power activation of voice activated device |
CN203882609U (en) * | 2014-05-08 | 2014-10-15 | 钰太芯微电子科技(上海)有限公司 | Awakening apparatus based on voice activation detection |
CN107231584A (en) * | 2016-03-25 | 2017-10-03 | 美特科技(苏州)有限公司 | A kind of microphone apparatus |
CN106714023A (en) * | 2016-12-27 | 2017-05-24 | 广东小天才科技有限公司 | Bone conduction earphone-based voice awakening method and system and bone conduction earphone |
CN108766468A (en) * | 2018-06-12 | 2018-11-06 | 歌尔科技有限公司 | A kind of intelligent sound detection method, wireless headset, TWS earphones and terminal |
CN108882087A (en) * | 2018-06-12 | 2018-11-23 | 歌尔科技有限公司 | A kind of intelligent sound detection method, wireless headset, TWS earphone and terminal |
CN110265036A (en) * | 2019-06-06 | 2019-09-20 | 湖南国声声学科技股份有限公司 | Voice awakening method, system, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110265036A (en) | 2019-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020244257A1 (en) | Method and system for voice wake-up, electronic device, and computer-readable storage medium | |
US11502859B2 (en) | Method and apparatus for waking up via speech | |
CN106714023B (en) | Bone conduction earphone-based voice awakening method and system and bone conduction earphone | |
US10332524B2 (en) | Speech recognition wake-up of a handheld portable electronic device | |
US10601599B2 (en) | Voice command processing in low power devices | |
US9620116B2 (en) | Performing automated voice operations based on sensor data reflecting sound vibration conditions and motion conditions | |
US20180293974A1 (en) | Spoken language understanding based on buffered keyword spotting and speech recognition | |
US20160019886A1 (en) | Method and apparatus for recognizing whisper | |
US20190147890A1 (en) | Audio peripheral device | |
WO2021184549A1 (en) | Monaural earphone, intelligent electronic device, method and computer readable medium | |
CN110968353A (en) | Central processing unit awakening method and device, voice processor and user equipment | |
CN110853644B (en) | Voice wake-up method, device, equipment and storage medium | |
CN109412544B (en) | Voice acquisition method and device of intelligent wearable device and related components | |
WO2022233308A9 (en) | Wearing detection method, wearable device and storage medium | |
WO2021169711A1 (en) | Instruction execution method and apparatus, storage medium, and electronic device | |
CN112233676B (en) | Intelligent device awakening method and device, electronic device and storage medium | |
US20220230657A1 (en) | Voice control method and apparatus, chip, earphones, and system | |
WO2022199405A1 (en) | Voice control method and apparatus | |
CN111028838A (en) | Voice wake-up method, device and computer readable storage medium | |
CN105430543A (en) | Digital microphone and electronic device | |
CN205408096U (en) | Digital microphone wind and electronic equipment | |
CN116705033A (en) | System on chip for wireless intelligent audio equipment and wireless processing method | |
TWI748587B (en) | Acoustic event detection system and method | |
JP2019139146A (en) | Voice recognition system and voice recognition method | |
US20220270593A1 (en) | Voice activity detection with low-power accelerometer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20817693 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20817693 Country of ref document: EP Kind code of ref document: A1 |