CN111435593A - Voice wake-up device and method - Google Patents

Voice wake-up device and method Download PDF

Info

Publication number
CN111435593A
CN111435593A CN201910031378.0A CN201910031378A CN111435593A CN 111435593 A CN111435593 A CN 111435593A CN 201910031378 A CN201910031378 A CN 201910031378A CN 111435593 A CN111435593 A CN 111435593A
Authority
CN
China
Prior art keywords
voice
detection
syllable
wake
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910031378.0A
Other languages
Chinese (zh)
Other versions
CN111435593B (en
Inventor
王及德
黄文昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Realtek Semiconductor Corp
Original Assignee
Realtek Semiconductor Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Realtek Semiconductor Corp filed Critical Realtek Semiconductor Corp
Priority to CN201910031378.0A priority Critical patent/CN111435593B/en
Publication of CN111435593A publication Critical patent/CN111435593A/en
Application granted granted Critical
Publication of CN111435593B publication Critical patent/CN111435593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/027Syllables being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electric Clocks (AREA)

Abstract

A voice wake-up device and a voice wake-up method are provided, the voice wake-up device is applied in an electronic device, and comprises: voice activity detection circuitry, memory circuit and intellectual detection system circuit. The voice activity detection circuit receives the voice input signal and detects voice activity in the voice input signal. The storage circuit is configured to store a preset voice sample. The intelligent detection circuit receives the voice input signal, performs time domain detection and frequency domain detection on voice activity to generate syllable and audio characteristic detection results, and further compares the syllable and audio characteristic detection results with a preset voice sample to generate a wake-up signal to a processing circuit of the electronic device when the syllable and audio characteristic detection results are consistent with the preset voice sample, so as to wake up the processing circuit. Therefore, the intelligent detection circuit can reduce the probability of mistakenly waking up the processing circuit and reduce the average power consumption of the whole voice wake-up device so as to achieve the true standby state.

Description

Voice wake-up device and method
Technical Field
The present invention relates to voice wake-up technology, and more particularly, to a voice wake-up apparatus and method.
Background
In recent years, due to the development of technology, a user can control an electronic device by voice, for example, the user can wake up the electronic device by voice. Usually, the voice wake-up mechanism is triggered by a specific voice command. In the existing technology, a voice receiving module is often only used for judging whether voice information is received. Whether the voice belongs to the command or not still depends on the processor in the electronic device to judge. However, in such a situation, the processor will always need to make a determination and cannot enter a real standby state, which has a considerable influence on the overall power consumption of the electronic device.
Therefore, how to design a new voice wake-up apparatus and method to solve the above-mentioned drawbacks is an urgent problem to be solved in the art.
Disclosure of Invention
This summary is intended to provide a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and is intended to neither identify key/critical elements of the embodiments nor delineate the scope of the embodiments.
It is an object of the present invention to provide a voice wake-up apparatus and method, which can improve the problems of the prior art.
To achieve the above object, one technical embodiment of the present invention relates to a voice wake-up apparatus applied in an electronic device, including: voice activity detection circuitry, memory circuit and intellectual detection system circuit. The voice activity detection circuit is configured to receive the voice input signal and detect a voice activity in the voice input signal. The storage circuit is configured to store a preset voice sample. The intelligent detection circuit is configured to receive the voice input signal, perform time domain detection and frequency domain detection on the voice activity to generate syllable and audio frequency characteristic detection results, and further compare the syllable and audio frequency characteristic detection results with a preset voice sample to generate a wake-up signal to a processing circuit of the electronic device when the syllable and audio frequency characteristic detection results are consistent with the preset voice sample, so as to wake up the processing circuit.
To achieve the above object, another technical embodiment of the present invention relates to a voice wake-up method applied in a voice wake-up device of an electronic device, including: receiving, by a voice activity detection circuit, a voice input signal and detecting voice activity in the voice input signal; receiving a voice input signal through an intelligent detection circuit, and carrying out time domain detection and frequency domain detection aiming at voice activity to generate syllable and audio characteristic detection results; comparing syllable and audio characteristic detection results with preset voice samples stored in a storage unit through an intelligent detection circuit; and generating a wake-up signal to a processing circuit of the electronic device through the intelligent detection circuit when the syllable and audio characteristic detection result is in accordance with the preset voice sample, so as to wake up the processing circuit.
The voice awakening device and the voice awakening method can quickly identify the number of syllables, the vowels and the consonants in voice activity through time domain and frequency domain detection, compare the syllable number with the preset voice sample to judge whether the syllable number accords with the awakening instruction or not, and further awaken a processing circuit of the electronic device when the syllable number accords with the preset voice sample. Therefore, the processing circuit can be identified without being awakened when voice activity is generated, and the power consumption of the electronic device is greatly reduced. The intelligent detection circuit can reduce the probability of mistakenly waking up the processing circuit and reduce the average power of the whole voice wake-up device so as to achieve the true standby state (for example, less than 0.5 watt).
Drawings
In order to make the aforementioned and other objects, features, and advantages of the invention, as well as others which will become apparent, reference is made to the following description of the preferred embodiments of the invention in which:
FIG. 1A is a block diagram of an electronic device according to an embodiment of the invention;
FIG. 1B is a diagram illustrating an audio input signal according to an embodiment of the present invention;
FIG. 2 is a more detailed block diagram of the smart detection circuit in accordance with an embodiment of the present invention;
FIG. 3A is a block diagram of a time domain detection circuit according to an embodiment of the present invention;
FIG. 3B is a diagram illustrating a waveform processed by the time domain detection circuit according to an embodiment of the present invention;
FIG. 3C is a block diagram of a time domain detection circuit according to an embodiment of the present invention;
FIG. 3D is a diagram illustrating a waveform processed by the time domain detection circuit according to an embodiment of the present invention;
FIG. 4A is a block diagram of a frequency domain detection circuit according to an embodiment of the present invention;
FIG. 4B is a diagram illustrating a frequency band energy distribution processed by the frequency domain detection circuit according to an embodiment of the present invention;
FIG. 4C is a block diagram of a frequency domain detection circuit according to an embodiment of the present invention;
FIG. 5 is a more detailed block diagram of the decision circuit according to one embodiment of the present invention; and
fig. 6 is a flowchart of a voice wake-up method according to an embodiment of the invention.
Description of the symbols
1: the electronic device 100: processing circuit
101: sound input signal 103: voice activity
110: the voice wake-up device 111: preset speech samples
112: voice activity detection circuit 113: wake-up signal
114: the storage circuit 116: intelligent detection circuit
200: the time domain detection circuit 201: time domain syllable detection results
202: the frequency domain detection circuit 204: decision circuit
203: frequency domain syllable and audio feature detection 300: down sampling unit
As a result, 301: wave form
302: subframe division unit 304: moving average filter
306: high-pass filter 308: moving average filter
310: the detection unit 320: down sampling unit
321: waveform 322: autocorrelation calculating unit
324: accumulator 326: detection unit
400: down-sampling unit 401: distribution of band energy
402: the filter 404: subframe division unit
406: the first maximum value acquisition unit 408: second maximum value acquisition unit
420: the down-sampling unit 422: fast Fourier transform arithmetic unit
500: the comparator 501: time domain comparison result
502: the comparator 503: frequency domain comparison result
504: the weighting unit 505: weighted sum
506: the weighting unit 508: summation operation unit
510: determination unit 600: voice awakening method
601-606: steps W1, W2: weight of
Detailed Description
Please refer to fig. 1A. Fig. 1A is a block diagram of an electronic device 1 according to an embodiment of the invention. The electronic device 1 may be, for example, but not limited to, a television, a display, a desktop computer, a notebook computer, or a mobile device such as a smartphone or a tablet computer. The electronic device 1 comprises a processing circuit 100 and a voice wake-up apparatus 110.
The processing circuit 100 is electrically coupled to the voice wake-up device 110 and other circuit modules that can be disposed in the electronic device 1, such as but not limited to a communication circuit, a display circuit, a power circuit, etc. (not shown), and is configured to process and control information related to these circuits in an operating state. In one embodiment, the processing circuit 100 will be substantially non-operational when entering, for example, a sleep or standby state, and has a relatively low power consumption (e.g., less than 0.5 watts).
The voice wake-up device 110 is configured to receive the sound input signal 101, to detect whether the sound input signal 101 has a predetermined wake-up command, and to wake up the processing circuit 100 when the sound input signal 101 has the predetermined wake-up command, so as to restore the processing circuit 100 from the sleep state or the standby state to the operating state.
The voice wake-up apparatus 110 includes: voice activity detection circuitry 112, storage circuitry 114, and smart detection circuitry 116.
The voice activity detection circuit 112 is configured to receive the acoustic input signal 101 and perform detection of speech.
Please refer to fig. 1B. Fig. 1B is a schematic diagram of an audio input signal 101 according to an embodiment of the invention. In fig. 1B, the horizontal axis represents time, and the vertical axis represents the amplitude of the audio signal.
In one embodiment, the sound input signal 101 may include both environmental sound and speech. The voice activity detection circuit 112 will detect voice activity 103 for a period of time via a specific algorithm based on the voice input signal 101. For example, the voice activity detection circuit 112 may determine whether there is a region of voice activity 103 by, for example, but not limited to, steps such as noise reduction by spectral subtraction (spectral subtraction), feature extraction of a region of the voice signal, and classification of a calculated value of the region with a predetermined threshold. However, the above steps are only an embodiment, and the detection manner of the voice activity detection circuit 112 of the present invention is not limited thereto.
The storage circuit 114 is configured to store preset speech samples 111. The predetermined voice sample 111 may be a sample defined by a user or a sample generated by offline learning (offline training), and the sample corresponds to the content of the wake-up command. For example, when the wake-up command is "OK Google", this sample will be the speech content of "OK Google", including, for example, but not limited to, the number of syllables and the manner in which the vowels and consonants are uttered.
The smart detection circuit 116 is configured to receive the audio input signal 101, perform time domain detection and frequency domain detection on the voice activity 103, and generate syllable and audio feature detection results. In one embodiment, the smart detection circuit 116 may be driven to start detecting after the voice activity detection circuit 112 detects the voice activity 103 due to the voice activity 103 received from the voice activity detection circuit 112.
In another embodiment, the smart detection circuit 116 may also start detecting when the voice activity detection circuit 112 receives the voice input signal 101 and also starts detecting when receiving the voice input signal 101.
Further, the smart detection circuit 116 obtains the predetermined speech samples 111 from the storage circuit 114 for comparison after generating the syllable and audio feature detection result. When the syllable and audio feature detection result matches the predetermined speech sample 111, the smart detection circuit 116 generates a wake-up signal 113 to the processing circuit 100, so as to wake up the processing circuit 100.
The structure and operation of the smart detection circuit 116 will be described in more detail below with reference to fig. 2.
Please refer to fig. 2. Fig. 2 is a more detailed block diagram of the smart detection circuit 116 according to an embodiment of the present invention. In one embodiment, the smart detection circuit 116 further includes a time domain detection circuit 200, a frequency domain detection circuit 202, and a decision circuit 204.
The time domain detection circuit 200 is configured to receive the sound input signal 101, detect at least one time domain energy peak in the time domain for the voice activity 103, and generate a time domain syllable detection result 201 according to the time domain energy peak. In various embodiments, the time domain detection performed by the time domain detection circuit 200 may be, for example, but not limited to, energy calculation detection (powercalculation), zero-crossing detection (zero-crossing detection), syllabic detection (syllabic detection), or delayed auto-correlation detection (delay auto-correlation).
Please refer to fig. 3A and fig. 3B simultaneously. Fig. 3A is a block diagram of a time domain detection circuit 200 according to an embodiment of the invention. Fig. 3B is a schematic diagram of a waveform 301 processed by the time domain detection circuit 200 according to an embodiment of the invention.
In one embodiment, as shown in fig. 3A, the time domain detection circuit 200 may be implemented by a syllable detection circuit, and may include a down-sampling unit 300, a sub-frame division unit 302, a moving average filter 304, a high-pass filter 306, a moving average filter 308, and a detection unit 310 to perform down-sampling, search and division of sub-frames, waveform reformation for smoothing the waveform, high-pass filtering, and waveform reformation for smoothing the waveform again, respectively, to generate the final waveform 301 shown in fig. 3B. In fig. 3B, the horizontal axis represents time, and the vertical axis represents energy intensity. Further, the detecting unit 310 sets a predetermined threshold for the waveform 301, finds out the energy peak exceeding the predetermined threshold, and accordingly determines the number of syllables to generate the time-domain syllable detecting result 201. In this embodiment, since the wake-up command is "OK Google", four syllables can be detected.
Please refer to fig. 3C and fig. 3D simultaneously. Fig. 3C is a block diagram of the time domain detection circuit 200 according to an embodiment of the invention. Fig. 3D is a diagram illustrating a waveform 321 processed by the time domain detection circuit 200 according to an embodiment of the invention.
In another embodiment, as shown in fig. 3C, the time domain detection circuit 200 may be implemented by a delayed autocorrelation detection circuit, and may include a down-sampling unit 320, an autocorrelation unit 322, an accumulator 324 and a detection unit 326 for performing down-sampling, autocorrelation and accumulation operations, respectively, to generate a final waveform 321 as shown in fig. 3D. In fig. 3D, the horizontal axis represents time, and the vertical axis represents energy intensity. Further, the detecting unit 326 calculates the energy peak number of the waveform 321 to determine the number of syllables, and generates the time-domain syllable detecting result 201. In this embodiment, since the wake-up command is "OKGoolle", four syllables can be detected.
The frequency domain detection circuit 202 is configured to receive the audio input signal 101, detect at least one frequency domain energy peak in a frequency domain for the voice activity 103, and generate a frequency domain syllable and audio feature detection result 203 according to the frequency domain energy peak. In various embodiments, the frequency domain detection performed by the frequency domain detection circuit 202 can be, for example, but not limited to, a filter bank (filterbank) filter detection or a Fast Fourier Transform (FFT) filter detection.
Please refer to fig. 4A and fig. 4B simultaneously. Fig. 4A is a block diagram of a frequency domain detection circuit 202 according to an embodiment of the invention. Fig. 4B is a schematic diagram of the band energy distribution 401 processed by the frequency domain detection circuit 202 according to an embodiment of the invention.
As shown in fig. 4A, in an embodiment, the frequency domain detection circuit 202 may be implemented by a filter bank circuit, and may include a down-sampling unit 400, a plurality of sets of filters 402 corresponding to different frequency bands and covering a range from about 50 hz to about 1 khz, a subframe dividing unit 404 corresponding to each filter 402, a first maximum value obtaining unit 406 and a second maximum value obtaining unit 408 corresponding to each subframe dividing unit 404 for performing down-sampling, band filtering, search and division of subframes, and energy maximum value obtaining for each frequency band, respectively, to generate a band energy distribution 401 as shown in fig. 4B. In fig. 4B, the horizontal axis corresponds to the numbers of the plurality of filters 402, and the vertical axis corresponds to the energy intensity maximum value.
Further, the second maximum value obtaining unit 406 obtains the maximum value obtained by the first maximum value obtaining unit 404 to determine the energy peak in the frequency domain, so as to determine the number of syllables.
In one embodiment, the vowels in speech sound exhibit specific harmonics, while the consonants do not. Therefore, according to the harmonic features of the partial frequency bands, the second maximum value obtaining unit 406 can also detect the existence of the vowels and the consonants, thereby generating the frequency-domain syllable and audio feature detection result 203.
Please refer to fig. 4C. Fig. 4C is a block diagram of the frequency domain detection circuit 202 according to an embodiment of the invention.
As shown in fig. 4C, in an embodiment, the frequency domain detection circuit 202 can be implemented by a fft filter circuit, and can include a down-sampling unit 420 and a fft operation unit 422 for performing down-sampling and fft respectively to generate a spectrum analysis graph, and further find energy peaks in different frequency bands to determine the number of syllables.
Further, the vowels in speech sound will exhibit specific harmonics, while the consonants will not. Therefore, according to the harmonic features of the partial frequency band, the existence of the vowel and the consonant can be detected from the operation result of the fft operation unit 422, and the frequency domain syllable and audio feature detection result 203 can be generated.
The decision circuit 204 compares the time-domain syllable detection result 201 and the frequency-domain syllable and audio feature detection result 203 with the predetermined speech sample 111.
Please refer to fig. 5. FIG. 5 is a more detailed block diagram of the decision circuit 204 according to an embodiment of the present invention.
In the present embodiment, the determining circuit 204 includes a comparator 500, a comparator 502, a weighting unit 504, a weighting unit 506, a sum operation unit 508, and a determining unit 510.
The comparator 500 is configured to compare the time-domain syllable detection result 201 with the predetermined speech sample 111 to generate a time-domain comparison result 501. In one embodiment, the time domain comparison result 501 may be generated, for example, but not limited to, in a fractional manner, and weighted by the weighting unit 504 according to the weight W1.
The comparator 502 is configured to compare the frequency-domain syllable and audio feature detection result 203 with the default speech sample 111 to generate a frequency-domain comparison result 503. In one embodiment, the frequency domain comparison result 503 may be generated, for example, but not limited to, in a fractional manner, and weighted by the weighting unit 506 according to the weight W2.
The summation unit 508 further sums the weighted results of the weighting units 504 and 506 to generate a weighted sum 505. The determining unit 510 determines whether the weighted sum 505 matches a predetermined range corresponding to the predetermined voice sample, so as to determine that the detection result of the syllable and the audio frequency feature including the time domain and the frequency domain matches the predetermined voice sample 111 and generate the wake-up signal 113 when the weighted sum matches the predetermined range (e.g., the difference is within plus or minus 20% of the predetermined voice sample).
Therefore, the voice wake-up device 110 of the present invention can rapidly identify the number of syllables, the vowels, and the consonants in the voice activity through the time domain and the frequency domain detection, and compare the number with the preset voice sample 111 to determine whether the voice activity conforms to the wake-up command, so as to further wake up the processing circuit 100 of the electronic device 1 when the voice activity conforms to the wake-up command. Therefore, the processing circuit 100 can recognize without being awakened when voice activity occurs, and the power consumption of the electronic device 1 is greatly reduced.
Fig. 6 is a flowchart of a voice wake-up method 600 according to an embodiment of the invention. The voice wake-up method 600 can be applied to the voice wake-up apparatus 110 of fig. 1A.
The voice wake-up method 600 comprises the following steps (it should be understood that the steps mentioned in the present embodiment, except the sequence specifically mentioned, can be performed simultaneously or partially simultaneously according to the actual requirement.
In step 601, the voice input signal 101 is received by the voice activity detection circuit 112 and the voice activity 103 in the voice input signal 101 is detected.
In step 602, the voice input signal 101 is received by the smart detection circuit 116 for performing time domain detection and frequency domain detection on the voice activity 103 to generate syllable and audio feature detection results.
In step 603, the syllable and audio feature detection result is compared with the predetermined speech sample 111 stored in the storage unit 114 through the smart detection circuit 116.
In step 604, the intelligent detection circuit 116 determines whether the syllable and audio feature detection result matches the predetermined voice sample 111.
When the syllable and audio feature detection result does not match the predetermined speech sample 111, the smart detection circuit 116 does not generate the wake-up signal 113 in step 605.
When the syllable and audio feature detection result matches the predetermined voice sample 111, in step 606, the intelligent detection circuit 116 generates a wake-up signal 113 to the processing circuit 100 of the electronic device 1, so as to wake up the processing circuit 100.
Although the foregoing embodiments have been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims (10)

1. A voice wake-up device applied in an electronic device comprises:
a voice activity detection circuit configured to receive an audio input signal and detect a voice activity in the audio input signal;
a storage circuit configured to store a predetermined speech sample; and
an intelligent detection circuit configured to receive the voice input signal, perform a time domain detection and a frequency domain detection on the voice activity to generate a syllable and audio feature detection result, and further compare the syllable and audio feature detection result with the predetermined voice sample to generate a wake-up signal to a processing circuit of the electronic device when the syllable and audio feature detection result matches the predetermined voice sample, thereby waking up the processing circuit.
2. The voice wake-up device of claim 1, wherein the smart detection circuit further comprises:
a time domain detection circuit configured to receive the voice input signal, detect at least one time domain energy peak in a time domain for the voice activity, and generate a time domain syllable detection result according to the at least one time domain energy peak;
a frequency domain detection circuit configured to receive the voice input signal, detect at least one frequency domain energy peak and a harmonic feature in a frequency domain for the voice activity, and generate a frequency domain syllable and audio feature detection result according to the at least one frequency domain energy peak and the harmonic feature; and
a decision circuit for comparing the time-domain syllable detection result and the frequency-domain syllable and audio feature detection result with the predetermined voice sample, respectively, to generate the wake-up signal when the time-domain syllable detection result and the frequency-domain syllable and audio feature detection result are consistent with the predetermined voice sample.
3. The voice wake-up apparatus of claim 2 wherein the decision circuit weights a time domain comparison of the time domain syllable detection result with the predetermined voice sample and a frequency domain comparison of the frequency domain syllable and audio feature detection result with the predetermined voice sample to generate a weighted sum, and generates the wake-up signal when the weighted sum matches a predetermined range corresponding to the predetermined voice sample.
4. The voice wakeup device according to claim 1, wherein the time domain detection is an energy calculation detection, a zero crossing detection, a syllable detection, or a delayed autocorrelation detection.
5. The voice wake-up device of claim 1 wherein the frequency domain detection is a filter bank filter detection or a fast fourier transform filter detection.
6. The voice wake-up device of claim 1 wherein the predetermined voice sample is a user defined sample or an off-line learning sample.
7. The voice wake-up device of claim 1, wherein the smart detection circuit is driven as a result of receiving the voice activity from the voice activity detection circuit.
8. The voice wake-up device of claim 1, wherein the smart detection circuit is driven simultaneously with the voice activity detection circuit upon receiving the voice input signal.
9. A voice wake-up method applied to a voice wake-up device of an electronic device includes:
receiving a voice input signal through a voice activity detection circuit and detecting a voice activity in the voice input signal;
receiving the voice input signal through an intelligent detection circuit, and performing time domain detection and frequency domain detection on the voice activity to generate a syllable and audio characteristic detection result;
comparing the syllable and audio frequency characteristic detection result with a preset voice sample stored by a storage circuit through the intelligent detection circuit; and
when the syllable and audio frequency feature detection result is in accordance with the preset voice sample, the intelligent detection circuit generates a wake-up signal to a processing circuit of the electronic device, so as to wake up the processing circuit.
10. The voice wake-up method of claim 9 further comprising:
receiving the voice input signal through a time domain detection circuit of the intelligent detection circuit to detect at least one time domain energy peak in a time domain for the voice activity, and generating a time domain syllable detection result according to the at least one time domain energy peak;
receiving the voice input signal through a frequency domain detection circuit of the intelligent detection circuit to detect at least one frequency domain energy wave peak and a harmonic feature on a frequency domain for the voice activity, and generating a frequency domain syllable and audio feature detection result according to the at least one frequency domain energy wave peak and the harmonic feature; and
comparing the time-domain syllable detection result and the frequency-domain syllable and audio frequency characteristic detection result with the preset voice sample through a decision circuit of the intelligent detection circuit, so as to generate the wake-up signal when the time-domain syllable detection result and the frequency-domain syllable and audio frequency characteristic detection result are consistent with the preset voice sample.
CN201910031378.0A 2019-01-14 2019-01-14 Voice wake-up device and method Active CN111435593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910031378.0A CN111435593B (en) 2019-01-14 2019-01-14 Voice wake-up device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910031378.0A CN111435593B (en) 2019-01-14 2019-01-14 Voice wake-up device and method

Publications (2)

Publication Number Publication Date
CN111435593A true CN111435593A (en) 2020-07-21
CN111435593B CN111435593B (en) 2023-08-01

Family

ID=71579911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910031378.0A Active CN111435593B (en) 2019-01-14 2019-01-14 Voice wake-up device and method

Country Status (1)

Country Link
CN (1) CN111435593B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013188007A1 (en) * 2012-06-15 2013-12-19 Spansion Llc Power-efficient voice activation
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
CN105869655A (en) * 2015-02-06 2016-08-17 美商富迪科技股份有限公司 Audio device and method for voice detection
CN106782554A (en) * 2016-12-19 2017-05-31 百度在线网络技术(北京)有限公司 Voice awakening method and device based on artificial intelligence
US20170162205A1 (en) * 2015-12-07 2017-06-08 Semiconductor Components Industries, Llc Method and apparatus for a low power voice trigger device
CN107919133A (en) * 2016-10-09 2018-04-17 赛谛听股份有限公司 For the speech-enhancement system and sound enhancement method of destination object
CN109119082A (en) * 2018-10-22 2019-01-01 深圳锐越微技术有限公司 Voice wake-up circuit and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013188007A1 (en) * 2012-06-15 2013-12-19 Spansion Llc Power-efficient voice activation
US20130339028A1 (en) * 2012-06-15 2013-12-19 Spansion Llc Power-Efficient Voice Activation
CN105869655A (en) * 2015-02-06 2016-08-17 美商富迪科技股份有限公司 Audio device and method for voice detection
US20170162205A1 (en) * 2015-12-07 2017-06-08 Semiconductor Components Industries, Llc Method and apparatus for a low power voice trigger device
CN105741838A (en) * 2016-01-20 2016-07-06 百度在线网络技术(北京)有限公司 Voice wakeup method and voice wakeup device
CN107919133A (en) * 2016-10-09 2018-04-17 赛谛听股份有限公司 For the speech-enhancement system and sound enhancement method of destination object
CN106782554A (en) * 2016-12-19 2017-05-31 百度在线网络技术(北京)有限公司 Voice awakening method and device based on artificial intelligence
CN109119082A (en) * 2018-10-22 2019-01-01 深圳锐越微技术有限公司 Voice wake-up circuit and electronic equipment

Also Published As

Publication number Publication date
CN111435593B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN107767863B (en) Voice awakening method and system and intelligent terminal
US9775113B2 (en) Voice wakeup detecting device with digital microphone and associated method
US20220215853A1 (en) Audio signal processing method, model training method, and related apparatus
CN110021307B (en) Audio verification method and device, storage medium and electronic equipment
KR102288928B1 (en) Voice activity detection using vocal tract area information
CN110232933B (en) Audio detection method and device, storage medium and electronic equipment
US20160135047A1 (en) User terminal and method for unlocking same
JP2016128935A (en) Speech syllable/vowel/phone boundary detection using auditory attention cues
US8046215B2 (en) Method and apparatus to detect voice activity by adding a random signal
WO2015047517A1 (en) Keyword detection
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
EP3989217A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN110223687B (en) Instruction execution method and device, storage medium and electronic equipment
CN108682432B (en) Speech emotion recognition device
TWI684912B (en) Voice wake-up apparatus and method thereof
CN108847218B (en) Self-adaptive threshold setting voice endpoint detection method, equipment and readable storage medium
CN110689887A (en) Audio verification method and device, storage medium and electronic equipment
GB2576960A (en) Speaker recognition
WO2003065352A1 (en) Method and apparatus for speech detection using time-frequency variance
US10236000B2 (en) Circuit and method for speech recognition
CN113053377A (en) Voice wake-up method and device, computer readable storage medium and electronic equipment
CN111435593B (en) Voice wake-up device and method
Loweimi et al. On the usefulness of the speech phase spectrum for pitch extraction
JP7152112B2 (en) Signal processing device, signal processing method and signal processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant