CN110473544A

CN110473544A - A kind of low-power consumption voice awakening method and device

Info

Publication number: CN110473544A
Application number: CN201910953391.1A
Authority: CN
Inventors: 姚嘉; 高永泽; 任金平; 马琪
Original assignee: HANGZHOU NANOSIC TECHNOLOGY Co Ltd
Current assignee: HANGZHOU NANOSIC TECHNOLOGY Co Ltd
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2019-11-19

Abstract

The invention discloses a kind of low-power consumption voice awakening method and device, specifically: setting acquisition module, judgment module and output module；Acquisition module and judgment module, judgment module and output module are electrically connected respectively, and acquisition module includes two highly sensitive microphones；The step of voice awakening method are as follows: in smart host off working state, real-time monitoring condition sound, when a user speaks, acquisition module acquire the voice messaging that user says, and are input to judgment module；Judgment module is compared the voice messaging of acquisition with preset wake up instruction, judges whether comprising wake up instruction；If sending wake-up signal to output module, output module sends wake-up signal to smart host, wake-up master by hardware interface comprising instruction.Hardware configuration of the present invention is simple, and smart host can be reduced power consumption, saved the energy with deep-sleep when off working state.

Description

A kind of low-power consumption voice awakening method and device

Technical field

The present invention relates to voice technology field more particularly to a kind of low-power consumption voice awakening methods and device.

Background technique

Gradually paid attention to the continuous progress of technology by people with development, the renewable of the energy.In order to reduce to the energy Consumption, a kind of lower standby mode of power consumption or suspend mode can be all arranged in usually many equipment, when user sets without using this When standby, equipment can be entered in standby mode or suspend mode.Different from common operating mode, under this quasi-mode, if Standby only to keep lower power consumption, when user needs using equipment, equipment can be carried out waking up from above-mentioned mode and be entered normally Operating mode in.

Currently, more and more equipment are all equipped with voice Rouser or voice with the rise of voice awakening technology Wake-up module, when user needs to wake up equipment, it is only necessary to issue corresponding voice, be connect by voice Rouser or module The voice messaging is received, and wake operation is carried out to equipment according to voice messaging.However, in practical applications, in order to receive in time The voice of user wakes up equipment, and voice Rouser needs to be always maintained at working condition, and power consumption is larger.In this way, just leading Family apply during carrying out voice wake-up, the equipment actual power loss for being equipped with voice Rouser is larger.

Chinese patent 201510549435.6 provides a kind of voice awakening method and device.This method comprises: believing audio Number carry out periodic samples, wherein obtain sampled signal in ti instance sample；Calculate the audio power of sampled signal；In audio When energy is greater than or equal to the first threshold at ti moment, wakes up DSP and carry out voice activation detection VAD；Fail when VAD is detected, And the difference of continuous n detection failure and the first noise energy and the first threshold at ti moment is greater than before the ti moment When preset first threshold value, second threshold is generated according to the first noise energy, and using second threshold as the of the ti+1 moment One threshold value, wherein the first noise energy is by being extracted with the first extraction yield 1/x to sampled signal, and to extracting Sampled point carries out tracking filter at a slow speed and obtains.The embodiment of the present invention can reduce the number for carrying out VAD, realize terminal noisy The reduction of power consumption under environment.

201910118663 .6 of Chinese patent, disclose a kind of directly waken up with voice is in deep sleep state System a solution.In battery powered system or have in the application system of low-power consumption demand, does not locate in master control system When working condition, needs master control system to be in low-power consumption standby state, reduce the stand-by power consumption electric current of system as far as possible, such as It is that battery power supply system can greatly extend the system standby time；The system that other low-power consumption require in this way, in system standby When reduce influence of the system standby to the other functions of system or performance as far as possible.When they need voice to wake up, need directly System is waken up with voice signal, system is then allowed to enter normal working condition.

That there are technologies is excessively complicated for the above-mentioned prior art, manufacturing cost is high or technology is too simple, is difficult to carry out etc. asks Topic, therefore a kind of low-power consumption voice awakening method and device easy to implement, at low cost are provided, it is to be asked with the technology of solution Topic.

Summary of the invention

In order to solve the problems, such as to realize that above-mentioned voice arousal function power consumption is larger, the present invention provides a kind of low-power consumption voice and calls out Awake method and device, main purpose are to reduce the actual power loss of the non-operating mode of smart machine.

In order to solve the above technical problems, the present invention provides a kind of low-power consumption voice awakening method, specifically:

Acquisition module, judgment module and output module are set；Acquisition module and judgment module, judgment module and output module difference It is electrically connected, acquisition module includes two highly sensitive microphones；

The step of voice awakening method are as follows:

1) in smart host off working state, real-time monitoring condition sound, when a user speaks, acquisition module acquisition user are said Voice messaging, and be input to judgment module；

2) judgment module is compared the voice messaging of acquisition with preset wake up instruction, judges whether comprising wake up instruction；

If 3) send wake-up signal to output module, output module sends wake-up signal by hardware interface comprising instruction To smart host, wake-up master.

Further, acquisition module and judgment module use low-power consumption special digital signal processing chip.

Further, the low-power consumption special digital signal processing chip is dedicated human-computer interaction/audio processing chip, Audio codec IP kernel, the power management module of integrated chip high-performance low-power-consumption, Codec IP be a low-power consumption, flexibly With highly integrated stereo audio codec IP.IP supports stereo ADC and microphone or line to input, stereo DAC with Earphone plays.

Further, the power management module, using the low power design technique based on multi-power domain and multi-clock zone, It ensure that the low-power consumption of SoC chip.

Further, the preset wake up instruction is default wake up instruction or the customized wake up instruction of user.

Further, the specific steps of the customized wake up instruction setting of the user are as follows:

1) installation control APP is installed in smart host；

2) in smart host working condition, voice Rouser is connected to smart host, host side recalls control APP automatically；

3) it according to APP interface prompt, inputs wake up instruction and saves.

Further, after waking up smart host, voice wake up instruction receives smart host software command, continuous collecting and to sound Sound is sent to the further identifying processing of smart host progress after carrying out the processing of the speech enhan-cements such as de-noising.

A kind of low-power consumption voice Rouser of the present invention, for the device for implementing any of the above-described low-power consumption voice awakening method.

Further, the communication of the voice Rouser and smart host is the hardware interface by smart host, hardware Interface supports USB interface, Type-C interface, Lightning interface.

Further, the voice Rouser is powered using smart host hardware interface or internal battery, independent of Any function or ability of the smart host of networking, work offline, and the voice messaging of acquisition does not need to upload to smart host, To protect the privacy of user.

De-noising uses dual microphone noise reduction algorithm, specific steps are as follows:

1) 2 microphones that forward and backward placement is arranged carry out voice collecting；Preposition Mike is main Mike, is mainly responsible for voice The detection of acquisition and spray wheat noise；Postposition Mike is auxiliary Mike, is mainly responsible for the acquisition of spray wheat noise compensation and ambient noise；

2) when voice inputs, preposition Mike and postposition Mike pickup simultaneously obtain time domain speech data T1 and T2 respectively；

3) window adding in frequency domain is carried out to the time domain speech data of preceding microphone and rear microphone respectively and Fourier transformation is handled, Obtain frequency domain speech data F1 and F2:

4) autocorrelation spectrum PSD and coherence spectra CPSD is calculated to the frequency domain speech data of preposition Mike and postposition Mike；

5) autocorrelation spectrum PSD and coherence spectra CPSD operation relevance function are used, for judging the frequency domain of preposition Mike The correlation of the frequency domain speech data of voice data and postposition Mike；

6) signal noise ratio SNR is estimated using relevance function, when correlation is high, the correlation height of makings and postposition Mike, Estimate that the value of signal noise ratio is high；And when relevance function correlation is low, estimation signal noise ratio estimated value is low；And believed with estimation It makes an uproar and calculates gain function than function；

7) gain adjustment is carried out using frequency domain speech data of the gain function to preceding microphone, the preposition Mike after obtaining noise reduction The frequency domain speech data of wind；Frequency domain speech data after noise reduction carry out inverse Fourier transform, when frequency domain speech data are transformed into Domain voice data；Time domain speech data after final output noise reduction；

8) analytical procedure 3) preposition Mike frequency domain speech data F1, as to belong to 20-4000hz frequency range frequency domain, amount big and for it Even zero-decrement type, it is determined that frequency domain speech data F1 belongs to spray wheat noise；It is replaced with the frequency domain speech data F2 of postposition Mike The spray wheat voice data for changing former preposition Mike completes the reparation that wheat noise is sprayed to preposition Mike.

Hardware configuration of the present invention is simple, and smart host can reduce power consumption with deep-sleep when off working state, saves The energy.

Detailed description of the invention

Fig. 1 is Application Example structural schematic diagram of the present invention；

Fig. 2 is low-power consumption voice awakening method work flow diagram；

Fig. 3 is low-power consumption voice Rouser structural framing figure of the present invention.

Specific embodiment

In the following, being made a more thorough explanation with reference to attached drawing to the present invention, shown in the drawings of exemplary implementation of the invention Example.However, the present invention can be presented as a variety of different forms, it is not construed as the exemplary implementation for being confined to describe here Example.And these embodiments are to provide, to keep the present invention full and complete, and it will fully convey the scope of the invention to this The those of ordinary skill in field.

The spatially relative terms such as "upper", "lower" " left side " " right side " can be used herein for ease of explanation, for saying Relationship of the elements or features relative to another elements or features shown in bright figure.It should be understood that in addition in figure Except the orientation shown, spatial terminology is intended to include the different direction of device in use or operation.For example, if in figure Device is squeezed, and is stated as being located at other elements or the element of feature "lower" will be located into other elements or feature "upper".Cause This, exemplary term "lower" may include both upper and lower orientation.Device, which can be positioned in other ways, (to be rotated by 90 ° or is located at Other orientation), it can be interpreted accordingly used herein of the opposite explanation in space.

As shown in Figure 1 and Figure 2, a kind of low-power consumption voice awakening method of the present invention, includes the following steps；

1, in smart host off working state, acquisition module monitoring condition sound, real-time monitoring condition sound is lasting to record, and adopts Collect environmental audio signal.When a user speaks, the voice messaging that acquisition module acquisition user says, and it is input to judgment module.

Ambient environment audio signal is acquired using dual microphone noise-reduction method, to obtain more accurately capable of speech.

In the step, while dual microphone noise reduction algorithm, specific steps are used using de-noising are as follows:

5) autocorrelation spectrum PSD and coherence spectra CPSD operation relevance function are used, for judging the frequency domain speech of preposition Mike The correlation of data and the frequency domain speech data of postposition Mike；

When it is implemented, we prepare 20000 audio files to carry out the neural network of acquisition module in the present invention Training.The elementary audio file of 10000 voice audio files, inhuman sound frequency file as training is extracted respectively, these 20000 audio files are 2-10 seconds audio files.

10000 voice audio files include the metastable voice of audio, comprising the biggish people of the audios such as mood fluctuation Sound.10000 inhuman sound frequency files are sound common in daily life, as field environment sound, noisy urban environment sound, Supermarket's ambient sound, heavy rain ambient sound, bird are ambient sound, congested traffic ambient sound, construction site ambient sound etc..All sounds The sample rate of frequency file is all 16000hz.

2, judgment module is compared the voice messaging of acquisition with preset wake up instruction, judges whether to refer to comprising waking up It enables.

Wake up instruction is human voice signal, therefore, according to preset rules, judges whether audio signal is human voice signal.Wherein, According to preset rules, judge whether audio signal is human voice signal, is included the following steps；

(1) the first audio signal is acquired by 2 microphones of forward and backward placement.

Preceding microphone and rear microphone obtain time domain speech data respectively, sample rate 16000hz, i.e., per second 16000 time domain speech data, we divide 128 voice data of every frame to handle, i.e., take 128 time domain speech data every time Carry out ambient noise noise reduction.It is time domain speech data T1 and T2 that the part, which obtains output, and T1 and T2 are delayed addition respectively Subtract each other with delay and obtain T_ADD and T_SUB, wherein T_ADD enhances for main signal, and T_SUB estimates for reference noise.It is right Audio signal carries out spectrum analysis or extracts acoustic feature, obtains the First Eigenvalue of the first audio signal.Such as.People says Voice signal in can decay by the degree of about 6dB/ frequency multiplication greater than the part of 800Hz, this feature value can be used as identification voice A technical parameter.

(2) sub-frame processing is carried out to the first audio signal, obtains at least three single frames audio signal.

Framing is carried out to the first audio signal.Voice signal is divided into several segments in time-domain, per a bit of A referred to as frame.For the voice signal in each frame, more stable signal can be regarded as.The length of usual frame voice is 10 milliseconds to 30 milliseconds.Window adding in frequency domain and Fourier transformation are carried out to the time domain speech data of the voice signal in each frame respectively Processing obtains frequency domain speech data.Multiplying window selection is Hanning window, to every frame 128 multiplied by Hanning window coefficient, for preventing Only below time-frequency convert when occur spectral aliasing, Fourier transformation is the conversion of time domain data to frequency domain data, and specific algorithm is real We use Fast Fourier Transform (FFT) (FFT) in existing, to reduce the burden of hardware.

(3) the second audio signal is acquired by 2 microphones of forward and backward placement.

Second audio signal is modified the frequency domain gain of the first audio signal.Using gain function to the frequency of microphone Domain voice data carries out gain adjustment, the frequency domain speech data of the main signal microphone after obtaining noise reduction.The input of the part is Main signal frequency domain speech data and gain function, output are the frequency domain speech data after noise reduction.

Above-mentioned preset wake up instruction can be the instruction of default, can also be customized with user.If made by oneself using user The preset wake up instruction specific steps of justice are as follows:

(1) installation control APP is installed in smart host；

(2) in smart host working condition, voice Rouser is connected to smart host, host side recalls control APP automatically；

(3) it according to APP interface prompt, inputs wake up instruction and saves.

If sending wake-up signal to output module, output module is sent by hardware interface and waken up 3, comprising instruction Signal is to smart host, wake-up master.

Whether the frequency spectrum for detecting single frames audio signal frequency spectrum and human voice signal is consistent, if unanimously, audio signal is voice Signal.

The frequency spectrum of above-mentioned single frames signal is compared with the frequency spectrum of human voice signal, whether detects single frames signal spectrum in people Within the spectral range of acoustical signal, if audio signal comprising the single frames signal is voice within the scope of this.Such as: voice Frequency spectrum is 6-10, when the single frames audio signal frequency spectrum of detection is 9, it is determined that corresponding voice signal is human voice signal.

In the present invention, acquisition module and judgment module use low-power consumption special digital signal processing chip.The low-power consumption is special It is dedicated human-computer interaction/audio processing chip, the Audio of integrated chip high-performance low-power-consumption with digital signal processing chip Codec IP kernel, power management module, Codec IP are a low-power consumption, flexible and highly integrated stereo audio encoding/decoding Device IP.Stereo audio codec IP supports stereo ADC and microphone or line to input, and stereo DAC and earphone play.Institute The power management module stated ensure that SoC chip using the low power design technique based on multi-power domain and multi-clock zone Low-power consumption.

As shown in figure 3, a kind of low-power consumption voice Rouser for implementing above-mentioned low-power consumption voice awakening method, including comprising Acquisition module 300, judgment module 310 and output module 320；Acquisition module and judgment module, judgment module and output module point It is not electrically connected, acquisition module includes two highly sensitive microphones.Low-power consumption voice Rouser and smart host of the present invention Communication is the hardware interface by smart host, and interface supports USB interface, Type-C interface, the types such as Lightning interface. Further, after waking up smart host, voice wake up instruction receives smart host software command, and continuous collecting simultaneously disappears to sound It makes an uproar after equal speech enhan-cements are handled and is sent to the further identifying processing of smart host progress.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.

Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims

1. a kind of low-power consumption voice awakening method, which is characterized in that this method specifically:

The step of voice awakening method are as follows:

In smart host off working state, real-time monitoring condition sound, when a user speaks, acquisition module acquisition user are said Voice messaging, and it is input to judgment module；

Judgment module is compared the voice messaging of acquisition with preset wake up instruction, judges whether comprising wake up instruction；

If comprising instruction, to output module send wake-up signal, output module by hardware interface transmission wake-up signal to Smart host, wake-up master.

2. low-power consumption voice awakening method as described in claim 1, which is characterized in that acquisition module and judgment module are using low Power consumption special digital signal processing chip.

3. low-power consumption voice awakening method as claimed in claim 2, which is characterized in that at the low-power consumption special digital signal Chip is managed, is dedicated human-computer interaction/audio processing chip, the Audio codec IP kernel of integrated chip high-performance low-power-consumption, Power management module, Audio codec IP are a low-power consumption, flexible and highly integrated stereo audio codec IP； The body audio codecs IP supports stereo ADC and microphone or line to input, and stereo DAC and earphone play.

4. low-power consumption voice awakening method as claimed in claim 3, which is characterized in that the power management module uses Low power design technique based on multi-power domain and multi-clock zone, ensure that the low-power consumption of SoC chip.

5. low-power consumption voice awakening method as described in claim 1, which is characterized in that the preset wake up instruction is silent Recognize wake up instruction or the customized wake up instruction of user.

6. low-power consumption voice awakening method as claimed in claim 5, which is characterized in that the customized wake up instruction of user is set The specific steps set are as follows:

Installation control APP is installed in smart host；

In smart host working condition, voice Rouser is connected to smart host, host side recalls control APP automatically；

According to APP interface prompt, inputs wake up instruction and save.

7. low-power consumption voice awakening method as described in claim 1, which is characterized in that after waking up smart host, voice is waken up Instruction receives smart host software command, and continuous collecting simultaneously carries out sound to be sent to intelligent master after the speech enhan-cements such as de-noising are handled Machine carries out further identifying processing.

8. a kind of low-power consumption voice Rouser, which is characterized in that called out to implement any low-power consumption voice of the claims 1-7 The device for method of waking up.

9. low-power consumption voice Rouser as claimed in claim 8, which is characterized in that the voice Rouser and intelligence are main The communication of machine is the hardware interface by smart host, and hardware interface supports USB interface, Type-C interface, Lightning to connect Mouthful.

10. low-power consumption voice Rouser as claimed in claim 8, which is characterized in that the voice Rouser uses intelligence Energy host hardware interface any function or ability that perhaps internal battery is powered independent of the smart host of networking, offline Work, the voice messaging of acquisition does not need to upload to smart host, to protect the privacy of user.