CN109272989A

CN109272989A - Voice awakening method, device and computer readable storage medium

Info

Publication number: CN109272989A
Application number: CN201810992991.4A
Authority: CN
Inventors: 徐晴晴; 陈宇; 杨楠; 耿岭
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2019-01-25
Anticipated expiration: 2038-08-29
Also published as: CN109272989B

Abstract

This disclosure relates to which a kind of voice awakening method, device and computer readable storage medium, are related to field of computer technology.Disclosed method includes: that voice signal is carried out Wave beam forming in scheduled multiple directions, obtains multiple wave beams；By wave beam input keyword identification model trained in advance, the probability comprising keyword of wave beam is obtained；Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam；According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.The disclosure does not use existing sound localization method and voice to wake up process, beamforming algorithm is decoupled with auditory localization algorithm, it to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system and wakes up accuracy rate, promote user experience.

Description

Voice awakening method, device and computer readable storage medium

Technical field

This disclosure relates to field of computer technology, in particular to a kind of voice awakening method, device and computer-readable deposit Storage media.

Background technique

With the development of computer technology, the demand that the mankind exchange with machine information is more more and more urgent.Voice is as the mankind One of most natural interactive mode, also become it is desirable to substitute the most important mode of mouse-keyboard and computer exchange it One.And as the growth requirement of the intelligent terminals such as smart home, intelligent vehicle, intelligent meeting system is more more and more urgent, as intelligence It is receive more and more attention that the intelligent sound of energy terminal entry wakes up systems technology.

It will receive interference that ambient enviroment and communication media introduce (such as echo, reverberation and interference in voice communication course Sound source etc.) so that computer sharply declines the comprehension of voice.Since noise jamming always comes from from all directions, using list A microphone, which captures clean speech, becomes extremely difficult.Voice wakes up system and is based primarily upon microphone array method at present, will be more A microphone acquisition voice carries out time-space domain processing, to achieve the purpose that inhibit noise, speech enhan-cement.

Voice awakening method known for inventor generallys include following steps: voice signal is acquired by microphone array, Voice signal is pre-processed, the angle and orientation of sound source are determined by auditory localization and tracer technique, using Wave beam forming Technology generates the wave beam for being directed toward sound source angle and orientation, and the multi-beam transmission of formation is recognized to speech recognition system, determines Whether system is waken up.

Summary of the invention

Inventor's discovery: current auditory localization can generally be divided into three classes by positioning principle: be based on maximum work output The steerable beam of rate forms technology, based on reaching time-difference technology and based on the positioning of High-Resolution Spectral Estimation.These three types of sound sources Location algorithm performance under the serious environment of reverberation and noise jamming sharply declines, and angle and the side of sound source can not be accurately positioned Position, and then subsequent speech recognition is directly affected, influence the result of voice wake-up.

A disclosure technical problem to be solved is: how to improve the accuracy rate of voice wake-up, promotes user experience.

According to some embodiments of the present disclosure, a kind of voice awakening method for providing, comprising: by voice signal scheduled Wave beam forming is carried out in multiple directions, obtains multiple wave beams；By wave beam input keyword identification model trained in advance, wave is obtained The probability comprising keyword of beam；Include the probability of keyword and the signal quality of wave beam according to wave beam, determines and be directed toward sound source side To wave beam, as sound source wave beam；According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.

In some embodiments, wave beam is inputted keyword identification model trained in advance includes: the signal according to wave beam Quality, selected part wave beam input keyword identification model trained in advance.

In some embodiments, according to the signal quality of wave beam, selected part wave beam includes: according to wave beam in the set time At least one of energy and signal-to-noise ratio in window determine the signal quality of wave beam；It chooses signal quality and is higher than signal quality threshold Part wave beam.

In some embodiments, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source It includes that the probability of keyword and the signal quality of wave beam are weighted and ask that the wave beam in direction, which includes: by wave beam as sound source wave beam, With obtain the significance level of wave beam；The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is true It is set to Sounnd source direction.

In some embodiments, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up System includes: to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determine continuous multiple moment Whether sound source wave beam includes keyword；It is consistent and continuous in the Sounnd source direction of the sound source beam position at continuous multiple moment In the case that the sound source wave beam at multiple moment includes keyword, system is waken up.

In some embodiments, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams It include: that microphone is determined according to the direction of source noise, the ratio of point source noise and white noise and the direction vector of predetermined direction Weight of each road voice signal received relative to the predetermined direction；Each road voice signal received according to microphone is opposite In the weight of the predetermined direction, summation is weighted to each road voice signal that microphone receives, determines the predetermined direction Wave beam.

In some embodiments, each road voice signal that microphone receives relative to the predetermined direction weight according to Lower formula calculates:

Wherein, W_m(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process Microphone array be directed toward vector,ForConjugate transposition, α_psnIt is dry for preset bearing point source in noise Disturb the ratio of noise, 1- α_psnFor the ratio of white noise in noise,For predetermined party site in m-th of wave beam treatment process The direction vector of source interference noise,ForConjugate transposition.

In some embodiments, this method further include: voice signal is subjected to Wave beam forming in scheduled multiple directions Process obtains multiple wave beams；Keyword mark is carried out to multiple wave beams, as training wave beam；Training wave beam is inputted into keyword Identification model is trained, with the keyword identification model trained in advance.

In some embodiments, it is also wrapped before voice signal is carried out beam forming process in scheduled multiple directions It includes: echo cancellor will be carried out by the received voice signal of microphone.

In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.

According to other embodiments of the disclosure, a kind of voice Rouser for providing, comprising: Wave beam forming module is used In voice signal is carried out Wave beam forming in scheduled multiple directions, multiple wave beams are obtained；Keyword identification module, being used for will Wave beam input keyword identification model trained in advance, obtains the probability comprising keyword of wave beam；Sound source determining module, is used for Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam； Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system.

In some embodiments, device further include: beam selection module is chosen for the signal quality according to wave beam Part wave beam is sent to keyword identification module, so that received wave beam is inputted pass trained in advance by keyword identification module Keyword identification model.

In some embodiments, beam selection module is used for according to wave beam in the energy and signal-to-noise ratio in set time window At least one of, determine the signal quality of wave beam；Choose the part wave beam that signal quality is higher than signal quality threshold.

In some embodiments, sound source determining module is used to wave beam include the probability of keyword and the signal quality of wave beam It is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave beam refers to To direction be determined as Sounnd source direction.

In some embodiments, voice wake-up module is used for the Sounnd source direction of the sound source beam position at continuous multiple moment It is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source wave at continuous multiple moment In the case that the Sounnd source direction of Shu Zhixiang is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.

In some embodiments, Wave beam forming module is for according to the direction of source noise, point source noise and white noise The direction vector of ratio and predetermined direction determines the power of each road voice signal that microphone receives relative to the predetermined direction Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.

Wherein, W_m(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process Microphone array be directed toward vector,ForConjugate transposition, α_psnIt is dry for preset bearing point source in noise Disturb the ratio of noise, 1- α_psnFor the ratio of white noise in noise,For preset bearing in m-th of wave beam treatment process The direction vector of point-source jamming noise,ForConjugate transposition.

In some embodiments, device further include: model training module is used for voice signal in scheduled multiple sides Beam forming process is carried out upwards, obtains multiple wave beams, keyword mark is carried out to multiple wave beams, as training wave beam, will be instructed Practice wave beam input keyword identification model to be trained, with the keyword identification model trained in advance.

In some embodiments, the device further include: echo cancellation module, for that will be believed by the received voice of microphone Number carry out echo cancellor.

According to the other embodiment of the disclosure, a kind of voice Rouser for providing, comprising: memory；And coupling To the processor of memory, processor is configured as executing such as aforementioned any reality based on the instruction being stored in memory devices Apply the voice awakening method of example.

According to the still other embodiments of the disclosure, a kind of computer readable storage medium provided is stored thereon with calculating Machine program, wherein the program realizes the voice awakening method of aforementioned any embodiment when being executed by processor.

In the disclosure voice signal is subjected to Wave beam forming in a plurality of directions, obtains multiple wave beams, multiple wave beams are defeated Enter keyword identification model, identify that multiple wave beams include the probability of keyword, and then includes the probability of keyword based on wave beam Sound source wave beam is chosen with the signal quality of the wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result, determination Whether system is waken up.The disclosure does not use existing sound localization method and voice to wake up process, by beamforming algorithm with sound Location algorithm decoupling in source to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system Accuracy rate is waken up, user experience is promoted.

By the detailed description referring to the drawings to the exemplary embodiment of the disclosure, the other feature of the disclosure and its Advantage will become apparent.

Detailed description of the invention

In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Disclosed some embodiments for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 shows the flow diagram of the voice awakening method of some embodiments of the present disclosure.

Fig. 2 shows the flow diagrams of the voice awakening method of other embodiments of the disclosure.

Fig. 3 shows the structural schematic diagram of the voice Rouser of some embodiments of the present disclosure.

Fig. 4 shows the structural schematic diagram of the voice Rouser of other embodiments of the disclosure.

Fig. 5 shows the structural schematic diagram of the voice Rouser of the other embodiment of the disclosure.

Fig. 6 shows the structural schematic diagram of the voice Rouser of the still other embodiments of the disclosure.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present disclosure, the technical solution in the embodiment of the present disclosure is carried out clear, complete Site preparation description, it is clear that described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Below Description only actually at least one exemplary embodiment be it is illustrative, never as to the disclosure and its application or making Any restrictions.Based on the embodiment in the disclosure, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, belong to the disclosure protection range.

The disclosure provides a kind of voice awakening method, and some realities of disclosure voice awakening method are described below with reference to Fig. 1 Apply example.

Fig. 1 is the flow chart of some embodiments of disclosure voice awakening method.As shown in Figure 1, the method packet of the embodiment It includes: step S102~S108.

In step s 102, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams.

Multiple microphones i.e. microphone array can be set in the speech recognition system waken up by voice and receive user's Voice signal.Voice signal can be pre-processed first, such as echo is carried out to the received voice signal of microphone array It eliminates, such as by AEC (Acoustic Echo Cancellation, echo cancellation algorithm) to the received language of microphone array Sound signal carries out echo cancellor.

Voice signal after being pre-processed can be using phased array beam formation algorithm in scheduled multiple directions Carry out Wave beam forming.The meaning of phased array is to preset M orientation (being evenly distributed in a circle), to microphone array The multi-path voice signal received carries out M weighted sum processing, forms the road M for the voice signal of respectively specific orientation enhancing. Such as upper direction of the equally distributed M direction as Wave beam forming of circle can be made a reservation for, that is, the beam position formed scheduled M Direction.Beamforming algorithm for example can using MVDR (Minimum Variance Distortionless Response, most The small undistorted response of variance) algorithm, GSC (Generalize Sidelobe Canceller, Generalized Sidelobe Canceller), TF- GSC (Transfer Function Generalize Sidelobe Canceller, transmission function Generalized Sidelobe Canceller) etc. Algorithm.The Wave beam forming in predetermined multiple directions may be implemented by existing algorithm, details are not described herein.

The disclosure also provides a kind of improved beamforming algorithm, is described below.

In some embodiments, according to the direction of source noise, the ratio of point source noise and white noise and predetermined direction It is directed toward vector, determines the weight of each road voice signal that microphone receives relative to the predetermined direction；It is received according to microphone Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks With determine the wave beam of the predetermined direction.Wave beam forming can be carried out according to the following formula.

Xⁿ(k, l)=fft (xⁿ(t)) (1)

In formula (1), xⁿ(t) voice signal received for n-th of microphone, fft () indicate to voice signal into Row Fast Fourier Transform (FFT) (FFT), obtains Xⁿ(k, l) is xⁿ(t) the SFFT amplitude of l period kth frequency band, l are indicated to voice Signal adding window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.

y_m(t)=ifft (Y_m(k,l)) (3)

In formula (2) and (3), y_m(t) output signal of m-th of the preset bearing wave beam formed for phased array beam, Ifft () indicates inverse fast fourier transform, Y_m(k, l) is y_m(t) the SFFT amplitude of l period kth frequency band.For in m-th of wave beam treatment process, n-th of microphone receives voice signal in l period kth frequency band Weight.

By above-mentioned formula it can be seen that, it is determined thatIt can determine the signal of the wave beam m of predetermined direction It indicates.

In formula (4), W_m(k) each road voice signal received for microphone in m-th wave beam treatment process relative to The weight vectors of the predetermined direction,For a n-dimensional vector, It is considered that identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives voice letter in kth frequency band Number weight.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse square Battle array.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction It is fixed.

It is further available according to formula (5)α_psnFor the ratio of constant bearing point-source jamming noise in noise Example, 1- α_psnFor the ratio of white noise in noise.α_psnIt can be obtained according to test or experience.It is handled for m-th of wave beam The direction vector of constant bearing point-source jamming noise in the process,ForConjugate transposition.It can be with It is obtained according to test or experience.

The beam signal on each predetermined direction can be calculated by above-mentioned the separate equations, multiple waves can be executed parallel The process that beam is formed, obtains multiple wave beams.

In step S104, by wave beam input keyword identification model trained in advance, obtain wave beam includes keyword Probability.

Voice system judges whether to record to subsequent voice and carries out voice by the keyword in identification voice Identification determines subsequent whether wake up voice system that is, by the keyword in retrieval voice.Keyword identification model is, for example, deep Spend learning model, Hidden Markov Model etc..Deep learning model is, for example, DNN (Deep Neural Networks, depth mind Through network), RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network), CRNN (Convolutional Recurrent Neural Network, convolution loop neural network) etc..These models are existing model, no longer superfluous herein It states.When carrying out the training of keyword identification model, multiple wave beams can be generated according to the embodiment of step S102, by multiple wave beams Whether mark contains keyword, as training wave beam；Training wave beam input keyword identification model is subjected to off-line training, with To keyword identification model trained in advance.And then the keyword identification model that wave beam input is trained in advance, it is available to be somebody's turn to do The probability comprising keyword of wave beam.

In step s 106, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side To wave beam, as sound source wave beam.

In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave The signal quality of beam.Energy of the wave beam in set time window is higher, and signal-to-noise ratio is higher, then signal quality is better.For example, can be with Energy and signal-to-noise ratio of the wave beam in set time window are calculated, two parameters are weighted with the signal matter for the determining wave beam of summing Amount.The weight of energy and signal-to-noise ratio can be configured according to actual needs.First energy and signal-to-noise ratio can be normalized Processing, is being weighted.

In some embodiments, include that the probability of keyword and the signal quality of wave beam are weighted summation by wave beam, obtain To the significance level of wave beam；The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is determined as Sounnd source direction.The beam signal better quality of Sounnd source direction, it is higher to be identified the probability comprising keyword, therefore, can To include that the probability of keyword and the signal quality of wave beam choose sound source wave beam according to wave beam.For example, calculating K wave beam in fixation Energy power ' and Signal to Noise Ratio (SNR) in time window ', it is normalized, obtains simultaneouslyThe keyword of k-th of wave beam is obtained by keyword identification model The keyword identification probability of identification model output is NNScore_k, and then the significance level of k-th of wave beam is calculated,

In step S108, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.

Can whether be more than directly threshold value according to the key words probabilities of sound source wave beam, and then determine whether wake-up system.But It is the accuracy rate that wake-up can be further increased by the characteristic matching of the sound source wave beam at continuous multiple moment.

In some embodiments, by current time and the sound source beam position at continuous multiple moment of predetermined number before Sounnd source direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword；At continuous multiple moment Sound source beam position Sounnd source direction it is consistent, and in the case that the sound source wave beam at continuous multiple moment includes keyword, Wake-up system.Otherwise, system is not waken up.I.e. according to t moment and before, each moment (t-p, t-p+1 ..., t-1, t moment) is closed Whether the consistency confirmation system of the result of keyword identification and positioning discrimination module is waken up.If the keyword at front and back moment identifies And positioning result is consistent, then system is waken up, and otherwise, system cannot be waken up.

Voice signal is carried out Wave beam forming in a plurality of directions, obtains multiple wave beams by the method for above-described embodiment, will be more A wave beam inputs keyword identification model, identifies that multiple wave beams include the probability of keyword, and then includes key based on wave beam The signal quality of the probability of word and wave beam chooses sound source wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result, Determine whether wake-up system.The method of above-described embodiment does not use existing sound localization method and voice to wake up process, by wave Beam formation algorithm is decoupled with auditory localization algorithm, so that influence of the acoustic source location accuracy to beamforming algorithm orientation is avoided, into And improve voice system and wake up accuracy rate, promote user experience.

Other embodiments of disclosure voice awakening method are described below with reference to Fig. 2.

Fig. 2 is the flow chart of other embodiments of disclosure voice awakening method.As shown in Fig. 2, the method for the embodiment It include: step S202~S214.

In step S202, the voice signal of user is received by microphone array.

In step S204, the received multi-path voice signal of microphone array is subjected to echo cancellor.

In step S206, received voice signal is subjected to Wave beam forming in scheduled multiple directions, is obtained multiple Wave beam.

In step S208, according to the signal quality of wave beam, selected part wave beam.

In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave The signal quality of beam；Choose the part wave beam that signal quality is higher than signal quality threshold.Such as wave beam is in set time window The weighted value of energy and signal-to-noise ratio determines the signal quality of wave beam.The weight of energy and signal-to-noise ratio can according to actual needs into Row setting.For example, calculating separately energy power and Signal to Noise Ratio (SNR) of each wave beam in set time window, while carrying out normalizing Change processing, obtainsFurther calculate the signal quality of each wave beam Score Choose signal quality score score_k(k=1,2 ... M) are higher than the wave beam of signal quality threshold, alternatively, choosing the wave that signal quality comes default ranking Beam.

The preferable wave beam of quality is chosen by the above method, it is possible to reduce the calculation amount of subsequent process improves system effect Rate and wake-up accuracy rate.

In step S210, by the part wave beam input of selection keyword identification model trained in advance, wave beam is obtained Probability comprising keyword.

In step S212, includes the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side To wave beam, as sound source wave beam.

In step S214, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.

The disclosure also provides a kind of voice Rouser, is described below with reference to Fig. 3.

Fig. 3 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 3, the device of the embodiment 30 include: Wave beam forming module 302, keyword identification module 304, sound source determining module 306, voice wake-up module 308.

Wave beam forming module 302 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple Wave beam.

In some embodiments, Wave beam forming module 302 is used for the direction according to source noise, puts source noise and white noise Ratio and predetermined direction direction vector, determine the power of each road voice signal that microphone receives relative to the predetermined direction Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.

In some embodiments, Wave beam forming can be carried out according to the following formula.It is identical as the formula in previous embodiment.

Xⁿ(k, l)=fft (xⁿ(t)) (1)

Wherein, xⁿ(t) voice signal received for n-th of microphone, fft () indicate to carry out voice signal quick Fourier transformation (FFT), obtains Xⁿ(k, l) is xⁿ(t) the SFFT amplitude of l period kth frequency band, l expression add voice signal Window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.

y_m(t)=ifft (y_m(k, l)) (3)

Wherein, y_m(t) output signal of m-th of the preset bearing wave beam formed for phased array beam, ifft () are indicated Inverse fast fourier transform, Y_m(k, l) is y_m(t) the SFFT amplitude of l period kth frequency band.It is m-th In wave beam treatment process, n-th of microphone receives the weight of voice signal in l period kth frequency band.

Wherein, W_m(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction,It, can be with for a n-dimensional vector Think identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives the weight of voice signal in kth frequency band.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse matrix.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction It is fixed.

α_psnFor the ratio of constant bearing point-source jamming noise in noise, 1- α_psnFor the ratio of white noise in noise.α_psnIt can To be obtained according to test or experience.For the direction of constant bearing point-source jamming noise in m-th of wave beam treatment process Vector,ForConjugate transposition.It can be obtained according to test or experience.

Keyword identification module 304 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam Probability containing keyword.

Sound source determining module 306 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward The wave beam of Sounnd source direction, as sound source wave beam.

In some embodiments, sound source determining module 306 is used to wave beam include the probability of keyword and the signal of wave beam Quality is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave The direction of Shu Zhixiang is determined as Sounnd source direction.

Voice wake-up module 308 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether Wake-up system.

In some embodiments, voice wake-up module 308 is used for the sound source of the sound source beam position at continuous multiple moment Direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound at continuous multiple moment In the case that the Sounnd source direction of source beam position is consistent, and the sound source wave beam at continuous multiple moment includes keyword, wake up System.

Other embodiments of disclosure voice Rouser are described below with reference to Fig. 4.

Fig. 4 is the structure chart of other embodiments of disclosure voice Rouser.As shown in figure 4, the dress of the embodiment Setting 40 includes: echo cancellation module 402, Wave beam forming module 404, beam selection module 406, keyword identification module 408, sound Source determining module 410, voice wake-up module 412, model training module 414.

Echo cancellation module 402 will be for that will carry out echo cancellor by the received voice signal of microphone.

Wave beam forming module 404 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple Wave beam.Wave beam forming module 404 is identical as 302 function of Wave beam forming module.

Beam selection module 406 is used for the signal quality according to wave beam, and selected part wave beam is sent to keyword identification mould Block, so that received wave beam is inputted keyword identification model trained in advance by keyword identification module 408.

In some embodiments, beam selection module 406 is for the energy and noise according to wave beam in set time window At least one of than, determine the signal quality of wave beam；Choose the part wave beam that signal quality is higher than signal quality threshold.

Keyword identification module 408 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam Probability containing keyword.Keyword identification module 408 is identical as 304 function of keyword identification module.

Sound source determining module 410 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward The wave beam of Sounnd source direction, as sound source wave beam.Sound source determining module 410 is identical as 306 function of sound source determining module.

Voice wake-up module 412 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether Wake-up system.Voice wake-up module 412 is identical as 308 function of voice wake-up module.

Model training module 414 is used to voice signal carrying out beam forming process in scheduled multiple directions, obtains Multiple wave beams, to multiple wave beams carry out keyword mark, as training wave beam, will training wave beam input keyword identification model into Row training, with the keyword identification model trained in advance.

Model training module 414 can be used for receiving the multiple wave beams or received wave that Wave beam forming module 404 obtains Multiple wave beams that beam selecting module 406 obtains carry out keyword mark to multiple wave beams, as training wave beam, by training wave beam Input keyword identification model is trained, with the keyword identification model trained in advance.

Voice Rouser in embodiment of the disclosure can realize respectively by various calculating equipment or computer system, under Face combines Fig. 5 and Fig. 6 to be described.

Fig. 5 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 5, the device of the embodiment 50 include: memory 510 and the processor 520 for being coupled to the memory 510, and processor 520 is configured as being based on being stored in Instruction in memory 510 executes the voice awakening method in the disclosure in any some embodiments.

Wherein, memory 110 is such as may include system storage, fixed non-volatile memory medium.System storage Device is for example stored with operating system, application program, Boot loader (Boot Loader), database and other programs etc..

Fig. 6 is the structure chart of other embodiments of disclosure voice Rouser.As shown in fig. 6, the dress of the embodiment Setting 60 includes: memory 610 and processor 620, similar with memory 510 and processor 520 respectively.It can also include defeated Enter output interface 630, network interface 640, memory interface 650 etc..These interfaces 630,640,650 and memory 610 and place It can for example be connected by bus 660 between reason device 620.Wherein, input/output interface 630 is display, mouse, keyboard, touching It touches the input-output equipment such as screen and connecting interface is provided.Network interface 640 provides connecting interface for various networked devices, such as can be with It is connected to database server or cloud storage server etc..Memory interface 650 is that the external storages such as SD card, USB flash disk mention For connecting interface.

A kind of computer readable storage medium that the disclosure also provides, is stored thereon with computer program, wherein the program The voice awakening method of aforementioned any embodiment is realized when being executed by processor.

Those skilled in the art should be understood that embodiment of the disclosure can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the disclosure The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the disclosure, which can be used in one or more, Machine can use the meter implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of calculation machine program product.

The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It is interpreted as to be realized by computer program instructions each in flowchart and/or the block diagram The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computer journeys Sequence instruct to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor with A machine is generated, so that the instruction generation executed by computer or the processor of other programmable data processing devices is used for Realize the dress for the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram It sets.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all spirit in the disclosure and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.

Claims

1. a kind of voice awakening method, comprising:

Voice signal is subjected to Wave beam forming in scheduled multiple directions, obtains multiple wave beams；

By wave beam input keyword identification model trained in advance, the probability comprising keyword of the wave beam is obtained；

Include the probability of keyword and the signal quality of the wave beam according to the wave beam, determine the wave beam for being directed toward Sounnd source direction, As sound source wave beam；

According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.

2. voice awakening method according to claim 1, wherein

The keyword identification model that wave beam input is trained in advance includes:

According to the signal quality of the wave beam, selected part wave beam input keyword identification model trained in advance.

3. voice awakening method according to claim 2, wherein

The signal quality according to the wave beam, selected part wave beam include:

According at least one of energy of the wave beam in set time window and signal-to-noise ratio, the signal matter of the wave beam is determined Amount；

Choose the part wave beam that signal quality is higher than signal quality threshold.

4. voice awakening method according to claim 1, wherein

The signal quality of the probability according to the wave beam comprising keyword and the wave beam determines the wave for being directed toward Sounnd source direction Beam includes: as sound source wave beam

Include that the probability of keyword and the signal quality of the wave beam are weighted summation by the wave beam, obtains the important of wave beam Degree；

The highest wave beam of significance level is chosen as sound source wave beam, the direction of the sound source beam position is determined as Sounnd source direction.

5. voice awakening method according to claim 1, wherein

The characteristic matching result of the continuous sound source wave beam at multiple moment of the basis, it is determined whether wake-up system includes:

The Sounnd source direction of the sound source beam position at continuous multiple moment is matched, and determines the sound at continuous multiple moment Whether source wave beam includes keyword；

It is consistent in the Sounnd source direction of the sound source beam position at continuous multiple moment, and the sound source wave beam at continuous multiple moment Comprising waking up system in the case where keyword.

6. voice awakening method according to claim 1, wherein

Described that voice signal is carried out Wave beam forming in scheduled multiple directions, obtaining multiple wave beams includes:

According to the direction of source noise, the direction vector of the ratio of point source noise and white noise and predetermined direction, microphone is determined Weight of each road voice signal received relative to the predetermined direction；

Weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.

7. voice awakening method according to claim 6, wherein

Each road voice signal that the microphone receives calculates according to the following formula relative to the weight of the predetermined direction:

Wherein, W_m(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process Gram wind array is directed toward vector,ForConjugate transposition, α_psnIt makes an uproar for preset bearing point-source jamming in noise The ratio of sound, 1- α_psnFor the ratio of white noise in noise,For preset bearing point source in m-th of wave beam treatment process The direction vector of interference noise,ForConjugate transposition.

8. voice awakening method according to claim 1, further includes:

Voice signal is subjected to beam forming process in scheduled multiple directions, obtains multiple wave beams；

Keyword mark is carried out to multiple wave beams, as training wave beam；

The trained wave beam input keyword identification model is trained, with the keyword identification model trained in advance.

9. voice awakening method according to claim 1, wherein

It is described voice signal is subjected to beam forming process in scheduled multiple directions before further include:

Echo cancellor will be carried out by the received voice signal of microphone.

10. -9 described in any item voice awakening methods according to claim 1, wherein

The keyword identification model includes: deep learning model or Hidden Markov Model.

11. a kind of voice Rouser, comprising:

Wave beam forming module obtains multiple wave beams for voice signal to be carried out Wave beam forming in scheduled multiple directions；

Keyword identification module obtains the wave beam for the keyword identification model that wave beam input is trained in advance Probability comprising keyword；

Sound source determining module, for the signal quality of probability and the wave beam according to the wave beam comprising keyword, determination refers to To the wave beam of Sounnd source direction, as sound source wave beam；

Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system System.

12. voice Rouser according to claim 11, further includes:

Beam selection module, for the signal quality according to the wave beam, selected part wave beam is sent to the keyword identification Module, so that received wave beam is inputted keyword identification model trained in advance by the keyword identification module.

13. voice Rouser according to claim 12, wherein

The beam selection module is used at least one of energy and signal-to-noise ratio according to the wave beam in set time window, really The signal quality of the fixed wave beam；Choose the part wave beam that signal quality is higher than signal quality threshold.

14. voice Rouser according to claim 11, wherein

The sound source determining module is used to include that the probability of keyword and the signal quality of the wave beam are added by the wave beam Power summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, the sound source beam position Direction be determined as Sounnd source direction.

15. voice Rouser according to claim 11, wherein

The voice wake-up module is used to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determines Whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source of the sound source beam position at continuous multiple moment In the case that direction is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.

16. voice Rouser according to claim 11, wherein

The Wave beam forming module is used for the direction according to source noise, the ratio for putting source noise and white noise and predetermined direction It is directed toward vector, the weight of each road voice signal that microphone receives relative to the predetermined direction is determined, is received according to microphone Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks With determine the wave beam of the predetermined direction.

17. voice Rouser according to claim 16, wherein

Wherein, W_m(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process Gram wind array is directed toward vector,ForConjugate transposition, α_psnIt makes an uproar for preset bearing point-source jamming in noise The ratio of sound, 1- α_psnFor the ratio of white noise in noise,It is dry for preset bearing point source in m-th of wave beam treatment process The direction vector of noise is disturbed,ForConjugate transposition.

18. voice Rouser according to claim 11, further includes:

Model training module obtains multiple waves for voice signal to be carried out beam forming process in scheduled multiple directions Beam carries out keyword mark to multiple wave beams, and as training wave beam, the trained wave beam input keyword identification model is carried out Training, with the keyword identification model trained in advance.

19. voice Rouser according to claim 11, further includes:

Echo cancellation module, for echo cancellor will to be carried out by the received voice signal of microphone.

20. the described in any item voice Rousers of 1-19 according to claim 1,

21. a kind of voice Rouser, comprising:

Memory；And

It is coupled to the processor of the memory, the processor is configured to based on the finger being stored in the memory devices It enables, executes such as the described in any item voice awakening methods of claim 1-10.

22. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor The step of realizing any one of claim 1-10 the method.