CN109272989A - Voice awakening method, device and computer readable storage medium - Google Patents
Voice awakening method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN109272989A CN109272989A CN201810992991.4A CN201810992991A CN109272989A CN 109272989 A CN109272989 A CN 109272989A CN 201810992991 A CN201810992991 A CN 201810992991A CN 109272989 A CN109272989 A CN 109272989A
- Authority
- CN
- China
- Prior art keywords
- wave beam
- voice
- keyword
- sound source
- wave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 238000003860 storage Methods 0.000 title claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 43
- 239000013598 vector Substances 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 20
- 230000017105 transposition Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013136 deep learning model Methods 0.000 claims description 6
- 241000209140 Triticum Species 0.000 claims 2
- 235000021307 Triticum Nutrition 0.000 claims 2
- 230000002618 waking effect Effects 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 17
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000004807 localization Effects 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000000700 radioactive tracer Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
This disclosure relates to which a kind of voice awakening method, device and computer readable storage medium, are related to field of computer technology.Disclosed method includes: that voice signal is carried out Wave beam forming in scheduled multiple directions, obtains multiple wave beams;By wave beam input keyword identification model trained in advance, the probability comprising keyword of wave beam is obtained;Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam;According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.The disclosure does not use existing sound localization method and voice to wake up process, beamforming algorithm is decoupled with auditory localization algorithm, it to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system and wakes up accuracy rate, promote user experience.
Description
Technical field
This disclosure relates to field of computer technology, in particular to a kind of voice awakening method, device and computer-readable deposit
Storage media.
Background technique
With the development of computer technology, the demand that the mankind exchange with machine information is more more and more urgent.Voice is as the mankind
One of most natural interactive mode, also become it is desirable to substitute the most important mode of mouse-keyboard and computer exchange it
One.And as the growth requirement of the intelligent terminals such as smart home, intelligent vehicle, intelligent meeting system is more more and more urgent, as intelligence
It is receive more and more attention that the intelligent sound of energy terminal entry wakes up systems technology.
It will receive interference that ambient enviroment and communication media introduce (such as echo, reverberation and interference in voice communication course
Sound source etc.) so that computer sharply declines the comprehension of voice.Since noise jamming always comes from from all directions, using list
A microphone, which captures clean speech, becomes extremely difficult.Voice wakes up system and is based primarily upon microphone array method at present, will be more
A microphone acquisition voice carries out time-space domain processing, to achieve the purpose that inhibit noise, speech enhan-cement.
Voice awakening method known for inventor generallys include following steps: voice signal is acquired by microphone array,
Voice signal is pre-processed, the angle and orientation of sound source are determined by auditory localization and tracer technique, using Wave beam forming
Technology generates the wave beam for being directed toward sound source angle and orientation, and the multi-beam transmission of formation is recognized to speech recognition system, determines
Whether system is waken up.
Summary of the invention
Inventor's discovery: current auditory localization can generally be divided into three classes by positioning principle: be based on maximum work output
The steerable beam of rate forms technology, based on reaching time-difference technology and based on the positioning of High-Resolution Spectral Estimation.These three types of sound sources
Location algorithm performance under the serious environment of reverberation and noise jamming sharply declines, and angle and the side of sound source can not be accurately positioned
Position, and then subsequent speech recognition is directly affected, influence the result of voice wake-up.
A disclosure technical problem to be solved is: how to improve the accuracy rate of voice wake-up, promotes user experience.
According to some embodiments of the present disclosure, a kind of voice awakening method for providing, comprising: by voice signal scheduled
Wave beam forming is carried out in multiple directions, obtains multiple wave beams;By wave beam input keyword identification model trained in advance, wave is obtained
The probability comprising keyword of beam;Include the probability of keyword and the signal quality of wave beam according to wave beam, determines and be directed toward sound source side
To wave beam, as sound source wave beam;According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system
System.
In some embodiments, wave beam is inputted keyword identification model trained in advance includes: the signal according to wave beam
Quality, selected part wave beam input keyword identification model trained in advance.
In some embodiments, according to the signal quality of wave beam, selected part wave beam includes: according to wave beam in the set time
At least one of energy and signal-to-noise ratio in window determine the signal quality of wave beam;It chooses signal quality and is higher than signal quality threshold
Part wave beam.
In some embodiments, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source
It includes that the probability of keyword and the signal quality of wave beam are weighted and ask that the wave beam in direction, which includes: by wave beam as sound source wave beam,
With obtain the significance level of wave beam;The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is true
It is set to Sounnd source direction.
In some embodiments, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up
System includes: to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determine continuous multiple moment
Whether sound source wave beam includes keyword;It is consistent and continuous in the Sounnd source direction of the sound source beam position at continuous multiple moment
In the case that the sound source wave beam at multiple moment includes keyword, system is waken up.
In some embodiments, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams
It include: that microphone is determined according to the direction of source noise, the ratio of point source noise and white noise and the direction vector of predetermined direction
Weight of each road voice signal received relative to the predetermined direction;Each road voice signal received according to microphone is opposite
In the weight of the predetermined direction, summation is weighted to each road voice signal that microphone receives, determines the predetermined direction
Wave beam.
In some embodiments, each road voice signal that microphone receives relative to the predetermined direction weight according to
Lower formula calculates:
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this
The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process
The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process
Microphone array be directed toward vector,ForConjugate transposition, αpsnIt is dry for preset bearing point source in noise
Disturb the ratio of noise, 1- αpsnFor the ratio of white noise in noise,For predetermined party site in m-th of wave beam treatment process
The direction vector of source interference noise,ForConjugate transposition.
In some embodiments, this method further include: voice signal is subjected to Wave beam forming in scheduled multiple directions
Process obtains multiple wave beams;Keyword mark is carried out to multiple wave beams, as training wave beam;Training wave beam is inputted into keyword
Identification model is trained, with the keyword identification model trained in advance.
In some embodiments, it is also wrapped before voice signal is carried out beam forming process in scheduled multiple directions
It includes: echo cancellor will be carried out by the received voice signal of microphone.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
According to other embodiments of the disclosure, a kind of voice Rouser for providing, comprising: Wave beam forming module is used
In voice signal is carried out Wave beam forming in scheduled multiple directions, multiple wave beams are obtained;Keyword identification module, being used for will
Wave beam input keyword identification model trained in advance, obtains the probability comprising keyword of wave beam;Sound source determining module, is used for
Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam;
Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system.
In some embodiments, device further include: beam selection module is chosen for the signal quality according to wave beam
Part wave beam is sent to keyword identification module, so that received wave beam is inputted pass trained in advance by keyword identification module
Keyword identification model.
In some embodiments, beam selection module is used for according to wave beam in the energy and signal-to-noise ratio in set time window
At least one of, determine the signal quality of wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
In some embodiments, sound source determining module is used to wave beam include the probability of keyword and the signal quality of wave beam
It is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave beam refers to
To direction be determined as Sounnd source direction.
In some embodiments, voice wake-up module is used for the Sounnd source direction of the sound source beam position at continuous multiple moment
It is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source wave at continuous multiple moment
In the case that the Sounnd source direction of Shu Zhixiang is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.
In some embodiments, Wave beam forming module is for according to the direction of source noise, point source noise and white noise
The direction vector of ratio and predetermined direction determines the power of each road voice signal that microphone receives relative to the predetermined direction
Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received
Voice signal is weighted summation, determines the wave beam of the predetermined direction.
In some embodiments, each road voice signal that microphone receives relative to the predetermined direction weight according to
Lower formula calculates:
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this
The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process
The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process
Microphone array be directed toward vector,ForConjugate transposition, αpsnIt is dry for preset bearing point source in noise
Disturb the ratio of noise, 1- αpsnFor the ratio of white noise in noise,For preset bearing in m-th of wave beam treatment process
The direction vector of point-source jamming noise,ForConjugate transposition.
In some embodiments, device further include: model training module is used for voice signal in scheduled multiple sides
Beam forming process is carried out upwards, obtains multiple wave beams, keyword mark is carried out to multiple wave beams, as training wave beam, will be instructed
Practice wave beam input keyword identification model to be trained, with the keyword identification model trained in advance.
In some embodiments, the device further include: echo cancellation module, for that will be believed by the received voice of microphone
Number carry out echo cancellor.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
According to the other embodiment of the disclosure, a kind of voice Rouser for providing, comprising: memory;And coupling
To the processor of memory, processor is configured as executing such as aforementioned any reality based on the instruction being stored in memory devices
Apply the voice awakening method of example.
According to the still other embodiments of the disclosure, a kind of computer readable storage medium provided is stored thereon with calculating
Machine program, wherein the program realizes the voice awakening method of aforementioned any embodiment when being executed by processor.
In the disclosure voice signal is subjected to Wave beam forming in a plurality of directions, obtains multiple wave beams, multiple wave beams are defeated
Enter keyword identification model, identify that multiple wave beams include the probability of keyword, and then includes the probability of keyword based on wave beam
Sound source wave beam is chosen with the signal quality of the wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result, determination
Whether system is waken up.The disclosure does not use existing sound localization method and voice to wake up process, by beamforming algorithm with sound
Location algorithm decoupling in source to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system
Accuracy rate is waken up, user experience is promoted.
By the detailed description referring to the drawings to the exemplary embodiment of the disclosure, the other feature of the disclosure and its
Advantage will become apparent.
Detailed description of the invention
In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Disclosed some embodiments for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 shows the flow diagram of the voice awakening method of some embodiments of the present disclosure.
Fig. 2 shows the flow diagrams of the voice awakening method of other embodiments of the disclosure.
Fig. 3 shows the structural schematic diagram of the voice Rouser of some embodiments of the present disclosure.
Fig. 4 shows the structural schematic diagram of the voice Rouser of other embodiments of the disclosure.
Fig. 5 shows the structural schematic diagram of the voice Rouser of the other embodiment of the disclosure.
Fig. 6 shows the structural schematic diagram of the voice Rouser of the still other embodiments of the disclosure.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present disclosure, the technical solution in the embodiment of the present disclosure is carried out clear, complete
Site preparation description, it is clear that described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Below
Description only actually at least one exemplary embodiment be it is illustrative, never as to the disclosure and its application or making
Any restrictions.Based on the embodiment in the disclosure, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, belong to the disclosure protection range.
The disclosure provides a kind of voice awakening method, and some realities of disclosure voice awakening method are described below with reference to Fig. 1
Apply example.
Fig. 1 is the flow chart of some embodiments of disclosure voice awakening method.As shown in Figure 1, the method packet of the embodiment
It includes: step S102~S108.
In step s 102, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams.
Multiple microphones i.e. microphone array can be set in the speech recognition system waken up by voice and receive user's
Voice signal.Voice signal can be pre-processed first, such as echo is carried out to the received voice signal of microphone array
It eliminates, such as by AEC (Acoustic Echo Cancellation, echo cancellation algorithm) to the received language of microphone array
Sound signal carries out echo cancellor.
Voice signal after being pre-processed can be using phased array beam formation algorithm in scheduled multiple directions
Carry out Wave beam forming.The meaning of phased array is to preset M orientation (being evenly distributed in a circle), to microphone array
The multi-path voice signal received carries out M weighted sum processing, forms the road M for the voice signal of respectively specific orientation enhancing.
Such as upper direction of the equally distributed M direction as Wave beam forming of circle can be made a reservation for, that is, the beam position formed scheduled M
Direction.Beamforming algorithm for example can using MVDR (Minimum Variance Distortionless Response, most
The small undistorted response of variance) algorithm, GSC (Generalize Sidelobe Canceller, Generalized Sidelobe Canceller), TF-
GSC (Transfer Function Generalize Sidelobe Canceller, transmission function Generalized Sidelobe Canceller) etc.
Algorithm.The Wave beam forming in predetermined multiple directions may be implemented by existing algorithm, details are not described herein.
The disclosure also provides a kind of improved beamforming algorithm, is described below.
In some embodiments, according to the direction of source noise, the ratio of point source noise and white noise and predetermined direction
It is directed toward vector, determines the weight of each road voice signal that microphone receives relative to the predetermined direction;It is received according to microphone
Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks
With determine the wave beam of the predetermined direction.Wave beam forming can be carried out according to the following formula.
Xn(k, l)=fft (xn(t)) (1)
In formula (1), xn(t) voice signal received for n-th of microphone, fft () indicate to voice signal into
Row Fast Fourier Transform (FFT) (FFT), obtains Xn(k, l) is xn(t) the SFFT amplitude of l period kth frequency band, l are indicated to voice
Signal adding window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.
ym(t)=ifft (Ym(k,l)) (3)
In formula (2) and (3), ym(t) output signal of m-th of the preset bearing wave beam formed for phased array beam,
Ifft () indicates inverse fast fourier transform, Ym(k, l) is ym(t) the SFFT amplitude of l period kth frequency band.For in m-th of wave beam treatment process, n-th of microphone receives voice signal in l period kth frequency band
Weight.
By above-mentioned formula it can be seen that, it is determined thatIt can determine the signal of the wave beam m of predetermined direction
It indicates.
In formula (4), Wm(k) each road voice signal received for microphone in m-th wave beam treatment process relative to
The weight vectors of the predetermined direction,For a n-dimensional vector,
It is considered that identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives voice letter in kth frequency band
Number weight.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse square
Battle array.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction
It is fixed.
It is further available according to formula (5)αpsnFor the ratio of constant bearing point-source jamming noise in noise
Example, 1- αpsnFor the ratio of white noise in noise.αpsnIt can be obtained according to test or experience.It is handled for m-th of wave beam
The direction vector of constant bearing point-source jamming noise in the process,ForConjugate transposition.It can be with
It is obtained according to test or experience.
The beam signal on each predetermined direction can be calculated by above-mentioned the separate equations, multiple waves can be executed parallel
The process that beam is formed, obtains multiple wave beams.
In step S104, by wave beam input keyword identification model trained in advance, obtain wave beam includes keyword
Probability.
Voice system judges whether to record to subsequent voice and carries out voice by the keyword in identification voice
Identification determines subsequent whether wake up voice system that is, by the keyword in retrieval voice.Keyword identification model is, for example, deep
Spend learning model, Hidden Markov Model etc..Deep learning model is, for example, DNN (Deep Neural Networks, depth mind
Through network), RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network), CRNN (Convolutional
Recurrent Neural Network, convolution loop neural network) etc..These models are existing model, no longer superfluous herein
It states.When carrying out the training of keyword identification model, multiple wave beams can be generated according to the embodiment of step S102, by multiple wave beams
Whether mark contains keyword, as training wave beam;Training wave beam input keyword identification model is subjected to off-line training, with
To keyword identification model trained in advance.And then the keyword identification model that wave beam input is trained in advance, it is available to be somebody's turn to do
The probability comprising keyword of wave beam.
In step s 106, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side
To wave beam, as sound source wave beam.
In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave
The signal quality of beam.Energy of the wave beam in set time window is higher, and signal-to-noise ratio is higher, then signal quality is better.For example, can be with
Energy and signal-to-noise ratio of the wave beam in set time window are calculated, two parameters are weighted with the signal matter for the determining wave beam of summing
Amount.The weight of energy and signal-to-noise ratio can be configured according to actual needs.First energy and signal-to-noise ratio can be normalized
Processing, is being weighted.
In some embodiments, include that the probability of keyword and the signal quality of wave beam are weighted summation by wave beam, obtain
To the significance level of wave beam;The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is determined as
Sounnd source direction.The beam signal better quality of Sounnd source direction, it is higher to be identified the probability comprising keyword, therefore, can
To include that the probability of keyword and the signal quality of wave beam choose sound source wave beam according to wave beam.For example, calculating K wave beam in fixation
Energy power ' and Signal to Noise Ratio (SNR) in time window ', it is normalized, obtains simultaneouslyThe keyword of k-th of wave beam is obtained by keyword identification model
The keyword identification probability of identification model output is NNScorek, and then the significance level of k-th of wave beam is calculated,
In step S108, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system
System.
Can whether be more than directly threshold value according to the key words probabilities of sound source wave beam, and then determine whether wake-up system.But
It is the accuracy rate that wake-up can be further increased by the characteristic matching of the sound source wave beam at continuous multiple moment.
In some embodiments, by current time and the sound source beam position at continuous multiple moment of predetermined number before
Sounnd source direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword;At continuous multiple moment
Sound source beam position Sounnd source direction it is consistent, and in the case that the sound source wave beam at continuous multiple moment includes keyword,
Wake-up system.Otherwise, system is not waken up.I.e. according to t moment and before, each moment (t-p, t-p+1 ..., t-1, t moment) is closed
Whether the consistency confirmation system of the result of keyword identification and positioning discrimination module is waken up.If the keyword at front and back moment identifies
And positioning result is consistent, then system is waken up, and otherwise, system cannot be waken up.
Voice signal is carried out Wave beam forming in a plurality of directions, obtains multiple wave beams by the method for above-described embodiment, will be more
A wave beam inputs keyword identification model, identifies that multiple wave beams include the probability of keyword, and then includes key based on wave beam
The signal quality of the probability of word and wave beam chooses sound source wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result,
Determine whether wake-up system.The method of above-described embodiment does not use existing sound localization method and voice to wake up process, by wave
Beam formation algorithm is decoupled with auditory localization algorithm, so that influence of the acoustic source location accuracy to beamforming algorithm orientation is avoided, into
And improve voice system and wake up accuracy rate, promote user experience.
Other embodiments of disclosure voice awakening method are described below with reference to Fig. 2.
Fig. 2 is the flow chart of other embodiments of disclosure voice awakening method.As shown in Fig. 2, the method for the embodiment
It include: step S202~S214.
In step S202, the voice signal of user is received by microphone array.
In step S204, the received multi-path voice signal of microphone array is subjected to echo cancellor.
In step S206, received voice signal is subjected to Wave beam forming in scheduled multiple directions, is obtained multiple
Wave beam.
In step S208, according to the signal quality of wave beam, selected part wave beam.
In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave
The signal quality of beam;Choose the part wave beam that signal quality is higher than signal quality threshold.Such as wave beam is in set time window
The weighted value of energy and signal-to-noise ratio determines the signal quality of wave beam.The weight of energy and signal-to-noise ratio can according to actual needs into
Row setting.For example, calculating separately energy power and Signal to Noise Ratio (SNR) of each wave beam in set time window, while carrying out normalizing
Change processing, obtainsFurther calculate the signal quality of each wave beam
Score Choose signal quality score
scorek(k=1,2 ... M) are higher than the wave beam of signal quality threshold, alternatively, choosing the wave that signal quality comes default ranking
Beam.
The preferable wave beam of quality is chosen by the above method, it is possible to reduce the calculation amount of subsequent process improves system effect
Rate and wake-up accuracy rate.
In step S210, by the part wave beam input of selection keyword identification model trained in advance, wave beam is obtained
Probability comprising keyword.
In step S212, includes the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side
To wave beam, as sound source wave beam.
In step S214, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system
System.
The disclosure also provides a kind of voice Rouser, is described below with reference to Fig. 3.
Fig. 3 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 3, the device of the embodiment
30 include: Wave beam forming module 302, keyword identification module 304, sound source determining module 306, voice wake-up module 308.
Wave beam forming module 302 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple
Wave beam.
In some embodiments, Wave beam forming module 302 is used for the direction according to source noise, puts source noise and white noise
Ratio and predetermined direction direction vector, determine the power of each road voice signal that microphone receives relative to the predetermined direction
Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received
Voice signal is weighted summation, determines the wave beam of the predetermined direction.
In some embodiments, Wave beam forming can be carried out according to the following formula.It is identical as the formula in previous embodiment.
Xn(k, l)=fft (xn(t)) (1)
Wherein, xn(t) voice signal received for n-th of microphone, fft () indicate to carry out voice signal quick
Fourier transformation (FFT), obtains Xn(k, l) is xn(t) the SFFT amplitude of l period kth frequency band, l expression add voice signal
Window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.
ym(t)=ifft (ym(k, l)) (3)
Wherein, ym(t) output signal of m-th of the preset bearing wave beam formed for phased array beam, ifft () are indicated
Inverse fast fourier transform, Ym(k, l) is ym(t) the SFFT amplitude of l period kth frequency band.It is m-th
In wave beam treatment process, n-th of microphone receives the weight of voice signal in l period kth frequency band.
By above-mentioned formula it can be seen that, it is determined thatIt can determine the signal of the wave beam m of predetermined direction
It indicates.
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this
The weight vectors in direction,It, can be with for a n-dimensional vector
Think identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives the weight of voice signal in kth frequency band.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse matrix.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction
It is fixed.
αpsnFor the ratio of constant bearing point-source jamming noise in noise, 1- αpsnFor the ratio of white noise in noise.αpsnIt can
To be obtained according to test or experience.For the direction of constant bearing point-source jamming noise in m-th of wave beam treatment process
Vector,ForConjugate transposition.It can be obtained according to test or experience.
Keyword identification module 304 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam
Probability containing keyword.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
Sound source determining module 306 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward
The wave beam of Sounnd source direction, as sound source wave beam.
In some embodiments, sound source determining module 306 is used to wave beam include the probability of keyword and the signal of wave beam
Quality is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave
The direction of Shu Zhixiang is determined as Sounnd source direction.
Voice wake-up module 308 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether
Wake-up system.
In some embodiments, voice wake-up module 308 is used for the sound source of the sound source beam position at continuous multiple moment
Direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound at continuous multiple moment
In the case that the Sounnd source direction of source beam position is consistent, and the sound source wave beam at continuous multiple moment includes keyword, wake up
System.
Other embodiments of disclosure voice Rouser are described below with reference to Fig. 4.
Fig. 4 is the structure chart of other embodiments of disclosure voice Rouser.As shown in figure 4, the dress of the embodiment
Setting 40 includes: echo cancellation module 402, Wave beam forming module 404, beam selection module 406, keyword identification module 408, sound
Source determining module 410, voice wake-up module 412, model training module 414.
Echo cancellation module 402 will be for that will carry out echo cancellor by the received voice signal of microphone.
Wave beam forming module 404 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple
Wave beam.Wave beam forming module 404 is identical as 302 function of Wave beam forming module.
Beam selection module 406 is used for the signal quality according to wave beam, and selected part wave beam is sent to keyword identification mould
Block, so that received wave beam is inputted keyword identification model trained in advance by keyword identification module 408.
In some embodiments, beam selection module 406 is for the energy and noise according to wave beam in set time window
At least one of than, determine the signal quality of wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
Keyword identification module 408 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam
Probability containing keyword.Keyword identification module 408 is identical as 304 function of keyword identification module.
Sound source determining module 410 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward
The wave beam of Sounnd source direction, as sound source wave beam.Sound source determining module 410 is identical as 306 function of sound source determining module.
Voice wake-up module 412 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether
Wake-up system.Voice wake-up module 412 is identical as 308 function of voice wake-up module.
Model training module 414 is used to voice signal carrying out beam forming process in scheduled multiple directions, obtains
Multiple wave beams, to multiple wave beams carry out keyword mark, as training wave beam, will training wave beam input keyword identification model into
Row training, with the keyword identification model trained in advance.
Model training module 414 can be used for receiving the multiple wave beams or received wave that Wave beam forming module 404 obtains
Multiple wave beams that beam selecting module 406 obtains carry out keyword mark to multiple wave beams, as training wave beam, by training wave beam
Input keyword identification model is trained, with the keyword identification model trained in advance.
Voice Rouser in embodiment of the disclosure can realize respectively by various calculating equipment or computer system, under
Face combines Fig. 5 and Fig. 6 to be described.
Fig. 5 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 5, the device of the embodiment
50 include: memory 510 and the processor 520 for being coupled to the memory 510, and processor 520 is configured as being based on being stored in
Instruction in memory 510 executes the voice awakening method in the disclosure in any some embodiments.
Wherein, memory 110 is such as may include system storage, fixed non-volatile memory medium.System storage
Device is for example stored with operating system, application program, Boot loader (Boot Loader), database and other programs etc..
Fig. 6 is the structure chart of other embodiments of disclosure voice Rouser.As shown in fig. 6, the dress of the embodiment
Setting 60 includes: memory 610 and processor 620, similar with memory 510 and processor 520 respectively.It can also include defeated
Enter output interface 630, network interface 640, memory interface 650 etc..These interfaces 630,640,650 and memory 610 and place
It can for example be connected by bus 660 between reason device 620.Wherein, input/output interface 630 is display, mouse, keyboard, touching
It touches the input-output equipment such as screen and connecting interface is provided.Network interface 640 provides connecting interface for various networked devices, such as can be with
It is connected to database server or cloud storage server etc..Memory interface 650 is that the external storages such as SD card, USB flash disk mention
For connecting interface.
A kind of computer readable storage medium that the disclosure also provides, is stored thereon with computer program, wherein the program
The voice awakening method of aforementioned any embodiment is realized when being executed by processor.
Those skilled in the art should be understood that embodiment of the disclosure can provide as method, system or computer journey
Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the disclosure
The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the disclosure, which can be used in one or more,
Machine can use the meter implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of calculation machine program product.
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure
Figure and/or block diagram describe.It is interpreted as to be realized by computer program instructions each in flowchart and/or the block diagram
The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computer journeys
Sequence instruct to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor with
A machine is generated, so that the instruction generation executed by computer or the processor of other programmable data processing devices is used for
Realize the dress for the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram
It sets.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all spirit in the disclosure and
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.
Claims (22)
1. a kind of voice awakening method, comprising:
Voice signal is subjected to Wave beam forming in scheduled multiple directions, obtains multiple wave beams;
By wave beam input keyword identification model trained in advance, the probability comprising keyword of the wave beam is obtained;
Include the probability of keyword and the signal quality of the wave beam according to the wave beam, determine the wave beam for being directed toward Sounnd source direction,
As sound source wave beam;
According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.
2. voice awakening method according to claim 1, wherein
The keyword identification model that wave beam input is trained in advance includes:
According to the signal quality of the wave beam, selected part wave beam input keyword identification model trained in advance.
3. voice awakening method according to claim 2, wherein
The signal quality according to the wave beam, selected part wave beam include:
According at least one of energy of the wave beam in set time window and signal-to-noise ratio, the signal matter of the wave beam is determined
Amount;
Choose the part wave beam that signal quality is higher than signal quality threshold.
4. voice awakening method according to claim 1, wherein
The signal quality of the probability according to the wave beam comprising keyword and the wave beam determines the wave for being directed toward Sounnd source direction
Beam includes: as sound source wave beam
Include that the probability of keyword and the signal quality of the wave beam are weighted summation by the wave beam, obtains the important of wave beam
Degree;
The highest wave beam of significance level is chosen as sound source wave beam, the direction of the sound source beam position is determined as Sounnd source direction.
5. voice awakening method according to claim 1, wherein
The characteristic matching result of the continuous sound source wave beam at multiple moment of the basis, it is determined whether wake-up system includes:
The Sounnd source direction of the sound source beam position at continuous multiple moment is matched, and determines the sound at continuous multiple moment
Whether source wave beam includes keyword;
It is consistent in the Sounnd source direction of the sound source beam position at continuous multiple moment, and the sound source wave beam at continuous multiple moment
Comprising waking up system in the case where keyword.
6. voice awakening method according to claim 1, wherein
Described that voice signal is carried out Wave beam forming in scheduled multiple directions, obtaining multiple wave beams includes:
According to the direction of source noise, the direction vector of the ratio of point source noise and white noise and predetermined direction, microphone is determined
Weight of each road voice signal received relative to the predetermined direction;
Weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received
Voice signal is weighted summation, determines the wave beam of the predetermined direction.
7. voice awakening method according to claim 6, wherein
Each road voice signal that the microphone receives calculates according to the following formula relative to the weight of the predetermined direction:
Wherein, Wm(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction
Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process
Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process
Gram wind array is directed toward vector,ForConjugate transposition, αpsnIt makes an uproar for preset bearing point-source jamming in noise
The ratio of sound, 1- αpsnFor the ratio of white noise in noise,For preset bearing point source in m-th of wave beam treatment process
The direction vector of interference noise,ForConjugate transposition.
8. voice awakening method according to claim 1, further includes:
Voice signal is subjected to beam forming process in scheduled multiple directions, obtains multiple wave beams;
Keyword mark is carried out to multiple wave beams, as training wave beam;
The trained wave beam input keyword identification model is trained, with the keyword identification model trained in advance.
9. voice awakening method according to claim 1, wherein
It is described voice signal is subjected to beam forming process in scheduled multiple directions before further include:
Echo cancellor will be carried out by the received voice signal of microphone.
10. -9 described in any item voice awakening methods according to claim 1, wherein
The keyword identification model includes: deep learning model or Hidden Markov Model.
11. a kind of voice Rouser, comprising:
Wave beam forming module obtains multiple wave beams for voice signal to be carried out Wave beam forming in scheduled multiple directions;
Keyword identification module obtains the wave beam for the keyword identification model that wave beam input is trained in advance
Probability comprising keyword;
Sound source determining module, for the signal quality of probability and the wave beam according to the wave beam comprising keyword, determination refers to
To the wave beam of Sounnd source direction, as sound source wave beam;
Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system
System.
12. voice Rouser according to claim 11, further includes:
Beam selection module, for the signal quality according to the wave beam, selected part wave beam is sent to the keyword identification
Module, so that received wave beam is inputted keyword identification model trained in advance by the keyword identification module.
13. voice Rouser according to claim 12, wherein
The beam selection module is used at least one of energy and signal-to-noise ratio according to the wave beam in set time window, really
The signal quality of the fixed wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
14. voice Rouser according to claim 11, wherein
The sound source determining module is used to include that the probability of keyword and the signal quality of the wave beam are added by the wave beam
Power summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, the sound source beam position
Direction be determined as Sounnd source direction.
15. voice Rouser according to claim 11, wherein
The voice wake-up module is used to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determines
Whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source of the sound source beam position at continuous multiple moment
In the case that direction is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.
16. voice Rouser according to claim 11, wherein
The Wave beam forming module is used for the direction according to source noise, the ratio for putting source noise and white noise and predetermined direction
It is directed toward vector, the weight of each road voice signal that microphone receives relative to the predetermined direction is determined, is received according to microphone
Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks
With determine the wave beam of the predetermined direction.
17. voice Rouser according to claim 16, wherein
Each road voice signal that the microphone receives calculates according to the following formula relative to the weight of the predetermined direction:
Wherein, Wm(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction
Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process
Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process
Gram wind array is directed toward vector,ForConjugate transposition, αpsnIt makes an uproar for preset bearing point-source jamming in noise
The ratio of sound, 1- αpsnFor the ratio of white noise in noise,It is dry for preset bearing point source in m-th of wave beam treatment process
The direction vector of noise is disturbed,ForConjugate transposition.
18. voice Rouser according to claim 11, further includes:
Model training module obtains multiple waves for voice signal to be carried out beam forming process in scheduled multiple directions
Beam carries out keyword mark to multiple wave beams, and as training wave beam, the trained wave beam input keyword identification model is carried out
Training, with the keyword identification model trained in advance.
19. voice Rouser according to claim 11, further includes:
Echo cancellation module, for echo cancellor will to be carried out by the received voice signal of microphone.
20. the described in any item voice Rousers of 1-19 according to claim 1,
The keyword identification model includes: deep learning model or Hidden Markov Model.
21. a kind of voice Rouser, comprising:
Memory;And
It is coupled to the processor of the memory, the processor is configured to based on the finger being stored in the memory devices
It enables, executes such as the described in any item voice awakening methods of claim 1-10.
22. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
The step of realizing any one of claim 1-10 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810992991.4A CN109272989B (en) | 2018-08-29 | 2018-08-29 | Voice wake-up method, apparatus and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810992991.4A CN109272989B (en) | 2018-08-29 | 2018-08-29 | Voice wake-up method, apparatus and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109272989A true CN109272989A (en) | 2019-01-25 |
CN109272989B CN109272989B (en) | 2021-08-10 |
Family
ID=65154643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810992991.4A Active CN109272989B (en) | 2018-08-29 | 2018-08-29 | Voice wake-up method, apparatus and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109272989B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920433A (en) * | 2019-03-19 | 2019-06-21 | 上海华镇电子科技有限公司 | The voice awakening method of electronic equipment under noisy environment |
CN109949810A (en) * | 2019-03-28 | 2019-06-28 | 华为技术有限公司 | A kind of voice awakening method, device, equipment and medium |
CN110265020A (en) * | 2019-07-12 | 2019-09-20 | 大象声科(深圳)科技有限公司 | Voice awakening method, device and electronic equipment, storage medium |
CN110277093A (en) * | 2019-07-30 | 2019-09-24 | 腾讯科技(深圳)有限公司 | The detection method and device of audio signal |
CN110517682A (en) * | 2019-09-02 | 2019-11-29 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, equipment and storage medium |
WO2020001163A1 (en) * | 2018-06-28 | 2020-01-02 | 腾讯科技(深圳)有限公司 | Method and device for speech recognition, computer device, and electronic device |
CN110797051A (en) * | 2019-10-28 | 2020-02-14 | 星络智能科技有限公司 | Awakening threshold setting method and device, intelligent sound box and storage medium |
CN111276143A (en) * | 2020-01-21 | 2020-06-12 | 北京远特科技股份有限公司 | Sound source positioning method and device, voice recognition control method and terminal equipment |
WO2020164397A1 (en) * | 2019-02-12 | 2020-08-20 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN111667843A (en) * | 2019-03-05 | 2020-09-15 | 北京京东尚科信息技术有限公司 | Voice wake-up method and system for terminal equipment, electronic equipment and storage medium |
CN111755021A (en) * | 2019-04-01 | 2020-10-09 | 北京京东尚科信息技术有限公司 | Speech enhancement method and device based on binary microphone array |
CN111833901A (en) * | 2019-04-23 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Audio processing method, audio processing apparatus, audio processing system, and medium |
CN111883162A (en) * | 2020-07-24 | 2020-11-03 | 杨汉丹 | Awakening method and device and computer equipment |
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN113257269A (en) * | 2021-04-21 | 2021-08-13 | 瑞芯微电子股份有限公司 | Beam forming method based on deep learning and storage device |
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
CN113782009A (en) * | 2021-11-10 | 2021-12-10 | 中科南京智能技术研究院 | Voice awakening system based on Savitzky-Golay filter smoothing method |
CN114257684A (en) * | 2021-12-17 | 2022-03-29 | 歌尔科技有限公司 | Voice processing method, system and device and electronic equipment |
CN116504264A (en) * | 2023-06-30 | 2023-07-28 | 小米汽车科技有限公司 | Audio processing method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013137900A1 (en) * | 2012-03-16 | 2013-09-19 | Nuance Communictions, Inc. | User dedicated automatic speech recognition |
CN104936091A (en) * | 2015-05-14 | 2015-09-23 | 科大讯飞股份有限公司 | Intelligent interaction method and system based on circle microphone array |
CN106483502A (en) * | 2016-09-23 | 2017-03-08 | 科大讯飞股份有限公司 | A kind of sound localization method and device |
-
2018
- 2018-08-29 CN CN201810992991.4A patent/CN109272989B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013137900A1 (en) * | 2012-03-16 | 2013-09-19 | Nuance Communictions, Inc. | User dedicated automatic speech recognition |
CN104936091A (en) * | 2015-05-14 | 2015-09-23 | 科大讯飞股份有限公司 | Intelligent interaction method and system based on circle microphone array |
CN106483502A (en) * | 2016-09-23 | 2017-03-08 | 科大讯飞股份有限公司 | A kind of sound localization method and device |
Non-Patent Citations (1)
Title |
---|
CHAO PAN, JINGDONG CHEN, JACOB BENESTY: "Performance Study of the MVDR Beam-former as a Function of the Source Incidence Angle", 《IEEE TRANSACTIONS ON AUDIO,SPEECH AND LANGUAGE PROCESSING 2014》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020001163A1 (en) * | 2018-06-28 | 2020-01-02 | 腾讯科技(深圳)有限公司 | Method and device for speech recognition, computer device, and electronic device |
US11217229B2 (en) * | 2018-06-28 | 2022-01-04 | Tencent Technology (Shenzhen) Company Ltd | Method and apparatus for speech recognition, and electronic device |
WO2020164397A1 (en) * | 2019-02-12 | 2020-08-20 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN111627425A (en) * | 2019-02-12 | 2020-09-04 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN111627425B (en) * | 2019-02-12 | 2023-11-28 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN111667843B (en) * | 2019-03-05 | 2021-12-31 | 北京京东尚科信息技术有限公司 | Voice wake-up method and system for terminal equipment, electronic equipment and storage medium |
CN111667843A (en) * | 2019-03-05 | 2020-09-15 | 北京京东尚科信息技术有限公司 | Voice wake-up method and system for terminal equipment, electronic equipment and storage medium |
CN109920433B (en) * | 2019-03-19 | 2021-08-20 | 上海华镇电子科技有限公司 | Voice awakening method of electronic equipment in noisy environment |
CN109920433A (en) * | 2019-03-19 | 2019-06-21 | 上海华镇电子科技有限公司 | The voice awakening method of electronic equipment under noisy environment |
CN109949810A (en) * | 2019-03-28 | 2019-06-28 | 华为技术有限公司 | A kind of voice awakening method, device, equipment and medium |
CN109949810B (en) * | 2019-03-28 | 2021-09-07 | 荣耀终端有限公司 | Voice wake-up method, device, equipment and medium |
WO2020192721A1 (en) * | 2019-03-28 | 2020-10-01 | 华为技术有限公司 | Voice awakening method and apparatus, and device and medium |
CN111755021A (en) * | 2019-04-01 | 2020-10-09 | 北京京东尚科信息技术有限公司 | Speech enhancement method and device based on binary microphone array |
CN111755021B (en) * | 2019-04-01 | 2023-09-01 | 北京京东尚科信息技术有限公司 | Voice enhancement method and device based on binary microphone array |
CN111833901A (en) * | 2019-04-23 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Audio processing method, audio processing apparatus, audio processing system, and medium |
CN111833901B (en) * | 2019-04-23 | 2024-04-05 | 北京京东尚科信息技术有限公司 | Audio processing method, audio processing device, system and medium |
CN112216295A (en) * | 2019-06-25 | 2021-01-12 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN112216295B (en) * | 2019-06-25 | 2024-04-26 | 大众问问(北京)信息科技有限公司 | Sound source positioning method, device and equipment |
CN110265020A (en) * | 2019-07-12 | 2019-09-20 | 大象声科(深圳)科技有限公司 | Voice awakening method, device and electronic equipment, storage medium |
WO2021008000A1 (en) * | 2019-07-12 | 2021-01-21 | 大象声科(深圳)科技有限公司 | Voice wakeup method and apparatus, electronic device and storage medium |
CN110277093B (en) * | 2019-07-30 | 2021-10-26 | 腾讯科技(深圳)有限公司 | Audio signal detection method and device |
CN110277093A (en) * | 2019-07-30 | 2019-09-24 | 腾讯科技(深圳)有限公司 | The detection method and device of audio signal |
CN110517682A (en) * | 2019-09-02 | 2019-11-29 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, equipment and storage medium |
CN110797051A (en) * | 2019-10-28 | 2020-02-14 | 星络智能科技有限公司 | Awakening threshold setting method and device, intelligent sound box and storage medium |
CN111276143A (en) * | 2020-01-21 | 2020-06-12 | 北京远特科技股份有限公司 | Sound source positioning method and device, voice recognition control method and terminal equipment |
CN111883162A (en) * | 2020-07-24 | 2020-11-03 | 杨汉丹 | Awakening method and device and computer equipment |
CN113257269A (en) * | 2021-04-21 | 2021-08-13 | 瑞芯微电子股份有限公司 | Beam forming method based on deep learning and storage device |
CN113284505A (en) * | 2021-04-21 | 2021-08-20 | 瑞芯微电子股份有限公司 | Adaptive beam forming method and storage device |
CN113782009A (en) * | 2021-11-10 | 2021-12-10 | 中科南京智能技术研究院 | Voice awakening system based on Savitzky-Golay filter smoothing method |
CN114257684A (en) * | 2021-12-17 | 2022-03-29 | 歌尔科技有限公司 | Voice processing method, system and device and electronic equipment |
CN116504264B (en) * | 2023-06-30 | 2023-10-31 | 小米汽车科技有限公司 | Audio processing method, device, equipment and storage medium |
CN116504264A (en) * | 2023-06-30 | 2023-07-28 | 小米汽车科技有限公司 | Audio processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109272989B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109272989A (en) | Voice awakening method, device and computer readable storage medium | |
CN110491403B (en) | Audio signal processing method, device, medium and audio interaction equipment | |
JP7434137B2 (en) | Speech recognition method, device, equipment and computer readable storage medium | |
JP7337953B2 (en) | Speech recognition method and device, neural network training method and device, and computer program | |
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
CN107703486B (en) | Sound source positioning method based on convolutional neural network CNN | |
CN110444214B (en) | Speech signal processing model training method and device, electronic equipment and storage medium | |
CN110364166B (en) | Electronic equipment for realizing speech signal recognition | |
CN110503971A (en) | Time-frequency mask neural network based estimation and Wave beam forming for speech processes | |
CN105068048B (en) | Distributed microphone array sound localization method based on spatial sparsity | |
CN110503970A (en) | A kind of audio data processing method, device and storage medium | |
CN110556103A (en) | Audio signal processing method, apparatus, system, device and storage medium | |
Dorfan et al. | Tree-based recursive expectation-maximization algorithm for localization of acoustic sources | |
CN108417224B (en) | Training and recognition method and system of bidirectional neural network model | |
WO2019080551A1 (en) | Target voice detection method and apparatus | |
CN108122563A (en) | Improve voice wake-up rate and the method for correcting DOA | |
US20150117649A1 (en) | Selective Audio Source Enhancement | |
CN108735199B (en) | Self-adaptive training method and system of acoustic model | |
CN108269567A (en) | For generating the method, apparatus of far field voice data, computing device and computer readable storage medium | |
CN110211599A (en) | Using awakening method, device, storage medium and electronic equipment | |
CN110400571A (en) | Audio-frequency processing method, device, storage medium and electronic equipment | |
CN111667843B (en) | Voice wake-up method and system for terminal equipment, electronic equipment and storage medium | |
CN112904279A (en) | Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum | |
CN109859769A (en) | A kind of mask estimation method and device | |
Chang et al. | Audio adversarial examples generation with recurrent neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |