CN109272989A - Voice awakening method, device and computer readable storage medium - Google Patents

Voice awakening method, device and computer readable storage medium Download PDF

Info

Publication number
CN109272989A
CN109272989A CN201810992991.4A CN201810992991A CN109272989A CN 109272989 A CN109272989 A CN 109272989A CN 201810992991 A CN201810992991 A CN 201810992991A CN 109272989 A CN109272989 A CN 109272989A
Authority
CN
China
Prior art keywords
wave beam
voice
keyword
sound source
wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810992991.4A
Other languages
Chinese (zh)
Other versions
CN109272989B (en
Inventor
徐晴晴
陈宇
杨楠
耿岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810992991.4A priority Critical patent/CN109272989B/en
Publication of CN109272989A publication Critical patent/CN109272989A/en
Application granted granted Critical
Publication of CN109272989B publication Critical patent/CN109272989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

This disclosure relates to which a kind of voice awakening method, device and computer readable storage medium, are related to field of computer technology.Disclosed method includes: that voice signal is carried out Wave beam forming in scheduled multiple directions, obtains multiple wave beams;By wave beam input keyword identification model trained in advance, the probability comprising keyword of wave beam is obtained;Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam;According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.The disclosure does not use existing sound localization method and voice to wake up process, beamforming algorithm is decoupled with auditory localization algorithm, it to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system and wakes up accuracy rate, promote user experience.

Description

Voice awakening method, device and computer readable storage medium
Technical field
This disclosure relates to field of computer technology, in particular to a kind of voice awakening method, device and computer-readable deposit Storage media.
Background technique
With the development of computer technology, the demand that the mankind exchange with machine information is more more and more urgent.Voice is as the mankind One of most natural interactive mode, also become it is desirable to substitute the most important mode of mouse-keyboard and computer exchange it One.And as the growth requirement of the intelligent terminals such as smart home, intelligent vehicle, intelligent meeting system is more more and more urgent, as intelligence It is receive more and more attention that the intelligent sound of energy terminal entry wakes up systems technology.
It will receive interference that ambient enviroment and communication media introduce (such as echo, reverberation and interference in voice communication course Sound source etc.) so that computer sharply declines the comprehension of voice.Since noise jamming always comes from from all directions, using list A microphone, which captures clean speech, becomes extremely difficult.Voice wakes up system and is based primarily upon microphone array method at present, will be more A microphone acquisition voice carries out time-space domain processing, to achieve the purpose that inhibit noise, speech enhan-cement.
Voice awakening method known for inventor generallys include following steps: voice signal is acquired by microphone array, Voice signal is pre-processed, the angle and orientation of sound source are determined by auditory localization and tracer technique, using Wave beam forming Technology generates the wave beam for being directed toward sound source angle and orientation, and the multi-beam transmission of formation is recognized to speech recognition system, determines Whether system is waken up.
Summary of the invention
Inventor's discovery: current auditory localization can generally be divided into three classes by positioning principle: be based on maximum work output The steerable beam of rate forms technology, based on reaching time-difference technology and based on the positioning of High-Resolution Spectral Estimation.These three types of sound sources Location algorithm performance under the serious environment of reverberation and noise jamming sharply declines, and angle and the side of sound source can not be accurately positioned Position, and then subsequent speech recognition is directly affected, influence the result of voice wake-up.
A disclosure technical problem to be solved is: how to improve the accuracy rate of voice wake-up, promotes user experience.
According to some embodiments of the present disclosure, a kind of voice awakening method for providing, comprising: by voice signal scheduled Wave beam forming is carried out in multiple directions, obtains multiple wave beams;By wave beam input keyword identification model trained in advance, wave is obtained The probability comprising keyword of beam;Include the probability of keyword and the signal quality of wave beam according to wave beam, determines and be directed toward sound source side To wave beam, as sound source wave beam;According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.
In some embodiments, wave beam is inputted keyword identification model trained in advance includes: the signal according to wave beam Quality, selected part wave beam input keyword identification model trained in advance.
In some embodiments, according to the signal quality of wave beam, selected part wave beam includes: according to wave beam in the set time At least one of energy and signal-to-noise ratio in window determine the signal quality of wave beam;It chooses signal quality and is higher than signal quality threshold Part wave beam.
In some embodiments, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source It includes that the probability of keyword and the signal quality of wave beam are weighted and ask that the wave beam in direction, which includes: by wave beam as sound source wave beam, With obtain the significance level of wave beam;The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is true It is set to Sounnd source direction.
In some embodiments, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up System includes: to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determine continuous multiple moment Whether sound source wave beam includes keyword;It is consistent and continuous in the Sounnd source direction of the sound source beam position at continuous multiple moment In the case that the sound source wave beam at multiple moment includes keyword, system is waken up.
In some embodiments, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams It include: that microphone is determined according to the direction of source noise, the ratio of point source noise and white noise and the direction vector of predetermined direction Weight of each road voice signal received relative to the predetermined direction;Each road voice signal received according to microphone is opposite In the weight of the predetermined direction, summation is weighted to each road voice signal that microphone receives, determines the predetermined direction Wave beam.
In some embodiments, each road voice signal that microphone receives relative to the predetermined direction weight according to Lower formula calculates:
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process Microphone array be directed toward vector,ForConjugate transposition, αpsnIt is dry for preset bearing point source in noise Disturb the ratio of noise, 1- αpsnFor the ratio of white noise in noise,For predetermined party site in m-th of wave beam treatment process The direction vector of source interference noise,ForConjugate transposition.
In some embodiments, this method further include: voice signal is subjected to Wave beam forming in scheduled multiple directions Process obtains multiple wave beams;Keyword mark is carried out to multiple wave beams, as training wave beam;Training wave beam is inputted into keyword Identification model is trained, with the keyword identification model trained in advance.
In some embodiments, it is also wrapped before voice signal is carried out beam forming process in scheduled multiple directions It includes: echo cancellor will be carried out by the received voice signal of microphone.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
According to other embodiments of the disclosure, a kind of voice Rouser for providing, comprising: Wave beam forming module is used In voice signal is carried out Wave beam forming in scheduled multiple directions, multiple wave beams are obtained;Keyword identification module, being used for will Wave beam input keyword identification model trained in advance, obtains the probability comprising keyword of wave beam;Sound source determining module, is used for Include the probability of keyword and the signal quality of wave beam according to wave beam, the wave beam for being directed toward Sounnd source direction is determined, as sound source wave beam; Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system.
In some embodiments, device further include: beam selection module is chosen for the signal quality according to wave beam Part wave beam is sent to keyword identification module, so that received wave beam is inputted pass trained in advance by keyword identification module Keyword identification model.
In some embodiments, beam selection module is used for according to wave beam in the energy and signal-to-noise ratio in set time window At least one of, determine the signal quality of wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
In some embodiments, sound source determining module is used to wave beam include the probability of keyword and the signal quality of wave beam It is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave beam refers to To direction be determined as Sounnd source direction.
In some embodiments, voice wake-up module is used for the Sounnd source direction of the sound source beam position at continuous multiple moment It is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source wave at continuous multiple moment In the case that the Sounnd source direction of Shu Zhixiang is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.
In some embodiments, Wave beam forming module is for according to the direction of source noise, point source noise and white noise The direction vector of ratio and predetermined direction determines the power of each road voice signal that microphone receives relative to the predetermined direction Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.
In some embodiments, each road voice signal that microphone receives relative to the predetermined direction weight according to Lower formula calculates:
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction, k are the number that microphone receives signal different frequency range,To make an uproar in m-th of wave beam treatment process The covariance matrix of sound,ForInverse matrix,For predetermined direction in m-th of wave beam treatment process Microphone array be directed toward vector,ForConjugate transposition, αpsnIt is dry for preset bearing point source in noise Disturb the ratio of noise, 1- αpsnFor the ratio of white noise in noise,For preset bearing in m-th of wave beam treatment process The direction vector of point-source jamming noise,ForConjugate transposition.
In some embodiments, device further include: model training module is used for voice signal in scheduled multiple sides Beam forming process is carried out upwards, obtains multiple wave beams, keyword mark is carried out to multiple wave beams, as training wave beam, will be instructed Practice wave beam input keyword identification model to be trained, with the keyword identification model trained in advance.
In some embodiments, the device further include: echo cancellation module, for that will be believed by the received voice of microphone Number carry out echo cancellor.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
According to the other embodiment of the disclosure, a kind of voice Rouser for providing, comprising: memory;And coupling To the processor of memory, processor is configured as executing such as aforementioned any reality based on the instruction being stored in memory devices Apply the voice awakening method of example.
According to the still other embodiments of the disclosure, a kind of computer readable storage medium provided is stored thereon with calculating Machine program, wherein the program realizes the voice awakening method of aforementioned any embodiment when being executed by processor.
In the disclosure voice signal is subjected to Wave beam forming in a plurality of directions, obtains multiple wave beams, multiple wave beams are defeated Enter keyword identification model, identify that multiple wave beams include the probability of keyword, and then includes the probability of keyword based on wave beam Sound source wave beam is chosen with the signal quality of the wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result, determination Whether system is waken up.The disclosure does not use existing sound localization method and voice to wake up process, by beamforming algorithm with sound Location algorithm decoupling in source to avoid influence of the acoustic source location accuracy to beamforming algorithm orientation, and then improves voice system Accuracy rate is waken up, user experience is promoted.
By the detailed description referring to the drawings to the exemplary embodiment of the disclosure, the other feature of the disclosure and its Advantage will become apparent.
Detailed description of the invention
In order to illustrate more clearly of the embodiment of the present disclosure or technical solution in the prior art, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Disclosed some embodiments for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 shows the flow diagram of the voice awakening method of some embodiments of the present disclosure.
Fig. 2 shows the flow diagrams of the voice awakening method of other embodiments of the disclosure.
Fig. 3 shows the structural schematic diagram of the voice Rouser of some embodiments of the present disclosure.
Fig. 4 shows the structural schematic diagram of the voice Rouser of other embodiments of the disclosure.
Fig. 5 shows the structural schematic diagram of the voice Rouser of the other embodiment of the disclosure.
Fig. 6 shows the structural schematic diagram of the voice Rouser of the still other embodiments of the disclosure.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present disclosure, the technical solution in the embodiment of the present disclosure is carried out clear, complete Site preparation description, it is clear that described embodiment is only disclosure a part of the embodiment, instead of all the embodiments.Below Description only actually at least one exemplary embodiment be it is illustrative, never as to the disclosure and its application or making Any restrictions.Based on the embodiment in the disclosure, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, belong to the disclosure protection range.
The disclosure provides a kind of voice awakening method, and some realities of disclosure voice awakening method are described below with reference to Fig. 1 Apply example.
Fig. 1 is the flow chart of some embodiments of disclosure voice awakening method.As shown in Figure 1, the method packet of the embodiment It includes: step S102~S108.
In step s 102, voice signal is subjected in scheduled multiple directions Wave beam forming, obtains multiple wave beams.
Multiple microphones i.e. microphone array can be set in the speech recognition system waken up by voice and receive user's Voice signal.Voice signal can be pre-processed first, such as echo is carried out to the received voice signal of microphone array It eliminates, such as by AEC (Acoustic Echo Cancellation, echo cancellation algorithm) to the received language of microphone array Sound signal carries out echo cancellor.
Voice signal after being pre-processed can be using phased array beam formation algorithm in scheduled multiple directions Carry out Wave beam forming.The meaning of phased array is to preset M orientation (being evenly distributed in a circle), to microphone array The multi-path voice signal received carries out M weighted sum processing, forms the road M for the voice signal of respectively specific orientation enhancing. Such as upper direction of the equally distributed M direction as Wave beam forming of circle can be made a reservation for, that is, the beam position formed scheduled M Direction.Beamforming algorithm for example can using MVDR (Minimum Variance Distortionless Response, most The small undistorted response of variance) algorithm, GSC (Generalize Sidelobe Canceller, Generalized Sidelobe Canceller), TF- GSC (Transfer Function Generalize Sidelobe Canceller, transmission function Generalized Sidelobe Canceller) etc. Algorithm.The Wave beam forming in predetermined multiple directions may be implemented by existing algorithm, details are not described herein.
The disclosure also provides a kind of improved beamforming algorithm, is described below.
In some embodiments, according to the direction of source noise, the ratio of point source noise and white noise and predetermined direction It is directed toward vector, determines the weight of each road voice signal that microphone receives relative to the predetermined direction;It is received according to microphone Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks With determine the wave beam of the predetermined direction.Wave beam forming can be carried out according to the following formula.
Xn(k, l)=fft (xn(t)) (1)
In formula (1), xn(t) voice signal received for n-th of microphone, fft () indicate to voice signal into Row Fast Fourier Transform (FFT) (FFT), obtains Xn(k, l) is xn(t) the SFFT amplitude of l period kth frequency band, l are indicated to voice Signal adding window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.
ym(t)=ifft (Ym(k,l)) (3)
In formula (2) and (3), ym(t) output signal of m-th of the preset bearing wave beam formed for phased array beam, Ifft () indicates inverse fast fourier transform, Ym(k, l) is ym(t) the SFFT amplitude of l period kth frequency band.For in m-th of wave beam treatment process, n-th of microphone receives voice signal in l period kth frequency band Weight.
By above-mentioned formula it can be seen that, it is determined thatIt can determine the signal of the wave beam m of predetermined direction It indicates.
In formula (4), Wm(k) each road voice signal received for microphone in m-th wave beam treatment process relative to The weight vectors of the predetermined direction,For a n-dimensional vector, It is considered that identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives voice letter in kth frequency band Number weight.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse square Battle array.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction It is fixed.
It is further available according to formula (5)αpsnFor the ratio of constant bearing point-source jamming noise in noise Example, 1- αpsnFor the ratio of white noise in noise.αpsnIt can be obtained according to test or experience.It is handled for m-th of wave beam The direction vector of constant bearing point-source jamming noise in the process,ForConjugate transposition.It can be with It is obtained according to test or experience.
The beam signal on each predetermined direction can be calculated by above-mentioned the separate equations, multiple waves can be executed parallel The process that beam is formed, obtains multiple wave beams.
In step S104, by wave beam input keyword identification model trained in advance, obtain wave beam includes keyword Probability.
Voice system judges whether to record to subsequent voice and carries out voice by the keyword in identification voice Identification determines subsequent whether wake up voice system that is, by the keyword in retrieval voice.Keyword identification model is, for example, deep Spend learning model, Hidden Markov Model etc..Deep learning model is, for example, DNN (Deep Neural Networks, depth mind Through network), RNN (Recurrent Neural Network, Recognition with Recurrent Neural Network), CRNN (Convolutional Recurrent Neural Network, convolution loop neural network) etc..These models are existing model, no longer superfluous herein It states.When carrying out the training of keyword identification model, multiple wave beams can be generated according to the embodiment of step S102, by multiple wave beams Whether mark contains keyword, as training wave beam;Training wave beam input keyword identification model is subjected to off-line training, with To keyword identification model trained in advance.And then the keyword identification model that wave beam input is trained in advance, it is available to be somebody's turn to do The probability comprising keyword of wave beam.
In step s 106, include the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side To wave beam, as sound source wave beam.
In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave The signal quality of beam.Energy of the wave beam in set time window is higher, and signal-to-noise ratio is higher, then signal quality is better.For example, can be with Energy and signal-to-noise ratio of the wave beam in set time window are calculated, two parameters are weighted with the signal matter for the determining wave beam of summing Amount.The weight of energy and signal-to-noise ratio can be configured according to actual needs.First energy and signal-to-noise ratio can be normalized Processing, is being weighted.
In some embodiments, include that the probability of keyword and the signal quality of wave beam are weighted summation by wave beam, obtain To the significance level of wave beam;The highest wave beam of significance level is chosen as sound source wave beam, the direction of sound source beam position is determined as Sounnd source direction.The beam signal better quality of Sounnd source direction, it is higher to be identified the probability comprising keyword, therefore, can To include that the probability of keyword and the signal quality of wave beam choose sound source wave beam according to wave beam.For example, calculating K wave beam in fixation Energy power ' and Signal to Noise Ratio (SNR) in time window ', it is normalized, obtains simultaneouslyThe keyword of k-th of wave beam is obtained by keyword identification model The keyword identification probability of identification model output is NNScorek, and then the significance level of k-th of wave beam is calculated,
In step S108, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.
Can whether be more than directly threshold value according to the key words probabilities of sound source wave beam, and then determine whether wake-up system.But It is the accuracy rate that wake-up can be further increased by the characteristic matching of the sound source wave beam at continuous multiple moment.
In some embodiments, by current time and the sound source beam position at continuous multiple moment of predetermined number before Sounnd source direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword;At continuous multiple moment Sound source beam position Sounnd source direction it is consistent, and in the case that the sound source wave beam at continuous multiple moment includes keyword, Wake-up system.Otherwise, system is not waken up.I.e. according to t moment and before, each moment (t-p, t-p+1 ..., t-1, t moment) is closed Whether the consistency confirmation system of the result of keyword identification and positioning discrimination module is waken up.If the keyword at front and back moment identifies And positioning result is consistent, then system is waken up, and otherwise, system cannot be waken up.
Voice signal is carried out Wave beam forming in a plurality of directions, obtains multiple wave beams by the method for above-described embodiment, will be more A wave beam inputs keyword identification model, identifies that multiple wave beams include the probability of keyword, and then includes key based on wave beam The signal quality of the probability of word and wave beam chooses sound source wave beam, then by the characteristic matching of the sound source wave beam at multiple moment as a result, Determine whether wake-up system.The method of above-described embodiment does not use existing sound localization method and voice to wake up process, by wave Beam formation algorithm is decoupled with auditory localization algorithm, so that influence of the acoustic source location accuracy to beamforming algorithm orientation is avoided, into And improve voice system and wake up accuracy rate, promote user experience.
Other embodiments of disclosure voice awakening method are described below with reference to Fig. 2.
Fig. 2 is the flow chart of other embodiments of disclosure voice awakening method.As shown in Fig. 2, the method for the embodiment It include: step S202~S214.
In step S202, the voice signal of user is received by microphone array.
In step S204, the received multi-path voice signal of microphone array is subjected to echo cancellor.
In step S206, received voice signal is subjected to Wave beam forming in scheduled multiple directions, is obtained multiple Wave beam.
In step S208, according to the signal quality of wave beam, selected part wave beam.
In some embodiments, at least one of the energy according to wave beam in set time window and signal-to-noise ratio, determine wave The signal quality of beam;Choose the part wave beam that signal quality is higher than signal quality threshold.Such as wave beam is in set time window The weighted value of energy and signal-to-noise ratio determines the signal quality of wave beam.The weight of energy and signal-to-noise ratio can according to actual needs into Row setting.For example, calculating separately energy power and Signal to Noise Ratio (SNR) of each wave beam in set time window, while carrying out normalizing Change processing, obtainsFurther calculate the signal quality of each wave beam Score Choose signal quality score scorek(k=1,2 ... M) are higher than the wave beam of signal quality threshold, alternatively, choosing the wave that signal quality comes default ranking Beam.
The preferable wave beam of quality is chosen by the above method, it is possible to reduce the calculation amount of subsequent process improves system effect Rate and wake-up accuracy rate.
In step S210, by the part wave beam input of selection keyword identification model trained in advance, wave beam is obtained Probability comprising keyword.
In step S212, includes the probability of keyword and the signal quality of wave beam according to wave beam, determine and be directed toward sound source side To wave beam, as sound source wave beam.
In step S214, according to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system System.
The disclosure also provides a kind of voice Rouser, is described below with reference to Fig. 3.
Fig. 3 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 3, the device of the embodiment 30 include: Wave beam forming module 302, keyword identification module 304, sound source determining module 306, voice wake-up module 308.
Wave beam forming module 302 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple Wave beam.
In some embodiments, Wave beam forming module 302 is used for the direction according to source noise, puts source noise and white noise Ratio and predetermined direction direction vector, determine the power of each road voice signal that microphone receives relative to the predetermined direction Weight, weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.
In some embodiments, Wave beam forming can be carried out according to the following formula.It is identical as the formula in previous embodiment.
Xn(k, l)=fft (xn(t)) (1)
Wherein, xn(t) voice signal received for n-th of microphone, fft () indicate to carry out voice signal quick Fourier transformation (FFT), obtains Xn(k, l) is xn(t) the SFFT amplitude of l period kth frequency band, l expression add voice signal Window is divided into l period and is respectively processed, and k indicates that every section of voice signal carries out the frequency band number after FFT transform.
ym(t)=ifft (ym(k, l)) (3)
Wherein, ym(t) output signal of m-th of the preset bearing wave beam formed for phased array beam, ifft () are indicated Inverse fast fourier transform, Ym(k, l) is ym(t) the SFFT amplitude of l period kth frequency band.It is m-th In wave beam treatment process, n-th of microphone receives the weight of voice signal in l period kth frequency band.
By above-mentioned formula it can be seen that, it is determined thatIt can determine the signal of the wave beam m of predetermined direction It indicates.
Wherein, Wm(k) each road voice signal received for microphone in m-th of wave beam treatment process makes a reservation for relative to this The weight vectors in direction,It, can be with for a n-dimensional vector Think identical for the signal weight vector of each period, it is known thatThen it can be obtained It indicates in m-th of wave beam treatment process, n-th of microphone receives the weight of voice signal in kth frequency band.For the covariance matrix of noise in m-th of wave beam treatment process,ForInverse matrix.The microphone array for enhancing orientation (i.e. preset bearing) for expectation in m-th of wave beam treatment process is directed toward vector,It is column vector for n,ForConjugate transposition.It then can be true by setting predetermined direction It is fixed.
αpsnFor the ratio of constant bearing point-source jamming noise in noise, 1- αpsnFor the ratio of white noise in noise.αpsnIt can To be obtained according to test or experience.For the direction of constant bearing point-source jamming noise in m-th of wave beam treatment process Vector,ForConjugate transposition.It can be obtained according to test or experience.
Keyword identification module 304 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam Probability containing keyword.
In some embodiments, keyword identification model includes: deep learning model or Hidden Markov Model.
Sound source determining module 306 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward The wave beam of Sounnd source direction, as sound source wave beam.
In some embodiments, sound source determining module 306 is used to wave beam include the probability of keyword and the signal of wave beam Quality is weighted summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, sound source wave The direction of Shu Zhixiang is determined as Sounnd source direction.
Voice wake-up module 308 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether Wake-up system.
In some embodiments, voice wake-up module 308 is used for the sound source of the sound source beam position at continuous multiple moment Direction is matched, and determines whether the sound source wave beam at continuous multiple moment includes keyword, in the sound at continuous multiple moment In the case that the Sounnd source direction of source beam position is consistent, and the sound source wave beam at continuous multiple moment includes keyword, wake up System.
Other embodiments of disclosure voice Rouser are described below with reference to Fig. 4.
Fig. 4 is the structure chart of other embodiments of disclosure voice Rouser.As shown in figure 4, the dress of the embodiment Setting 40 includes: echo cancellation module 402, Wave beam forming module 404, beam selection module 406, keyword identification module 408, sound Source determining module 410, voice wake-up module 412, model training module 414.
Echo cancellation module 402 will be for that will carry out echo cancellor by the received voice signal of microphone.
Wave beam forming module 404 is used to voice signal carrying out Wave beam forming in scheduled multiple directions, obtains multiple Wave beam.Wave beam forming module 404 is identical as 302 function of Wave beam forming module.
Beam selection module 406 is used for the signal quality according to wave beam, and selected part wave beam is sent to keyword identification mould Block, so that received wave beam is inputted keyword identification model trained in advance by keyword identification module 408.
In some embodiments, beam selection module 406 is for the energy and noise according to wave beam in set time window At least one of than, determine the signal quality of wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
Keyword identification module 408 is used for the keyword identification model that wave beam input is trained in advance, obtains the packet of wave beam Probability containing keyword.Keyword identification module 408 is identical as 304 function of keyword identification module.
Sound source determining module 410 is used to according to wave beam include the probability of keyword and the signal quality of wave beam, determines and is directed toward The wave beam of Sounnd source direction, as sound source wave beam.Sound source determining module 410 is identical as 306 function of sound source determining module.
Voice wake-up module 412 is used for the characteristic matching result of the sound source wave beam according to continuous multiple moment, it is determined whether Wake-up system.Voice wake-up module 412 is identical as 308 function of voice wake-up module.
Model training module 414 is used to voice signal carrying out beam forming process in scheduled multiple directions, obtains Multiple wave beams, to multiple wave beams carry out keyword mark, as training wave beam, will training wave beam input keyword identification model into Row training, with the keyword identification model trained in advance.
Model training module 414 can be used for receiving the multiple wave beams or received wave that Wave beam forming module 404 obtains Multiple wave beams that beam selecting module 406 obtains carry out keyword mark to multiple wave beams, as training wave beam, by training wave beam Input keyword identification model is trained, with the keyword identification model trained in advance.
Voice Rouser in embodiment of the disclosure can realize respectively by various calculating equipment or computer system, under Face combines Fig. 5 and Fig. 6 to be described.
Fig. 5 is the structure chart of some embodiments of disclosure voice Rouser.As shown in figure 5, the device of the embodiment 50 include: memory 510 and the processor 520 for being coupled to the memory 510, and processor 520 is configured as being based on being stored in Instruction in memory 510 executes the voice awakening method in the disclosure in any some embodiments.
Wherein, memory 110 is such as may include system storage, fixed non-volatile memory medium.System storage Device is for example stored with operating system, application program, Boot loader (Boot Loader), database and other programs etc..
Fig. 6 is the structure chart of other embodiments of disclosure voice Rouser.As shown in fig. 6, the dress of the embodiment Setting 60 includes: memory 610 and processor 620, similar with memory 510 and processor 520 respectively.It can also include defeated Enter output interface 630, network interface 640, memory interface 650 etc..These interfaces 630,640,650 and memory 610 and place It can for example be connected by bus 660 between reason device 620.Wherein, input/output interface 630 is display, mouse, keyboard, touching It touches the input-output equipment such as screen and connecting interface is provided.Network interface 640 provides connecting interface for various networked devices, such as can be with It is connected to database server or cloud storage server etc..Memory interface 650 is that the external storages such as SD card, USB flash disk mention For connecting interface.
A kind of computer readable storage medium that the disclosure also provides, is stored thereon with computer program, wherein the program The voice awakening method of aforementioned any embodiment is realized when being executed by processor.
Those skilled in the art should be understood that embodiment of the disclosure can provide as method, system or computer journey Sequence product.Therefore, complete hardware embodiment, complete software embodiment or combining software and hardware aspects can be used in the disclosure The form of embodiment.Moreover, it wherein includes the calculating of computer usable program code that the disclosure, which can be used in one or more, Machine can use the meter implemented in non-transient storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of calculation machine program product.
The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It is interpreted as to be realized by computer program instructions each in flowchart and/or the block diagram The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computer journeys Sequence instruct to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices processor with A machine is generated, so that the instruction generation executed by computer or the processor of other programmable data processing devices is used for Realize the dress for the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram It sets.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all spirit in the disclosure and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the protection scope of the disclosure.

Claims (22)

1. a kind of voice awakening method, comprising:
Voice signal is subjected to Wave beam forming in scheduled multiple directions, obtains multiple wave beams;
By wave beam input keyword identification model trained in advance, the probability comprising keyword of the wave beam is obtained;
Include the probability of keyword and the signal quality of the wave beam according to the wave beam, determine the wave beam for being directed toward Sounnd source direction, As sound source wave beam;
According to the characteristic matching result of the sound source wave beam at continuous multiple moment, it is determined whether wake up system.
2. voice awakening method according to claim 1, wherein
The keyword identification model that wave beam input is trained in advance includes:
According to the signal quality of the wave beam, selected part wave beam input keyword identification model trained in advance.
3. voice awakening method according to claim 2, wherein
The signal quality according to the wave beam, selected part wave beam include:
According at least one of energy of the wave beam in set time window and signal-to-noise ratio, the signal matter of the wave beam is determined Amount;
Choose the part wave beam that signal quality is higher than signal quality threshold.
4. voice awakening method according to claim 1, wherein
The signal quality of the probability according to the wave beam comprising keyword and the wave beam determines the wave for being directed toward Sounnd source direction Beam includes: as sound source wave beam
Include that the probability of keyword and the signal quality of the wave beam are weighted summation by the wave beam, obtains the important of wave beam Degree;
The highest wave beam of significance level is chosen as sound source wave beam, the direction of the sound source beam position is determined as Sounnd source direction.
5. voice awakening method according to claim 1, wherein
The characteristic matching result of the continuous sound source wave beam at multiple moment of the basis, it is determined whether wake-up system includes:
The Sounnd source direction of the sound source beam position at continuous multiple moment is matched, and determines the sound at continuous multiple moment Whether source wave beam includes keyword;
It is consistent in the Sounnd source direction of the sound source beam position at continuous multiple moment, and the sound source wave beam at continuous multiple moment Comprising waking up system in the case where keyword.
6. voice awakening method according to claim 1, wherein
Described that voice signal is carried out Wave beam forming in scheduled multiple directions, obtaining multiple wave beams includes:
According to the direction of source noise, the direction vector of the ratio of point source noise and white noise and predetermined direction, microphone is determined Weight of each road voice signal received relative to the predetermined direction;
Weight of each road voice signal received according to microphone relative to the predetermined direction, each road that microphone is received Voice signal is weighted summation, determines the wave beam of the predetermined direction.
7. voice awakening method according to claim 6, wherein
Each road voice signal that the microphone receives calculates according to the following formula relative to the weight of the predetermined direction:
Wherein, Wm(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process Gram wind array is directed toward vector,ForConjugate transposition, αpsnIt makes an uproar for preset bearing point-source jamming in noise The ratio of sound, 1- αpsnFor the ratio of white noise in noise,For preset bearing point source in m-th of wave beam treatment process The direction vector of interference noise,ForConjugate transposition.
8. voice awakening method according to claim 1, further includes:
Voice signal is subjected to beam forming process in scheduled multiple directions, obtains multiple wave beams;
Keyword mark is carried out to multiple wave beams, as training wave beam;
The trained wave beam input keyword identification model is trained, with the keyword identification model trained in advance.
9. voice awakening method according to claim 1, wherein
It is described voice signal is subjected to beam forming process in scheduled multiple directions before further include:
Echo cancellor will be carried out by the received voice signal of microphone.
10. -9 described in any item voice awakening methods according to claim 1, wherein
The keyword identification model includes: deep learning model or Hidden Markov Model.
11. a kind of voice Rouser, comprising:
Wave beam forming module obtains multiple wave beams for voice signal to be carried out Wave beam forming in scheduled multiple directions;
Keyword identification module obtains the wave beam for the keyword identification model that wave beam input is trained in advance Probability comprising keyword;
Sound source determining module, for the signal quality of probability and the wave beam according to the wave beam comprising keyword, determination refers to To the wave beam of Sounnd source direction, as sound source wave beam;
Voice wake-up module, for the characteristic matching result according to the continuously sound source wave beam at multiple moment, it is determined whether wake up system System.
12. voice Rouser according to claim 11, further includes:
Beam selection module, for the signal quality according to the wave beam, selected part wave beam is sent to the keyword identification Module, so that received wave beam is inputted keyword identification model trained in advance by the keyword identification module.
13. voice Rouser according to claim 12, wherein
The beam selection module is used at least one of energy and signal-to-noise ratio according to the wave beam in set time window, really The signal quality of the fixed wave beam;Choose the part wave beam that signal quality is higher than signal quality threshold.
14. voice Rouser according to claim 11, wherein
The sound source determining module is used to include that the probability of keyword and the signal quality of the wave beam are added by the wave beam Power summation, obtains the significance level of wave beam, chooses the highest wave beam of significance level as sound source wave beam, the sound source beam position Direction be determined as Sounnd source direction.
15. voice Rouser according to claim 11, wherein
The voice wake-up module is used to match the Sounnd source direction of the sound source beam position at continuous multiple moment, and determines Whether the sound source wave beam at continuous multiple moment includes keyword, in the sound source of the sound source beam position at continuous multiple moment In the case that direction is consistent, and the sound source wave beam at continuous multiple moment includes keyword, system is waken up.
16. voice Rouser according to claim 11, wherein
The Wave beam forming module is used for the direction according to source noise, the ratio for putting source noise and white noise and predetermined direction It is directed toward vector, the weight of each road voice signal that microphone receives relative to the predetermined direction is determined, is received according to microphone Weight of each road voice signal arrived relative to the predetermined direction, is weighted each road voice signal that microphone receives and asks With determine the wave beam of the predetermined direction.
17. voice Rouser according to claim 16, wherein
Each road voice signal that the microphone receives calculates according to the following formula relative to the weight of the predetermined direction:
Wherein, Wm(k) each road voice signal received for microphone in m-th wave beam treatment process is relative to the predetermined direction Weight vectors, k be microphone receive signal different frequency range number,For noise in m-th of wave beam treatment process Covariance matrix,ForInverse matrix,For the wheat of predetermined direction in m-th of wave beam treatment process Gram wind array is directed toward vector,ForConjugate transposition, αpsnIt makes an uproar for preset bearing point-source jamming in noise The ratio of sound, 1- αpsnFor the ratio of white noise in noise,It is dry for preset bearing point source in m-th of wave beam treatment process The direction vector of noise is disturbed,ForConjugate transposition.
18. voice Rouser according to claim 11, further includes:
Model training module obtains multiple waves for voice signal to be carried out beam forming process in scheduled multiple directions Beam carries out keyword mark to multiple wave beams, and as training wave beam, the trained wave beam input keyword identification model is carried out Training, with the keyword identification model trained in advance.
19. voice Rouser according to claim 11, further includes:
Echo cancellation module, for echo cancellor will to be carried out by the received voice signal of microphone.
20. the described in any item voice Rousers of 1-19 according to claim 1,
The keyword identification model includes: deep learning model or Hidden Markov Model.
21. a kind of voice Rouser, comprising:
Memory;And
It is coupled to the processor of the memory, the processor is configured to based on the finger being stored in the memory devices It enables, executes such as the described in any item voice awakening methods of claim 1-10.
22. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor The step of realizing any one of claim 1-10 the method.
CN201810992991.4A 2018-08-29 2018-08-29 Voice wake-up method, apparatus and computer readable storage medium Active CN109272989B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810992991.4A CN109272989B (en) 2018-08-29 2018-08-29 Voice wake-up method, apparatus and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810992991.4A CN109272989B (en) 2018-08-29 2018-08-29 Voice wake-up method, apparatus and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109272989A true CN109272989A (en) 2019-01-25
CN109272989B CN109272989B (en) 2021-08-10

Family

ID=65154643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810992991.4A Active CN109272989B (en) 2018-08-29 2018-08-29 Voice wake-up method, apparatus and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109272989B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920433A (en) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 The voice awakening method of electronic equipment under noisy environment
CN109949810A (en) * 2019-03-28 2019-06-28 华为技术有限公司 A kind of voice awakening method, device, equipment and medium
CN110265020A (en) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 Voice awakening method, device and electronic equipment, storage medium
CN110277093A (en) * 2019-07-30 2019-09-24 腾讯科技(深圳)有限公司 The detection method and device of audio signal
CN110517682A (en) * 2019-09-02 2019-11-29 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
WO2020001163A1 (en) * 2018-06-28 2020-01-02 腾讯科技(深圳)有限公司 Method and device for speech recognition, computer device, and electronic device
CN110797051A (en) * 2019-10-28 2020-02-14 星络智能科技有限公司 Awakening threshold setting method and device, intelligent sound box and storage medium
CN111276143A (en) * 2020-01-21 2020-06-12 北京远特科技股份有限公司 Sound source positioning method and device, voice recognition control method and terminal equipment
WO2020164397A1 (en) * 2019-02-12 2020-08-20 阿里巴巴集团控股有限公司 Voice recognition method and system
CN111667843A (en) * 2019-03-05 2020-09-15 北京京东尚科信息技术有限公司 Voice wake-up method and system for terminal equipment, electronic equipment and storage medium
CN111755021A (en) * 2019-04-01 2020-10-09 北京京东尚科信息技术有限公司 Speech enhancement method and device based on binary microphone array
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN111883162A (en) * 2020-07-24 2020-11-03 杨汉丹 Awakening method and device and computer equipment
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN113257269A (en) * 2021-04-21 2021-08-13 瑞芯微电子股份有限公司 Beam forming method based on deep learning and storage device
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device
CN113782009A (en) * 2021-11-10 2021-12-10 中科南京智能技术研究院 Voice awakening system based on Savitzky-Golay filter smoothing method
CN114257684A (en) * 2021-12-17 2022-03-29 歌尔科技有限公司 Voice processing method, system and device and electronic equipment
CN116504264A (en) * 2023-06-30 2023-07-28 小米汽车科技有限公司 Audio processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013137900A1 (en) * 2012-03-16 2013-09-19 Nuance Communictions, Inc. User dedicated automatic speech recognition
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN106483502A (en) * 2016-09-23 2017-03-08 科大讯飞股份有限公司 A kind of sound localization method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013137900A1 (en) * 2012-03-16 2013-09-19 Nuance Communictions, Inc. User dedicated automatic speech recognition
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN106483502A (en) * 2016-09-23 2017-03-08 科大讯飞股份有限公司 A kind of sound localization method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHAO PAN, JINGDONG CHEN, JACOB BENESTY: "Performance Study of the MVDR Beam-former as a Function of the Source Incidence Angle", 《IEEE TRANSACTIONS ON AUDIO,SPEECH AND LANGUAGE PROCESSING 2014》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020001163A1 (en) * 2018-06-28 2020-01-02 腾讯科技(深圳)有限公司 Method and device for speech recognition, computer device, and electronic device
US11217229B2 (en) * 2018-06-28 2022-01-04 Tencent Technology (Shenzhen) Company Ltd Method and apparatus for speech recognition, and electronic device
WO2020164397A1 (en) * 2019-02-12 2020-08-20 阿里巴巴集团控股有限公司 Voice recognition method and system
CN111627425A (en) * 2019-02-12 2020-09-04 阿里巴巴集团控股有限公司 Voice recognition method and system
CN111627425B (en) * 2019-02-12 2023-11-28 阿里巴巴集团控股有限公司 Voice recognition method and system
CN111667843B (en) * 2019-03-05 2021-12-31 北京京东尚科信息技术有限公司 Voice wake-up method and system for terminal equipment, electronic equipment and storage medium
CN111667843A (en) * 2019-03-05 2020-09-15 北京京东尚科信息技术有限公司 Voice wake-up method and system for terminal equipment, electronic equipment and storage medium
CN109920433B (en) * 2019-03-19 2021-08-20 上海华镇电子科技有限公司 Voice awakening method of electronic equipment in noisy environment
CN109920433A (en) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 The voice awakening method of electronic equipment under noisy environment
CN109949810A (en) * 2019-03-28 2019-06-28 华为技术有限公司 A kind of voice awakening method, device, equipment and medium
CN109949810B (en) * 2019-03-28 2021-09-07 荣耀终端有限公司 Voice wake-up method, device, equipment and medium
WO2020192721A1 (en) * 2019-03-28 2020-10-01 华为技术有限公司 Voice awakening method and apparatus, and device and medium
CN111755021A (en) * 2019-04-01 2020-10-09 北京京东尚科信息技术有限公司 Speech enhancement method and device based on binary microphone array
CN111755021B (en) * 2019-04-01 2023-09-01 北京京东尚科信息技术有限公司 Voice enhancement method and device based on binary microphone array
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN111833901B (en) * 2019-04-23 2024-04-05 北京京东尚科信息技术有限公司 Audio processing method, audio processing device, system and medium
CN112216295A (en) * 2019-06-25 2021-01-12 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN112216295B (en) * 2019-06-25 2024-04-26 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN110265020A (en) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 Voice awakening method, device and electronic equipment, storage medium
WO2021008000A1 (en) * 2019-07-12 2021-01-21 大象声科(深圳)科技有限公司 Voice wakeup method and apparatus, electronic device and storage medium
CN110277093B (en) * 2019-07-30 2021-10-26 腾讯科技(深圳)有限公司 Audio signal detection method and device
CN110277093A (en) * 2019-07-30 2019-09-24 腾讯科技(深圳)有限公司 The detection method and device of audio signal
CN110517682A (en) * 2019-09-02 2019-11-29 腾讯科技(深圳)有限公司 Audio recognition method, device, equipment and storage medium
CN110797051A (en) * 2019-10-28 2020-02-14 星络智能科技有限公司 Awakening threshold setting method and device, intelligent sound box and storage medium
CN111276143A (en) * 2020-01-21 2020-06-12 北京远特科技股份有限公司 Sound source positioning method and device, voice recognition control method and terminal equipment
CN111883162A (en) * 2020-07-24 2020-11-03 杨汉丹 Awakening method and device and computer equipment
CN113257269A (en) * 2021-04-21 2021-08-13 瑞芯微电子股份有限公司 Beam forming method based on deep learning and storage device
CN113284505A (en) * 2021-04-21 2021-08-20 瑞芯微电子股份有限公司 Adaptive beam forming method and storage device
CN113782009A (en) * 2021-11-10 2021-12-10 中科南京智能技术研究院 Voice awakening system based on Savitzky-Golay filter smoothing method
CN114257684A (en) * 2021-12-17 2022-03-29 歌尔科技有限公司 Voice processing method, system and device and electronic equipment
CN116504264B (en) * 2023-06-30 2023-10-31 小米汽车科技有限公司 Audio processing method, device, equipment and storage medium
CN116504264A (en) * 2023-06-30 2023-07-28 小米汽车科技有限公司 Audio processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN109272989B (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN109272989A (en) Voice awakening method, device and computer readable storage medium
CN110491403B (en) Audio signal processing method, device, medium and audio interaction equipment
JP7434137B2 (en) Speech recognition method, device, equipment and computer readable storage medium
JP7337953B2 (en) Speech recognition method and device, neural network training method and device, and computer program
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
CN107703486B (en) Sound source positioning method based on convolutional neural network CNN
CN110444214B (en) Speech signal processing model training method and device, electronic equipment and storage medium
CN110364166B (en) Electronic equipment for realizing speech signal recognition
CN110503971A (en) Time-frequency mask neural network based estimation and Wave beam forming for speech processes
CN105068048B (en) Distributed microphone array sound localization method based on spatial sparsity
CN110503970A (en) A kind of audio data processing method, device and storage medium
CN110556103A (en) Audio signal processing method, apparatus, system, device and storage medium
Dorfan et al. Tree-based recursive expectation-maximization algorithm for localization of acoustic sources
CN108417224B (en) Training and recognition method and system of bidirectional neural network model
WO2019080551A1 (en) Target voice detection method and apparatus
CN108122563A (en) Improve voice wake-up rate and the method for correcting DOA
US20150117649A1 (en) Selective Audio Source Enhancement
CN108735199B (en) Self-adaptive training method and system of acoustic model
CN108269567A (en) For generating the method, apparatus of far field voice data, computing device and computer readable storage medium
CN110211599A (en) Using awakening method, device, storage medium and electronic equipment
CN110400571A (en) Audio-frequency processing method, device, storage medium and electronic equipment
CN111667843B (en) Voice wake-up method and system for terminal equipment, electronic equipment and storage medium
CN112904279A (en) Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum
CN109859769A (en) A kind of mask estimation method and device
Chang et al. Audio adversarial examples generation with recurrent neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant