WO2021008000A1 - Voice wakeup method and apparatus, electronic device and storage medium - Google Patents

Voice wakeup method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2021008000A1
WO2021008000A1 PCT/CN2019/114378 CN2019114378W WO2021008000A1 WO 2021008000 A1 WO2021008000 A1 WO 2021008000A1 CN 2019114378 W CN2019114378 W CN 2019114378W WO 2021008000 A1 WO2021008000 A1 WO 2021008000A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
noise ratio
wake
beam signal
interference
Prior art date
Application number
PCT/CN2019/114378
Other languages
French (fr)
Chinese (zh)
Inventor
段相
张珍斌
Original Assignee
大象声科(深圳)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大象声科(深圳)科技有限公司 filed Critical 大象声科(深圳)科技有限公司
Publication of WO2021008000A1 publication Critical patent/WO2021008000A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0617Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present disclosure relates to the technical field of intelligent voice interaction, and in particular to a voice wake-up method, device, electronic equipment, and storage medium.
  • Voice as the most natural way of human interaction, has also become one of the most important ways people hope to replace the mouse, keyboard, and touch screen to communicate with computers.
  • Voice wake-up technology has become a very important function in the process of human-computer interaction and has received more and more attention.
  • Wake-up rate, false wake-up, response time and power consumption level are four general evaluation indicators for judging voice wake-up technology.
  • voice wake-up technology users are increasingly pursuing experience effects.
  • the combination of traditional front-end voice enhancement technology and wake-up models has become an important way to improve the wake-up rate.
  • multi-microphone enhancement technology is widely used in front-end voice enhancement. With multi-mic technology, the signal-to-noise ratio of the input voice will be significantly enhanced, so that a better recognition effect can be obtained.
  • the voice wake-up rate is low due to interference and reverberation.
  • microphone technology can also be used to preprocess the sound signal.
  • the use of multi-microphone technology can make full use of spatial information to enhance speech.
  • the microphone array can solve the acoustic problems of the room, such as sound source location, tracking, noise cancellation, speech enhancement, signal source separation, and reverberation cancellation.
  • the present invention provides a voice wake-up method, device, electronic equipment, and storage medium.
  • a voice wake-up method including:
  • the step of calculating the signal-to-noise ratio of each beam signal includes:
  • the beam signal includes a target source signal, an interference source signal, and background noise
  • the point source signal energy includes the target source signal energy and the interference source signal energy
  • the method further includes:
  • the point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.
  • the step of determining the most awakening direction through the signal-to-noise ratio includes:
  • each candidate beam signal determine the beam signal direction with the largest signal-to-noise ratio
  • the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames includes:
  • the preset signal-to-noise ratio threshold When the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the difference between the signal-to-noise ratio in this direction and the second signal-to-noise ratio in the direction of the maximum signal-to-noise ratio beam signal, The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
  • the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal includes:
  • the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the condition is set as Optimal beam signal direction.
  • the steps include:
  • the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.
  • the step of determining the wake-up direction through the signal-to-noise ratio further includes:
  • the interference direction is determined as the wake-up direction.
  • a voice wake-up device including:
  • the sound signal receiving module is used to receive the sound signal collected by the microphone
  • a fixed beamforming module configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions
  • the signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal
  • a wake-up direction determination module configured to determine the wake-up direction based on the signal-to-noise ratio
  • the voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.
  • an electronic device including:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method according to the first aspect.
  • a computer-readable storage medium for storing a program that, when executed, causes an electronic device to perform the method described in the first aspect.
  • the sound signal After receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform the voice wake-up operation, making the system in a low signal-to-noise ratio environment It can also accurately determine the wake-up direction, effectively improving the accuracy of voice wake-up.
  • Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.
  • Fig. 2 is a beam signal direction diagram according to the embodiment corresponding to Fig. 1.
  • FIG. 3 is a specific implementation flowchart of step S130 in the voice wake-up method in the embodiment corresponding to FIG. 1.
  • Fig. 4 is a schematic diagram of a microphone array according to an exemplary embodiment.
  • Fig. 5 is a specific implementation flow chart of step S140 in the voice wake-up method in the embodiment corresponding to Fig. 1.
  • FIG. 6 is a specific implementation flowchart of step S141 shown in the embodiment corresponding to FIG. 5.
  • FIG. 7 is a specific implementation flowchart of step S143 shown in the embodiment corresponding to FIG. 5.
  • FIG. 8 is another specific implementation flowchart of step S140 in the voice wake-up method in the embodiment corresponding to FIG. 5.
  • Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.
  • FIG. 10 is a block diagram of the signal-to-noise ratio calculation module 130 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
  • FIG. 11 is a block diagram of the wake-up direction determining module 140 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
  • Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.
  • the voice wake-up method can be used in electronic devices such as smart phones and computers.
  • the voice wake-up method may include step S110, step S120, step S130, and step S140.
  • Step S110 Receive the sound signal collected by the microphone.
  • the electronic device When the electronic device is awakened by voice, the electronic device will collect the sound signal through the microphone.
  • the sound signal collected by the microphone not only includes the voice signal used for voice wake-up, but also contains interference noise.
  • the voice awakening rate is improved through voice front-end enhancement technology.
  • the voice signal can be collected through a microphone array, and the number of microphones is M, then the collected microphone signal is:
  • n time
  • T transpose
  • Step S120 Perform fixed beamforming on the voice signal to generate multiple beam signals.
  • a delayed sum beamforming method and a filtered sum beamforming method can be used.
  • the number of beams is BN (BN ⁇ M), and the direction of the beams is fixed and uniformly distributed (linear array 0°-180°, circular array 0°-360°).
  • the coefficients of the beamformer can be implemented by delay-sum technology and filter-sum technology, and different beamforming methods can also be used for different frequency bands. Delay and sum beamforming can obtain higher white noise amplification gain; in the filtering and summing beamforming method, the differential array is widely used due to its smaller size and better frequency invariance characteristics.
  • this embodiment designs a beamformer with broadband and gain independent of frequency band.
  • Figure 2 is a beam signal pattern shown in this embodiment. The array is a circular array, the number of microphones is 4, and the radius is 0.035m, Figure 2(a) is a three-dimensional schematic diagram of frequency angle gain, and Figure 2(b) is a beam polar coordinate diagram. It can be seen from Fig. 2 that the beam sidelobe attenuation of this embodiment is about 25dB.
  • BN beam output can be obtained:
  • Y(w) [y 1 (w),y 2 (w),...,y BN (w)] T , w represents the frequency point.
  • Step S130 Calculate the signal-to-noise ratio of each beam signal.
  • the signal-to-noise ratio is the ratio between the energy in the sound signal and the noise spectrum.
  • step S130 may include step S131, step S132, and step S133.
  • Step S131 For each beam signal, calculate the energy and noise spectrum of each frequency point in the beam signal.
  • Step S132 Calculate the signal-to-noise ratio of each frequency point in the beam signal based on the energy and noise spectrum of each frequency point in the beam signal.
  • the corresponding beam signals also include signals of different frequency points. Therefore, in order to improve the voice awakening rate, the energy and noise spectrum of each frequency point in the beam signal must be calculated.
  • a corresponding smoothing factor is selected to smooth the energy of each frequency point in the beam signal through the position relationship between each beam signal and the interference direction.
  • the magnitude of the smoothing factor weight i (w) is related to the interference direction, that is, the greater the smoothing factor set for the beam signal whose beam direction is closer to the interference direction.
  • the beam signal smoothing factor 2 weight 2 (w) and the beam signal smoothing factor 4 weight 4 (w) corresponding to the weight is large (w) corresponding to beam signals 1weight 1 the weight of.
  • weight 2 (w) weight 4 (w) ⁇ 0.8
  • weight 1 (w) 0.6.
  • the estimation method can be: MCRA, IMCRA, MARTIN, DOBLINGER, HIRSCH and other single-channel noise spectrum estimation methods, and other noise spectrum estimation methods can also be used.
  • the specific noise spectrum is not determined here.
  • the estimation methods are described one by one.
  • Step S133 representing the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal in the preset frequency band.
  • the preset frequency band range is 0-2 kHz.
  • Step S140 Determine the wake-up direction according to the signal-to-noise ratio.
  • the wake-up direction is the voice wake-up direction confirmed by the present invention.
  • the present invention calculates the signal-to-noise ratio of each beam signal, and then determines the wake-up direction from multiple beam signals according to the signal-to-noise ratio of each beam signal, and then uses the beam signal in the wake-up direction to perform voice wake-up operations on the electronic device.
  • the optimal beam signal direction When selecting the optimal beam signal direction from all beam signals, it can be determined as the optimal beam signal direction according to the beam signal direction with the largest average signal-to-noise ratio in a certain period of time, or it can be the largest signal-to-noise direction in a certain period of time.
  • the beam signal direction with the largest number of frames is determined as the optimal beam signal direction, and the optimal beam signal direction may also be determined by other methods, which will not be described here.
  • step S140 may include steps S141, S142, S143, and S144.
  • Step S141 Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames.
  • the interference direction is the direction of the noise source that interferes with the voice signal relative to the electronic device.
  • 3 is the interference source
  • the electronic device is located at the center of the circle
  • the direction of the interference source 3 relative to the center of the circle is the interference direction.
  • the interference source Since the interference source has a greater impact on the voice signal during voice wake-up, the sound signal generated by the interference source will have a greater impact on each beam signal. Therefore, by pre-determining the interference direction, and then through the positional relationship between other beam signal directions and the interference source, the corresponding beam signal is smoothed, thereby effectively reducing the influence of the interference source on voice wake-up and improving the accuracy of voice wake-up .
  • the interference direction can be the beam signal with the largest number of frames whose signal-to-noise ratio reaches the preset signal-to-noise ratio threshold within the preset number of frames.
  • the direction is determined as the interference direction, or the direction of the beam signal with the largest average signal-to-noise ratio within the preset number of frames is determined as the interference direction, or the interference direction may be determined by other methods, which will not be described here.
  • step S141 may include steps S1411, S1412, and S1413.
  • Step S1411 Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with a preset signal-to-noise ratio threshold.
  • Step S1412 when the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the signal-to-noise ratio in the direction of the beam signal with the maximum signal-to-noise ratio between that direction and the second signal-to-noise ratio The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
  • Step S1413 Count the sum of the difference values recorded in each beam signal direction within the preset number of frames. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction.
  • the preset number of frames is T1, and preferably, T1 ⁇ 2000 frames.
  • SNR i the signal-to-noise ratio of all beam signal directions
  • MAXSNR the maximum signal-to-noise ratio of all beam signal directions
  • set the threshold ⁇ if MAXSNR ⁇ , then consider the beam signal It is a silent segment, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio is recorded in the direction of the maximum signal-to-noise ratio beam signal, and the other directions are recorded as zero.
  • MAXSNR> ⁇ then mark all beam signal directions to zero, and finally count the sum of the difference values recorded in each beam signal direction within the preset frame number T1.
  • the beam signal direction with the largest sum is determined as the interference direction.
  • the value of T1 and ⁇ will be selected according to specific scenarios, so as to better improve the accuracy of the interference direction judgment.
  • T1>2000 frames, ⁇ 10dB.
  • Step S142 Remove the beam signal corresponding to the interference direction from all beam signals to obtain candidate beam signals.
  • the candidate beam signal is a beam signal set after removing the beam signal corresponding to the interference direction from all beam signals.
  • the interference direction determined by the technical solution of the present invention is not the optimal beam signal direction. Therefore, when the optimal beam signal direction is determined, the beam signal corresponding to the interference direction is removed from all beam signals, and then according to the candidate beam signal The signal-to-noise ratio further determines the optimal beam signal direction to improve the accuracy of determining the optimal beam signal direction.
  • Step S143 Determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal.
  • the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal.
  • step S143 may further include step S1431, step S1432, step S1433, and step S1434:
  • Step S1431 Sort the signal-to-noise ratios of the candidate beam signals.
  • Step S1432 If the maximum signal-to-noise ratio of the candidate beam exceeds the threshold within the preset number of consecutive frames, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, step S1433 is executed ; If it is no, no processing is done.
  • step S1433 it is judged whether the beam signal directions of the maximum signal-to-noise ratio remain consistent within the preset number of consecutive frames. If it is (Y), then execute step S1434; if it is not, then no processing is performed.
  • Step S1434 Set the beam signal direction where the maximum SNR is located as the optimal beam signal direction within a certain preset number of frames.
  • the signal-to-noise ratios of candidate beam signals other than the interference direction are sorted, and the beam signal in the direction of the maximum signal-to-noise ratio is selected.
  • the beam signal direction has the maximum signal-to-noise ratio in consecutive N frames, and its signal-to-noise ratio MAXSNR> ⁇ (preset threshold) within N consecutive frames, and exceeds a certain threshold ⁇ of the second largest signal-to-noise ratio SECSNR, It is assumed that the direction of the MAXSNR beam signal is the optimal beam signal direction.
  • the optimal beam signal direction is set to the MAXSNR direction within a certain time range T3.
  • the size of T3 depends on different wake words.
  • Step S144 Count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction where the candidate beam signals are located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
  • each candidate beam signal In each frame, according to the signal-to-noise ratio of each candidate beam signal, count the candidate beam signals with the largest signal-to-noise ratio in the preset number of frames, and determine the direction of the candidate beam signal as the optimal Beam signal direction.
  • the number of frames is T2.
  • the number of frames is T2 preset frame numbers, in the BN beams, except for the beam signals in the interference direction that have been detected, if the maximum SNR is less than the threshold th, then the current frame The optimal beam signal direction of is consistent with the previous frame; if the maximum SNR is greater than the threshold th, the beam signal corresponding to the maximum SNR is recorded as 1, the beam signal in the interference direction and other smaller SNR beam signals are recorded as 0, and the statistics are in The direction of the beam signal with the largest number of SNR occurrences in the T2 frames is determined as the optimal beam signal direction of the current frame.
  • 20 ⁇ T2 ⁇ 100 and th 10.
  • the wake-up direction will be identified as the interference direction.
  • the statistical interference direction and the optimal beam signal direction determined according to the aforementioned method are both used as the wake-up direction, that is, two wakeups are performed. If one of them exceeds the threshold, Determined to be awake state.
  • step S140 may further include step S146 and step S147:
  • Step S146 It is judged whether the energy of the signal source is far greater than a certain threshold of the energy of the interference source. If yes (Y), proceed to step S147; if no, proceed to step S142.
  • Step S147 Determine the interference direction as the wake-up direction.
  • the interference direction can be determined as the wake-up direction, and steps S142, S143, and S144 are performed at the same time to determine the optimal beam signal direction, and the interference direction and the optimal beam signal direction are carried out in two beams.
  • Signal direction wake-up it is also possible to determine that the signal source energy is greater than a certain threshold of the interference source energy, and determine the interference direction as the wake-up direction, without performing steps S142, S143, S144, and directly perform voice wake-up operations according to the voice signal in the interference direction. To improve the efficiency of voice wake-up operations.
  • Step S150 Perform a voice wake-up operation according to the voice signal in the wake-up direction.
  • the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform speech according to the sound signal in the wake-up direction
  • the wake-up operation enables the system to accurately determine the wake-up direction even in a low signal-to-noise ratio environment, which effectively improves the accuracy of voice wake-up.
  • the experimental results are used to illustrate the experiment.
  • the experiment is carried out in a room of 6x3x3.5m.
  • the microphone array is a 4 wheat circular array with a radius of 0.035m and is located at 3x1.5x1.5m.
  • the interference direction is at 2x1.5x1.5m, and the wake-up positions are distributed on a circle 1.2m away from the microphone. There are two wake-ups every 30 degrees, a total of 24 wake-ups.
  • test signal-to-noise ratio is -5dB, 0dB, and 5dB, respectively, and the test results are shown in Table 1.
  • the third line in the table shows the probability that the best beam signal direction is correct. From the results, it can be seen that the probability of babble interference at -5dB is lower, and the probability of other cases is above 80%.
  • Rows 4 and 5 of Table 1 are the wake-up results of single wheat and the present invention respectively. From the table, it can be seen that the present invention can significantly improve the arousal rate.
  • Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.
  • the device includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 are not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 are not limited to: a voice wake-up operation module 150.
  • the sound signal receiving module 110 is used to receive the sound signal collected by the microphone
  • the fixed beam forming module 120 is configured to perform fixed beam forming on the sound signal to generate multiple beam signals;
  • the signal-to-noise ratio calculation module 130 is used to calculate the signal-to-noise ratio of each beam signal
  • the wake-up direction determination module 140 is configured to determine the wake-up direction according to the signal-to-noise ratio
  • the voice wake-up operation module 150 is configured to perform voice wake-up operations according to the voice signal in the wake-up direction.
  • the SNR calculation module 130 described in FIG. 9 includes, but is not limited to: a signal energy and noise spectrum calculation unit 131, a frequency point SNR calculation unit 132, and a beam signal SNR Calculating unit 133.
  • the signal energy and noise spectrum calculation unit 131 is configured to calculate the energy and noise spectrum of each frequency point in the beam signal for each beam signal;
  • the frequency point signal-to-noise ratio calculation unit 132 is configured to pass the signal-to-noise ratio of each frequency point in the beam signal;
  • the beam signal signal-to-noise ratio calculation unit 133 is configured to indicate the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal within a preset frequency band.
  • the signal-to-noise ratio calculation module 130 described in FIG. 10 further includes, but is not limited to, a smoothing processing unit.
  • the smoothing processing unit is used for smoothing the energy of each frequency point in the beam signal through a smoothing factor.
  • the wake-up direction determination module 140 described in FIG. 9 includes but is not limited to: an interference direction determination unit 141, a rejection unit 142, a candidate beam signal determination unit 143 and a wake-up direction determination unit 144.
  • the interference direction determining unit 141 is configured to determine the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames;
  • the removing unit 142 is configured to remove the beam signal in the interference direction from all the beam signals to obtain candidate beam signals;
  • the candidate beam signal determining unit 143 is configured to determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal;
  • the wake-up direction determining unit 144 is configured to count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction in which the candidate beam signals are located as the optimal beam signal direction, and determine the optimal beam signal direction. The direction of the beam signal is used as the wake-up direction.
  • the present invention also provides an electronic device that performs all or part of the steps of the voice wake-up method shown in any of the foregoing exemplary embodiments.
  • Electronic equipment includes:
  • a memory connected in communication with the processor; wherein,
  • the memory stores readable instructions, and when the readable instructions are executed by the processor, the method according to any of the foregoing exemplary embodiments is implemented.
  • a storage medium is also provided.
  • the storage medium is a computer-readable storage medium, for example, it may be a temporary and non-transitory computer-readable storage medium including instructions.

Abstract

Disclosed are a voice wakeup method and apparatus, an electronic device and a storage medium. The method comprises: receiving sound signals collected by a microphone (S110); carrying out fixed beamforming on the sound signals to generate a plurality of beam signals (S120); calculating the signal-to-noise ratio of each beam signal (S130); determining a wakeup direction by means of the signal-to-noise ratios (S140); and carrying out a voice wakeup operation according to a sound signal in the wakeup direction (S150). By means of the wakeup method, a system can accurately determine a wakeup direction in a low signal-to-noise ratio environment, thereby effectively improving the accuracy of voice wakeup.

Description

语音唤醒方法、装置及电子设备、存储介质Voice wake-up method, device, electronic equipment and storage medium 技术领域Technical field
本公开涉及智能语音交互技术领域,特别涉及一种语音唤醒方法、装置及电子设备、存储介质。The present disclosure relates to the technical field of intelligent voice interaction, and in particular to a voice wake-up method, device, electronic equipment, and storage medium.
背景技术Background technique
随着语音技术的发展以及智能交互领域的进步,人类与机器信息交流的需求越来越迫切,人机交互成为当前技术发展的热点。With the development of voice technology and advancement in the field of intelligent interaction, the demand for information exchange between humans and machines has become more and more urgent, and human-computer interaction has become a hot spot in current technological development.
语音作为人类最自然的交互方式,也成为人们希望能替代鼠标,键盘,及触屏与计算机交流的最重要方式之一。语音唤醒技术成为人机交互过程中一个很重要的功能而受到了越来越多的关注。唤醒率、误唤醒、响应时间和功耗水平是判断语音唤醒技术的四种通用的评价指标。伴随着语音唤醒技术的发展,用户对体验效果的追求越来越高,传统前端语音增强技术与唤醒模型结合成为提高唤醒率的重要方式。目前多麦克风增强技术被广泛利用到前端语音增强,采用多麦技术,输入语音的信噪比会明显增强,从而可以获得更好的识别效果。Voice, as the most natural way of human interaction, has also become one of the most important ways people hope to replace the mouse, keyboard, and touch screen to communicate with computers. Voice wake-up technology has become a very important function in the process of human-computer interaction and has received more and more attention. Wake-up rate, false wake-up, response time and power consumption level are four general evaluation indicators for judging voice wake-up technology. With the development of voice wake-up technology, users are increasingly pursuing experience effects. The combination of traditional front-end voice enhancement technology and wake-up models has become an important way to improve the wake-up rate. At present, multi-microphone enhancement technology is widely used in front-end voice enhancement. With multi-mic technology, the signal-to-noise ratio of the input voice will be significantly enhanced, so that a better recognition effect can be obtained.
在低信噪比下,受到干扰及混响等影响,语音唤醒率较低,为提高唤醒率,除了优化后端唤醒模型,还可利用麦克风技术对声音信号进行预处理。利用多麦克风技术,可以充分利用空间信息从而对语音进行增强,麦克风阵列可以解决房间的声学问题,例如声源定位,跟踪,噪声消除,语音增强,信号源分离,混响抵消。Under low signal-to-noise ratio, the voice wake-up rate is low due to interference and reverberation. In order to improve the wake-up rate, in addition to optimizing the back-end wake-up model, microphone technology can also be used to preprocess the sound signal. The use of multi-microphone technology can make full use of spatial information to enhance speech. The microphone array can solve the acoustic problems of the room, such as sound source location, tracking, noise cancellation, speech enhancement, signal source separation, and reverberation cancellation.
然而在信噪比较低的情况下,准确地估计波达方向进而对特定方向的增强具有很大挑战,估计不准会对识别率产生较大影响。However, in the case of low signal-to-noise ratio, it is very challenging to accurately estimate the direction of arrival to enhance the specific direction. Inaccurate estimation will have a greater impact on the recognition rate.
发明内容Summary of the invention
为了解决相关技术中语音唤醒的准确率不高的技术问题,本发明提供了一种语音唤醒方法、装置及电子设备、存储介质。In order to solve the technical problem of low accuracy of voice wake-up in related technologies, the present invention provides a voice wake-up method, device, electronic equipment, and storage medium.
第一方面,提供了一种语音唤醒方法,包括:In the first aspect, a voice wake-up method is provided, including:
接收麦克风采集的声音信号;Receive the sound signal collected by the microphone;
将所述声音信号进行固定波束形成,生成多个波束信号;Performing fixed beamforming on the sound signal to generate multiple beam signals;
计算各波束信号的信噪比;Calculate the signal-to-noise ratio of each beam signal;
通过所述信噪比确定唤醒方向;Determining the wake-up direction by the signal-to-noise ratio;
根据所述唤醒方向的声音信号进行语音唤醒操作。Perform a voice wake-up operation according to the sound signal in the wake-up direction.
可选的,所述计算各波束信号的信噪比的步骤包括:Optionally, the step of calculating the signal-to-noise ratio of each beam signal includes:
计算所述波束信号中各频点的点源信号能量及背景噪声能量,所述波束信号包括目标源信号、干扰源信号和背景噪声,点源信号能量包含目标源信号能量和干扰源信号能量;Calculating point source signal energy and background noise energy at each frequency point in the beam signal, the beam signal includes a target source signal, an interference source signal, and background noise, and the point source signal energy includes the target source signal energy and the interference source signal energy;
通过所述波束信号在各频点的点源信号能量与背景噪声能量的比值,计算所述波束信号中各频点的信噪比;Calculating the signal-to-noise ratio of each frequency point in the beam signal by using the ratio of the energy of the point source signal at each frequency point of the beam signal to the background noise energy;
计算所述波束信号中各频点的信噪比的步骤之前,所述方法还包括:Before the step of calculating the signal-to-noise ratio of each frequency point in the beam signal, the method further includes:
通过平滑因子对所述波束信号中各频点的点源信号能量及背景噪声能量进行平滑处理。The point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.
可选的,所述通过信噪比确定最唤醒方向的步骤包括:Optionally, the step of determining the most awakening direction through the signal-to-noise ratio includes:
根据预设帧数内各波束信号的信噪比确定干扰方向;Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames;
从所有波束信号中剔除干扰方向所在的波束信号,得到备选波束信号;Eliminate the beam signal in the interference direction from all beam signals to obtain candidate beam signals;
根据各个备选波束信号的信噪比,确定其信噪比最大的波束信号方向;According to the signal-to-noise ratio of each candidate beam signal, determine the beam signal direction with the largest signal-to-noise ratio;
在预设帧数中统计最大信噪比出现次数最多的波束信号方向,将该波束信号所在的方向确定为最优波束信号方向,并将所述最优波束信号方向作为唤醒方向。Count the beam signal direction with the largest number of occurrences of the maximum signal-to-noise ratio in the preset number of frames, determine the direction where the beam signal is located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
可选的,所述根据预设帧数内各个波束信号的信噪比确定干扰方向的 步骤包括:Optionally, the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames includes:
计算当前帧所有波束信号的信噪比的最大值,并且将最大值与预设的信噪比阈值进行比对;Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with the preset signal-to-noise ratio threshold;
当所有波束信号的信噪比的最大值小于预设的信噪比阈值时,则在最大信噪比波束信号方向上记录该方向的信噪比与第二信噪比之间的差值,其他方向记录的差值为零。若所有波束信号的信噪比的最大值均大于预设的信噪比阈值时,则将所有波束信号方向记录的差值为零。When the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the difference between the signal-to-noise ratio in this direction and the second signal-to-noise ratio in the direction of the maximum signal-to-noise ratio beam signal, The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
统计预设帧数内每个波束信号方向上所记录的差值的和;将和大于零且最大的波束信号方向确定为干扰方向。Count the sum of the difference values recorded in each beam signal direction within the preset number of frames; determine the beam signal direction whose sum is greater than zero and the largest as the interference direction.
可选的,根据各备选波束信号的信噪比,确定每一帧的最优波束信号方向的步骤包括:Optionally, the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal includes:
为了保证最优波束信号方向为信号源方向时的输出稳定性,根据各备选波束信号的信噪比进行信号源方向检测,并将满足条件的波束信号方向在预设帧数内设定为最优波束信号方向。In order to ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the condition is set as Optimal beam signal direction.
可选的,根据各备选波束信号的信噪比进行信号源方向检测,步骤包括:Optionally, detecting the signal source direction according to the signal-to-noise ratio of each candidate beam signal, the steps include:
将所述备选波束信号的信噪比进行大小排序;Sort the signal-to-noise ratios of the candidate beam signals;
若在预设连续帧数内,备选波束信号中最大信噪比超过一定阈值,且最大信噪比与第二信噪比之间的差值达到预设的差值阈值,且最大信噪比的波束信号方向保持一致,则将最大信噪比所在波束信号方向在一定预设帧数内设定为最优波束信号方向。If within the preset number of consecutive frames, the maximum signal-to-noise ratio in the candidate beam signal exceeds a certain threshold, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, and the maximum signal-to-noise ratio If the beam signal direction of the ratio remains the same, the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.
可选的,所述根据预设帧数内各波束信号的信噪比确定干扰方向的步骤之后,通过信噪比确定唤醒方向的步骤还包括:Optionally, after the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames, the step of determining the wake-up direction through the signal-to-noise ratio further includes:
判断目标源信号能量是否超过干扰源信号能量一定阈值,若为是,则将所述干扰方向确定为唤醒方向。It is determined whether the energy of the target source signal exceeds a certain threshold of the energy of the interference source signal, and if so, the interference direction is determined as the wake-up direction.
第二方面,提供了一种语音唤醒装置,包括:In a second aspect, a voice wake-up device is provided, including:
声音信号接收模块,用于接收麦克风采集的声音信号;The sound signal receiving module is used to receive the sound signal collected by the microphone;
固定波束形成模块,用于将所述声音信号进行固定波束形成,在不同方向上生成多个波束信号;A fixed beamforming module, configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions;
信噪比计算模块,用于计算各波束信号的信噪比;The signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal;
唤醒方向确定模块,用于通过所述信噪比确定唤醒方向;A wake-up direction determination module, configured to determine the wake-up direction based on the signal-to-noise ratio;
语音唤醒操作模块,用于根据所述唤醒方向的声音信号进行语音唤醒操作。The voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.
第三方面,提供了一种电子设备,包括:In a third aspect, an electronic device is provided, including:
至少一个处理器;以及At least one processor; and
与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如第一方面所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method according to the first aspect.
第四方面,提供了一种计算机可读存储介质,用于存储程序,所述程序在被执行时使得电子设备执行如第一方面所述的方法。In a fourth aspect, a computer-readable storage medium is provided for storing a program that, when executed, causes an electronic device to perform the method described in the first aspect.
本公开的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:
在接收麦克风采集的声音信号后,将声音信号进行固定波束形成,进而计算各波束信号的信噪比,通过信噪比确定唤醒方向而进行语音唤醒操作,使得系统在低信噪比的环境下也能准确判断出唤醒方向,有效提高了语音唤醒的准确率。After receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform the voice wake-up operation, making the system in a low signal-to-noise ratio environment It can also accurately determine the wake-up direction, effectively improving the accuracy of voice wake-up.
应当理解的是,以上的一般描述和后文的细节描述仅为示例性,并不能限制本发明范围。It should be understood that the above general description and the following detailed description are only exemplary and cannot limit the scope of the present invention.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并于说明书一起用于解释本发明的原理。The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments in accordance with the present invention, and are used together with the specification to explain the principle of the present invention.
图1是根据一示例性实施例示出的一种语音唤醒方法的流程图。Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.
图2为根据图1对应实施例示出的波束信号方向图。Fig. 2 is a beam signal direction diagram according to the embodiment corresponding to Fig. 1.
图3是图1对应实施例的语音唤醒方法中步骤S130的一种具体实现流程图。FIG. 3 is a specific implementation flowchart of step S130 in the voice wake-up method in the embodiment corresponding to FIG. 1.
图4是根据一示例性实施例的麦克风阵列示意图。Fig. 4 is a schematic diagram of a microphone array according to an exemplary embodiment.
图5是图1对应实施例的语音唤醒方法中步骤S140的一种具体实现流程图。Fig. 5 is a specific implementation flow chart of step S140 in the voice wake-up method in the embodiment corresponding to Fig. 1.
图6是图5对应实施例的示出的步骤S141的一种具体实现流程图。FIG. 6 is a specific implementation flowchart of step S141 shown in the embodiment corresponding to FIG. 5.
图7是图5对应实施例的示出的步骤S143的一种具体实现流程图。FIG. 7 is a specific implementation flowchart of step S143 shown in the embodiment corresponding to FIG. 5.
图8是图5对应实施例的语音唤醒方法中步骤S140的另一种具体实现流程图。FIG. 8 is another specific implementation flowchart of step S140 in the voice wake-up method in the embodiment corresponding to FIG. 5.
图9是根据一示例性实施例示出的一种语音唤醒装置的框图。Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.
图10是根据图9对应实施例示出的语音唤醒装置中信噪比计算模块130的一种框图。FIG. 10 is a block diagram of the signal-to-noise ratio calculation module 130 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
图11是根据图9对应实施例示出的语音唤醒装置中唤醒方向确定模块140的一种框图。FIG. 11 is a block diagram of the wake-up direction determining module 140 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
具体实施方式Detailed ways
这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、与 本发明相一致的装置和方法的例子。Here, an exemplary embodiment will be described in detail, and examples thereof are shown in the accompanying drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present invention. On the contrary, they are merely examples of devices and methods consistent with the present invention as detailed in the appended claims.
图1是根据一示例性实施例示出的一种语音唤醒方法的流程图。该语音唤醒方法可用于智能手机、电脑等电子设备中。如图1所示,该语音唤醒方法可以包括步骤S110、步骤S120、步骤S130、步骤S140。Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment. The voice wake-up method can be used in electronic devices such as smart phones and computers. As shown in Fig. 1, the voice wake-up method may include step S110, step S120, step S130, and step S140.
步骤S110,接收麦克风采集的声音信号。Step S110: Receive the sound signal collected by the microphone.
对电子设备进行语音唤醒时,电子设备将通过麦克风进行声音信号的采集。When the electronic device is awakened by voice, the electronic device will collect the sound signal through the microphone.
但通过麦克风采集的声音信号不仅包括用于语音唤醒的语音信号,还包含干扰噪声。However, the sound signal collected by the microphone not only includes the voice signal used for voice wake-up, but also contains interference noise.
因此,通过语音前端增强技术提高语音唤醒率。Therefore, the voice awakening rate is improved through voice front-end enhancement technology.
可选的,语音信号的采集可通过麦克风阵列进行采集,且麦克风个数为M,则采集麦克风信号为:Optionally, the voice signal can be collected through a microphone array, and the number of microphones is M, then the collected microphone signal is:
X(n)=[x 1(n),x 2(n),…,x M(n)] T X(n)=[x 1 (n),x 2 (n),...,x M (n)] T
其中,n代表时刻,T代表转置。Among them, n represents time, and T represents transpose.
将采集的语音信号利用短时傅里叶变换变为频域信号X(w)=[x 1(w),x 2(w),…,x M(w)] T,w代表频点。 The collected speech signal is transformed into a frequency domain signal X(w)=[x 1 (w), x 2 (w),..., x M (w)] T , w represents the frequency point using short-time Fourier transform.
步骤S120,将所述语音信号进行固定波束形成,生成多个波束信号。Step S120: Perform fixed beamforming on the voice signal to generate multiple beam signals.
将语音信号进行固定波束形成的方法有多种,例如,可以采用延迟求和波束形成方法和滤波求和波束形成方法。There are many methods for performing fixed beamforming on voice signals. For example, a delayed sum beamforming method and a filtered sum beamforming method can be used.
在一示例性实施例中,波束个数为BN(BN≥M),波束的方向固定且均匀分布(线阵0°~180°,圆阵0°~360°)。波束形成器的系数可利用延迟求和技术和滤波求和技术的实现方式,也可针对不同的频带利用不同的波束形成方法。延迟求和波束形成可以获得较高的白噪声放大增益;在滤波求和波束形成方法中,差分阵列由于其较小的尺寸,以及较好的频不变特性被广泛采用。In an exemplary embodiment, the number of beams is BN (BN≥M), and the direction of the beams is fixed and uniformly distributed (linear array 0°-180°, circular array 0°-360°). The coefficients of the beamformer can be implemented by delay-sum technology and filter-sum technology, and different beamforming methods can also be used for different frequency bands. Delay and sum beamforming can obtain higher white noise amplification gain; in the filtering and summing beamforming method, the differential array is widely used due to its smaller size and better frequency invariance characteristics.
针对语音的宽频特性,本实施例设计了一种宽带且增益独立于频带的 波束形成器,图2为本实施例示出的波束信号方向图,阵列为圆阵,麦克风个数为4,半径为0.035m,图2(a)为频率角度增益的三维示意图,图2(b)为波束极坐标图。从图2中可以看出本实施例的波束旁瓣衰减约为25dB。经过波束增强后,可以得到BN路波束输出:Aiming at the broadband characteristics of speech, this embodiment designs a beamformer with broadband and gain independent of frequency band. Figure 2 is a beam signal pattern shown in this embodiment. The array is a circular array, the number of microphones is 4, and the radius is 0.035m, Figure 2(a) is a three-dimensional schematic diagram of frequency angle gain, and Figure 2(b) is a beam polar coordinate diagram. It can be seen from Fig. 2 that the beam sidelobe attenuation of this embodiment is about 25dB. After beam enhancement, BN beam output can be obtained:
Y(w)=[y 1(w),y 2(w),…,y BN(w)] T,w代表频点。 Y(w)=[y 1 (w),y 2 (w),...,y BN (w)] T , w represents the frequency point.
步骤S130,计算各波束信号的信噪比。Step S130: Calculate the signal-to-noise ratio of each beam signal.
信噪比是声音信号中能量与噪声谱之间的比值。The signal-to-noise ratio is the ratio between the energy in the sound signal and the noise spectrum.
可选的,如图3所示,步骤S130可以包括步骤S131、步骤S132、步骤S133。Optionally, as shown in FIG. 3, step S130 may include step S131, step S132, and step S133.
步骤S131,针对每一波束信号,计算所述波束信号中各频点的能量及噪声谱。Step S131: For each beam signal, calculate the energy and noise spectrum of each frequency point in the beam signal.
步骤S132,通过所述波束信号中各频点的能量及噪声谱,计算所述波束信号中各频点的信噪比。Step S132: Calculate the signal-to-noise ratio of each frequency point in the beam signal based on the energy and noise spectrum of each frequency point in the beam signal.
由于声音信号是由不同频点的信号组成的,相应的各波束信号也包含不同频点的信号,因此为提高语音唤醒率,需计算波束信号中各频点的能量及噪声谱。Since the sound signal is composed of signals of different frequency points, the corresponding beam signals also include signals of different frequency points. Therefore, in order to improve the voice awakening rate, the energy and noise spectrum of each frequency point in the beam signal must be calculated.
可选的,为进一步提高语音唤醒率,在确定干扰方向后,通过各波束信号与干扰方向之间的位置关系,选取相应的平滑因子对波束信号中各频点的能量进行平滑处理。Optionally, in order to further improve the voice awakening rate, after the interference direction is determined, a corresponding smoothing factor is selected to smooth the energy of each frequency point in the beam signal through the position relationship between each beam signal and the interference direction.
具体的,在计算波束信号的能量时,需要对计算的能量进行平滑,平滑因子weight i(w)(i=1,2…BN)。平滑因子weight i(w)的大小与干扰方向有关,即波束指向与干扰方向越接近的波束信号所设定的平滑因子越大。例如在图4中干扰方向对应波束信号3的指向,则波束信号2的平滑因子weight 2(w)和波束信号4的平滑因子weight 4(w)对应的权重大于波束信号1weight 1(w)对应的权重。例如,在图3中weight 2(w)=weight 4(w)≈0.8,weight 1(w)=0.6。 Specifically, when calculating the energy of the beam signal, the calculated energy needs to be smoothed, and the smoothing factor weight i (w) (i=1, 2...BN). The magnitude of the smoothing factor weight i (w) is related to the interference direction, that is, the greater the smoothing factor set for the beam signal whose beam direction is closer to the interference direction. For example, the corresponding beam signals in the interference fourth direction in FIG point 3, the beam signal smoothing factor 2 weight 2 (w) and the beam signal smoothing factor 4 weight 4 (w) corresponding to the weight is large (w) corresponding to beam signals 1weight 1 the weight of. For example, in Fig. 3, weight 2 (w)=weight 4 (w)≈0.8, and weight 1 (w)=0.6.
对每个频点进行噪声谱估计时,估计方法可采用:MCRA,IMCRA,MARTIN,DOBLINGER,HIRSCH等单通道噪声谱估计方法,也可以采用其它的噪声谱估计方法,在此不对具体的噪声谱估计方法进行一一描述。When estimating the noise spectrum for each frequency point, the estimation method can be: MCRA, IMCRA, MARTIN, DOBLINGER, HIRSCH and other single-channel noise spectrum estimation methods, and other noise spectrum estimation methods can also be used. The specific noise spectrum is not determined here. The estimation methods are described one by one.
步骤S133,根据所述波束信号在预设频带范围内中信噪比的平均值,表示所述波束信号的信噪比。Step S133, representing the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal in the preset frequency band.
在本发明的实施例中,预设的频带范围为0-2kHz。在计算出波束信号中各频点的信噪比后,通过计算0-2kHz频带范围内信噪比的平均值,得到该波束信号的信噪比SNR i(i=1,2…BN)。 In the embodiment of the present invention, the preset frequency band range is 0-2 kHz. After calculating the signal-to-noise ratio of each frequency point in the beam signal, by calculating the average value of the signal-to-noise ratio in the frequency band of 0-2kHz, the signal-to-noise ratio SNR i (i=1, 2...BN) of the beam signal is obtained.
步骤S140,通过所述信噪比确定唤醒方向。Step S140: Determine the wake-up direction according to the signal-to-noise ratio.
唤醒方向即为通过本发明确认得到的语音唤醒方向。The wake-up direction is the voice wake-up direction confirmed by the present invention.
本发明通过计算各波束信号的信噪比,进而根据各波束信号的信噪比从多个波束信号中确定唤醒方向,再采用唤醒方向的波束信号对电子设备进行语音唤醒操作。The present invention calculates the signal-to-noise ratio of each beam signal, and then determines the wake-up direction from multiple beam signals according to the signal-to-noise ratio of each beam signal, and then uses the beam signal in the wake-up direction to perform voice wake-up operations on the electronic device.
在所有波束信号中选取最优波束信号方向时,可以是根据某一时间段内平均信噪比最大的波束信号方向确定为最优波束信号方向,也可以是将某一时间段内最大信噪比的帧数最多的波束信号方向确定为最优波束信号方向,还可以是通过其它方式确定最优波束信号方向,在此不进行一一描述。When selecting the optimal beam signal direction from all beam signals, it can be determined as the optimal beam signal direction according to the beam signal direction with the largest average signal-to-noise ratio in a certain period of time, or it can be the largest signal-to-noise direction in a certain period of time. The beam signal direction with the largest number of frames is determined as the optimal beam signal direction, and the optimal beam signal direction may also be determined by other methods, which will not be described here.
可选的,如图5所示,步骤S140可包括步骤S141、S142、S143、S144。Optionally, as shown in FIG. 5, step S140 may include steps S141, S142, S143, and S144.
步骤S141,根据预设帧数内各波束信号的信噪比确定干扰方向。Step S141: Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames.
干扰方向是对语音信号造成干扰的噪声源相对电子设备的方向。如图2所示,3为干扰源,电子设备位于圆心,干扰源3相对圆心的方向即为干扰方向。The interference direction is the direction of the noise source that interferes with the voice signal relative to the electronic device. As shown in Figure 2, 3 is the interference source, the electronic device is located at the center of the circle, and the direction of the interference source 3 relative to the center of the circle is the interference direction.
由于干扰源对进行语音唤醒时语音信号的影响较大,干扰源产生的声音信号将对各波束信号造成较大的影响。因此,通过预先确定干扰方向,进而通过其它波束信号方向与干扰源之间的位置关系,对相应的波束信号 进行平滑处理,从而有效减小干扰源对语音唤醒的影响,提高语音唤醒的准确率。Since the interference source has a greater impact on the voice signal during voice wake-up, the sound signal generated by the interference source will have a greater impact on each beam signal. Therefore, by pre-determining the interference direction, and then through the positional relationship between other beam signal directions and the interference source, the corresponding beam signal is smoothed, thereby effectively reducing the influence of the interference source on voice wake-up and improving the accuracy of voice wake-up .
根据预设帧数内各波束信号的信噪比确定干扰方向的方式有多种,可以是将在预设帧数内信噪比达到预设信噪比阈值的帧数最多的波束信号所在的方向确定为干扰方向,也可以是将在预设帧数内平均信噪比最大的波束信号所在的方向确定为干扰方向,还可以是通过其它方式确定干扰方向,在此不进行一一描述。There are many ways to determine the interference direction according to the signal-to-noise ratio of each beam signal in the preset number of frames. It can be the beam signal with the largest number of frames whose signal-to-noise ratio reaches the preset signal-to-noise ratio threshold within the preset number of frames. The direction is determined as the interference direction, or the direction of the beam signal with the largest average signal-to-noise ratio within the preset number of frames is determined as the interference direction, or the interference direction may be determined by other methods, which will not be described here.
可选的,如图6所示,步骤S141可包括步骤S1411、S1412、S1413。Optionally, as shown in FIG. 6, step S141 may include steps S1411, S1412, and S1413.
步骤S1411,计算当前帧所有波束信号的信噪比的最大值,并且将最大值与预设的信噪比阈值进行比对。Step S1411: Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with a preset signal-to-noise ratio threshold.
步骤S1412,当所有波束信号的信噪比的最大值均小于预设的信噪比阈值时,则在最大信噪比波束信号方向上记录该方向的信噪比与第二信噪比之间的差值,其他方向记录的差值为零。若所有波束信号的信噪比的最大值大于预设的信噪比阈值时,则将所有波束信号方向记录的差值为零。Step S1412, when the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the signal-to-noise ratio in the direction of the beam signal with the maximum signal-to-noise ratio between that direction and the second signal-to-noise ratio The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
步骤S1413,统计预设帧数内每个波束信号方向上所记录的差值的和。若和大于零,则将和最大的波束信号方向确定为干扰方向。Step S1413: Count the sum of the difference values recorded in each beam signal direction within the preset number of frames. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction.
具体地,预设帧数为T1,优选地,T1≥2000帧。针对每路波束信号的信噪比SNR i(i=1,2,…,BN),求出所有波束信号方向的最大信噪比MAXSNR,设定阈值ε,若MAXSNR<ε,则认为波束信号为静音段,在最大信噪比波束信号方向上记录该最大信噪比与第二信噪比之间的差值,其他方向记为零。若MAXSNR>ε,则将所有波束信号方向标记位零,最后统计预设帧数T1内每个波束信号方向上所记录的差值的和。若和大于零,则将和最大的波束信号方向确定为干扰方向。实际应用中将根据具体场景选择T1和ε的取值大小,从而更好地提高干扰方向判断的准确性。本发明实施例中T1>2000帧,ε=10dB。 Specifically, the preset number of frames is T1, and preferably, T1≥2000 frames. For the signal-to-noise ratio SNR i (i=1, 2,...,BN) of each beam signal, find the maximum signal-to-noise ratio MAXSNR of all beam signal directions, set the threshold ε, if MAXSNR<ε, then consider the beam signal It is a silent segment, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio is recorded in the direction of the maximum signal-to-noise ratio beam signal, and the other directions are recorded as zero. If MAXSNR>ε, then mark all beam signal directions to zero, and finally count the sum of the difference values recorded in each beam signal direction within the preset frame number T1. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction. In practical applications, the value of T1 and ε will be selected according to specific scenarios, so as to better improve the accuracy of the interference direction judgment. In the embodiment of the present invention, T1>2000 frames, ε=10dB.
步骤S142,从所有波束信号中剔除所述干扰方向对应的波束信号,得 到备选波束信号。Step S142: Remove the beam signal corresponding to the interference direction from all beam signals to obtain candidate beam signals.
备选波束信号是从所有波束信号中剔除干扰方向对应的波束信号后的波束信号集合。The candidate beam signal is a beam signal set after removing the beam signal corresponding to the interference direction from all beam signals.
通常地,通过本发明技术方案确定的干扰方向并非最优波束信号方向,因此,在确定最优波束信号方向时,从所有波束信号中剔除干扰方向对应的波束信号,进而根据备选波束信号的信噪比进一步确定最优波束信号方向,以提高确定最优波束信号方向的准确性。Generally, the interference direction determined by the technical solution of the present invention is not the optimal beam signal direction. Therefore, when the optimal beam signal direction is determined, the beam signal corresponding to the interference direction is removed from all beam signals, and then according to the candidate beam signal The signal-to-noise ratio further determines the optimal beam signal direction to improve the accuracy of determining the optimal beam signal direction.
步骤S143,根据各备选波束信号的信噪比,确定每一帧中信噪比最大的备选波束信号。Step S143: Determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal.
为进一步保证最优波束信号方向为信号源方向时的输出稳定性,根据各备选波束信号的信噪比进行信号源方向检测。In order to further ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal.
具体的,如图7所示,步骤S143还可以包括步骤S1431、步骤S1432、步骤S1433、步骤S1434:Specifically, as shown in FIG. 7, step S143 may further include step S1431, step S1432, step S1433, and step S1434:
步骤S1431,将所述备选波束信号的信噪比进行大小排序。Step S1431: Sort the signal-to-noise ratios of the candidate beam signals.
步骤S1432,若在预设连续帧数内,备选波束最大信噪比超过阈值,且最大信噪比与第二信噪比之间的差值达到预设的差值阈值,则执行步骤S1433;若为否,则不作处理。Step S1432: If the maximum signal-to-noise ratio of the candidate beam exceeds the threshold within the preset number of consecutive frames, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, step S1433 is executed ; If it is no, no processing is done.
步骤S1433,判断在预设连续帧数内最大信噪比波束信号方向是否保持一致。若为是(Y),则执行步骤S1434;若为否,则不作处理。In step S1433, it is judged whether the beam signal directions of the maximum signal-to-noise ratio remain consistent within the preset number of consecutive frames. If it is (Y), then execute step S1434; if it is not, then no processing is performed.
步骤S1434,将最大信噪比所在波束信号方向在一定预设帧数内设定为最优波束信号方向。Step S1434: Set the beam signal direction where the maximum SNR is located as the optimal beam signal direction within a certain preset number of frames.
具体的,将除干扰方向外的备选波束信号的信噪比进行排序,选取最大信噪比方向的波束信号。该波束信号方向在连续N帧内均为最大信噪比,且在连续N帧的时间内其信噪比MAXSNR>δ(预设阈值),并超过第2大信噪比SECSNR一定阈值μ,则认定该MAXSNR的波束信号所在的方向为最优 波束信号方向。在后续的统计最优波束信号方向过程中,在一定时间范围T3内,将最优波束信号方向设定为该MAXSNR方向。T3的大小视不同唤醒词而定。优选的,阈值N=3,δ=5,μ=3,T3=65。Specifically, the signal-to-noise ratios of candidate beam signals other than the interference direction are sorted, and the beam signal in the direction of the maximum signal-to-noise ratio is selected. The beam signal direction has the maximum signal-to-noise ratio in consecutive N frames, and its signal-to-noise ratio MAXSNR>δ (preset threshold) within N consecutive frames, and exceeds a certain threshold μ of the second largest signal-to-noise ratio SECSNR, It is assumed that the direction of the MAXSNR beam signal is the optimal beam signal direction. In the subsequent process of calculating the optimal beam signal direction, the optimal beam signal direction is set to the MAXSNR direction within a certain time range T3. The size of T3 depends on different wake words. Preferably, the threshold N=3, δ=5, μ=3, and T3=65.
步骤S144,统计在预设帧数中最大信噪比出现最多的备选波束信号,将该备选波束信号所在的方向确定为最优波束信号方向,并将所述最优波束信号方向作为唤醒方向。Step S144: Count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction where the candidate beam signals are located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
在每一帧中,根据各备选波束信号的信噪比,统计在预设帧数中最大信噪比出现最多的备选波束信号,并将该备选波束信号所在的方向确定为最优波束信号方向。In each frame, according to the signal-to-noise ratio of each candidate beam signal, count the candidate beam signals with the largest signal-to-noise ratio in the preset number of frames, and determine the direction of the candidate beam signal as the optimal Beam signal direction.
在一示例性实施例中,帧数为T2,在T2个预设帧数内,在BN路波束中,除已检测出的干扰方向的波束信号外,若最大SNR小于阈值th,则当前帧的最优波束信号方向与前一帧保持一致;若最大SNR大于阈值th,则将该最大SNR对应波束信号记为1,干扰方向的波束信号及其他较小SNR波束信号记为0,统计在T2个帧中出现最大SNR次数最多的波束信号所在方向确定为当前帧的最优波束信号方向。优选的,20≤T2≤100,th=10。In an exemplary embodiment, the number of frames is T2. Within T2 preset frame numbers, in the BN beams, except for the beam signals in the interference direction that have been detected, if the maximum SNR is less than the threshold th, then the current frame The optimal beam signal direction of is consistent with the previous frame; if the maximum SNR is greater than the threshold th, the beam signal corresponding to the maximum SNR is recorded as 1, the beam signal in the interference direction and other smaller SNR beam signals are recorded as 0, and the statistics are in The direction of the beam signal with the largest number of SNR occurrences in the T2 frames is determined as the optimal beam signal direction of the current frame. Preferably, 20≤T2≤100 and th=10.
可选的,为进一步提高在信号源能量远大于干扰源能量这种环境下最优波束信号方向选择的准确性,在确定干扰方向过程中,若存在此类情形时(例如,信号源能量超过干扰源能量20dB),会将唤醒方向认定为干扰方向,此时将统计的干扰方向以及按照前述方法确定的最优波束信号方向均作为唤醒方向,即进行两路唤醒,若有一路超过阈值则判定为唤醒状态。Optionally, in order to further improve the accuracy of the optimal beam signal direction selection in an environment where the energy of the signal source is much greater than the energy of the interference source, in the process of determining the interference direction, if such a situation exists (for example, the energy of the signal source exceeds Interference source energy 20dB), the wake-up direction will be identified as the interference direction. At this time, the statistical interference direction and the optimal beam signal direction determined according to the aforementioned method are both used as the wake-up direction, that is, two wakeups are performed. If one of them exceeds the threshold, Determined to be awake state.
具体的,如图8所示,步骤S140还可以包括步骤S146、步骤S147:Specifically, as shown in FIG. 8, step S140 may further include step S146 and step S147:
步骤S146,判断是否所述信号源能量远大于干扰源能量一定阈值。若为是(Y),则执行步骤S147;若为否,则继续执行步骤S142。Step S146: It is judged whether the energy of the signal source is far greater than a certain threshold of the energy of the interference source. If yes (Y), proceed to step S147; if no, proceed to step S142.
步骤S147,将所述干扰方向确定为唤醒方向。Step S147: Determine the interference direction as the wake-up direction.
需要说明的是,在干扰方向选取之后,可将干扰方向确定为唤醒方向,并同时执行步骤S142、S143、S144,确定最优波束信号方向,将干扰方向 和最优波束信号方向进行两路波束信号方向的唤醒;也可以在判断出信号源能量大于干扰源能量一定阈值,将干扰方向确定为唤醒方向,无需执行步骤S142、S143、S144,而根据干扰方向的语音信号直接进行语音唤醒操作,以提高语音唤醒操作的效率。It should be noted that after the interference direction is selected, the interference direction can be determined as the wake-up direction, and steps S142, S143, and S144 are performed at the same time to determine the optimal beam signal direction, and the interference direction and the optimal beam signal direction are carried out in two beams. Signal direction wake-up; it is also possible to determine that the signal source energy is greater than a certain threshold of the interference source energy, and determine the interference direction as the wake-up direction, without performing steps S142, S143, S144, and directly perform voice wake-up operations according to the voice signal in the interference direction. To improve the efficiency of voice wake-up operations.
步骤S150,根据所述唤醒方向的语音信号进行语音唤醒操作。Step S150: Perform a voice wake-up operation according to the voice signal in the wake-up direction.
利用如上所述的方法,在接收麦克风采集的声音信号后,将声音信号进行固定波束形成,进而计算各波束信号的信噪比,通过信噪比确定唤醒方向按照该唤醒方向的声音信号进行语音唤醒操作,使得系统在低信噪比的环境下也能准确判断出唤醒方向,有效提高了语音唤醒的准确率。Using the above method, after receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform speech according to the sound signal in the wake-up direction The wake-up operation enables the system to accurately determine the wake-up direction even in a low signal-to-noise ratio environment, which effectively improves the accuracy of voice wake-up.
为了说明本发明提高唤醒率的语音前端增强方法的效果,利用实验结果进行说明,实验在6ⅹ3ⅹ3.5m的房间内进行,麦克风阵列为4麦圆阵,半径0.035m,位于3ⅹ1.5ⅹ1.5m位置处,干扰方向位于2ⅹ1.5ⅹ1.5m,唤醒位置分布在距离麦克风1.2m圆上,每30度进行两次唤醒,共24次唤醒,利用三种干扰类型进行测试,分别为音乐、babble以及电视干扰,测试信噪比分别为-5dB、0dB、5dB,测试结果见表1。表中第3行为最佳波束信号方向正确的概率,从结果可以看出除babble干扰在-5dB概率较低,其他情形概率都在80%以上。表1第4、5行分别为单麦和本发明的唤醒结果,从表中可以看到本发明可以明显提高唤醒率。In order to illustrate the effect of the voice front-end enhancement method for improving the wake-up rate of the present invention, the experimental results are used to illustrate the experiment. The experiment is carried out in a room of 6ⅹ3ⅹ3.5m. The microphone array is a 4 wheat circular array with a radius of 0.035m and is located at 3ⅹ1.5ⅹ1.5m. The interference direction is at 2ⅹ1.5ⅹ1.5m, and the wake-up positions are distributed on a circle 1.2m away from the microphone. There are two wake-ups every 30 degrees, a total of 24 wake-ups. Three types of interference are used for testing, namely music, babble and TV For interference, the test signal-to-noise ratio is -5dB, 0dB, and 5dB, respectively, and the test results are shown in Table 1. The third line in the table shows the probability that the best beam signal direction is correct. From the results, it can be seen that the probability of babble interference at -5dB is lower, and the probability of other cases is above 80%. Rows 4 and 5 of Table 1 are the wake-up results of single wheat and the present invention respectively. From the table, it can be seen that the present invention can significantly improve the arousal rate.
表1唤醒实验结果图Table 1 Figure of wake-up experiment results
Figure PCTCN2019114378-appb-000001
Figure PCTCN2019114378-appb-000001
下述为本公开装置实施例,可以用于执行上述语音唤醒方法实施例。 对于本公开装置实施例中未披露的细节,请参照本公开语音唤醒方法实施例。The following are embodiments of the disclosed device, which can be used to implement the above-mentioned voice wake-up method embodiments. For details not disclosed in the embodiments of the device of the present disclosure, please refer to the embodiments of the voice wake-up method of the present disclosure.
图9是根据一示例性实施例示出的一种语音唤醒装置的框图,该装置包括但不限于:声音信号接收模块110、固定波束形成模块120、信噪比计算模块130、最优波束信号方向确定模块140及语音唤醒操作模块150。Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment. The device includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction The determination module 140 and the voice wake-up operation module 150.
声音信号接收模块110,用于接收麦克风采集的声音信号;The sound signal receiving module 110 is used to receive the sound signal collected by the microphone;
固定波束形成模块120,用于将所述声音信号进行固定波束形成,生成多个波束信号;The fixed beam forming module 120 is configured to perform fixed beam forming on the sound signal to generate multiple beam signals;
信噪比计算模块130,用于计算各波束信号的信噪比;The signal-to-noise ratio calculation module 130 is used to calculate the signal-to-noise ratio of each beam signal;
唤醒方向确定模块140,用于通过所述信噪比确定唤醒方向;The wake-up direction determination module 140 is configured to determine the wake-up direction according to the signal-to-noise ratio;
语音唤醒操作模块150,用于根据所述唤醒方向的声音信号进行语音唤醒操作。The voice wake-up operation module 150 is configured to perform voice wake-up operations according to the voice signal in the wake-up direction.
上述装置中各个模块的功能和作用的实现过程,具体见上述语音唤醒方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and roles of each module in the above-mentioned device, see the implementation process of the corresponding steps in the above-mentioned voice wake-up method for details, which will not be repeated here.
可选的,如图10所示,图9中所述的信噪比计算模块130包括但不限于:信号能量及噪声谱计算单元131、频点信噪比计算单元132和波束信号信噪比计算单元133。Optionally, as shown in FIG. 10, the SNR calculation module 130 described in FIG. 9 includes, but is not limited to: a signal energy and noise spectrum calculation unit 131, a frequency point SNR calculation unit 132, and a beam signal SNR Calculating unit 133.
信号能量及噪声谱计算单元131,用于针对每一波束信号,计算所述波束信号中各频点的能量及噪声谱;The signal energy and noise spectrum calculation unit 131 is configured to calculate the energy and noise spectrum of each frequency point in the beam signal for each beam signal;
频点信噪比计算单元132,用于通过所述波束信号中各频点的信噪比;The frequency point signal-to-noise ratio calculation unit 132 is configured to pass the signal-to-noise ratio of each frequency point in the beam signal;
波束信号信噪比计算单元133,用于根据所述波束信号在预设频带范围内信噪比平均值,表示所述波束信号的信噪比。The beam signal signal-to-noise ratio calculation unit 133 is configured to indicate the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal within a preset frequency band.
可选的,图10中所述的信噪比计算模块130还包括但不限于:平滑处理单元。Optionally, the signal-to-noise ratio calculation module 130 described in FIG. 10 further includes, but is not limited to, a smoothing processing unit.
平滑处理单元,用于通过平滑因子对所述波束信号中各频点的能量进行平滑处理。The smoothing processing unit is used for smoothing the energy of each frequency point in the beam signal through a smoothing factor.
可选的,如图11所示,图9中所述的唤醒方向确定模块140包括但不限于:干扰方向确定单元141、剔除单元142、备选波束信号确定单元143及唤醒方向确定单元144。Optionally, as shown in FIG. 11, the wake-up direction determination module 140 described in FIG. 9 includes but is not limited to: an interference direction determination unit 141, a rejection unit 142, a candidate beam signal determination unit 143 and a wake-up direction determination unit 144.
干扰方向确定单元141,用于根据预设帧数内各波束信号的信噪比确定干扰方向;The interference direction determining unit 141 is configured to determine the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames;
剔除单元142,用于从所有波束信号中剔除所述干扰方向所在的波束信号,得到备选波束信号;The removing unit 142 is configured to remove the beam signal in the interference direction from all the beam signals to obtain candidate beam signals;
备选波束信号确定单元143,用于根据各备选波束信号的信噪比,确定每一帧中信噪比最大的备选波束信号;The candidate beam signal determining unit 143 is configured to determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal;
唤醒方向确定单元144,用于统计在预设帧数中最大信噪比出现最多的备选波束信号,将该备选波束信号所在的方向确定为最优波束信号方向,并将所述最优波束信号方向作为唤醒方向。The wake-up direction determining unit 144 is configured to count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction in which the candidate beam signals are located as the optimal beam signal direction, and determine the optimal beam signal direction. The direction of the beam signal is used as the wake-up direction.
可选的,本发明还提供一种电子设备,执行如上述示例性实施例任一所示的语音唤醒方法的全部或者部分步骤。电子设备包括:Optionally, the present invention also provides an electronic device that performs all or part of the steps of the voice wake-up method shown in any of the foregoing exemplary embodiments. Electronic equipment includes:
处理器;以及Processor; and
与所述处理器通信连接的存储器;其中,A memory connected in communication with the processor; wherein,
所述存储器存储有可读性指令,所述可读性指令被所述处理器执行时实现如上述任一示例性实施例所述的方法。The memory stores readable instructions, and when the readable instructions are executed by the processor, the method according to any of the foregoing exemplary embodiments is implemented.
该实施例中的终端中处理器执行操作的具体方式已经在有关该语音唤醒方法的实施例中执行了详细描述,此处将不做详细阐述说明。The specific manner in which the processor in the terminal performs operations in this embodiment has been described in detail in the embodiment of the voice wake-up method, and will not be elaborated here.
在示例性实施例中,还提供了一种存储介质,该存储介质为计算机可 读性存储介质,例如可以为包括指令的临时性和非临时性计算机可读性存储介质。In an exemplary embodiment, a storage medium is also provided. The storage medium is a computer-readable storage medium, for example, it may be a temporary and non-transitory computer-readable storage medium including instructions.
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的具体结构,可以在不脱离其范围时进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。It should be understood that the present invention is not limited to the specific structure described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present invention is only limited by the appended claims.

Claims (11)

  1. 一种语音唤醒方法,其特征在于,所述方法包括:A voice wake-up method, characterized in that the method includes:
    接收麦克风采集的声音信号;Receive the sound signal collected by the microphone;
    将所述声音信号进行固定波束形成,生成多个波束信号;Performing fixed beamforming on the sound signal to generate multiple beam signals;
    计算各个波束信号的信噪比;Calculate the signal-to-noise ratio of each beam signal;
    通过所述信噪比确定唤醒方向;Determining the wake-up direction by the signal-to-noise ratio;
    根据所述唤醒方向的声音信号进行语音唤醒操作。Perform a voice wake-up operation according to the sound signal in the wake-up direction.
  2. 根据权利要求1所述的方法,其特征在于,所述计算各个波束信号的信噪比的步骤包括:The method according to claim 1, wherein the step of calculating the signal-to-noise ratio of each beam signal comprises:
    计算各所述波束信号中各频点的点源信号能量及背景噪声能量,所述波束信号包括目标源信号、干扰源信号和背景噪声,点源信号能量包含目标源信号能量和干扰源信号能量;Calculate the point source signal energy and background noise energy of each frequency point in each of the beam signals, the beam signal includes the target source signal, the interference source signal and the background noise, and the point source signal energy includes the target source signal energy and the interference source signal energy ;
    通过所述波束信号在各频点的点源信号能量与背景噪声能量的比值,计算所述波束信号中各频点的信噪比;Calculating the signal-to-noise ratio of each frequency point in the beam signal by using the ratio of the energy of the point source signal at each frequency point of the beam signal to the background noise energy;
    利用所述波束信号在预设频带范围信噪比的平均值,表示所述波束信号的信噪比。The average value of the signal-to-noise ratio of the beam signal in the preset frequency band is used to indicate the signal-to-noise ratio of the beam signal.
  3. 根据权利要求2所述的方法,其特征在于,计算所述波束信号中各频点的信噪比的步骤之前,所述方法还包括:The method according to claim 2, wherein before the step of calculating the signal-to-noise ratio of each frequency point in the beam signal, the method further comprises:
    通过平滑因子对所述波束信号中各频点的点源信号能量及背景噪声能量进行平滑处理。The point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.
  4. 根据权利要求1所述的方法,其特征在于,通过所述信噪比确定唤醒方向的步骤包括:The method according to claim 1, wherein the step of determining the wake-up direction according to the signal-to-noise ratio comprises:
    根据预设帧数内各个波束信号的信噪比确定干扰方向;Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames;
    从所有波束信号中剔除干扰方向所在的波束信号,得到备选波束信号;Eliminate the beam signal in the interference direction from all beam signals to obtain candidate beam signals;
    根据各个备选波束信号的信噪比,确定信噪比最大的波束信号方向;Determine the beam signal direction with the largest signal-to-noise ratio according to the signal-to-noise ratio of each candidate beam signal;
    在预设帧数中统计最大信噪比出现次数最多的波束信号方向,将该波束信号所在的方向确定为最优波束信号方向,并将所述最优波束信号方向作为唤醒方向。Count the beam signal direction with the largest number of occurrences of the maximum signal-to-noise ratio in the preset number of frames, determine the direction where the beam signal is located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
  5. 根据权利要求4所述的方法,其特征在于,所述根据预设帧数内各个波束信号的信噪比确定干扰方向的步骤包括:The method according to claim 4, wherein the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames comprises:
    计算当前帧所有波束信号的信噪比的最大值,并且将最大值与预设的信噪比阈值进行比对;Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with the preset signal-to-noise ratio threshold;
    当所有波束信号信噪比的最大值均小于预设的信噪比阈值时,则在最大信噪比波束信号方向上,记录该方向的信噪比与第二信噪比之间的差值,其他方向记录的差值为零;若所有波束信号的信噪比的最大值均大于预设的信噪比阈值时,则将所有波束信号方向记录的差值为零;When the maximum SNR of all beam signals is less than the preset SNR threshold, in the direction of the maximum SNR beam signal, record the difference between the SNR in this direction and the second SNR , The difference value recorded in other directions is zero; if the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero;
    统计预设帧数内每个波束信号方向上所记录的差值的和;将和大于零且最大的波束信号方向确定为干扰方向。Count the sum of the difference values recorded in each beam signal direction within the preset number of frames; determine the beam signal direction whose sum is greater than zero and the largest as the interference direction.
  6. 根据权利要求4所述的方法,其特征在于,所述根据各个备选波束信号的信噪比,确定每一帧的最优波束信号方向的步骤包括:The method according to claim 4, wherein the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal comprises:
    为了保证最优波束信号方向为信号源方向时的输出稳定性,根据各备选波束信号的信噪比进行信号源方向检测,将满足条件的波束信号方向在预设帧数内设定为最优波束信号方向。In order to ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction is detected according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the conditions is set to the maximum within the preset number of frames. Optimal beam signal direction.
  7. 根据权利要求6所述的方法,其特征在于,根据各备选波束信号的信噪比进行信号源方向检测步骤包括:The method according to claim 6, wherein the step of detecting the signal source direction according to the signal-to-noise ratio of each candidate beam signal comprises:
    将所述备选波束信号的信噪比进行大小排序;Sort the signal-to-noise ratios of the candidate beam signals;
    SDXDCDDS若在预设连续帧数内,备选波束信号中最大信噪比超过一定阈值,最大信噪比与第二信噪比之间的差值达到预设的差值阈值,且最大信噪比的波束信号方向保持一致,则将最大信噪比所在波束信号方向在一定预设帧数内设定为最优波束信号方向。If SDXDCDDS in the preset number of consecutive frames, the maximum SNR in the candidate beam signal exceeds a certain threshold, the difference between the maximum SNR and the second SNR reaches the preset difference threshold, and the maximum SNR If the beam signal direction of the ratio remains the same, the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.
  8. 根据权利要求4所述的方法,其特征在于,所述根据预设帧数内各 波束信号的信噪比确定干扰方向的步骤之后,通过信噪比确定唤醒方向的步骤还包括:The method according to claim 4, wherein after the step of determining the interference direction according to the signal-to-noise ratio of each beam signal in the preset number of frames, the step of determining the wake-up direction by the signal-to-noise ratio further comprises:
    判断目标源信号能量是否超过干扰源信号能量一定阈值,若为是,则将所述干扰方向确定为唤醒方向。It is determined whether the energy of the target source signal exceeds a certain threshold of the energy of the interference source signal, and if so, the interference direction is determined as the wake-up direction.
  9. 一种语音唤醒装置,其特征在于,所述装置包括:A voice wake-up device, characterized in that the device includes:
    声音信号接收模块,用于接收麦克风采集的声音信号;The sound signal receiving module is used to receive the sound signal collected by the microphone;
    固定波束形成模块,用于将所述声音信号进行固定波束形成,在不同方向上生成多个波束信号;A fixed beamforming module, configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions;
    信噪比计算模块,用于计算各波束信号的信噪比;The signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal;
    唤醒方向确定模块,用于通过所述信噪比确定唤醒方向;A wake-up direction determination module, configured to determine the wake-up direction based on the signal-to-noise ratio;
    语音唤醒操作模块,用于根据所述唤醒方向的声音信号进行语音唤醒操作。The voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.
  10. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that, the electronic device includes:
    至少一个处理器;以及At least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,A memory communicatively connected with the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-8任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any one of claims 1-8 Methods.
  11. 一种计算机可读存储介质,用于存储程序,其特征在于,所述程序在被执行时使得电子设备执行如权利要求1-8任一项所述的方法。A computer-readable storage medium for storing a program, characterized in that, when the program is executed, an electronic device executes the method according to any one of claims 1-8.
PCT/CN2019/114378 2019-07-12 2019-10-30 Voice wakeup method and apparatus, electronic device and storage medium WO2021008000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910627574.4 2019-07-12
CN201910627574.4A CN110265020B (en) 2019-07-12 2019-07-12 Voice wake-up method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021008000A1 true WO2021008000A1 (en) 2021-01-21

Family

ID=67925774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114378 WO2021008000A1 (en) 2019-07-12 2019-10-30 Voice wakeup method and apparatus, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN110265020B (en)
WO (1) WO2021008000A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113724704A (en) * 2021-08-30 2021-11-30 深圳创维-Rgb电子有限公司 Voice acquisition method, device, terminal and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265020B (en) * 2019-07-12 2021-07-06 大象声科(深圳)科技有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111223497B (en) * 2020-01-06 2022-04-19 思必驰科技股份有限公司 Nearby wake-up method and device for terminal, computing equipment and storage medium
CN111192589A (en) * 2020-01-16 2020-05-22 云知声智能科技股份有限公司 Voice wake-up method and device
CN111341297B (en) * 2020-03-04 2023-04-07 开放智能机器(上海)有限公司 Voice wake-up rate test system and method
CN111402883B (en) * 2020-03-31 2023-05-26 云知声智能科技股份有限公司 Nearby response system and method in distributed voice interaction system under complex environment
CN111863012A (en) * 2020-07-31 2020-10-30 北京小米松果电子有限公司 Audio signal processing method and device, terminal and storage medium
CN113066488B (en) * 2021-03-26 2023-10-27 深圳市欧瑞博科技股份有限公司 Voice wakeup intelligent control method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734845B1 (en) * 2015-06-26 2017-08-15 Amazon Technologies, Inc. Mitigating effects of electronic audio sources in expression detection
CN108831498A (en) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
CN109272989A (en) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 Voice awakening method, device and computer readable storage medium
CN109473118A (en) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 Double-channel pronunciation Enhancement Method and device
US20190098400A1 (en) * 2017-09-28 2019-03-28 Sonos, Inc. Three-Dimensional Beam Forming with a Microphone Array
CN109920433A (en) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 The voice awakening method of electronic equipment under noisy environment
CN109949810A (en) * 2019-03-28 2019-06-28 华为技术有限公司 A kind of voice awakening method, device, equipment and medium
CN110265020A (en) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 Voice awakening method, device and electronic equipment, storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1473964A3 (en) * 2003-05-02 2006-08-09 Samsung Electronics Co., Ltd. Microphone array, method to process signals from this microphone array and speech recognition method and system using the same
US9640179B1 (en) * 2013-06-27 2017-05-02 Amazon Technologies, Inc. Tailoring beamforming techniques to environments
CN104810021B (en) * 2015-05-11 2017-08-18 百度在线网络技术(北京)有限公司 The pre-treating method and device recognized applied to far field
CN106683685B (en) * 2016-12-23 2020-05-22 云知声(上海)智能科技有限公司 Target direction voice detection method based on least square method
CN108831495B (en) * 2018-06-04 2022-11-29 桂林电子科技大学 Speech enhancement method applied to speech recognition in noise environment
CN109102822B (en) * 2018-07-25 2020-07-28 出门问问信息科技有限公司 Filtering method and device based on fixed beam forming
CN110491403B (en) * 2018-11-30 2022-03-04 腾讯科技(深圳)有限公司 Audio signal processing method, device, medium and audio interaction equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734845B1 (en) * 2015-06-26 2017-08-15 Amazon Technologies, Inc. Mitigating effects of electronic audio sources in expression detection
CN108877827A (en) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 Voice-enhanced interaction method and system, storage medium and electronic equipment
US20190098400A1 (en) * 2017-09-28 2019-03-28 Sonos, Inc. Three-Dimensional Beam Forming with a Microphone Array
CN108831498A (en) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 The method, apparatus and electronic equipment of multi-beam beam forming
CN109272989A (en) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 Voice awakening method, device and computer readable storage medium
CN109473118A (en) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 Double-channel pronunciation Enhancement Method and device
CN109920433A (en) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 The voice awakening method of electronic equipment under noisy environment
CN109949810A (en) * 2019-03-28 2019-06-28 华为技术有限公司 A kind of voice awakening method, device, equipment and medium
CN110265020A (en) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 Voice awakening method, device and electronic equipment, storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113724704A (en) * 2021-08-30 2021-11-30 深圳创维-Rgb电子有限公司 Voice acquisition method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN110265020A (en) 2019-09-20
CN110265020B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
WO2021008000A1 (en) Voice wakeup method and apparatus, electronic device and storage medium
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
CN110503969B (en) Audio data processing method and device and storage medium
CN109272989B (en) Voice wake-up method, apparatus and computer readable storage medium
CN107577449B (en) Wake-up voice pickup method, device, equipment and storage medium
US9008329B1 (en) Noise reduction using multi-feature cluster tracker
CN107910011B (en) Voice noise reduction method and device, server and storage medium
RU2642353C2 (en) Device and method for providing informed probability estimation and multichannel speech presence
CN107464565B (en) Far-field voice awakening method and device
JP2021500634A (en) Target voice acquisition method and device based on microphone array
US9959886B2 (en) Spectral comb voice activity detection
CN110211599B (en) Application awakening method and device, storage medium and electronic equipment
CN110875060A (en) Voice signal processing method, device, system, equipment and storage medium
US10242677B2 (en) Speaker dependent voiced sound pattern detection thresholds
CN110556103A (en) Audio signal processing method, apparatus, system, device and storage medium
CN108922553B (en) Direction-of-arrival estimation method and system for sound box equipment
CN103180900A (en) Systems, methods, and apparatus for voice activity detection
WO2019080551A1 (en) Target voice detection method and apparatus
WO2015196760A1 (en) Microphone array speech detection method and device
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
CN110610718A (en) Method and device for extracting expected sound source voice signal
WO2016119388A1 (en) Method and device for constructing focus covariance matrix on the basis of voice signal
CN110830870B (en) Earphone wearer voice activity detection system based on microphone technology
CN112394324A (en) Microphone array-based remote sound source positioning method and system
CN109997186B (en) Apparatus and method for classifying acoustic environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937552

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19937552

Country of ref document: EP

Kind code of ref document: A1