WO2021008000A1

WO2021008000A1 - Voice wakeup method and apparatus, electronic device and storage medium

Info

Publication number: WO2021008000A1
Application number: PCT/CN2019/114378
Authority: WO
Inventors: 段相; 张珍斌
Original assignee: 大象声科（深圳）科技有限公司
Priority date: 2019-07-12
Filing date: 2019-10-30
Publication date: 2021-01-21
Also published as: CN110265020A; CN110265020B

Abstract

Disclosed are a voice wakeup method and apparatus, an electronic device and a storage medium. The method comprises: receiving sound signals collected by a microphone (S110); carrying out fixed beamforming on the sound signals to generate a plurality of beam signals (S120); calculating the signal-to-noise ratio of each beam signal (S130); determining a wakeup direction by means of the signal-to-noise ratios (S140); and carrying out a voice wakeup operation according to a sound signal in the wakeup direction (S150). By means of the wakeup method, a system can accurately determine a wakeup direction in a low signal-to-noise ratio environment, thereby effectively improving the accuracy of voice wakeup.

Description

Voice wake-up method, device, electronic equipment and storage medium

Technical field

The present disclosure relates to the technical field of intelligent voice interaction, and in particular to a voice wake-up method, device, electronic equipment, and storage medium.

Background technique

With the development of voice technology and advancement in the field of intelligent interaction, the demand for information exchange between humans and machines has become more and more urgent, and human-computer interaction has become a hot spot in current technological development.

Voice, as the most natural way of human interaction, has also become one of the most important ways people hope to replace the mouse, keyboard, and touch screen to communicate with computers. Voice wake-up technology has become a very important function in the process of human-computer interaction and has received more and more attention. Wake-up rate, false wake-up, response time and power consumption level are four general evaluation indicators for judging voice wake-up technology. With the development of voice wake-up technology, users are increasingly pursuing experience effects. The combination of traditional front-end voice enhancement technology and wake-up models has become an important way to improve the wake-up rate. At present, multi-microphone enhancement technology is widely used in front-end voice enhancement. With multi-mic technology, the signal-to-noise ratio of the input voice will be significantly enhanced, so that a better recognition effect can be obtained.

Under low signal-to-noise ratio, the voice wake-up rate is low due to interference and reverberation. In order to improve the wake-up rate, in addition to optimizing the back-end wake-up model, microphone technology can also be used to preprocess the sound signal. The use of multi-microphone technology can make full use of spatial information to enhance speech. The microphone array can solve the acoustic problems of the room, such as sound source location, tracking, noise cancellation, speech enhancement, signal source separation, and reverberation cancellation.

However, in the case of low signal-to-noise ratio, it is very challenging to accurately estimate the direction of arrival to enhance the specific direction. Inaccurate estimation will have a greater impact on the recognition rate.

Summary of the invention

In order to solve the technical problem of low accuracy of voice wake-up in related technologies, the present invention provides a voice wake-up method, device, electronic equipment, and storage medium.

In the first aspect, a voice wake-up method is provided, including:

Receive the sound signal collected by the microphone;

Performing fixed beamforming on the sound signal to generate multiple beam signals;

Calculate the signal-to-noise ratio of each beam signal;

Determining the wake-up direction by the signal-to-noise ratio;

Perform a voice wake-up operation according to the sound signal in the wake-up direction.

Optionally, the step of calculating the signal-to-noise ratio of each beam signal includes:

Calculating point source signal energy and background noise energy at each frequency point in the beam signal, the beam signal includes a target source signal, an interference source signal, and background noise, and the point source signal energy includes the target source signal energy and the interference source signal energy;

Calculating the signal-to-noise ratio of each frequency point in the beam signal by using the ratio of the energy of the point source signal at each frequency point of the beam signal to the background noise energy;

Before the step of calculating the signal-to-noise ratio of each frequency point in the beam signal, the method further includes:

The point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.

Optionally, the step of determining the most awakening direction through the signal-to-noise ratio includes:

Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames;

Eliminate the beam signal in the interference direction from all beam signals to obtain candidate beam signals;

According to the signal-to-noise ratio of each candidate beam signal, determine the beam signal direction with the largest signal-to-noise ratio;

Count the beam signal direction with the largest number of occurrences of the maximum signal-to-noise ratio in the preset number of frames, determine the direction where the beam signal is located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.

Optionally, the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames includes:

Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with the preset signal-to-noise ratio threshold;

When the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the difference between the signal-to-noise ratio in this direction and the second signal-to-noise ratio in the direction of the maximum signal-to-noise ratio beam signal, The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.

Count the sum of the difference values recorded in each beam signal direction within the preset number of frames; determine the beam signal direction whose sum is greater than zero and the largest as the interference direction.

Optionally, the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal includes:

In order to ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the condition is set as Optimal beam signal direction.

Optionally, detecting the signal source direction according to the signal-to-noise ratio of each candidate beam signal, the steps include:

Sort the signal-to-noise ratios of the candidate beam signals;

If within the preset number of consecutive frames, the maximum signal-to-noise ratio in the candidate beam signal exceeds a certain threshold, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, and the maximum signal-to-noise ratio If the beam signal direction of the ratio remains the same, the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.

Optionally, after the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames, the step of determining the wake-up direction through the signal-to-noise ratio further includes:

It is determined whether the energy of the target source signal exceeds a certain threshold of the energy of the interference source signal, and if so, the interference direction is determined as the wake-up direction.

In a second aspect, a voice wake-up device is provided, including:

The sound signal receiving module is used to receive the sound signal collected by the microphone;

A fixed beamforming module, configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions;

The signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal;

A wake-up direction determination module, configured to determine the wake-up direction based on the signal-to-noise ratio;

The voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.

In a third aspect, an electronic device is provided, including:

At least one processor; and

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided for storing a program that, when executed, causes an electronic device to perform the method described in the first aspect.

The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

After receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform the voice wake-up operation, making the system in a low signal-to-noise ratio environment It can also accurately determine the wake-up direction, effectively improving the accuracy of voice wake-up.

It should be understood that the above general description and the following detailed description are only exemplary and cannot limit the scope of the present invention.

Description of the drawings

The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments in accordance with the present invention, and are used together with the specification to explain the principle of the present invention.

Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.

Fig. 2 is a beam signal direction diagram according to the embodiment corresponding to Fig. 1.

FIG. 3 is a specific implementation flowchart of step S130 in the voice wake-up method in the embodiment corresponding to FIG. 1.

Fig. 4 is a schematic diagram of a microphone array according to an exemplary embodiment.

Fig. 5 is a specific implementation flow chart of step S140 in the voice wake-up method in the embodiment corresponding to Fig. 1.

FIG. 6 is a specific implementation flowchart of step S141 shown in the embodiment corresponding to FIG. 5.

FIG. 7 is a specific implementation flowchart of step S143 shown in the embodiment corresponding to FIG. 5.

FIG. 8 is another specific implementation flowchart of step S140 in the voice wake-up method in the embodiment corresponding to FIG. 5.

Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.

FIG. 10 is a block diagram of the signal-to-noise ratio calculation module 130 in the voice wake-up device according to the embodiment corresponding to FIG. 9.

FIG. 11 is a block diagram of the wake-up direction determining module 140 in the voice wake-up device according to the embodiment corresponding to FIG. 9.

Detailed ways

Here, an exemplary embodiment will be described in detail, and examples thereof are shown in the accompanying drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present invention. On the contrary, they are merely examples of devices and methods consistent with the present invention as detailed in the appended claims.

Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment. The voice wake-up method can be used in electronic devices such as smart phones and computers. As shown in Fig. 1, the voice wake-up method may include step S110, step S120, step S130, and step S140.

Step S110: Receive the sound signal collected by the microphone.

When the electronic device is awakened by voice, the electronic device will collect the sound signal through the microphone.

However, the sound signal collected by the microphone not only includes the voice signal used for voice wake-up, but also contains interference noise.

Therefore, the voice awakening rate is improved through voice front-end enhancement technology.

Optionally, the voice signal can be collected through a microphone array, and the number of microphones is M, then the collected microphone signal is:

X(n)=[x ₁ (n),x ₂ (n),...,x _M (n)] ^T

Among them, n represents time, and T represents transpose.

The collected speech signal is transformed into a frequency domain signal X(w)=[x ₁ (w), x ₂ (w),..., x _M (w)] ^T , w represents the frequency point using short-time Fourier transform.

Step S120: Perform fixed beamforming on the voice signal to generate multiple beam signals.

There are many methods for performing fixed beamforming on voice signals. For example, a delayed sum beamforming method and a filtered sum beamforming method can be used.

In an exemplary embodiment, the number of beams is BN (BN≥M), and the direction of the beams is fixed and uniformly distributed (linear array 0°-180°, circular array 0°-360°). The coefficients of the beamformer can be implemented by delay-sum technology and filter-sum technology, and different beamforming methods can also be used for different frequency bands. Delay and sum beamforming can obtain higher white noise amplification gain; in the filtering and summing beamforming method, the differential array is widely used due to its smaller size and better frequency invariance characteristics.

Aiming at the broadband characteristics of speech, this embodiment designs a beamformer with broadband and gain independent of frequency band. Figure 2 is a beam signal pattern shown in this embodiment. The array is a circular array, the number of microphones is 4, and the radius is 0.035m, Figure 2(a) is a three-dimensional schematic diagram of frequency angle gain, and Figure 2(b) is a beam polar coordinate diagram. It can be seen from Fig. 2 that the beam sidelobe attenuation of this embodiment is about 25dB. After beam enhancement, BN beam output can be obtained:

Y(w)=[y ₁ (w),y ₂ (w),...,y _BN (w)] ^T , w represents the frequency point.

Step S130: Calculate the signal-to-noise ratio of each beam signal.

The signal-to-noise ratio is the ratio between the energy in the sound signal and the noise spectrum.

Optionally, as shown in FIG. 3, step S130 may include step S131, step S132, and step S133.

Step S131: For each beam signal, calculate the energy and noise spectrum of each frequency point in the beam signal.

Step S132: Calculate the signal-to-noise ratio of each frequency point in the beam signal based on the energy and noise spectrum of each frequency point in the beam signal.

Since the sound signal is composed of signals of different frequency points, the corresponding beam signals also include signals of different frequency points. Therefore, in order to improve the voice awakening rate, the energy and noise spectrum of each frequency point in the beam signal must be calculated.

Optionally, in order to further improve the voice awakening rate, after the interference direction is determined, a corresponding smoothing factor is selected to smooth the energy of each frequency point in the beam signal through the position relationship between each beam signal and the interference direction.

Specifically, when calculating the energy of the beam signal, the calculated energy needs to be smoothed, and the smoothing factor weight _i (w) (i=1, 2...BN). The magnitude of the smoothing factor weight _i (w) is related to the interference direction, that is, the greater the smoothing factor set for the beam signal whose beam direction is closer to the interference direction. For example, the corresponding beam signals in the interference fourth direction in FIG point 3, the beam signal smoothing factor 2 weight ₂ (w) and the beam signal smoothing factor 4 weight ₄ (w) corresponding to the weight is large (w) corresponding to beam signals 1weight ₁ the weight of. For example, in Fig. 3, weight ₂ (w)=weight ₄ (w)≈0.8, and weight ₁ (w)=0.6.

When estimating the noise spectrum for each frequency point, the estimation method can be: MCRA, IMCRA, MARTIN, DOBLINGER, HIRSCH and other single-channel noise spectrum estimation methods, and other noise spectrum estimation methods can also be used. The specific noise spectrum is not determined here. The estimation methods are described one by one.

Step S133, representing the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal in the preset frequency band.

In the embodiment of the present invention, the preset frequency band range is 0-2 kHz. After calculating the signal-to-noise ratio of each frequency point in the beam signal, by calculating the average value of the signal-to-noise ratio in the frequency band of 0-2kHz, the signal-to-noise ratio SNR _i (i=1, 2...BN) of the beam signal is obtained.

Step S140: Determine the wake-up direction according to the signal-to-noise ratio.

The wake-up direction is the voice wake-up direction confirmed by the present invention.

The present invention calculates the signal-to-noise ratio of each beam signal, and then determines the wake-up direction from multiple beam signals according to the signal-to-noise ratio of each beam signal, and then uses the beam signal in the wake-up direction to perform voice wake-up operations on the electronic device.

When selecting the optimal beam signal direction from all beam signals, it can be determined as the optimal beam signal direction according to the beam signal direction with the largest average signal-to-noise ratio in a certain period of time, or it can be the largest signal-to-noise direction in a certain period of time. The beam signal direction with the largest number of frames is determined as the optimal beam signal direction, and the optimal beam signal direction may also be determined by other methods, which will not be described here.

Optionally, as shown in FIG. 5, step S140 may include steps S141, S142, S143, and S144.

Step S141: Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames.

The interference direction is the direction of the noise source that interferes with the voice signal relative to the electronic device. As shown in Figure 2, 3 is the interference source, the electronic device is located at the center of the circle, and the direction of the interference source 3 relative to the center of the circle is the interference direction.

Since the interference source has a greater impact on the voice signal during voice wake-up, the sound signal generated by the interference source will have a greater impact on each beam signal. Therefore, by pre-determining the interference direction, and then through the positional relationship between other beam signal directions and the interference source, the corresponding beam signal is smoothed, thereby effectively reducing the influence of the interference source on voice wake-up and improving the accuracy of voice wake-up .

There are many ways to determine the interference direction according to the signal-to-noise ratio of each beam signal in the preset number of frames. It can be the beam signal with the largest number of frames whose signal-to-noise ratio reaches the preset signal-to-noise ratio threshold within the preset number of frames. The direction is determined as the interference direction, or the direction of the beam signal with the largest average signal-to-noise ratio within the preset number of frames is determined as the interference direction, or the interference direction may be determined by other methods, which will not be described here.

Optionally, as shown in FIG. 6, step S141 may include steps S1411, S1412, and S1413.

Step S1411: Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with a preset signal-to-noise ratio threshold.

Step S1412, when the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the signal-to-noise ratio in the direction of the beam signal with the maximum signal-to-noise ratio between that direction and the second signal-to-noise ratio The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.

Step S1413: Count the sum of the difference values recorded in each beam signal direction within the preset number of frames. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction.

Specifically, the preset number of frames is T1, and preferably, T1≥2000 frames. For the signal-to-noise ratio SNR _i (i=1, 2,...,BN) of each beam signal, find the maximum signal-to-noise ratio MAXSNR of all beam signal directions, set the threshold ε, if MAXSNR<ε, then consider the beam signal It is a silent segment, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio is recorded in the direction of the maximum signal-to-noise ratio beam signal, and the other directions are recorded as zero. If MAXSNR>ε, then mark all beam signal directions to zero, and finally count the sum of the difference values recorded in each beam signal direction within the preset frame number T1. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction. In practical applications, the value of T1 and ε will be selected according to specific scenarios, so as to better improve the accuracy of the interference direction judgment. In the embodiment of the present invention, T1>2000 frames, ε=10dB.

Step S142: Remove the beam signal corresponding to the interference direction from all beam signals to obtain candidate beam signals.

The candidate beam signal is a beam signal set after removing the beam signal corresponding to the interference direction from all beam signals.

Generally, the interference direction determined by the technical solution of the present invention is not the optimal beam signal direction. Therefore, when the optimal beam signal direction is determined, the beam signal corresponding to the interference direction is removed from all beam signals, and then according to the candidate beam signal The signal-to-noise ratio further determines the optimal beam signal direction to improve the accuracy of determining the optimal beam signal direction.

Step S143: Determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal.

In order to further ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal.

Specifically, as shown in FIG. 7, step S143 may further include step S1431, step S1432, step S1433, and step S1434:

Step S1431: Sort the signal-to-noise ratios of the candidate beam signals.

Step S1432: If the maximum signal-to-noise ratio of the candidate beam exceeds the threshold within the preset number of consecutive frames, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, step S1433 is executed ; If it is no, no processing is done.

In step S1433, it is judged whether the beam signal directions of the maximum signal-to-noise ratio remain consistent within the preset number of consecutive frames. If it is (Y), then execute step S1434; if it is not, then no processing is performed.

Step S1434: Set the beam signal direction where the maximum SNR is located as the optimal beam signal direction within a certain preset number of frames.

Specifically, the signal-to-noise ratios of candidate beam signals other than the interference direction are sorted, and the beam signal in the direction of the maximum signal-to-noise ratio is selected. The beam signal direction has the maximum signal-to-noise ratio in consecutive N frames, and its signal-to-noise ratio MAXSNR>δ (preset threshold) within N consecutive frames, and exceeds a certain threshold μ of the second largest signal-to-noise ratio SECSNR, It is assumed that the direction of the MAXSNR beam signal is the optimal beam signal direction. In the subsequent process of calculating the optimal beam signal direction, the optimal beam signal direction is set to the MAXSNR direction within a certain time range T3. The size of T3 depends on different wake words. Preferably, the threshold N=3, δ=5, μ=3, and T3=65.

Step S144: Count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction where the candidate beam signals are located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.

In each frame, according to the signal-to-noise ratio of each candidate beam signal, count the candidate beam signals with the largest signal-to-noise ratio in the preset number of frames, and determine the direction of the candidate beam signal as the optimal Beam signal direction.

In an exemplary embodiment, the number of frames is T2. Within T2 preset frame numbers, in the BN beams, except for the beam signals in the interference direction that have been detected, if the maximum SNR is less than the threshold th, then the current frame The optimal beam signal direction of is consistent with the previous frame; if the maximum SNR is greater than the threshold th, the beam signal corresponding to the maximum SNR is recorded as 1, the beam signal in the interference direction and other smaller SNR beam signals are recorded as 0, and the statistics are in The direction of the beam signal with the largest number of SNR occurrences in the T2 frames is determined as the optimal beam signal direction of the current frame. Preferably, 20≤T2≤100 and th=10.

Optionally, in order to further improve the accuracy of the optimal beam signal direction selection in an environment where the energy of the signal source is much greater than the energy of the interference source, in the process of determining the interference direction, if such a situation exists (for example, the energy of the signal source exceeds Interference source energy 20dB), the wake-up direction will be identified as the interference direction. At this time, the statistical interference direction and the optimal beam signal direction determined according to the aforementioned method are both used as the wake-up direction, that is, two wakeups are performed. If one of them exceeds the threshold, Determined to be awake state.

Specifically, as shown in FIG. 8, step S140 may further include step S146 and step S147:

Step S146: It is judged whether the energy of the signal source is far greater than a certain threshold of the energy of the interference source. If yes (Y), proceed to step S147; if no, proceed to step S142.

Step S147: Determine the interference direction as the wake-up direction.

It should be noted that after the interference direction is selected, the interference direction can be determined as the wake-up direction, and steps S142, S143, and S144 are performed at the same time to determine the optimal beam signal direction, and the interference direction and the optimal beam signal direction are carried out in two beams. Signal direction wake-up; it is also possible to determine that the signal source energy is greater than a certain threshold of the interference source energy, and determine the interference direction as the wake-up direction, without performing steps S142, S143, S144, and directly perform voice wake-up operations according to the voice signal in the interference direction. To improve the efficiency of voice wake-up operations.

Step S150: Perform a voice wake-up operation according to the voice signal in the wake-up direction.

Using the above method, after receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform speech according to the sound signal in the wake-up direction The wake-up operation enables the system to accurately determine the wake-up direction even in a low signal-to-noise ratio environment, which effectively improves the accuracy of voice wake-up.

In order to illustrate the effect of the voice front-end enhancement method for improving the wake-up rate of the present invention, the experimental results are used to illustrate the experiment. The experiment is carried out in a room of 6ⅹ3ⅹ3.5m. The microphone array is a 4 wheat circular array with a radius of 0.035m and is located at 3ⅹ1.5ⅹ1.5m. The interference direction is at 2ⅹ1.5ⅹ1.5m, and the wake-up positions are distributed on a circle 1.2m away from the microphone. There are two wake-ups every 30 degrees, a total of 24 wake-ups. Three types of interference are used for testing, namely music, babble and TV For interference, the test signal-to-noise ratio is -5dB, 0dB, and 5dB, respectively, and the test results are shown in Table 1. The third line in the table shows the probability that the best beam signal direction is correct. From the results, it can be seen that the probability of babble interference at -5dB is lower, and the probability of other cases is above 80%. Rows 4 and 5 of Table 1 are the wake-up results of single wheat and the present invention respectively. From the table, it can be seen that the present invention can significantly improve the arousal rate.

Table 1 Figure of wake-up experiment results

The following are embodiments of the disclosed device, which can be used to implement the above-mentioned voice wake-up method embodiments. For details not disclosed in the embodiments of the device of the present disclosure, please refer to the embodiments of the voice wake-up method of the present disclosure.

Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment. The device includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction The determination module 140 and the voice wake-up operation module 150.

The sound signal receiving module 110 is used to receive the sound signal collected by the microphone;

The fixed beam forming module 120 is configured to perform fixed beam forming on the sound signal to generate multiple beam signals;

The signal-to-noise ratio calculation module 130 is used to calculate the signal-to-noise ratio of each beam signal;

The wake-up direction determination module 140 is configured to determine the wake-up direction according to the signal-to-noise ratio;

The voice wake-up operation module 150 is configured to perform voice wake-up operations according to the voice signal in the wake-up direction.

For the implementation process of the functions and roles of each module in the above-mentioned device, see the implementation process of the corresponding steps in the above-mentioned voice wake-up method for details, which will not be repeated here.

Optionally, as shown in FIG. 10, the SNR calculation module 130 described in FIG. 9 includes, but is not limited to: a signal energy and noise spectrum calculation unit 131, a frequency point SNR calculation unit 132, and a beam signal SNR Calculating unit 133.

The signal energy and noise spectrum calculation unit 131 is configured to calculate the energy and noise spectrum of each frequency point in the beam signal for each beam signal;

The frequency point signal-to-noise ratio calculation unit 132 is configured to pass the signal-to-noise ratio of each frequency point in the beam signal;

The beam signal signal-to-noise ratio calculation unit 133 is configured to indicate the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal within a preset frequency band.

Optionally, the signal-to-noise ratio calculation module 130 described in FIG. 10 further includes, but is not limited to, a smoothing processing unit.

The smoothing processing unit is used for smoothing the energy of each frequency point in the beam signal through a smoothing factor.

Optionally, as shown in FIG. 11, the wake-up direction determination module 140 described in FIG. 9 includes but is not limited to: an interference direction determination unit 141, a rejection unit 142, a candidate beam signal determination unit 143 and a wake-up direction determination unit 144.

The interference direction determining unit 141 is configured to determine the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames;

The removing unit 142 is configured to remove the beam signal in the interference direction from all the beam signals to obtain candidate beam signals;

The candidate beam signal determining unit 143 is configured to determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal;

The wake-up direction determining unit 144 is configured to count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction in which the candidate beam signals are located as the optimal beam signal direction, and determine the optimal beam signal direction. The direction of the beam signal is used as the wake-up direction.

Optionally, the present invention also provides an electronic device that performs all or part of the steps of the voice wake-up method shown in any of the foregoing exemplary embodiments. Electronic equipment includes:

Processor; and

A memory connected in communication with the processor; wherein,

The memory stores readable instructions, and when the readable instructions are executed by the processor, the method according to any of the foregoing exemplary embodiments is implemented.

The specific manner in which the processor in the terminal performs operations in this embodiment has been described in detail in the embodiment of the voice wake-up method, and will not be elaborated here.

In an exemplary embodiment, a storage medium is also provided. The storage medium is a computer-readable storage medium, for example, it may be a temporary and non-transitory computer-readable storage medium including instructions.

It should be understood that the present invention is not limited to the specific structure described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present invention is only limited by the appended claims.

Claims

A voice wake-up method, characterized in that the method includes:

Receive the sound signal collected by the microphone;

Performing fixed beamforming on the sound signal to generate multiple beam signals;

Calculate the signal-to-noise ratio of each beam signal;

Determining the wake-up direction by the signal-to-noise ratio;

Perform a voice wake-up operation according to the sound signal in the wake-up direction.
The method according to claim 1, wherein the step of calculating the signal-to-noise ratio of each beam signal comprises:

Calculate the point source signal energy and background noise energy of each frequency point in each of the beam signals, the beam signal includes the target source signal, the interference source signal and the background noise, and the point source signal energy includes the target source signal energy and the interference source signal energy ；

Calculating the signal-to-noise ratio of each frequency point in the beam signal by using the ratio of the energy of the point source signal at each frequency point of the beam signal to the background noise energy;

The average value of the signal-to-noise ratio of the beam signal in the preset frequency band is used to indicate the signal-to-noise ratio of the beam signal.
The method according to claim 2, wherein before the step of calculating the signal-to-noise ratio of each frequency point in the beam signal, the method further comprises:

The point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.
The method according to claim 1, wherein the step of determining the wake-up direction according to the signal-to-noise ratio comprises:

Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames;

Eliminate the beam signal in the interference direction from all beam signals to obtain candidate beam signals;

Determine the beam signal direction with the largest signal-to-noise ratio according to the signal-to-noise ratio of each candidate beam signal;

Count the beam signal direction with the largest number of occurrences of the maximum signal-to-noise ratio in the preset number of frames, determine the direction where the beam signal is located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
The method according to claim 4, wherein the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames comprises:

Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with the preset signal-to-noise ratio threshold;

When the maximum SNR of all beam signals is less than the preset SNR threshold, in the direction of the maximum SNR beam signal, record the difference between the SNR in this direction and the second SNR , The difference value recorded in other directions is zero; if the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero;

Count the sum of the difference values recorded in each beam signal direction within the preset number of frames; determine the beam signal direction whose sum is greater than zero and the largest as the interference direction.
The method according to claim 4, wherein the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal comprises:

In order to ensure the output stability when the optimal beam signal direction is the signal source direction, the signal source direction is detected according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the conditions is set to the maximum within the preset number of frames. Optimal beam signal direction.
The method according to claim 6, wherein the step of detecting the signal source direction according to the signal-to-noise ratio of each candidate beam signal comprises:

Sort the signal-to-noise ratios of the candidate beam signals;

If SDXDCDDS in the preset number of consecutive frames, the maximum SNR in the candidate beam signal exceeds a certain threshold, the difference between the maximum SNR and the second SNR reaches the preset difference threshold, and the maximum SNR If the beam signal direction of the ratio remains the same, the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.
The method according to claim 4, wherein after the step of determining the interference direction according to the signal-to-noise ratio of each beam signal in the preset number of frames, the step of determining the wake-up direction by the signal-to-noise ratio further comprises:

It is determined whether the energy of the target source signal exceeds a certain threshold of the energy of the interference source signal, and if so, the interference direction is determined as the wake-up direction.
A voice wake-up device, characterized in that the device includes:

The sound signal receiving module is used to receive the sound signal collected by the microphone;

A fixed beamforming module, configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions;

The signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal;

A wake-up direction determination module, configured to determine the wake-up direction based on the signal-to-noise ratio;

The voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.
An electronic device, characterized in that, the electronic device includes:

At least one processor; and

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any one of claims 1-8 Methods.
A computer-readable storage medium for storing a program, characterized in that, when the program is executed, an electronic device executes the method according to any one of claims 1-8.