CN113223554A

CN113223554A - Wind noise detection method, device, equipment and storage medium

Info

Publication number: CN113223554A
Application number: CN202110276288.5A
Authority: CN
Inventors: 李伟南; 陈超
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2021-08-06

Abstract

The application discloses a wind noise detection method, a wind noise detection device, wind noise detection equipment and a storage medium, and relates to the technical field of computers, in particular to the technical field of voice. The specific implementation scheme is as follows: acquiring audio signal sequences respectively acquired by at least two microphones; calculating cross-correlation data of at least two audio signal sequences, and calculating a wind noise detection value according to the cross-correlation data; adjusting the detection area range of a wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value; and determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range. By the method and the device, the condition of false detection or missed detection can be reduced when wind noise detection is carried out, and the accuracy and the robustness of the wind noise detection are improved.

Description

Wind noise detection method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to an audio processing technology and a speech technology, and in particular, to a wind noise detection method, apparatus, device, and storage medium.

Background

With the continuous development of scientific technology in various fields of life and work, personal portable equipment is gradually popularized, and various voice equipment based on outdoor scenes are produced. When real-time voice communication, recording or interaction with an intelligent voice assistant are carried out outdoors, wind noise interference is inevitable. The presence of wind noise severely impacts real-time speech and speech intelligibility, while the impact of wind noise on speech recognition is catastrophic.

In order to reduce the influence of wind noise on voice, the influence of the wind noise on the voice is relieved to a certain extent by introducing a wind noise removing algorithm, and wind noise detection is used as an important component of wind noise removal and plays a crucial role in the final effect of the wind noise removing algorithm.

In the prior art, a dual-microphone acquisition signal can be generally used for detecting wind noise, but due to the complex environment for generating the wind noise, the situation of false detection or missed detection is easy to occur, so that the accuracy rate during detection is low, and the robustness is poor.

Disclosure of Invention

The application provides a wind noise detection method, a wind noise detection device, wind noise detection equipment and a storage medium, so as to optimize a wind noise detection scheme.

According to an aspect of the present application, there is provided a wind noise detection method, the method including:

acquiring audio signal sequences respectively acquired by at least two microphones;

calculating cross-correlation data of at least two audio signal sequences, and calculating a wind noise detection value according to the cross-correlation data;

adjusting the detection area range of a wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value;

and determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

According to another aspect of the present application, there is provided a wind noise detection apparatus, the apparatus including:

the system comprises an audio signal sequence acquisition module, a processing module and a processing module, wherein the audio signal sequence acquisition module is used for acquiring audio signal sequences respectively acquired by at least two microphones;

the wind noise detection value calculation module is used for calculating cross-correlation data of at least two audio signal sequences and calculating a wind noise detection value according to the cross-correlation data;

the detection region range adjusting module is used for adjusting the detection region range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value;

and the wind noise detection result determining module is used for determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a wind noise detection method according to any of the embodiments of the present application.

According to an aspect of the application, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the wind noise detection method according to any one of the embodiments of the application.

According to an aspect of the application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a wind noise detection method as in any one of the embodiments of the application.

According to the technical scheme of the embodiment of the application, the wind noise detection scheme can be improved, and the accuracy and robustness of wind noise detection are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram of a wind noise detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of another wind noise detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application;

FIG. 6A is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application;

FIG. 6B is a schematic diagram of a preferred wind noise detection method according to an embodiment of the present application;

FIG. 7 is a schematic view of a wind noise detection apparatus according to an embodiment of the present application;

FIG. 8 is a block diagram of an electronic device for implementing a method of wind noise detection according to an embodiment of the present application;

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a wind noise detection method according to an embodiment of the present application. The embodiment can be suitable for the condition of detecting whether the wind noise exists in the voice equipment when the wind noise reduction processing is carried out on the voice equipment by using the wind noise removing algorithm, thereby reducing the false detection or missing detection during the wind noise detection and improving the accuracy and robustness of the wind noise detection. The wind noise detection method disclosed in this embodiment may be implemented by a wind noise detection method apparatus, which may be implemented by software and/or hardware and configured in an electronic device with computing and storing functions. The electronic device can be a terminal device, an earphone, a server, or a voice device with audio processing capability such as a smart speaker.

Referring to fig. 1, the wind noise detection method provided in this embodiment includes:

and S110, acquiring audio signal sequences respectively acquired by at least two microphones.

The audio signal sequence is the same voice signal collected by different microphones configured in the same voice device. The audio signal sequence may be collected by a microphone disposed in the speech device when the user is performing real-time voice communication, recording, or interacting with the intelligent voice assistant through the speech device outdoors.

At least two microphones are arranged on the same voice device, the distance is close, and wind noise can be detected by comparing the cross correlation of the audio frequency with each other. In an optional embodiment, the at least two microphones comprise a main microphone and an auxiliary microphone which are respectively arranged on a bluetooth headset, and the main microphone is 3-4 cm away from the auxiliary microphone; the primary microphone is adjacent the user's mouth relative to the secondary microphone.

The audio signal sequence may be a time-sequence signal or a frequency-domain signal, and preferably, the audio signal sequence is a frequency-domain signal. If the audio signal sequence is a time sequence signal, the time sequence signal can be converted into a frequency domain signal.

In an optional embodiment, at least two audio signal sequences in a time domain format respectively acquired by microphones are acquired, and each audio signal sequence comprises at least two frames of audio signals; and converting the audio signal sequence in the time domain format into the audio signal sequence in the frequency domain format.

Where the audio signal sequence may include a plurality of audio signal frames, the microphone may capture 60 audio signals within 1 second, for example. When the wind noise detection is carried out on the audio signal sequence, in order to ensure the accuracy of the wind noise detection, the obtained audio signal sequence comprises at least two frames of audio signals.

Since the time domain signal is a function that describes the value of the signal at different times. The frequency domain signal describes the frequency structure of the signal and the relationship of the frequency to the amplitude of the frequency signal. Since the signal not only changes with time but also relates to frequency, phase, etc., it is necessary to analyze the frequency structure of the signal and describe the signal in the frequency domain. The audio signal is a synthesized signal, is decomposed into the sum of a plurality of sine harmonic functions according to the Fourier theorem, and analyzes the amplitude-frequency and phase-frequency characteristics of the audio signal in a frequency domain, so that the method is more intuitive.

In an alternative embodiment, converting each time-domain formatted audio signal sequence into a frequency-domain formatted audio signal sequence comprises: splicing at least two frames of audio signal sequences in time domain formats by adopting an overlap preservation method based on a window function aiming at each audio signal sequence in time domain formats to form audio signals in time domain formats; and carrying out Fourier transform on the audio signals in the time domain format to form audio signals of a plurality of sampling frequency points as an audio signal sequence in the frequency domain format.

The overlap-preserving method is to preserve the original input sequence with a certain number of bits at the front end of the segmented signal to extend the signal sequence. For one audio signal sequence in the time domain format, based on a window function, at least two frames of audio signal sequences in the time domain format are spliced by adopting an overlap preservation method, specifically, a historical frame of the audio signal sequence is spliced to a current frame, and the size of an overlap region of the historical frame and the current frame is related to the size of a window in the window function.

Since the audio signal sequence may include wind noise in addition to the clean speech signal. The audio signal sequence in the time domain may be represented by the formula y (m) ═ x (m) + d (m), where x (m) represents a clean speech signal; d (m) represents wind noise.

Specifically, the audio signal in the time domain format is fourier transformed by using the following formula:

Y_i(m,k)＝DFT(y_i(m)w)

wherein, y_i(m) w is the input signal y using Overlap-Save_i(m) splicing results obtained by splicing. i-1, 2 denotes the ith microphone, y₁(m) and y₂(m) time domain audio signal sequences collected by the 1 st microphone and the 2 nd microphone respectively, wherein m represents the current frame, w is a window function, and k representsK frequency point, Y_i(m, k) is the frequency domain audio signal sequence after discrete time fourier transform.

The frequency domain audio signal sequence comprises a plurality of frequency bins. Wherein the frequency points are numbers given to fixed frequencies. Specifically, all frequency intervals covered by frequency domain audio signals obtained by converting a time domain audio signal sequence are equally divided, a certain number of frequency bands are obtained after division, each frequency band is numbered in sequence, and the number is a frequency point. Illustratively, the frequency intervals are all 200 KHz. Thus, the frequency bands are divided into 125 frequency bands from 890MHz, 890.2MHz, 890.4MHz, 890.6MHz, 890.8MHz and 891MHz … 915MHz according to the frequency interval of 200KHz, and each frequency band is numbered from 1, 2, 3 and 4 … 125; these numbers for fixed frequencies are known as frequency bins.

And S120, calculating cross-correlation data of at least two audio signal sequences, and calculating a wind noise detection value according to the cross-correlation data.

The cross-correlation data describes, among other things, the degree of correlation between the audio signal sequences and the degree of correlation between the audio signals that make up the respective audio signal sequences themselves. The cross-correlation data is a measure of the similarity of the audio signals themselves between and within the sequences of audio signals.

The wind noise detection value is calculated according to the cross-correlation data, whether wind noise exists in the current audio signal sequence or not can be roughly known according to the wind noise detection value, and when the detection value is smaller, the current audio signal sequence is more prone to wind noise.

And S130, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

Optionally, the wind noise detection threshold includes an upper wind noise detection threshold and a lower wind noise detection threshold.

The detection area range of the wind noise detection threshold value is an interval determined by the upper wind noise detection threshold value and the lower wind noise detection threshold value together. The detection area of the wind noise detection threshold value directly determines the accuracy of wind noise detection, and the wind noise detection value can represent the existence possibility of wind noise. When the wind noise is detected, the probability of the existence of the wind noise is high when the wind noise detection value is small, and at the moment, the detection range of the wind noise detection threshold value is adjusted, so that the wind noise detection value or the cross-correlation data fall into the middle of the detection area range as much as possible, the wind noise detection value or the cross-correlation data are prevented from falling into the edge of the detection area range, and the condition of false detection can be effectively reduced. The false detection is a case where an audio signal sequence without wind noise is erroneously determined to have wind noise. In the above scheme, the condition that the value is greater than the upper wind noise detection threshold value generally indicates that wind noise does not exist, and the condition that the value is less than the lower wind noise detection threshold value generally indicates that wind noise exists, belongs to a determination region of wind noise detection, and has a low probability of false detection. And the detection value falling in the detection area range is an uncertain area with the possibility of wind noise or without wind noise. According to the technical scheme, the range of the detection area is dynamically adjusted according to the possibility of wind noise in the uncertain area, so that the detection in the uncertain area is more accurate.

And S140, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

The relation between the cross-correlation data and the detection area range refers to the relative position relation between the cross-correlation data and the detection area range. Specifically, the relative positional relationship includes: the cross-correlation data is located on the left side, the right side or in the detection area range of the wind noise detection interval.

And the wind noise detection result is whether wind noise exists in the audio signal sequence. Whether wind noise exists in the audio signal sequence can be determined by judging the relative position relation between the cross-correlation data and the detection area range

The technical scheme of the embodiment of the application comprises the steps of calculating cross-correlation data of at least two audio signal sequences, calculating a wind noise detection value according to the cross-correlation data, and adjusting the detection area range of a wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value; and finally, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range. According to the wind noise detection method and device, the detection area range of the wind noise detection threshold value is cut off according to the possibility of wind noise, the reliability of wind noise detection is improved, and meanwhile the false detection rate is reduced.

FIG. 2 is a schematic diagram of another wind noise detection method according to an embodiment of the present application; the present embodiment is an alternative proposed on the basis of the above-described embodiments. Specifically, the method is used for refining the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

Referring to fig. 2, the wind noise detection method provided in this embodiment includes:

s210, acquiring audio signal sequences respectively acquired by at least two microphones.

S220, cross-correlation data of at least two audio signal sequences are calculated, and a wind noise detection value is calculated according to the cross-correlation data.

And S230, if the wind noise existence possibility represented by the wind noise detection value is smaller than a set possibility threshold value, adjusting the detection area range of the wind noise detection threshold value to increase the probability that the wind noise detection result is no wind noise.

The setting of the probability threshold is used as a basis for adjusting the detection range of the wind noise detection threshold. The set probability threshold is an empirical value set by a relevant technician according to an actual situation, and is not limited herein, and the specific actual situation determines that, in an alternative embodiment, when the value range of the probability of the wind noise is between 0 and 1, the value range of the set probability threshold is between 0.4 and 0.5.

If the wind noise existence possibility represented by the wind noise detection value is smaller than the set possibility threshold, the possibility that wind noise exists in the current audio signal sequence is higher, and at the moment, the detection area range of the wind noise detection threshold is adjusted, so that the lower limit value and the upper limit value of the detection area can be reduced, and the false detection rate is reduced.

If the wind noise existence possibility represented by the wind noise detection value is larger than or equal to the set possibility threshold, the wind noise existence possibility in the current audio signal sequence is less, and the current audio signal sequence is more prone to no wind noise. At this time, the detection range of the wind noise detection threshold value does not need to be adjusted, and the detection area range of the original wind noise detection threshold value is used for subsequent calculation.

In an alternative embodiment, the detection area range of the wind noise detection threshold value is adjusted according to the following formula:

wherein, C_low(m),C_up(m) is the lower limit of the detection area and the upper limit of the detection area, η_low,η_upSetting an initial lower limit value and an initial upper limit value of a detection area; beta is the adjustment coefficient of the lower limit value of the detection area, gamma is the adjustment coefficient of the upper limit value of the detection area, 0<β<γ<1，Γ_thresIs the set likelihood threshold.

Since, both β and γ are fractions between 0 and 1, and β<Gamma, the lower limit value and the upper limit value of the detection area are adjusted by beta and gamma to make the difference from eta_lowAnd η_upThe detection area range of the wind noise detection threshold value is determined, and the effect of cutting off the detection area range of the original wind noise detection threshold value is generated.

And S240, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

According to the technical scheme, under the condition that the wind noise existence possibility represented by the wind noise detection value is smaller than the set possibility threshold, the detection area range of the wind noise detection threshold is adjusted, the reliability of wind noise detection is improved, and meanwhile the false detection rate is reduced.

FIG. 3 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application; the present embodiment is an alternative proposed on the basis of the above-described embodiments. Specifically, the audio signal sequence is an audio signal sequence in a frequency domain format, and this embodiment is a refinement of the operation "calculating cross-correlation data of at least two audio signal sequences, and calculating a wind noise detection value according to the cross-correlation data is a refinement of adjusting a detection region range of a wind noise detection threshold value according to a wind noise existence possibility represented by the wind noise detection value".

Referring to fig. 3, the wind noise detection method provided in this embodiment includes:

and S310, acquiring audio signal sequences respectively acquired by at least two microphones.

And S320, calculating an autocorrelation value aiming at the audio signal of each frequency point in each audio signal sequence.

The autocorrelation value refers to the correlation degree between the values of any different frequencies in the frequency band identified by the frequency point of the audio signal of each frequency point in the audio signal sequence.

The number of the autocorrelation values obtained by calculation is consistent with the number of the frequency points divided by the audio signal sequence, and each frequency point corresponds to one autocorrelation value.

Illustratively, the case where the number of microphones is 2 will be described. Specifically, for the audio signal of each frequency point in each audio signal sequence, the autocorrelation value may be calculated by the following formula:

wherein the content of the first and second substances,

is Y₁(m, k) conjugation is carried out,

is Y₂(m, k) conjugation. Y is₁(m, k) and Y₂(m, k) is y₁(m) and y₂(m) a sequence of frequency domain audio signals after a discrete time fourier transform. Phi_y1y1(m, k) is y₁Autocorrelation value of (m), phi_y2y2(m, k) is y₂(m) autocorrelation value. m denotes microphone acquisitionUp to the m-th frame of the audio signal; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format.

And S330, respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point.

Wherein, the cross-correlation value is used for calculating the correlation degree of the two audio signals between each frequency point.

The number of the cross-correlation values obtained by calculation is consistent with the number of the frequency points divided by the audio sequence, and each frequency point corresponds to one cross-correlation value.

Continuing with the description of the above example, specifically, for each frequency point, the cross-correlation value between two audio signals may be calculated by the following formula:

wherein phi_y1y2(m, k) is y₁(m) and y₂(m) cross-correlation values between the two.

It should be noted that there is no sequential logical relationship between step S320 and step S330, and the autocorrelation value may be calculated first, and the cross-correlation value may also be calculated first.

And S340, calculating the cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point.

It is known that, when the autocorrelation value and the mutual value of the audio signal are calculated, the audio signal is an audio signal on the frequency domain.

The autocorrelation value reflects the degree of correlation between the audio signals of each frequency point of the audio signals, and the cross-correlation value reflects the degree of correlation between the audio signals of two audio signals at each frequency point. The autocorrelation values of the audio signal may be reflected in the case of sudden noise or low frequency noise floor of the two microphones. The cross-correlation value of the audio signals may reflect the fact that both microphones are subjected to wind noise at the same time. By comprehensively considering the autocorrelation value and the cross-correlation value, the occurrence of false detection which is possibly caused when sudden noise or the background low-frequency noise of the two microphones is inconsistent can be reduced. And when the two microphones are influenced by the similar wind noise at the same moment, the condition of missing detection occurs. Therefore, the accuracy and robustness of wind noise detection are improved.

And S350, calculating a wind noise detection value according to the cross-correlation data of each frequency point.

Because each frequency point in the audio signal sequence has a corresponding autocorrelation value and a cross-correlation value with other audio signal sequences, the wind noise detection value corresponding to each frequency point is calculated according to the cross-correlation value and the autocorrelation value obtained by calculation.

And S360, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

And S370, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

The method and the device comprehensively consider the autocorrelation value of each frequency point in each audio signal sequence, consider the cross-correlation value between every two audio signals of each frequency point, and comprehensively obtain the cross-correlation data according to the autocorrelation value and the cross-correlation value. By comprehensively considering the autocorrelation value and the cross-correlation value, the occurrence of false detection which is possibly caused when sudden noise or the background low-frequency noise of the two microphones is inconsistent can be reduced. And when the two microphones are influenced by the similar wind noise at the same moment, the condition of missing detection occurs. Therefore, the accuracy and robustness of wind noise detection are improved.

FIG. 4 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application; the present embodiment is an alternative proposed on the basis of the above-described embodiments. Specifically, before "cross-correlation data of at least two audio signals is calculated from the autocorrelation value and the cross-correlation value", an operation of "performing smoothing processing on the autocorrelation value and the cross-correlation value, respectively" is added.

Referring to fig. 4, the wind noise detection method provided in this embodiment includes:

and S410, acquiring audio signal sequences respectively acquired by at least two microphones.

And S420, calculating an autocorrelation value aiming at the audio signal of each frequency point in each audio signal sequence.

And S430, respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point.

And S440, respectively carrying out smoothing processing on the autocorrelation value and the cross-correlation value.

The smoothing processing is carried out on the autocorrelation value and the cross-correlation value, so that the condition that the calculated result fluctuates too fast along with time due to the fluctuation of frequency domain coherence can be effectively avoided.

In an alternative embodiment, the autocorrelation values or cross-correlation values are smoothed using the following formulas:

Φ_smooth(m,k)＝αΦ_smooth(m,k)+(1-α)Φ(m,k)

wherein, alpha is a smoothing constant, phi (m, k) is an autocorrelation value or a cross-correlation value before smoothing; phi_smooth(m, k) is the smoothed autocorrelation value or cross-correlation value; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format.

The smoothing constant α is an empirical value set by a person skilled in the relevant art according to actual conditions, and a specific numerical value of α is not limited herein and is determined according to actual conditions. For example, a value between 0.8 and 0.95 can be selected as the value of α. .

And S450, calculating the cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point.

In order to reduce the complexity of the subsequent flow operation of wind noise detection, the subsequent processing is convenient. In an optional embodiment, for each frequency point, performing normalization processing on a cross-correlation value according to the autocorrelation value and the cross-correlation value to obtain a cross-correlation coefficient, which is used as the cross-correlation data.

Wherein, the normalization process is to restrict the range of the cross-correlation value within 0-1.

In an alternative embodiment, the normalization process is performed according to the following formula to obtain the cross-correlation coefficient:

wherein, C_y1y2(m, k) is the mutual correlation coefficient; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences is two, y₁Representing a first sequence of audio signals, y₂Representing a second sequence of audio signals; phi_y1y2(m, k) is the cross-correlation value, Φ_y1y1(m, k) and Φ_y2y2(m, k) are the autocorrelation values, respectively. Of note is Φ_y1y2(m,k)、Φ_y1y1(m, k) and Φ_y2y2(m, k) may preferably be a value after smoothing treatment, Φ_{smooth_y1y2}(m,k)、Φ_{smooth_y1y1}(m, k) and Φ_{smooth_y2y2}(m,k)。

Then the process of the first step is carried out,

and S460, calculating a wind noise detection value according to the cross-correlation data of each frequency point.

And S470, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

And S480, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

According to the technical scheme of the embodiment of the application, the autocorrelation value of each frequency point in each audio signal sequence and the cross-correlation value between every two audio signals are obtained through calculation, and then the autocorrelation value and the cross-correlation value are smoothed respectively, so that the influence of the fluctuation of frequency domain coherence on a detection algorithm is reduced, and the accuracy and the reliability of wind noise detection are improved.

FIG. 5 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application; the present embodiment is an alternative proposed on the basis of the above-described embodiments. Specifically, the refinement of "calculating the wind noise detection value according to the cross-correlation data of each frequency point" is performed.

Referring to fig. 5, the wind noise detection method provided in this embodiment includes:

and S510, acquiring audio signal sequences respectively acquired by at least two microphones.

S520, calculating an autocorrelation value aiming at the audio signal of each frequency point in each audio signal sequence.

And S530, respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point.

And S540, calculating the cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point.

And S550, calculating the mean value of cross correlation coefficients in a set frequency band range according to the cross correlation data of each frequency point to obtain the wind noise detection value.

Because the cross-correlation coefficients of different frequency points may be greatly different from each other, the cross-correlation coefficients of individual frequency points may be much larger or much smaller than the cross-correlation coefficients of most frequency points, and these extreme values cannot reflect the concentration trend of audio signals. And the mean value of the cross correlation coefficients in the set frequency band range is calculated, so that the concentration trend of the cross correlation coefficients in the set frequency band range can be better reflected.

The set frequency band range is determined by the relevant technical personnel according to the actual situation, and is not limited herein, and is determined according to the actual situation. Since wind noise is mainly concentrated in the low frequency part of the audio signal, the set frequency band range is mainly taken as the low frequency range. Because the frequency points are the numbers of the frequency bands, the set frequency band range can be determined by selecting the frequency points.

In an alternative embodiment, the average value of the cross-correlation coefficients in the set frequency band is calculated according to the following formula to obtain the wind noise detection value:

wherein ind1 is a lower limit of the set band range, ind2 is of the set band rangeUpper limit, Γ (m) is the wind noise detection value; II |)₂Represents a two-norm; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences is two, y₁Representing a first sequence of audio signals, y₂Representing a second sequence of audio signals.

Wherein |₂Represents a two-norm, in particular C_y1y2The sum of squares of the real and imaginary parts of (m, k) is re-squared.

After the set band range is determined, ind1 and ind2 may be determined according to the set band range and the number of bins. For example, when the order of fourier transform is 128, since fourier transform equally divides the signal with an effective spectrum of 8K into 128/2 parts (sample theorem), that is, 64 parts, if the frequency band is set to be 0 to 2000Hz, the frequency point is mapped to 0 to 2000/8000 × 64, that is, 0 to 16, and then ind1 is 0 and ind2 is 16.

And S560, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

S570, determining a wind noise detection result according to the relation between the cross-correlation data and the detection area range.

According to the technical scheme of the embodiment of the application, after the autocorrelation value of each frequency point in each audio signal sequence and the cross-correlation value between every two audio signals are obtained through calculation, the cross-correlation data of at least two audio signals are calculated according to the autocorrelation value and the cross-correlation value, and then the mean value of the cross-correlation coefficient in the set frequency band range is calculated according to the cross-correlation data of each frequency point. According to the method and the device, the mean value of the cross correlation coefficient is calculated in the set frequency band range in which the wind noise is more likely to appear, the concentration trend of the cross correlation coefficient in the set frequency band range is better reflected, and the accuracy and the reliability of the wind noise detection are further improved.

FIG. 6 is a schematic diagram of yet another wind noise detection method according to an embodiment of the present application; the present embodiment is an alternative proposed on the basis of the above-described embodiments. Specifically, the refinement of "determining the wind noise detection result according to the relationship between the cross-correlation data and the detection area range" is performed.

S610, acquiring audio signal sequences respectively acquired by at least two microphones.

And S620, calculating an autocorrelation value aiming at the audio signal of each frequency point in each audio signal sequence.

And S630, respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point.

And S640, calculating the cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point.

S650, calculating the mean value of cross correlation coefficients in a set frequency band range according to the cross correlation data of each frequency point to obtain the wind noise detection value.

And S660, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value.

S670, intercepting the value falling into the detection area range from the cross-correlation data of each frequency point to update the cross-correlation data of each frequency point.

After the detection area range of the wind noise detection threshold is adjusted, determining update data for updating the cross-correlation data according to the relative position relationship between the cross-correlation data of each frequency point and the detection area range of the wind noise detection threshold, and updating the cross-correlation data of each frequency point by using the update data.

The detection area range is used for judging whether wind noise exists in the audio signal sequence. Generally speaking, if the cross-correlation data falls outside the detection region, a relatively clear determination result of whether wind noise exists in the audio signal sequence can be obtained according to the cross-correlation data, that is, whether wind noise exists or not in the audio signal sequence can be obtained. According to the embodiment of the application, the attention point is in the condition that the wind noise judgment result is uncertain, namely the condition that the cross-correlation data falls in the detection area range. And intercepting numerical values falling into the detection area range from the cross-correlation data of each frequency point, and updating the numerical values by using updating data.

In an alternative embodiment, the determination is made according to the following formulaWind noise detection result

Wherein, C_low(m),C_up(m) is a detection area lower limit value and a detection area upper limit value;

is the adjusted mutual coherence coefficient.

If the cross-correlation data falls into the left side of the detection area range, namely the cross-correlation data is smaller than the lower limit value of the detection area, determining that the updating data is 0; if the cross-correlation data falls into the right side of the detection area range, namely the cross-correlation data is larger than the upper limit value of the detection area, determining that the updating data is 1; if the cross-correlation data falls into the detection area range, namely the cross-correlation data is between the lower limit value of the detection area and the upper limit value of the detection area, determining the updated data as C_y1y2(m,k)-C_low(m)。

And S680, calculating the wind noise existence probability of the wind noise of each frequency point according to the updated proportional value of the cross-correlation data of each frequency point in the detection area range.

The updated ratio value of the cross-correlation data of each frequency point in the detection area range can be determined by the ratio of the cross-correlation data to the difference value of the lower limit value of the detection area and the difference value of the lower limit value of the detection area to the difference value of the upper limit value of the detection area.

Illustratively, the update data of the cross-correlation value falling within the detection region is determined as C_y1y2(m,k)-C_lowIn the case of (m), the proportional value can be calculated by the following formula

The wind noise existence probability of the wind noise at each frequency point can be determined by the following formula:

wherein the content of the first and second substances,

for the adjusted cross-correlation coefficient, in case the cross-correlation data is within the detection area,

therefore, it is

And S690, counting the wind noise existence probability of each frequency point in the wind noise detection frequency range.

The detection frequency range of the audio signal sequence comprises a plurality of frequency points, and the wind noise existence probability of each frequency point in the range is judged to have reference value on whether the audio signal sequence has wind noise. Therefore, the wind noise existence probability of each frequency point in the wind noise detection frequency range needs to be counted.

In an alternative embodiment, the statistics are performed according to the following formula:

where n (m) is the statistical result and I () is the indicator function. N (m) is a statistic in which the wind noise existence probability (determined by ind1 and ind 2) in a specific frequency range is 1 after the wind noise existence probability is binarized.

In an optional embodiment, before counting the wind noise existence probability of each frequency point in the wind noise detection frequency range, the method further includes: and carrying out binarization processing on the wind noise existence probability according to a wind noise existence probability threshold value.

The binarization processing is an operation of comparing the wind noise existence probability with a wind noise existence probability threshold value and setting the wind noise existence probability to be 0 or 1 according to a comparison result.

In an optional embodiment, the wind noise existence probability is binarized according to the following formula:

wherein, P_thresAnd e (m, k) is the wind noise existence probability after binarization processing. Wherein, P (H)₁And | Y (m, k)) represents the wind noise existence probability of the existence of wind noise at each frequency point.

Wherein, the threshold value of the wind noise existence probability is an empirical value determined by the related technicians according to the actual situation, and is not limited herein, and is determined according to the actual situation specifically, P_thresIs a constant. If wind noise exists probability P (H)₁Y (m, k)) is greater than P_thresSetting the existence probability of the wind noise to be 1; otherwise, the wind noise existence probability is set to 0.

And S695, determining the wind noise detection result according to the relation between the statistical result of the wind noise existence probability of each frequency point and the wind noise statistical threshold value.

And the statistical result of the wind noise existence probability of each frequency point is the number of the wind noise existence probability of each frequency point being 1. And comparing the statistical result of the wind noise existence probability of each frequency point with a wind noise statistical threshold value, and determining a wind noise detection result according to the comparison result.

In an alternative embodiment, the wind noise detection result is determined according to the following formula:

wherein S is_windnoise(m) is the wind noise detection result, N_thresAnd calculating a wind noise statistical threshold value corresponding to the wind noise detection frequency range.

If the quantity of the frequency points with the wind noise existence probability of 1 is larger than the wind noise statistical threshold value, the wind noise detection result is 1, namely the existence of wind noise in the audio signal sequence is indicated; if the number of the frequency points with the wind noise existence probability of 1 is less than or equal to the wind noise statistical threshold value, the wind noise detection result is 0, and the fact that wind noise does not exist in the audio signal sequence is indicated.

Fig. 6B is a schematic diagram of a preferred wind noise detection method according to an embodiment of the present application, and as shown in fig. 6B, a case where the number of microphones is 2 is described. In a preferred embodiment, the wind noise detection comprises the following procedures:

step 1, collecting time domain audio signal sequence y from microphone 1 and microphone 2₁And y₂Performing Discrete Fourier Transform (DFT) to obtain a frequency domain audio signal sequence Y₁(m, k) and Y₂(m,k)；

Step 2, respectively calculating Y₁(m, k) and Y₂Autocorrelation value of (m, k) and Y₁(m, k) and Y₂Cross correlation value between (m, k) to obtain autocorrelation value phi_y1y1(m, k) and Φ_y2y2(m, k) and cross-correlation value phi_y1y2(m,k)。

Step 3, respectively aligning the autocorrelation values phi_y1y1(m, k) and Φ_y2y2(m, k) or cross-correlation value phi_y1y2(m, k) smoothing to obtain a smoothing result phi_{smooth_y1y1}(m,k)、Φ_{smooth_y2y2}(m, k) and Φ_{smooth_y1y2}(m,k)。

Step 4, then through

The smoothing processing result is subjected to frequency domain coherence coefficient calculation to obtain a cross-correlation coefficient C_y1y2(m,k)

Step 5By the formula

And carrying out normalization processing on the smoothed cross-correlation value.

Step 6, according to the frequency domain coherence coefficient, passing through a formula

And calculating the average value of the cross correlation coefficients in the set frequency band range so as to obtain the wind noise detection value.

Step 7, adjusting the detection area range of the wind noise detection threshold value according to the wind noise existence possibility represented by the wind noise detection value;

wherein, beta and gamma are interval adjusting coefficients, 0<β<γ<1，η_low，η_upFor an initial lower limit value of the detection area and an initial upper limit value of the detection area, C_low(m),C_up(m) is a detection area lower limit value and a detection area upper limit value, Γ_thresA likelihood threshold is set.

And 8, intercepting the numerical value falling into the detection area range from the cross-correlation data of each frequency point so as to update the cross-correlation data of each frequency point.

Wherein the content of the first and second substances,

is the adjusted mutual coherence coefficient.

Step 9, calculating the wind noise existence probability of the wind noise of each frequency point according to the updated proportional value of the cross-correlation data of each frequency point in the detection area range

Wherein, P (H)₁| Y (m, k)) is the wind noise existence probability.

And step 10, carrying out binarization processing on the wind noise existence probability according to a wind noise existence probability threshold value.

Wherein epsilon (m, k) is the result of binarization processing of the wind noise existence probability. P_thresIs a threshold value for the probability of speech presence.

And step 11, counting the number of wind noise states at the current moment as 1 within the wind noise detection frequency range.

Where I () is an indicator function and n (m) is a statistic of the probability of 1 of wind noise being present in a particular frequency range.

Step 12, if the number of the wind noise existing probability of 1 at the current moment exceeds the wind noise statistical threshold value N_thresAnd if not, the wind noise does not exist in the current audio signal sequence.

According to the technical scheme, the detection area range of the wind noise detection threshold value is adjusted according to the wind noise existence possibility represented by the wind noise detection value, so that the probability that the wind noise detection result is wind noise-free is increased, and the false detection rate of wind noise detection is reduced. For example, if the wind noise check value is 0.6 and the probability threshold is set to 0.5, then it is shown that the current audio signal sequence is more prone to be wind-noise-free because 0.6 is greater than 0.5. If the detection area of the initial wind noise detection threshold value is [0.5,0.9], and both β and γ are 0.8, the detection area of the adjusted wind noise detection threshold value is [0.5 × 0.8,0.9 × 0.8], that is [0.4,0.72], and if the detection area of the wind noise detection threshold value is not adjusted, the wind noise existence probability calculation formula provided in step 9 is used to finally calculate that P is 1- (0.6-0.5)/(0.9-0.5) to 0.75, which indicates a probability of 0.75 wind noise, and if the adjustment coefficients β and γ are introduced, the wind noise probability is finally calculated that P is 1- (0.6-0.4)/(0.72-0.4) to 0.375. It can be seen that the two calculations are still far apart numerically.

FIG. 7 is a schematic view of a wind noise detection apparatus according to an embodiment of the present application; referring to fig. 7, an embodiment of the present application discloses a wind noise detection apparatus 700, where the apparatus 700 may include: the system comprises an audio signal sequence acquisition module 710, a wind noise detection value calculation module 720, a detection area range adjustment module 730 and a wind noise detection result determination module 740.

An audio signal sequence obtaining module 710, configured to obtain audio signal sequences respectively collected by at least two microphones;

a wind noise detection value calculation module 720, configured to calculate cross-correlation data of at least two audio signal sequences, and calculate a wind noise detection value according to the cross-correlation data;

a detection region range adjusting module 730, configured to adjust a detection region range of a wind noise detection threshold according to a wind noise existence possibility represented by the wind noise detection value;

and a wind noise detection result determining module 740, configured to determine a wind noise detection result according to a relationship between the cross-correlation data and the detection area range.

Optionally, the detection area range adjusting module is specifically configured to: and if the wind noise existence possibility represented by the wind noise detection value is smaller than a set possibility threshold value, adjusting the detection area range of the wind noise detection threshold value to increase the probability that the wind noise detection result is no wind noise.

Optionally, the detection area range adjusting module is specifically configured to: adjusting the detection area range of the wind noise detection threshold value according to the following formula:

wherein, C_low(m),C_up(m) is the lower limit of the detection area and the upper limit of the detection area, η_low,η_upSetting an initial lower limit value and an initial upper limit value of a detection area; beta is the adjustment coefficient of the lower limit value of the detection area, gamma is the adjustment coefficient of the upper limit value of the detection area, 0<β<γ<1，Г_thresIs the set likelihood threshold.

Optionally, the value range of the set possibility threshold is 0.4-0.5.

Optionally, the at least two microphones include a main microphone and an auxiliary microphone respectively arranged on one bluetooth headset, and the distance between the main microphone and the auxiliary microphone is 3-4 cm; the primary microphone is adjacent the user's mouth relative to the secondary microphone.

Optionally, the audio signal sequence obtaining module includes: the time domain format audio signal sequence acquisition sub-module is used for acquiring audio signal sequences in a time domain format, which are respectively acquired by at least two microphones, and each audio signal sequence comprises at least two frames of audio signals; and the audio signal sequence format conversion sub-module is used for converting the audio signal sequences in the time domain formats into the audio signal sequences in the frequency domain formats.

Optionally, the audio signal sequence format conversion sub-module includes: the audio signal sequence splicing unit is used for splicing the audio signal sequences in at least two frames of time domain formats by adopting an overlap preservation method based on a window function aiming at each audio signal sequence in the time domain format to form an audio signal in the time domain format; and the audio signal Fourier transform unit is used for carrying out Fourier transform on the audio signal in the time domain format to form audio signals of a plurality of sampling frequency points as an audio signal sequence in a frequency domain format.

Optionally, if the audio signal sequence is an audio signal sequence in a frequency domain format, the wind noise detection value calculation module includes: the autocorrelation value calculation submodule is used for calculating autocorrelation values aiming at the audio signals of each frequency point in each audio signal sequence; the cross-correlation value operator module is used for respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point; the cross-correlation data calculation sub-module is used for calculating cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point; and the wind noise detection value calculation submodule is used for calculating the wind noise detection value according to the cross-correlation data of each frequency point.

Optionally, the apparatus further includes: and the smoothing processing module is used for respectively smoothing the autocorrelation value and the cross-correlation value before cross-correlation data of at least two audio signals are calculated according to the autocorrelation value and the cross-correlation value.

Optionally, the smoothing module is specifically configured to: and respectively smoothing the autocorrelation value or the cross-correlation value by adopting the following formula:

Φ_smooth(m,k)＝αΦ_smooth(m,k)+(1-α)Φ(m,k)

Optionally, the cross-correlation data calculation sub-module includes: and the cross correlation coefficient determining unit is used for carrying out normalization processing on cross correlation values according to the autocorrelation values and the cross correlation values aiming at each frequency point so as to obtain the cross correlation coefficients which are used as the cross correlation data.

Optionally, the cross-correlation coefficient determining unit is specifically configured to: normalization is performed according to the following formula to obtain the cross-correlation coefficient:

wherein, C_y1y2(m, k) is the mutual correlation coefficient; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences is two, y1 denotes a first audio signal sequence, y2 denotes a second audio signal sequence; phi_y1y2(m, k) is the cross-correlation value, Φ_y1y1(m, k) and Φ_y2y2(m, k) are the autocorrelation values, respectively.

Optionally, the wind noise detection value calculation sub-module includes: and the wind noise detection value determining unit is used for calculating the mean value of cross correlation coefficients in a set frequency band range according to the cross correlation data of each frequency point so as to obtain the wind noise detection value.

Optionally, the wind noise detection value determining unit is specifically configured to: calculating the average value of the cross correlation coefficients in the set frequency band range according to the following formula to obtain the wind noise detection value:

wherein ind1 is a lower limit of the set frequency band range, ind2 is an upper limit of the set frequency band range, and Γ (m) is a wind noise detection value; II |)₂Represents a two-norm; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences being twoY1 denotes a first audio signal sequence and y2 denotes a second audio signal sequence.

Optionally, the wind noise detection result determining module includes: the cross-correlation data updating submodule is used for intercepting the numerical value falling into the range of the detection area from the cross-correlation data of each frequency point so as to update the cross-correlation data of each frequency point; the wind noise existence probability calculation submodule is used for calculating the wind noise existence probability of each frequency point according to the updated proportional value of the cross-correlation data of each frequency point in the detection area range; the wind noise existence probability counting submodule is used for counting the wind noise existence probability of each frequency point in the wind noise detection frequency range;

and the wind noise detection result determining submodule is used for determining the wind noise detection result according to the relation between the statistical result of the wind noise existence probability of each frequency point and the wind noise statistical threshold value.

Optionally, the cross-correlation data updating sub-module is specifically configured to: determining a wind noise detection result according to the following formula:

is the adjusted mutual coherence coefficient.

Optionally, the wind noise existence probability calculation submodule is specifically configured to: calculating the wind noise existence probability of the wind noise existing at each frequency point according to the following formula:

wherein, P (H)₁| Y (m, k)) is the wind noise existence probability.

Optionally, the apparatus further comprises: the wind noise existence probability binarization module is specifically used for: and carrying out binarization processing on the wind noise existence probability according to a wind noise existence probability threshold value before counting the wind noise existence probability of each frequency point in a wind noise detection frequency range.

Optionally, the wind noise existence probability binarization module is specifically configured to: and carrying out binarization processing on the wind noise existence probability according to the following formula:

wherein, P_thresAnd e (m, k) is the wind noise existence probability after binarization processing.

Optionally, the wind noise existence probability statistics submodule is specifically configured to: the statistics are carried out according to the following formula:

where n (m) is the statistical result and I () is the indicator function.

Optionally, the wind noise detection result determining submodule is specifically configured to: determining the wind noise detection result according to the following formula:

FIG. 8 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as the wind noise detection method. For example, in some embodiments, the wind noise detection method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into RAM 803 and executed by the computing unit 801, one or more steps of the wind noise detection method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the wind noise detection method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A wind noise detection method, comprising:

2. The method of claim 1, wherein adjusting the detection region range of the wind noise detection threshold value according to the wind noise existence probability characterized by the wind noise detection value comprises:

and if the wind noise existence possibility represented by the wind noise detection value is smaller than a set possibility threshold value, adjusting the detection area range of the wind noise detection threshold value to increase the probability that the wind noise detection result is no wind noise.

3. The method of claim 2, wherein adjusting the detection region range of the wind noise detection threshold value according to the wind noise existence probability characterized by the wind noise detection value comprises:

adjusting the detection area range of the wind noise detection threshold value according to the following formula:

4. The method of claim 3, wherein the set likelihood threshold ranges from 0.4 to 0.5.

5. The method of claim 1, wherein the at least two microphones comprise a primary microphone and a secondary microphone each disposed on a bluetooth headset, the primary microphone being 3-4 centimeters from the secondary microphone; the primary microphone is adjacent the user's mouth relative to the secondary microphone.

6. The method of claim 1, wherein obtaining the sequence of audio signals respectively acquired by at least two microphones comprises:

acquiring audio signal sequences in a time domain format, which are respectively acquired by at least two microphones, wherein each audio signal sequence comprises at least two frames of audio signals;

and converting the audio signal sequence in the time domain format into the audio signal sequence in the frequency domain format.

7. The method of claim 6, wherein converting each time-domain formatted audio signal sequence to a frequency-domain formatted audio signal sequence comprises:

splicing at least two frames of audio signal sequences in time domain formats by adopting an overlap preservation method based on a window function aiming at each audio signal sequence in time domain formats to form audio signals in time domain formats;

and carrying out Fourier transform on the audio signals in the time domain format to form audio signals of a plurality of sampling frequency points as an audio signal sequence in the frequency domain format.

8. The method of any of claims 1-7, wherein the audio signal sequence is a frequency domain formatted audio signal sequence, and calculating cross-correlation data for at least two audio signal sequences and calculating a wind noise detection value based on the cross-correlation data comprises:

calculating an autocorrelation value for the audio signal of each frequency point in each audio signal sequence;

respectively calculating a cross-correlation value between every two audio signals aiming at each frequency point;

for each frequency point, calculating cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value;

and calculating a wind noise detection value according to the cross-correlation data of each frequency point.

9. The method of claim 8, wherein calculating cross-correlation data for at least two audio signals based on the autocorrelation values and cross-correlation values further comprises:

and respectively carrying out smoothing treatment on the autocorrelation value and the cross-correlation value.

10. The method of claim 9, wherein smoothing the autocorrelation values and cross-correlation values, respectively, comprises:

and respectively smoothing the autocorrelation value or the cross-correlation value by adopting the following formula:

Φ_smooth(m,k)＝αΦ_smooth(m,k)+(1-α)Φ(m,k)

11. The method of claim 8, wherein calculating cross-correlation data for at least two audio signals from the autocorrelation values and cross-correlation values for each frequency bin comprises:

and for each frequency point, performing normalization processing on the cross-correlation value according to the autocorrelation value and the cross-correlation value to obtain a cross-correlation coefficient as the cross-correlation data.

12. The method of claim 11, wherein performing a normalization process on the cross-correlation value according to the auto-correlation value and the cross-correlation value to obtain a cross-correlation coefficient comprises:

normalization is performed according to the following formula to obtain the cross-correlation coefficient:

wherein, C_y1y2(m, k) is the mutual correlation coefficient; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences is two, y1 denotes the first audio signal sequenceColumn, y2 denotes a second audio signal sequence; phi_y1y2(m, k) is the cross-correlation value, Φ_y1y1(m, k) and Φ_y2y2(m, k) are the autocorrelation values, respectively.

13. The method of claim 11, wherein calculating a wind noise detection value from the cross-correlation data for each frequency bin comprises:

and calculating the mean value of cross correlation coefficients in a set frequency band range according to the cross correlation data of each frequency point to obtain the wind noise detection value.

14. The method of claim 13, wherein calculating a mean value of cross-correlation coefficients within a set frequency band according to the cross-correlation data of each frequency point to obtain the wind noise detection value comprises:

calculating the average value of the cross correlation coefficients in the set frequency band range according to the following formula to obtain the wind noise detection value:

wherein ind1 is a lower limit of the set frequency band range, ind2 is an upper limit of the set frequency band range, and Γ (m) is a wind noise detection value; II |)₂Represents a two-norm; m represents the mth frame of audio signal collected by the microphone; k represents the frequency point sequence number of the audio signal sequence in the frequency domain format; the number of audio signal sequences is two, y1 denotes the first audio signal sequence, and y2 denotes the second audio signal sequence.

15. The method of claim 13, wherein determining a wind noise detection result from the correlation data versus the detection region range comprises:

intercepting the numerical value falling into the detection area range from the cross-correlation data of each frequency point to update the cross-correlation data of each frequency point;

calculating the wind noise existence probability of the wind noise of each frequency point according to the updated proportional value of the cross-correlation data of each frequency point in the detection area range;

counting the wind noise existence probability of each frequency point in the wind noise detection frequency range;

and determining the wind noise detection result according to the relation between the statistical result of the wind noise existence probability of each frequency point and the wind noise statistical threshold value.

16. The method of claim 15, wherein intercepting the values falling within the detection region from the cross-correlation data of each frequency point to update the cross-correlation data of each frequency point comprises:

determining a wind noise detection result according to the following formula:

is the adjusted mutual coherence coefficient.

17. The method of claim 15, wherein calculating the wind noise existence probability of the wind noise existence of each frequency point according to the updated proportional value of the cross-correlation data of each frequency point in the detection area range comprises:

calculating the wind noise existence probability of the wind noise existing at each frequency point according to the following formula:

wherein, P (H)₁| Y (m, k)) is the wind noise existence probability.

18. The method of claim 15, wherein before counting the wind noise existence probability of each frequency point in the wind noise detection frequency range, the method further comprises:

and carrying out binarization processing on the wind noise existence probability according to a wind noise existence probability threshold value.

19. The method of claim 18, wherein the binarizing the wind noise existence probability according to a wind noise existence probability threshold value comprises:

and carrying out binarization processing on the wind noise existence probability according to the following formula:

20. The method of claim 19, wherein the counting the wind noise existence probability of each frequency point in the wind noise detection frequency range comprises:

the statistics are carried out according to the following formula:

where n (m) is the statistical result and I () is the indicator function.

21. The method according to claim 15, wherein determining the wind noise detection result according to the relationship between the statistical result of the wind noise existence probability of each frequency point and the wind noise statistical threshold value comprises:

determining the wind noise detection result according to the following formula:

22. A wind noise detection apparatus comprising:

23. The apparatus of claim 22, wherein the detection region range adjustment module is specifically configured to:

24. The apparatus of claim 22, wherein the audio signal sequence acquisition module comprises:

the time domain format audio signal sequence acquisition sub-module is used for acquiring audio signal sequences in a time domain format, which are respectively acquired by at least two microphones, and each audio signal sequence comprises at least two frames of audio signals;

and the audio signal sequence format conversion sub-module is used for converting the audio signal sequences in the time domain formats into the audio signal sequences in the frequency domain formats.

25. The apparatus according to any one of claims 22-24, wherein the audio signal sequence is a frequency domain audio signal sequence, and the wind noise detection value calculating module comprises:

the autocorrelation value calculation submodule is used for calculating autocorrelation values aiming at the audio signals of each frequency point in each audio signal sequence;

the cross-correlation value operator module is used for respectively calculating the cross-correlation value between every two audio signals aiming at each frequency point;

the cross-correlation data calculation sub-module is used for calculating cross-correlation data of at least two audio signals according to the autocorrelation value and the cross-correlation value aiming at each frequency point;

and the wind noise detection value calculation submodule is used for calculating the wind noise detection value according to the cross-correlation data of each frequency point.

26. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the wind noise detection method of any of claims 1-21.

27. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the wind noise detection method according to any one of claims 1-21.

28. A computer program product comprising a computer program which, when executed by a processor, implements a wind noise detection method according to any of claims 1-21.