CN114697786A

CN114697786A - Wind noise suppression mode determination method, device, terminal and storage medium

Info

Publication number: CN114697786A
Application number: CN202011582063.4A
Authority: CN
Inventors: 练添富
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-01

Abstract

The embodiment of the application provides a method, a device, a terminal and a storage medium for determining a wind noise suppression mode, and relates to the technical field of wind noise processing. The method comprises the following steps: collecting ambient noise through a feedforward microphone of the headset; carrying out feature extraction processing on the environmental noise to obtain an audio feature vector of the environmental noise; processing the audio characteristic vector of the environmental noise through a noise identification model to obtain an identification result, wherein the identification result is used for indicating whether the environmental noise contains wind noise; and determining a wind noise suppression mode of the earphone based on the identification result. The volume of the earphone is reduced.

Description

Wind noise suppression mode determination method, device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of wind noise processing, in particular to a method, a device, a terminal and a storage medium for determining a wind noise suppression mode.

Background

The wind blows the pickup surface of the microphone of the earphone, obvious wind noise can be generated under the condition of counter wind or strong airflow, and user experience is poor.

In the related art, the wind speed and the wind direction of a wind noise signal acting on a microphone are monitored by a wind direction and wind speed sensor, and a target wind noise prevention parameter matching the monitored wind speed and wind direction is acquired from at least one wind noise prevention parameter, each wind noise prevention parameter including at least one of: high-pass filtering cut-off frequency and signal gain; and then, wind noise prevention processing is carried out on the signals collected by the microphone according to the target wind noise prevention parameters.

Disclosure of Invention

The embodiment of the application provides a method and a device for determining a wind noise suppression mode, a terminal and a storage medium method and device, a terminal and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for determining a wind noise suppression manner, where the method includes:

collecting ambient noise through a feedforward microphone of the headset;

carrying out feature extraction processing on the environmental noise to obtain an audio feature vector of the environmental noise;

processing the audio characteristic vector of the environmental noise through a noise identification model to obtain an identification result, wherein the identification result is used for indicating whether the environmental noise contains wind noise;

and determining a wind noise suppression mode of the earphone based on the identification result.

On the other hand, an embodiment of the present application provides a wind noise suppression mode determination device, where the device includes:

the noise acquisition module is used for acquiring environmental noise through a feedforward microphone of the earphone;

the characteristic extraction module is used for carrying out characteristic extraction processing on the environmental noise to obtain an audio characteristic vector of the environmental noise;

the characteristic identification module is used for processing the audio characteristic vector of the environmental noise through a noise identification model to obtain an identification result, and the identification result is used for indicating whether the environmental noise contains wind noise;

and the mode determining module is used for determining the wind noise suppression mode of the earphone based on the identification result.

In another aspect, an embodiment of the present application provides a terminal, where the terminal includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the wind noise suppression mode determination method according to the above aspect.

In still another aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, the computer program being loaded and executed by a processor to implement the wind noise suppression manner determination method according to the above aspect.

In yet another aspect, embodiments of the present application provide a computer program product including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the wind noise suppression manner determination method according to the above aspect.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

whether wind noise exists is judged through the audio characteristic vector based on the environmental noise, and the wind noise suppression mode of the earphone is determined based on the recognition result.

Drawings

Fig. 1 is a schematic diagram of a headset provided by an embodiment of the present application;

fig. 2 is a block diagram of a hardware system of a headset according to an embodiment of the present application;

fig. 3 is a flowchart of a wind noise suppression mode determination method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a DNN model provided by one embodiment of the present application;

fig. 5 is a flowchart of a wind noise suppression mode determination method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of the frequency response of a low pass filter provided by one embodiment of the present application;

fig. 7 is a flowchart of a wind noise suppression mode determination method according to another embodiment of the present application;

fig. 8 is a block diagram of a wind noise suppression mode determination apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of a terminal according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The embodiment of the application can be applied to earphones. Illustratively, referring to fig. 1 and 2 in combination, the headset includes a speaker (also referred to as a speaker device) 101, a feedforward microphone 102 (also referred to as an external microphone), a feedback microphone 103 (also referred to as an internal microphone), an audio signal processing chip circuit (not shown in the figure), a memory circuit 104, a power supply circuit 105, and the like.

The speaker 101 plays the inverted noise and music signal. The feedforward microphone 102 is used to collect ambient noise, which is used for wind noise suppression processing. The feedback microphone 103 is used to collect internal sounds of the headset. The audio signal processing chip circuit is used for executing the wind noise suppression mode determination process. The memory circuit 104 is used to store algorithm codes, audio signal data, and the like. The power supply circuit 105 may supply power to other hardware components, and the power supply source may be a battery built in the headset.

It should be noted that the structural illustration of the above earphone is only an example, and in a possible implementation manner, the embodiment of the present application may also be applied to other earphones with structures different from that of fig. 1, and the embodiment of the present application does not limit this.

Exemplarily, taking the execution subject of the embodiment of the present application as an earphone as an example for explanation, as shown in fig. 2, the feedforward microphone 101 of the earphone is used to collect environmental noise, convert the collected environmental noise into a digital signal through the analog-to-digital converter 106, extract an audio feature vector, and input the audio feature vector into a trained noise recognition model to obtain a recognition result; if the wind noise exists, adjusting the microphone gain based on the intensity degree of the wind noise; if no wind noise exists, the microphone gain is not adjusted. The microphone gain is used for indicating the volume of the microphone, and the larger the microphone gain is, the higher the volume of the microphone is; the smaller the microphone gain, the lower the volume of the microphone.

For example, when the embodiment of the present application is executed by a headset, it may execute the wind noise suppression mode determination procedure by an audio signal processing chip circuit inside the headset.

For example, for convenience of description, the execution subject of the embodiment of the present application will be described as an example of a terminal, which may be an earphone, a mobile phone, a smart wearable device, or the like.

Please refer to fig. 3, which illustrates a flowchart of a wind noise suppression mode determining method according to an embodiment of the present application. The execution subject of the method may be a terminal. The method may include several steps as follows.

In step 301, ambient noise is collected via a feedforward microphone of the headset.

Illustratively, the feedforward microphone of the headset captures ambient noise in real time.

In a possible implementation manner, when the terminal is an earphone, after a feedforward microphone of the earphone collects environmental noise, the environmental noise is sent to an audio signal processing chip circuit in the earphone, and the audio signal processing chip circuit executes the following wind noise suppression mode determination process.

In a possible implementation manner, when the terminal is other electronic equipment than the headset, the terminal establishes a communication connection with the headset, and the terminal and the headset may communicate, for example, the terminal and the headset may communicate through a wired network, or the terminal and the headset may communicate through a wireless network. After the feedforward microphone of the earphone collects the environmental noise, the environmental noise is sent to the terminal, and the following wind noise suppression mode determining process is executed by the terminal. Illustratively, the ambient noise picked up by the feedforward microphone is in the form of audio data carrying the ambient noise picked up by the feedforward microphone. The headset transmits the audio data carrying the ambient noise to the terminal so that the terminal can perform the following wind noise suppression manner determination procedure.

For example, the headset in the embodiment of the present application may be a wired headset or a wireless headset; the earphone in the embodiment of the present application may be an ANC earphone or an ordinary earphone, which is not limited in the embodiment of the present application.

Step 302, performing feature extraction processing on the environmental noise to obtain an audio feature vector of the environmental noise.

After the terminal acquires the environmental noise, because the noise identification processing needs to be performed through the noise identification model, and the noise identification model cannot directly identify the environmental noise, the terminal needs to process the environmental noise in advance to obtain the digital features which can be identified by the noise identification model. Exemplarily, the terminal performs feature extraction processing on the environmental noise to obtain an audio feature vector of the environmental noise. Illustratively, the audio feature vector of the ambient noise may be an MFCC (Mel-scale Frequency Cepstral Coefficients, Mel-Frequency Cepstral Coefficients) vector of the ambient noise, which is a Cepstral parameter extracted in a Mel-scale Frequency domain describing a non-linear characteristic of human ear frequencies. For the processing flow of feature extraction for environmental noise, reference may be made to the following embodiments, which will not be described herein.

And 303, processing the audio characteristic vector of the environmental noise through the noise identification model to obtain an identification result.

In the embodiment of the present application, the identification result is used to indicate whether the ambient noise contains wind noise.

In a possible implementation, the noise identification model may employ a DNN (Deep Neural Networks) model. As shown in fig. 4, the DNN model consists of an input layer 410, three hidden layers 420 and an output layer 430. The input layer 410 is the top layer, with 8 neurons; the three hidden layers 420 include: a first hidden layer, a second hidden layer, and a third hidden layer, each having 9 neurons; the output layer 430 has 4 neurons.

In a possible implementation, the training process of the noise recognition model may include the following steps: 1. acquiring training data of a training noise identification model, wherein the training data comprises at least one training audio characteristic vector and a standard identification result of the training audio characteristic vector, and the standard identification result is a standard result of whether wind noise is contained in environmental noise corresponding to the training audio characteristic vector; 2. processing the training audio feature vector through a noise identification model to obtain a prediction identification result, wherein the prediction identification result is a result of predicting whether the environmental noise corresponding to the training audio feature vector contains wind noise; 3. and training the noise recognition model based on the prediction recognition result and the standard recognition result to obtain the trained noise recognition model. Exemplarily, a loss function of the noise recognition model is determined based on the predicted recognition result and the standard recognition result, and the noise recognition model is trained based on the loss function to obtain the trained noise recognition model. Illustratively, the at least one training audio feature vector corresponds to different degrees of the intensity of the wind noise contained in the environmental noise. And continuously training and updating the weights of each layer of neurons of the noise recognition model by training the audio characteristic vectors in a forward operation and a backward propagation manner, so as to obtain the noise recognition model for recognizing the wind noise.

And step 304, determining a wind noise suppression mode of the earphone based on the identification result.

In a possible implementation manner, when the identification result is used for indicating that the ambient noise does not include the wind noise, the earphone is determined not to be subjected to the wind noise suppression processing, and the wind noise suppression manner of the earphone is determined.

In a possible implementation manner, when the identification result is used for indicating that the ambient noise contains wind noise, the wind noise suppression manner of the earphone is determined based on the strength degree of the wind noise. For an explanation of how to determine the wind noise suppression mode of the earphone, reference may be made to the following embodiments, which are not described herein.

In a possible implementation manner, when the terminal is an earphone, the earphone may perform wind noise suppression processing on the environmental noise collected by the feedforward microphone based on the determined wind noise suppression manner.

In a possible implementation manner, when the terminal is other electronic equipment that is not an earphone, the terminal determines a wind noise suppression manner of the earphone and then sends the wind noise suppression manner to the earphone, so that the earphone can perform wind noise suppression processing on the environmental noise collected by the feedforward microphone based on the wind noise suppression manner.

In summary, in the technical scheme provided by the embodiment of the present application, it is determined whether wind noise exists or not by using the audio feature vector based on the environmental noise, and the wind noise suppression mode of the earphone is determined based on the recognition result.

In addition, the embodiment of the application detects whether wind noise exists and then determines the wind noise suppression mode, so that the feedforward wind noise suppression effect is not influenced under the condition of no wind noise.

In addition, the wind noise detection is realized through a DNN model algorithm, and the hardware cost is reduced. And the algorithm operand of the noise identification model is small, the power consumption is low, and the endurance time of the terminal can be prolonged.

In an exemplary embodiment, as shown in fig. 5, the terminal obtains the audio feature vector of the ambient noise by:

step 501, performing framing processing on audio data corresponding to the environmental noise to obtain an ith audio frame, where i is a positive integer.

The feedforward microphone collects the environmental noise in real time, the audio data of the feedforward microphone is not stable on the whole, but the part of the audio data can be regarded as stable data, and the noise identification model can only identify the stable data, so that the terminal firstly carries out framing processing on the environmental noise to obtain the ith audio frame.

In a possible implementation manner, before a terminal performs framing processing on audio data corresponding to environmental noise, firstly, analog-to-digital conversion processing needs to be performed on the audio data to obtain a digital signal; then, the audio data after the analog-to-digital conversion is subjected to pre-emphasis processing, and a high-pass filter is adopted in the pre-emphasis processing, wherein only signal components higher than a certain frequency are allowed to pass through, and signal components lower than the certain frequency are suppressed, so that unnecessary low-frequency interference such as human talk sound, footstep sound and mechanical noise in the audio data is removed, and the frequency spectrum of the audio signal is flattened. The mathematical expression for the high pass filter is:

H(z)＝1-az^-1；

where a is a correction coefficient, generally ranging from 0.95 to 0.97, and z is an audio signal.

And the terminal performs frame division processing on the pre-emphasized audio data to obtain the ith audio frame.

Step 502, performing windowing processing on the ith audio frame to obtain an ith audio frame after windowing processing.

Because the audio data after the framing processing needs to be subjected to fourier transform during subsequent feature extraction, and one frame of audio data has no obvious periodicity, namely the left end and the right end of the frame are discontinuous, errors can be generated between the audio data and original data after the fourier transform, and the more frames are, the larger the error is, the more frames are, the more the error is, the more the audio data after the framing is continuous, and each frame of audio data shows the features of a periodic function, windowing processing needs to be performed.

In a possible implementation, the audio frame is windowed using rectangular windows. Each frame of data is multiplied by a rectangular window function to increase the continuity of the left and right ends of the frame, resulting in audio data with significant periodicity. The functional form of the rectangular window w (n) is:

where N is a positive integer, N is the window length (window length for short), and N is the number of points in fourier transform.

Step 503, performing feature extraction on the ith windowed audio frame to obtain an ith audio feature vector of the environmental noise.

After the audio data is subjected to frame windowing, feature extraction is required to be performed, and audio feature vectors which can be identified by a noise identification model are obtained.

In a possible implementation, this step comprises several sub-steps as follows:

1. and carrying out short-time Fourier transform processing on the ith windowed audio frame to obtain an ith first signal.

Short-Time Fourier Transform (STFT) is a mathematical Transform related to the Fourier Transform to determine the frequency and phase of the local area sinusoid of a Time-varying signal. In a possible implementation, the expression for performing a short-time fourier transform on a certain audio frame is as follows:

where N is a discrete time, ω is a continuous frequency, ω is 2 pi k/N, k is 0, 1, … … N-1, N is the number of points of fourier transform, and x (i) is the ith audio frame.

2. And carrying out low-pass filtering processing on the ith first signal based on a preset low-pass cut-off frequency to obtain an ith second signal.

The preset low-pass cut-off frequency can be adjusted according to actual conditions, and illustratively, the ith first signal is subjected to low-pass filtering processing through a low-pass filter based on the preset low-pass cut-off frequency to obtain the ith second signal. The frequency response (frequency response for short) of the low-pass filter is shown in fig. 6, and the main energy of the wind noise is concentrated in the low frequency band, so that the energy of the wind noise is calculated mainly by considering the signals of the low frequency band.

3. And performing fast Fourier transform processing on the ith second signal to obtain an ith energy spectrum.

Since it is difficult to obtain the signal characteristics from the transformation of the audio signal in the time domain, the time domain signal is usually processed by transforming the energy distribution in the frequency domain, and different energy distributions can represent the characteristics of different voices. Therefore, the terminal performs fourier transform on the second signal, and then calculates an energy spectrum based on the fourier-transformed second signal. And performing fast Fourier transform on the second signal to obtain a frequency spectrum of the second signal, and performing modulo square on the frequency spectrum of the voice signal to obtain a power spectrum, namely an energy spectrum, of the voice signal.

4. And processing the ith energy spectrum through a Mel filter bank to obtain the ith logarithmic energy.

In order to convert the energy spectrum into a mel spectrum that conforms to the auditory sense of human ears, the energy spectrum needs to be filtered. Illustratively, the energy spectrum is processed by a mel-filter bank, and the frequency response of the mel-filter bank h (k) can be expressed as follows:

wherein the content of the first and second substances,

m represents the total number of filter banks, k represents the number of Fourier transform points, and M is typically 22-26.

Illustratively, each logarithmic energy e (m) may be determined by the following equation:

wherein | X (k) cells do not²Representing an energy spectrum.

5. And carrying out discrete cosine transform processing on the ith logarithmic energy to obtain an ith audio feature vector.

The terminal performs Discrete Cosine Transform (DCT) processing on the logarithmic energy, and an obtained DCT coefficient is an audio feature vector. Illustratively, the logarithmic energy may be subjected to discrete cosine transform processing by the following formula to obtain an audio feature vector f (n):

wherein, L order refers to MFCC order, and L is usually 12-16; m represents the total number of filter banks; n denotes the number of points of the fourier transform.

The audio feature vector in the embodiment of the present application is an MFCC vector, and in other possible implementations, the audio feature vector may also be determined in other ways, and the embodiment of the present application does not limit the determination way of the audio feature vector.

In an exemplary embodiment, after the terminal obtains the ith audio feature vector of the ambient noise, the terminal may determine a wind noise suppression manner of the headset based on the ith audio feature vector. Illustratively, as shown in fig. 7, the terminal may determine the wind noise suppression mode of the headset by:

step 701, processing the ith audio feature vector through a noise identification model to obtain an identification result of the ith audio feature vector, wherein the identification result of the ith audio feature vector is used for indicating whether wind noise exists in the ith audio frame.

In a possible implementation manner, each time the terminal acquires an audio feature vector, the audio feature vector is input into a noise identification model, and the noise identification model processes the audio feature vector to obtain an identification result of the audio feature vector. The recognition result of the audio feature vector may be used to indicate that wind noise exists in the audio frame corresponding to the audio feature vector, or the recognition result of the audio feature vector may be used to indicate that wind noise does not exist in the audio frame corresponding to the audio feature vector.

In step 702, the identification result is determined to indicate whether wind noise exists in the ith audio frame. If the recognition result is used to indicate that wind noise exists in the ith audio frame, performing step 703; if the recognition result indicates that no wind noise exists in the ith audio frame, step 704 is executed.

And 703, determining the wind noise suppression mode as the wind noise suppression mode by performing wind noise suppression processing on the earphone based on the ith energy spectrum.

When the recognition result of the audio feature vector is used for indicating that wind noise exists in the audio frame corresponding to the audio feature vector, it indicates that the earphone needs to perform wind noise suppression processing. In this case, the terminal may determine that the wind noise suppression processing is performed on the earphone based on the energy spectrum corresponding to the audio feature vector as a wind noise suppression method.

In a possible implementation, this step comprises several sub-steps as follows:

firstly, based on the ith energy spectrum, determining low-frequency energy corresponding to the ith audio frame.

In a possible implementation manner, the low-frequency energy eng (i) corresponding to the ith audio frame may be determined by the following formula:

wherein | X (i, k) & gtdoes not fume²Representing the ith energy spectrum and N the number of points of the fourier transform.

Secondly, in response to the fact that the low-frequency energy corresponding to the ith audio frame is larger than or equal to the wind noise energy threshold value, the method determines that the wind noise suppression mode is achieved by performing wind noise suppression processing on the earphone based on the low-frequency energy corresponding to the ith audio frame.

In a possible implementation manner, after obtaining the low-frequency energy corresponding to the ith audio frame, performing smoothing processing on the low-frequency energy to obtain the low-frequency energy after the smoothing processing. Illustratively, the low-frequency energy may be smoothed by the following formula, to obtain smoothed low-frequency energy Eng _ smooth (i):

Eng_smooth(i)＝α*Eng_6mooth(i-1)+(l-α)*Eng(i)；

wherein, 0 < alpha < l, Eng (i) is the low-frequency energy corresponding to the ith audio frame, and Eng _ smooth (i-1) is the low-frequency energy after smoothing corresponding to the (i-1) th audio frame.

It should be noted that when i is 1, smoothing is not needed, that is, the low-frequency energy corresponding to the 1 st audio frame does not need smoothing, or it can be said that the low-frequency energy after smoothing corresponding to the 1 st audio frame is itself.

Illustratively, the smoothed low frequency energy corresponding to the ith audio frame is compared to a wind noise energy threshold. And in response to the fact that the smoothed low-frequency energy corresponding to the ith audio frame is larger than or equal to the wind noise energy threshold value, determining that the earphone is subjected to wind noise suppression processing based on the smoothed low-frequency energy corresponding to the ith audio frame as a wind noise suppression mode.

Exemplarily, a target low-frequency energy range to which the low-frequency energy corresponding to the ith audio frame belongs is determined from the stored correspondence between at least one low-frequency energy range and the microphone gain; determining the microphone gain corresponding to the target low-frequency energy range in the corresponding relation as a target microphone gain; and determining the wind noise suppression mode as the wind noise suppression mode by performing the wind noise suppression processing on the earphone based on the target microphone gain.

For example, the terminal stores a corresponding relationship between at least one low-frequency energy range and a microphone gain, and the terminal may determine a target low-frequency energy range to which the terminal belongs based on the current low-frequency energy, then determine a microphone gain corresponding to the low-frequency energy range based on the low-frequency energy range, and determine the microphone gain as the microphone gain of the final wind noise suppression process.

And thirdly, in response to the fact that the low-frequency energy corresponding to the ith audio frame is smaller than a wind noise energy threshold value, determining that the earphone is not subjected to wind noise suppression processing as a wind noise suppression mode.

When the low-frequency energy corresponding to the ith audio frame is smaller than the wind noise energy threshold, the wind noise is not very large, and the earphone does not need to perform wind noise suppression processing on the wind noise.

Step 704, determining that the earphone is not subjected to the wind noise suppression processing as a wind noise suppression mode.

And when the identification result is used for indicating that the ith audio frame does not comprise the wind noise, the earphone does not need the wind noise suppression processing, so that the terminal determines that the earphone does not carry out the wind noise suppression processing as a wind noise suppression mode. When the earphone does not need wind noise suppression processing, the microphone gain of the earphone does not need to be changed, and the earphone can carry out normal active noise reduction.

In summary, in the technical solution provided in the embodiment of the present application, the microphone gain is adjusted based on the intensity of the wind noise, so as to perform wind noise suppression processing; and when the wind noise disappears or the wind noise energy is small, the optimal feedforward noise reduction effect is recovered.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 8, a block diagram of a wind noise suppression manner determining apparatus provided in an embodiment of the present application is shown, where the apparatus has a function of implementing the above method example, and the function may be implemented by hardware, or by hardware executing corresponding software. The apparatus 800 may include: a noise acquisition module 810, a feature extraction module 820, a feature identification module 830, and a manner determination module 840.

A noise collection module 810 for collecting ambient noise through a feedforward microphone of the headset;

a feature extraction module 820, configured to perform feature extraction processing on the environmental noise to obtain an audio feature vector of the environmental noise;

the feature identification module 830 is configured to process the audio feature vector of the environmental noise through a noise identification model to obtain an identification result, where the identification result is used to indicate whether the environmental noise includes wind noise;

a manner determining module 840, configured to determine a wind noise suppression manner of the earphone based on the identification result.

In an exemplary embodiment, the feature extraction module 820 includes: an audio framing unit, an audio windowing unit and a feature extraction unit (not shown in the figure).

The audio framing unit is used for framing the audio data corresponding to the environmental noise to obtain an ith audio frame, wherein i is a positive integer;

the audio windowing unit is used for windowing the ith audio frame to obtain an ith windowed audio frame;

and the characteristic extraction unit is used for extracting the characteristics of the audio frame subjected to the ith windowing processing to obtain the ith audio characteristic vector of the environmental noise.

In an exemplary embodiment, the feature extraction unit is configured to:

carrying out short-time Fourier transform processing on the ith windowed audio frame to obtain an ith first signal;

performing low-pass filtering processing on the ith first signal based on a preset low-pass cut-off frequency to obtain an ith second signal;

performing fast Fourier transform processing on the ith second signal to obtain an ith energy spectrum;

processing the ith energy spectrum through a Mel filter bank to obtain ith logarithmic energy;

and performing discrete cosine transform processing on the ith logarithmic energy to obtain the ith audio feature vector.

In an exemplary embodiment, the feature identification module 830 is configured to:

and processing the ith audio feature vector through the noise identification model to obtain an identification result of the ith audio feature vector, wherein the identification result of the ith audio feature vector is used for indicating whether wind noise exists in the ith audio frame.

In an exemplary embodiment, the manner determining module 840 is configured to:

in response to the identification result indicating that wind noise exists in the ith audio frame, determining that the earphone is subjected to wind noise suppression processing based on the ith energy spectrum as the wind noise suppression mode;

and in response to the identification result indicating that no wind noise exists in the ith audio frame, determining that the earphone is not subjected to wind noise suppression processing as the wind noise suppression mode.

In an exemplary embodiment, the manner determining module 840 includes: an energy determination unit and a mode determination unit (not shown in the figure).

An energy determining unit, configured to determine, based on the ith energy spectrum, low-frequency energy corresponding to the ith audio frame;

and the mode determining unit is used for determining that the earphone is subjected to wind noise suppression processing based on the low-frequency energy corresponding to the ith audio frame as the wind noise suppression mode in response to the fact that the low-frequency energy corresponding to the ith audio frame is greater than or equal to a wind noise energy threshold value.

In an exemplary embodiment, the manner determining unit is configured to:

determining a target low-frequency energy range to which the low-frequency energy corresponding to the ith audio frame belongs from the stored corresponding relation between at least one low-frequency energy range and the microphone gain;

determining the microphone gain corresponding to the target low-frequency energy range in the corresponding relation as a target microphone gain;

and determining the mode of wind noise suppression as the mode of wind noise suppression by performing wind noise suppression processing on the earphone based on the target microphone gain.

In an exemplary embodiment, the manner determining module 840 is further configured to:

and determining that the earphone is not subjected to wind noise suppression processing as the wind noise suppression mode in response to that the low-frequency energy corresponding to the ith audio frame is smaller than the wind noise energy threshold.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 9, a block diagram of a terminal according to an embodiment of the present application is shown.

The terminal in the embodiment of the present application may include one or more of the following components: a processor 910 and a memory 920.

Processor 910 may include one or more processing cores. The processor 910 connects various parts within the entire terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 920 and calling data stored in the memory 920. Alternatively, the processor 910 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 910 may integrate one or more of a Central Processing Unit (CPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, an application program and the like; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 910, but may be implemented by a single chip.

Optionally, the processor 910, when executing the program instructions in the memory 920, implements the methods provided by the various method embodiments described above.

The Memory 920 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 920 includes a non-transitory computer-readable medium. The memory 920 may be used to store instructions, programs, code sets, or instruction sets. The memory 920 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described above, and the like; the storage data area may store data created according to the use of the terminal, and the like.

The structure of the terminal described above is only illustrative, and in actual implementation, the terminal may include more or less components, such as: a display screen, etc., which are not limited in this embodiment.

Those skilled in the art will appreciate that the configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium is also provided, in which a computer program is stored, which is loaded and executed by a processor of a computer device to implement the individual steps in the above-described method embodiments.

In an exemplary embodiment, a computer program product is also provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the method provided by the above embodiment.

The above description is only exemplary of the application and should not be taken as limiting the application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the application should be included in the protection scope of the application.

Claims

1. A method for determining a wind noise suppression mode is characterized by comprising the following steps:

collecting ambient noise through a feedforward microphone of the headset;

2. The method according to claim 1, wherein the performing the feature extraction process on the environmental noise to obtain an audio feature vector of the environmental noise comprises:

performing framing processing on the audio data corresponding to the environmental noise to obtain an ith audio frame, wherein i is a positive integer;

windowing the ith audio frame to obtain an ith windowed audio frame;

and performing feature extraction on the ith windowed audio frame to obtain an ith audio feature vector of the environmental noise.

3. The method according to claim 2, wherein said performing feature extraction on the ith windowed audio frame to obtain an ith audio feature vector of the ambient noise comprises:

performing short-time Fourier transform processing on the ith windowed audio frame to obtain an ith first signal;

4. The method according to claim 3, wherein the processing the audio feature vector of the environmental noise through the noise identification model to obtain an identification result comprises:

5. The method of claim 4, wherein determining the manner in which the headset is wind noise suppressed based on the identification comprises:

6. The method of claim 5, wherein determining the noise suppression for the headset based on the ith energy spectrum as the noise suppression mode comprises:

determining low-frequency energy corresponding to the ith audio frame based on the ith energy spectrum;

and in response to the fact that the low-frequency energy corresponding to the ith audio frame is larger than or equal to a wind noise energy threshold value, determining that the wind noise suppression mode is the wind noise suppression mode by performing wind noise suppression processing on the earphone based on the low-frequency energy corresponding to the ith audio frame.

7. The method according to claim 6, wherein the determining that the earphone is subjected to the wind noise suppression processing based on the low-frequency energy corresponding to the ith audio frame as the wind noise suppression mode comprises:

8. The method according to claim 6, wherein after determining the low-frequency energy corresponding to the ith audio frame based on the ith energy spectrum, further comprising:

9. A wind noise suppression mode determination apparatus, characterized in that the apparatus comprises:

10. A terminal, characterized in that the terminal comprises a processor and a memory, the memory storing a computer program which is loaded and executed by the processor to implement the wind noise suppression manner determination method according to any one of claims 1 to 8.

11. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the wind noise suppression manner determination method according to any one of claims 1 to 8.