CN109427345B - Wind noise detection method, device and system - Google Patents
Wind noise detection method, device and system Download PDFInfo
- Publication number
- CN109427345B CN109427345B CN201710754716.4A CN201710754716A CN109427345B CN 109427345 B CN109427345 B CN 109427345B CN 201710754716 A CN201710754716 A CN 201710754716A CN 109427345 B CN109427345 B CN 109427345B
- Authority
- CN
- China
- Prior art keywords
- frequency
- wind noise
- frequency domain
- domain data
- audio data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 58
- 238000001228 spectrum Methods 0.000 claims abstract description 76
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000003595 spectral effect Effects 0.000 claims description 120
- 238000001914 filtration Methods 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 17
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims 1
- 230000001629 suppression Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The embodiment of the invention provides a wind noise detection method, a device and a system, wherein the method comprises the following steps: converting each frame of audio data into frequency domain data, and determining whether wind noise exists in the audio data according to the frequency spectrum centroid of the frequency domain data; therefore, on the first aspect, the wind noise detection can be performed by using one path of audio data, and two paths of audio data do not need to be acquired at the same time for comparison, so that the operation is convenient; in the second aspect, compared with a scheme that two acquisition devices are arranged to acquire two paths of audio data, the scheme only needs one acquisition device to acquire one path of audio data, and equipment cost is reduced.
Description
Technical Field
The present invention relates to the field of audio processing technologies, and in particular, to a method, an apparatus, and a system for detecting wind noise.
Background
During audio acquisition, wind noise is often an important factor affecting audio quality. In order to improve the audio quality, wind noise detection is usually required for the acquired audio data.
Wind noise detection generally requires collecting two paths of audio data at the same time, comparing the two paths of audio data, for example, comparing attribute values such as a peak relation, a signal-to-noise ratio, and the like of the two paths of audio data, and determining whether wind noise exists in the audio data according to a comparison result.
In the above scheme, two paths of audio data need to be acquired simultaneously, and in some scenes without this condition, for example, in a single-microphone scene, if two paths of audio data need to be acquired simultaneously, other acquisition devices need to be added, which is inconvenient to operate.
Disclosure of Invention
The embodiment of the invention aims to provide a wind noise detection method, a wind noise detection device and a wind noise detection system, so that wind noise detection is realized through one path of audio data.
In order to achieve the above object, an embodiment of the present invention provides a wind noise detection method, including:
for each frame of audio data, converting the frame of audio data into frequency domain data;
calculating the power spectral density of the frequency points in the frequency domain data;
calculating a spectral centroid of the frequency domain data from the power spectral density;
judging whether the frequency spectrum centroid is smaller than a first preset threshold value;
if so, determining that wind noise exists in the frame of audio data.
Optionally, after determining that wind noise exists in the frame of audio data, the method may further include:
judging whether the frequency spectrum centroid is smaller than a second preset threshold value, wherein the second preset threshold value is smaller than the first preset threshold value;
if so, the frame of audio data is zeroed.
Optionally, after determining that wind noise exists in the frame of audio data, the method may further include:
judging whether the frequency spectrum centroid is larger than or equal to a third preset threshold value, wherein the third preset threshold value is smaller than the first preset threshold value;
if yes, filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data;
and converting the filtered frequency domain data into time domain data.
Optionally, in the case that it is determined that the centroid of the frequency spectrum is not smaller than the first preset threshold, the method may further include:
the frame of audio data is output.
Optionally, the calculating the power spectral density of the frequency point in the frequency domain data may include:
sampling the frequency domain data to obtain a plurality of frequency points, and calculating the power spectrum density of the frequency points;
the calculating the spectral centroid of the frequency domain data by the power spectral density may include:
and calculating the frequency spectrum centroid of the frequency domain data according to the sampling rate and the power spectrum density.
Optionally, the calculating, by the power spectral density, a spectral centroid of the frequency-domain data includes:
calculating a spectral centroid of the frequency domain data using the following equation:
where SC denotes the spectral centroid, fs denotes the sampling rate, L denotes the total number of frequency bins,represents the power spectral density of the k-th frequency point,x (k) represents a frequency value of a k-th frequency point, and k is a positive integer not greater than L.
Optionally, in the case that it is determined that the spectrum centroid is smaller than a third preset threshold, the method may further include:
recording the power spectral density corresponding to the frequency spectrum centroid as the wind noise power spectral density;
the filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data includes:
determining the filtered frequency domain data using the following equation:
Y(k)=H(k)*X(k),
wherein Y (k) represents the filtered frequency domain data, H (k) represents the preset filter, X (k) represents a frequency value of a k-th frequency point,represents the power spectral density of the k-th frequency point, representing the most recently recorded wind noise power spectral density.
In order to achieve the above object, an embodiment of the present invention further provides a wind noise detection apparatus, including:
the first conversion module is used for converting each frame of audio data into frequency domain data;
a first calculating module, configured to calculate a power spectral density of a frequency point in the frequency domain data;
a second calculating module, configured to calculate a spectral centroid of the frequency domain data according to the power spectral density;
the first judgment module is used for judging whether the frequency spectrum centroid is smaller than a first preset threshold value;
and the determining module is used for determining that wind noise exists in the frame of audio data when the first judging module judges that the frame of audio data has the wind noise.
Optionally, the apparatus may further include:
the second judging module is used for judging whether the mass center of the frequency spectrum is smaller than a second preset threshold value when the first judging module judges that the result is yes, and the second preset threshold value is smaller than the first preset threshold value;
and the zero setting module is used for setting the audio data of the frame to zero when the judgment result of the second judgment module is yes.
Optionally, the apparatus may further include:
a third judging module, configured to, when the first judging module judges that the result is yes, judge whether the spectrum centroid is greater than or equal to a third preset threshold, where the third preset threshold is smaller than the first preset threshold;
the filtering module is used for filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data when the judgment result of the third judgment module is yes;
and the second conversion module is used for converting the filtered frequency domain data into time domain data.
Optionally, the apparatus may further include:
and the output module is used for outputting the frame of audio data when the judgment result of the first judgment module is negative.
Optionally, the first calculating module may be specifically configured to:
sampling the frequency domain data to obtain a plurality of frequency points, and calculating the power spectrum density of the frequency points;
the second calculating module may be specifically configured to:
and calculating the frequency spectrum centroid of the frequency domain data according to the sampling rate and the power spectrum density.
Optionally, the second calculating module may be specifically configured to:
calculating a spectral centroid of the frequency domain data using the following equation:
where SC denotes the spectral centroid, fs denotes the sampling rate, L denotes the total number of frequency bins,represents the power spectral density of the k-th frequency point,x (k) represents a frequency value of a k-th frequency point, and k is a positive integer not greater than L.
Optionally, the apparatus may further include:
the recording module is used for recording the power spectral density corresponding to the frequency spectrum centroid as the wind noise power spectral density when the judgment result of the third judgment module is negative;
the filtering module may be specifically configured to:
the filtered frequency domain data is determined using the following equation:
Y(k)=H(k)*X(k),
wherein Y (k) represents the filtered frequency domain data, H (k) represents the preset filter, X (k) represents a frequency value of a k-th frequency point,represents the power spectral density of the k-th frequency point, representing the most recently recorded power spectral density of the wind noise by the recording module.
To achieve the above object, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory, wherein,
a memory for storing a computer program;
and the processor is used for realizing any wind noise detection method when the program stored in the memory is executed.
In order to achieve the above object, an embodiment of the present invention further provides a wind noise detection system, including: an audio acquisition device and a wind noise detection device, wherein,
the audio acquisition equipment is used for acquiring audio data and sending the acquired audio data to the wind noise detection equipment;
the wind noise detection device is used for receiving the audio data sent by the audio acquisition device and converting each frame of audio data into frequency domain data; calculating the power spectral density of the frequency points in the frequency domain data; calculating a spectral centroid of the frequency domain data from the power spectral density; judging whether the frequency spectrum centroid is smaller than a first preset threshold value; if so, determining that wind noise exists in the frame of audio data.
By applying the embodiment of the invention, each frame of audio data is converted into frequency domain data, and whether wind noise exists in the audio data is determined according to the frequency spectrum centroid of the frequency domain data; therefore, in the scheme, the wind noise detection can be performed by utilizing one path of audio data, two paths of audio data do not need to be collected simultaneously for comparison, and the operation is convenient.
Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a wind noise detection method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a wind noise detection method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a wind noise detection apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a wind noise detection system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the technical problem, embodiments of the present invention provide a method, an apparatus, and a device for detecting wind noise. The method and the apparatus can be applied to devices with audio processing functions, and are not limited specifically.
First, a wind noise detection method provided by an embodiment of the present invention is described in detail below.
Fig. 1 is a first flowchart of a wind noise detection method according to an embodiment of the present invention, including:
s101: for each frame of audio data, the frame of audio data is converted into frequency domain data.
S102: a power spectral density is calculated for the frequency points in the frequency domain data.
S103: from the power spectral density, a spectral centroid of the frequency domain data is calculated.
S104: and judging whether the mass center of the frequency spectrum is smaller than a first preset threshold value, and if so, executing S105.
S105: it is determined that wind noise is present in the frame of audio data.
By applying the embodiment of the invention shown in FIG. 1, each frame of audio data is converted into frequency domain data, and whether wind noise exists in the audio data is determined according to the frequency spectrum centroid of the frequency domain data; therefore, on the first aspect, the wind noise detection can be performed by using one path of audio data, and two paths of audio data do not need to be acquired at the same time for comparison, so that the operation is convenient; in the second aspect, compared with a scheme that two acquisition devices are arranged to acquire two paths of audio data, the scheme only needs one acquisition device to acquire one path of audio data, and equipment cost is reduced.
The following is a detailed description of the embodiment shown in FIG. 1:
s101: for each frame of audio data, the frame of audio data is converted into frequency domain data.
The device (or the execution main body, hereinafter referred to as the present device) executing the present scheme may be an audio acquisition device, or may be another electronic device communicatively connected to the audio acquisition device. If the device is an audio acquisition device, the audio acquisition device can apply the scheme to detect wind noise according to each acquired frame of audio data. If the device is other electronic devices in communication connection with the audio acquisition device, the electronic device can acquire the audio data acquired by the audio acquisition device from the audio acquisition device, and the scheme is applied to each frame of audio data for wind noise detection.
In an alternative embodiment of the invention, the captured audio data may be framed, for example, as a frame every 20ms of audio data. The specific frame length can be set according to actual conditions.
A frame of audio data may be converted into frequency domain data using a time domain frequency domain conversion algorithm. For example, a Fourier transform algorithm, a Fast Fourier Transform (FFT) algorithm, or the like may be used, and the present invention is not particularly limited.
Assuming that one frame of audio data is X (t), it is converted into frequency domain data X by the FFT algorithm. The frequency domain data X includes frequency values of frequency points, for example, the frequency value of the kth frequency point is X (k).
S102: the power spectral density of the frequency points in the frequency domain data is calculated.
As an embodiment, the frequency domain data may be sampled to obtain a plurality of frequency points, and the power spectral densities of the plurality of frequency points may be calculated. Assuming that the frequency domain data X is sampled to obtain L frequency points, calculating the power spectral density of each of the L frequency points, wherein the power spectral density of the k-th frequency point is Wherein k is a positive integer not greater than L.
S103: from the power spectral density, a spectral centroid of the frequency domain data is calculated.
It is understood that Spectral Centroids (SC) represent average points of spectral energy distribution of a frame of audio data, which can reflect the center of the frequency distribution of the audio data.
In general, the formula for calculating the centroid of the spectrum can be:
wherein,represents the frequency range of the frequency domain data, F (ω) represents the frequency value, and E represents the energy of the frequency domain data.
Equation a may calculate the spectral centroid of a continuous segment of frequency domain data, or equation B may also be used to calculate the spectral centroid of the frequency domain data:
wherein fs denotes the sampling rate, L denotes the total number of frequency bins,represents the power spectral density of the k-th frequency point,x (k) represents a frequency value of a k-th frequency point, and k is a positive integer not greater than L.
A continuous segment of frequency domain data is sampled, assuming that the sampling rate is fs, and ω = fs × k/L is substituted into the above equation a to obtain equation B. It will be appreciated that the frequency domain data has conjugate symmetry and therefore only half the frequency points may be considered, that is k may take 1,l/2.
S104: and judging whether the spectrum centroid is smaller than a first preset threshold value, and if so, executing S105.
S105: it is determined that wind noise is present in the frame of audio data.
Because the frequency spectrum centroid of wind noise is less, and the frequency spectrum centroid of pronunciation is great, consequently:
in the first case, if the spectrum centroid calculated in S103 is very small and close to the spectrum centroid of pure wind noise, it may be considered that a large wind noise exists in the audio data;
in the second case, if the spectrum centroid calculated in S103 is very large and is close to the spectrum centroid of pure speech, it may be considered that wind noise existing in the audio data is negligible or considered that wind noise does not exist;
in the third case, if the centroid of the spectrum calculated in S103 is between the first two cases, it can be considered that the audio data is mixed with voice and wind noise.
On this basis, a threshold (a first preset threshold) may be set, and if the spectral centroid calculated in S103 is smaller than the first preset threshold, that is, the spectral centroid is smaller, which may be considered as the first case and the third case, it is determined that wind noise exists in the frame of audio data.
As an embodiment, if the centroid of the spectrum calculated in S103 is greater than or equal to the first preset threshold, the "frame of audio data" in S101 may be directly output. It is understood that if the calculated centroid of the spectrum in S103 is greater than the first preset threshold, the second case is considered, in which case, the quality of the frame of audio data is better, and the frame of audio data can be directly output.
By the scheme, the wind noise detection is completed. In one embodiment, the wind noise suppression is further performed after the wind noise detection is completed.
As an embodiment, after S105, it may be further determined whether the spectrum centroid is smaller than a second preset threshold; if so, the frame of audio data is zeroed.
According to the description of the above three cases, if the calculated spectral centroid is smaller in S103, it may be the first case or the third case, and further, a threshold (a second preset threshold) smaller than the first preset threshold may be set, and if the calculated spectral centroid is smaller than the second preset threshold in S103, it is considered that the first case is the first case, and the audio data has a larger wind noise. In this case, the frame of audio data may be zeroed out. It will be appreciated that the audio data of the frame is output as zeroed out and the output is silence data.
In this embodiment, if the wind noise in a certain frame of audio data is too large, the mute data is output, so that the user is prevented from hearing meaningless wind noise, and the experience is better.
As an embodiment, after S105, it may further be determined whether the spectrum centroid is greater than or equal to a third preset threshold, where the third preset threshold is smaller than the first preset threshold;
if yes, filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data;
and converting the filtered frequency domain data into time domain data.
It is understood that, in the case where the centroid of the spectrum calculated in S103 is smaller than the first preset threshold and is greater than or equal to the third preset threshold, the present embodiment is executed, and therefore, the third preset threshold is smaller than the first preset threshold.
The third preset threshold may be equal to the second preset threshold, or may be greater than the second preset threshold, for example, in the embodiment shown in fig. 2, the third preset threshold is equal to the second preset threshold, and the condition that the third preset threshold is greater than or equal to the second preset threshold is treated as the condition that the third preset threshold is greater than or equal to the third preset threshold.
In the third case, the present embodiment may be considered that the audio data is mixed data of wind noise and voice, and in this case, the wind noise suppression is realized by filtering the audio data.
In an alternative filtering manner, if a certain frame of audio data belongs to the first case, that is, if the spectral centroid of the certain frame of audio data is smaller than a third preset threshold, the power spectral density of the certain frame of audio data is recorded as the wind noise power spectral density. In this way, the recorded wind noise power spectral density can be used to filter the audio data.
Selecting a plurality of frequency points from continuous frequency domain data, using k to represent the serial number of the selected frequency points, and then determining the filtered frequency domain data by using the following formula:
Y(k)=H(k)*X(k),
wherein Y (k) represents the filtered frequency domain data, H (k) represents the preset filter, X (k) represents a frequency value of a k-th frequency point,represents the power spectral density of the k-th frequency point, representing the newly recorded wind noise power spectral density, k being a positive integer.
In the filtering method, only the latest wind noise power spectral density may be recorded, that is, whenever it is determined that the spectral centroid corresponding to a certain frame of audio data is smaller than a third preset threshold, the power spectral density of the audio data is recorded as the wind noise power spectral density, and the previously recorded wind noise power spectral density is deleted; thus, when the above formula is used to filter a frame of audio data, the formulaI.e. the recorded wind noise power spectral density. Alternatively, previously recorded wind noise power spectral density may not be deleted; in this way, when a frame of audio data is filtered by the above formula, it is necessary to determine the latest recorded wind noise power spectral density among the recorded wind noise power spectral densities as the wind noise power spectral density of the formula
The setting process of the preset filter is as follows:
assume that frequency domain data X = s + n of a certain frame of audio data, where s denotes a speech portion in the frequency domain data and n denotes a wind noise portion in the frequency domain data.
Accordingly, the power spectral density of the frequency domain data XWherein,the power spectral density of the speech portion is represented,representing the most recently recorded wind noise power spectral density.
Setting the filter toH (k) ranges from [0, 1%]When the wind noise is larger, the smaller H (k) and the smaller Y (k) indicate the larger suppression effect on the audio data, and when the wind noise is smaller, the larger H (k) and the larger Y (k) indicate the smaller suppression effect on the audio data.
Then, IFFT (Inverse Fast Fourier Transform) is performed on Y (k), so that the filtered frequency domain data is converted into time domain data, and the time domain data can be output, that is, the time domain data after wind noise suppression is output.
In the embodiment, part of the audio data with wind noise (the frequency spectrum centroid is greater than or equal to the third preset threshold and is smaller than the first preset threshold) is filtered, and compared with a scheme of discarding all the audio data with wind noise, the embodiment can retain more effective voice data.
If wind noise detection is performed by using two paths of audio data, in some scenes without the condition of simultaneously acquiring the two paths of audio data, for example, in a scene with a single microphone, other acquisition equipment needs to be additionally arranged, so that the operation is inconvenient, and the equipment cost is increased. By applying the embodiment of the invention, the wind noise detection can be carried out by utilizing one path of audio data, the operation is convenient, and the equipment cost is reduced.
Fig. 2 is a schematic flow chart of a wind noise detection method according to an embodiment of the present invention, including:
s201: for each frame of audio data, the frame of audio data is converted into frequency domain data.
S202: the power spectral density of the frequency points in the frequency domain data is calculated.
S203: from the power spectral density, a spectral centroid of the frequency domain data is calculated.
S204: and judging whether the spectrum centroid is smaller than a first preset threshold value, if so, executing S205, and if not, executing S206.
S205: it is determined that wind noise is present in the frame of audio data.
S206: the frame of audio data is output.
S207: and judging whether the spectrum centroid is smaller than a second preset threshold value, if so, executing S208, and if not, executing S209. The second preset threshold is smaller than the first preset threshold.
S208: the frame of audio data is zeroed, and the zeroed audio data is output.
S209: and filtering the frequency domain data by using a preset filter to obtain the filtered frequency domain data.
S210: and converting the filtered frequency domain data into time domain data, and outputting the converted time domain data.
The embodiment of fig. 2 of the present invention provides a complete wind noise detection and wind noise suppression method. In the embodiment of fig. 2, the third preset threshold is not set, but the case of being greater than or equal to the second preset threshold is treated as the case of being greater than or equal to the third preset threshold.
According to the content of the embodiment of fig. 1, since the spectral centroid of wind noise is small and the spectral centroid of voice is large, the audio data is divided into three cases according to the spectral centroids: 1. there is a large wind noise, or pure wind noise; 2. wind noise is ignored or absent (pure voice); 3. mixed data of wind noise and voice.
In the embodiment of fig. 2, different processing schemes are respectively adopted for the three cases: in the first case (the center of mass of the frequency spectrum is smaller than the second preset threshold), the audio quality is poor, and the audio data is set to zero and output, that is, mute data is output; in the second case (the mass center of the frequency spectrum is greater than or equal to the first preset threshold), the audio quality is good, and the audio data is directly output; in the third case (the frequency spectrum centroid is greater than or equal to the second preset threshold and smaller than the first preset threshold), the audio quality is between the first case and the second case, and the frequency domain data of the audio data is filtered and then converted into audio data to be output.
In the embodiment of fig. 2, for an audio data frame with no detected wind noise (an audio data frame with a spectrum centroid greater than or equal to a first preset threshold), directly outputting the audio data frame; in some schemes, no matter whether wind noise exists in the audio data frame or not, wind noise suppression is carried out on the whole audio data; compared to these schemes, the present embodiment reduces unnecessary suppression operations. This effect is more obvious in some scenes with small wind noise or intermittent wind noise.
In the embodiment of fig. 2, the audio data frames with wind noise partially (audio data frames with the spectral centroid greater than or equal to a second preset threshold and smaller than a first preset threshold) are filtered; in some schemes, all the audio data for detecting the wind noise are discarded; compared to these schemes, the present embodiment can retain more effective voice data.
By applying the embodiment shown in fig. 2 of the invention, on the basis of wind noise detection, different wind noise suppression schemes can be adopted for different wind noise conditions, and the output audio data quality is higher.
Corresponding to the method embodiment, the embodiment of the invention also provides a wind noise detection device.
Fig. 3 is a schematic structural diagram of a wind noise detection apparatus according to an embodiment of the present invention, including:
a first conversion module 301, configured to, for each frame of audio data, convert the frame of audio data into frequency domain data;
a first calculating module 302, configured to calculate a power spectral density of a frequency point in the frequency domain data;
a second calculating module 303, configured to calculate a spectral centroid of the frequency domain data according to the power spectral density;
a first determining module 304, configured to determine whether the spectrum centroid is smaller than a first preset threshold;
a determining module 305, configured to determine that wind noise exists in the frame of audio data when the first determining module 304 determines that the frame of audio data is yes.
As an embodiment, the apparatus may further include: a second decision block and a zero setting block (not shown), wherein,
the second judging module is used for judging whether the mass center of the frequency spectrum is smaller than a second preset threshold value when the first judging module judges that the result is yes, and the second preset threshold value is smaller than the first preset threshold value;
and the zero setting module is used for setting the audio data of the frame to zero when the judgment result of the second judgment module is yes.
As an embodiment, the apparatus may further include: a third judging module, a filtering module and a second converting module (not shown in the figure), wherein,
a third judging module, configured to, when the first judging module judges that the result is yes, judge whether the spectrum centroid is greater than or equal to a third preset threshold, where the third preset threshold is smaller than the first preset threshold;
the filtering module is used for filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data when the judgment result of the third judgment module is yes;
and the second conversion module is used for converting the filtered frequency domain data into time domain data.
As an embodiment, the apparatus may further include:
an output module (not shown in the figure) for outputting the frame of audio data when the first determining module 304 determines that the frame of audio data is negative.
As an embodiment, the first calculating module 302 may specifically be configured to:
sampling the frequency domain data to obtain a plurality of frequency points, and calculating the power spectrum density of the frequency points;
the second calculating module 303 may specifically be configured to:
and calculating the frequency spectrum centroid of the frequency domain data according to the sampling rate and the power spectrum density.
As an embodiment, the second calculating module 303 may be specifically configured to:
calculating a spectral centroid of the frequency domain data using the following equation:
where SC denotes the spectral centroid, fs denotes the sampling rate, L denotes the total number of frequency bins,represents the power spectral density of the k-th frequency point,x (k) represents a frequency value of a k-th frequency point, and k is a positive integer not greater than L.
As an embodiment, the apparatus may further include:
a recording module (not shown in the figure), configured to record, as a wind noise power spectral density, a power spectral density corresponding to the spectral centroid when the third determining module determines that the determination result is negative;
the filtering module selects a plurality of frequency points from the continuous segment of frequency domain data, and k represents the serial number of the selected frequency points, and can be used for:
determining the filtered frequency domain data using the following equation:
Y(k)=H(k)*X(k),
wherein Y (k) representsThe filtered frequency domain data, H (k) represents the preset filter, X (k) represents the frequency value of the k-th frequency point,represents the power spectral density of the k-th frequency point, representing the power spectral density of the wind noise newly recorded by the recording module, and k is a positive integer.
By applying the embodiment of the invention shown in fig. 3, each frame of audio data is converted into frequency domain data, and whether wind noise exists in the audio data is determined according to the frequency spectrum centroid of the frequency domain data; therefore, on the first aspect, the wind noise detection can be performed by using one path of audio data, and two paths of audio data do not need to be acquired at the same time for comparison, so that the operation is convenient; in the second aspect, compared with a scheme that two acquisition devices are arranged to acquire two paths of audio data, the scheme only needs one acquisition device to acquire one path of audio data, and equipment cost is reduced.
In correspondence with the above method embodiments, the embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 401 and a memory 402, wherein,
a memory 402 for storing a computer program;
the processor 401 is configured to implement any of the above-described wind noise detection methods when executing the program stored in the memory 402.
The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the wind noise detection methods described above.
An embodiment of the present invention further provides a wind noise detection system, as shown in fig. 5, including: an audio capture device and a wind noise detection device, wherein,
the audio acquisition equipment is used for acquiring audio data and sending the acquired audio data to the wind noise detection equipment;
the wind noise detection device is used for receiving the audio data sent by the audio acquisition device and converting the frame of audio data into frequency domain data aiming at each frame of audio data; calculating the power spectral density of the frequency points in the frequency domain data; calculating a spectral centroid of the frequency domain data from the power spectral density; judging whether the frequency spectrum centroid is smaller than a first preset threshold value; if so, determining that wind noise exists in the frame of audio data.
As an embodiment, the number of the audio capturing devices may be one, that is, the system may include only one audio capturing device.
In this embodiment, the audio acquisition device is configured to acquire one path of audio data and send the acquired audio data to the wind noise detection device;
the wind noise detection device is configured to receive the channel of audio data, and convert each frame of audio data into frequency domain data; calculating the power spectral density of the frequency points in the frequency domain data; calculating a spectral centroid of the frequency domain data from the power spectral density; judging whether the frequency spectrum centroid is smaller than a first preset threshold value; if so, determining that wind noise exists in the frame of audio data.
As an embodiment, the audio acquisition device and the wind noise detection device may be integrally provided, and both may be the same device.
By applying the embodiment of the invention, the wind noise detection equipment converts each frame of received audio data into frequency domain data, and determines whether wind noise exists in the audio data according to the frequency spectrum centroid of the frequency domain data; therefore, on the first aspect, the wind noise detection can be performed by using one path of audio data, and two paths of audio data do not need to be acquired at the same time for comparison, so that the operation is convenient; compared with the scheme of setting two acquisition devices to acquire two paths of audio data, the system only needs one acquisition device to acquire one path of audio data, and equipment cost is reduced.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the wind noise detection apparatus shown in fig. 3, the embodiment of the electronic device shown in fig. 4, and the embodiment of the wind noise detection system shown in fig. 5, since they are substantially similar to the embodiment of the wind noise detection method shown in fig. 1-2, the description is relatively simple, and relevant points can be referred to the partial description of the embodiment of the wind noise detection method shown in fig. 1-2.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (16)
1. A method of wind noise detection, comprising:
for each frame of audio data, converting the frame of audio data into frequency domain data;
calculating the power spectral density of the frequency points in the frequency domain data;
calculating a spectral centroid of the frequency domain data from the power spectral density;
judging whether the frequency spectrum centroid is smaller than a first preset threshold value;
if yes, determining that wind noise exists in the frame of audio data;
after the determining that the wind noise exists in the frame of audio data, the method further includes:
judging whether the mass center of the frequency spectrum is greater than or equal to a third preset threshold value, wherein the third preset threshold value is smaller than the first preset threshold value;
under the condition that the frequency spectrum centroid is judged to be smaller than the third preset threshold value, recording the power spectrum density corresponding to the frequency spectrum centroid as the wind noise power spectrum density; the wind noise power spectral density is used to filter the frame of audio data.
2. The method of claim 1, further comprising, after the determining that wind noise is present in the frame of audio data:
judging whether the frequency spectrum centroid is smaller than a second preset threshold value, wherein the second preset threshold value is smaller than the first preset threshold value;
if yes, setting the audio data of the frame to zero;
and the third preset threshold is greater than or equal to the second preset threshold.
3. The method of claim 1, further comprising:
under the condition that the frequency spectrum centroid is larger than or equal to the third preset threshold, filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data;
and converting the filtered frequency domain data into time domain data.
4. The method of claim 1, wherein in case it is determined that the spectral centroid is not less than a first preset threshold, the method further comprises:
the frame of audio data is output.
5. The method of claim 1, wherein the calculating the power spectral density of the frequency points in the frequency domain data comprises:
sampling the frequency domain data to obtain a plurality of frequency points, and calculating the power spectral densities of the frequency points;
calculating a spectral centroid of the frequency domain data by the power spectral density, comprising:
and calculating the frequency spectrum centroid of the frequency domain data according to the sampling rate and the power spectrum density.
6. The method of claim 1, wherein said calculating a spectral centroid of said frequency domain data from said power spectral density comprises:
calculating a spectral centroid of the frequency domain data using the following equation:
7. The method of claim 3, wherein the filtering the frequency domain data with a predetermined filter to obtain filtered frequency domain data comprises:
the filtered frequency domain data is determined using the following equation:
Y(k)=H(k)*X(k),
wherein Y (k) represents the filtered frequency domain data, H (k) represents the preset filter, X (k) represents a frequency value of a k-th frequency point,represents the power spectral density of the k-th frequency point, representing the newly recorded wind noise power spectral density, k being a positive integer.
8. A wind noise detection device, comprising:
the first conversion module is used for converting each frame of audio data into frequency domain data;
a first calculating module, configured to calculate a power spectral density of a frequency point in the frequency domain data;
a second calculating module, configured to calculate a spectral centroid of the frequency domain data according to the power spectral density;
the first judgment module is used for judging whether the mass center of the frequency spectrum is smaller than a first preset threshold value or not;
the determining module is used for determining that wind noise exists in the frame of audio data when the first judging module judges that the frame of audio data exists;
the device further comprises:
a third judging module, configured to, when the first judging module judges that the result is yes, judge whether the spectrum centroid is greater than or equal to a third preset threshold, where the third preset threshold is smaller than the first preset threshold;
the recording module is used for recording the power spectral density corresponding to the frequency spectrum centroid as the wind noise power spectral density when the judgment result of the third judgment module is negative; the wind noise power spectral density is used for filtering the frame of audio data.
9. The apparatus of claim 8, further comprising:
the second judging module is used for judging whether the frequency spectrum centroid is smaller than a second preset threshold value when the first judging module judges that the frequency spectrum centroid is smaller than the first preset threshold value;
the zero setting module is used for setting the audio data of the frame to zero when the judgment result of the second judgment module is yes;
and the third preset threshold is greater than or equal to the second preset threshold.
10. The apparatus of claim 8, further comprising:
the filtering module is used for filtering the frequency domain data by using a preset filter to obtain filtered frequency domain data when the judgment result of the third judgment module is yes;
and the second conversion module is used for converting the filtered frequency domain data into time domain data.
11. The apparatus of claim 8, further comprising:
and the output module is used for outputting the frame of audio data when the judgment result of the first judgment module is negative.
12. The apparatus of claim 8, wherein the first computing module is specifically configured to:
sampling the frequency domain data to obtain a plurality of frequency points, and calculating the power spectrum density of the frequency points;
the second calculation module is specifically configured to:
and calculating the frequency spectrum centroid of the frequency domain data according to the sampling rate and the power spectrum density.
13. The apparatus of claim 8, wherein the second computing module is specifically configured to:
calculating a spectral centroid of the frequency domain data using the following equation:
14. The apparatus of claim 10, wherein the filtering module is specifically configured to:
the filtered frequency domain data is determined using the following equation:
Y(k)=H(k)*X(k),
wherein Y (k) represents the filtered frequency domain data, H (k) represents the preset filter, X (k) represents a frequency value of a k-th frequency point,represents the power spectral density of the k-th frequency point, representing the power spectral density of the wind noise newly recorded by the recording module, and k is a positive integer.
15. An electronic device comprising a processor and a memory, wherein,
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.
16. A wind noise detection system, comprising: an audio acquisition device and a wind noise detection device, wherein,
the audio acquisition equipment is used for acquiring audio data and sending the acquired audio data to the wind noise detection equipment;
the wind noise detection device is used for receiving the audio data sent by the audio acquisition device and converting each frame of audio data into frequency domain data; calculating the power spectral density of the frequency points in the frequency domain data; calculating a spectral centroid of the frequency domain data from the power spectral density; judging whether the mass center of the frequency spectrum is smaller than a first preset threshold value or not; if yes, determining that wind noise exists in the frame of audio data;
the wind noise detection device is further configured to, after determining that wind noise exists in the frame of audio data, determine whether the spectrum centroid is greater than or equal to a third preset threshold, where the third preset threshold is smaller than the first preset threshold; under the condition that the frequency spectrum centroid is judged to be smaller than the third preset threshold value, recording the power spectrum density corresponding to the frequency spectrum centroid as the wind noise power spectrum density; the wind noise power spectral density is used to filter the frame of audio data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710754716.4A CN109427345B (en) | 2017-08-29 | 2017-08-29 | Wind noise detection method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710754716.4A CN109427345B (en) | 2017-08-29 | 2017-08-29 | Wind noise detection method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109427345A CN109427345A (en) | 2019-03-05 |
CN109427345B true CN109427345B (en) | 2022-12-02 |
Family
ID=65501950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710754716.4A Active CN109427345B (en) | 2017-08-29 | 2017-08-29 | Wind noise detection method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109427345B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112309420B (en) * | 2020-10-30 | 2023-06-27 | 出门问问(苏州)信息科技有限公司 | Method and device for detecting wind noise |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101031963A (en) * | 2004-09-16 | 2007-09-05 | 法国电信 | Method of processing a noisy sound signal and device for implementing said method |
US20110188685A1 (en) * | 2009-12-29 | 2011-08-04 | Sheikh Naim | Method for the detection of whistling in an audio system |
US20120008799A1 (en) * | 2009-04-03 | 2012-01-12 | Sascha Disch | Apparatus and method for determining a plurality of local center of gravity frequencies of a spectrum of an audio signal |
CN103345921A (en) * | 2013-07-15 | 2013-10-09 | 南京理工大学 | Nighttime sleeping sound signal analyzing method based on multiple characteristics |
US20160203833A1 (en) * | 2013-08-30 | 2016-07-14 | Zte Corporation | Voice Activity Detection Method and Device |
US20160225388A1 (en) * | 2013-10-25 | 2016-08-04 | Intel IP Corporation | Audio processing devices and audio processing methods |
CN106463106A (en) * | 2014-07-14 | 2017-02-22 | 英特尔Ip公司 | Wind noise reduction for audio reception |
-
2017
- 2017-08-29 CN CN201710754716.4A patent/CN109427345B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101031963A (en) * | 2004-09-16 | 2007-09-05 | 法国电信 | Method of processing a noisy sound signal and device for implementing said method |
US20120008799A1 (en) * | 2009-04-03 | 2012-01-12 | Sascha Disch | Apparatus and method for determining a plurality of local center of gravity frequencies of a spectrum of an audio signal |
US20110188685A1 (en) * | 2009-12-29 | 2011-08-04 | Sheikh Naim | Method for the detection of whistling in an audio system |
CN103345921A (en) * | 2013-07-15 | 2013-10-09 | 南京理工大学 | Nighttime sleeping sound signal analyzing method based on multiple characteristics |
US20160203833A1 (en) * | 2013-08-30 | 2016-07-14 | Zte Corporation | Voice Activity Detection Method and Device |
US20160225388A1 (en) * | 2013-10-25 | 2016-08-04 | Intel IP Corporation | Audio processing devices and audio processing methods |
CN106463106A (en) * | 2014-07-14 | 2017-02-22 | 英特尔Ip公司 | Wind noise reduction for audio reception |
Also Published As
Publication number | Publication date |
---|---|
CN109427345A (en) | 2019-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108831499B (en) | Speech enhancement method using speech existence probability | |
CN106664486B (en) | Method and apparatus for wind noise detection | |
CN102160296B (en) | Method and apparatus for detecting double talk | |
US7508948B2 (en) | Reverberation removal | |
US7968786B2 (en) | Volume adjusting apparatus and volume adjusting method | |
WO2021114733A1 (en) | Noise suppression method for processing at different frequency bands, and system thereof | |
CN103871418B (en) | A kind of sound reinforcement system is uttered long and high-pitched sounds the detection method of frequency and device | |
JP2010112996A (en) | Voice processing device, voice processing method and program | |
CN106328151B (en) | ring noise eliminating system and application method thereof | |
CN109285554B (en) | Echo cancellation method, server, terminal and system | |
CN108305637B (en) | Earphone voice processing method, terminal equipment and storage medium | |
JP4816711B2 (en) | Call voice processing apparatus and call voice processing method | |
CN107274913A (en) | A kind of sound identification method and device | |
CN106448696A (en) | Adaptive high-pass filtering speech noise reduction method based on background noise estimation | |
Towsey | Noise removal from wave-forms and spectrograms derived from natural recordings of the environment | |
CN101808260A (en) | Audio dynamic feedback suppression method | |
CN112309417A (en) | Wind noise suppression audio signal processing method, device, system and readable medium | |
CN109427345B (en) | Wind noise detection method, device and system | |
US11170760B2 (en) | Detecting speech activity in real-time in audio signal | |
CN106571138B (en) | Signal endpoint detection method, detection device and detection equipment | |
CN105869652B (en) | Psychoacoustic model calculation method and device | |
CN104424954B (en) | noise estimation method and device | |
CN107548007B (en) | Detection method and device of audio signal acquisition equipment | |
CN114724575A (en) | Howling detection method, device and system | |
WO2015000401A1 (en) | Audio signal classification processing method, apparatus, and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |