CN112201267A - Audio processing method and device, electronic equipment and storage medium - Google Patents

Audio processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112201267A
CN112201267A CN202010930871.9A CN202010930871A CN112201267A CN 112201267 A CN112201267 A CN 112201267A CN 202010930871 A CN202010930871 A CN 202010930871A CN 112201267 A CN112201267 A CN 112201267A
Authority
CN
China
Prior art keywords
audio signal
signal
current frame
processed
reverberation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010930871.9A
Other languages
Chinese (zh)
Inventor
李楠
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202010930871.9A priority Critical patent/CN112201267A/en
Publication of CN112201267A publication Critical patent/CN112201267A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Abstract

The present disclosure relates to an audio processing method, an apparatus, an electronic device, and a storage medium, the method comprising: acquiring an audio signal to be processed; acquiring a noise signal included in an audio signal to be processed and reverberation time of the audio signal to be processed; determining a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determining a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and the reverberation time length; and removing reverberation of the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain the audio signal after removing reverberation. According to the technical scheme provided by the embodiment of the disclosure, when the reverberation of the audio signal to be processed is removed, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that when the noise signal exists in the audio signal to be processed, the reverberation of the audio signal to be processed can be removed well.

Description

Audio processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of audio technologies, and in particular, to an audio processing method and apparatus, an electronic device, and a storage medium.
Background
Acoustic reverberation is a common physical phenomenon produced by the reflection of sound waves. When the microphone is used for collecting the audio signal, the reverberation causes interference to the audio signal, and the serious reverberation causes the intelligibility of the audio signal to be reduced. Therefore, the dereverberation technology for audio signals has attracted certain attention in audio communication, high-quality voice capturing and playback, and other scenes.
In the related art, a dereverberation method based on WPE (Weighted Prediction Error) is usually adopted to dereverberate an audio signal, but the method has a high dependence on the signal-to-noise ratio of the audio signal, and when noise exists in the audio signal, the convergence of the algorithm is poor, and finally, the dereverberation effect is poor.
Disclosure of Invention
In order to solve the technical problem existing in the related art that the dereverberation effect is poor when noise exists in an audio signal and the audio signal is dereverberated, the present disclosure provides an audio processing method, an apparatus, an electronic device and a storage medium, and the technical scheme of the present disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided an audio processing method, including:
acquiring an audio signal to be processed;
acquiring a noise signal included in the audio signal to be processed and reverberation time of the audio signal to be processed;
determining a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determining a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and the reverberation time length;
and dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain an audio signal after dereverberation.
Optionally, the dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal to obtain a dereverberated audio signal includes:
for any current frame audio signal of the audio signal to be processed, calculating a first gain factor corresponding to the current frame audio signal through the audio signal of the current frame audio signal, the reverberation signal of the current frame audio signal and a preset minimum dereverberation gain factor;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal;
smoothing the first gain factor and the second gain factor to obtain a target gain factor;
and performing dereverberation on the current frame audio signal through the target gain factor to obtain a dereverberated audio signal corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is smaller than a preset signal-to-noise ratio, and the reverberation duration of the current frame audio signal is greater than a preset reverberation duration;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal, including:
calculating the dereverberation and denoising scale factor of the current frame audio signal according to the following formula:
Figure BDA0002670178180000021
wherein gamma is a dereverberation and denoising scale factor of the current frame audio signal, and snr (n) is a signal-to-noise ratio corresponding to the current frame audio signal;
the second gain factor is calculated according to the following formula:
Figure BDA0002670178180000022
wherein G istmpIs a second gain factor, said Gdereverb(n) is the first gain factor corresponding to the current frame audio signal, GdenoiseAnd (n) is a noise reduction gain factor corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is greater than a preset signal-to-noise ratio, or the reverberation duration of the current frame audio signal is less than a preset reverberation duration;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal, including:
and determining the noise reduction gain factor corresponding to the current frame audio signal as a second gain factor.
Optionally, the determining the reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time length includes:
for any current frame audio signal of the audio signals to be processed, calculating the energy of the previous frame audio signal of the current frame audio signal after the attenuation of the excitation energy vector;
determining the excitation energy vector of the current frame audio signal according to the maximum value between the energy of the previous frame audio signal after the attenuation of the excitation energy vector and the energy of the current frame audio signal;
and determining the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed and the reverberation duration corresponding to the current frame audio signal.
Optionally, the calculating, for any current frame of audio signals of the audio signals to be processed, energy after attenuation of an excitation energy vector of an audio signal of a previous frame of the current frame of audio signals includes:
when a current frame audio signal of the audio signal to be processed is a first frame audio signal of the audio signal to be processed, determining the energy of the current frame audio signal after attenuation of an excitation energy vector as 0;
when the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, calculating the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000031
wherein R (n) isThe energy of the previous frame of audio signal of the current frame of audio signal after attenuation of the excitation energy vector, Ra (n-1) is the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, and T is1The time interval between two adjacent frames of the audio signal to be processed is defined.
Optionally, the determining the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed, and the reverberation duration corresponding to the current frame audio signal includes:
calculating a reverberation signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000032
M=RT60(n)/T1
wherein sr (n) is a reverberation signal of the current frame audio signal, RT60(n) is a reverberation duration corresponding to the current frame audio signal, and M is a number of frames within the reverberation duration corresponding to the current frame audio signal; ra (n-M) is the energy after attenuation of the excitation energy vector of the previous M frames of the current frame audio signal.
Optionally, the acquiring the audio signal to be processed includes:
acquiring an original audio signal;
carrying out short-time Fourier transform on the original audio signal to obtain a time-frequency domain signal of the original audio signal;
and determining the time-frequency domain signal as the audio signal to be processed.
Optionally, the method further includes:
and removing the noise signal included in the audio signal to be processed by the noise reduction gain factor.
According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus including:
an audio signal acquisition module configured to perform acquisition of an audio signal to be processed;
a noise signal and reverberation duration acquisition module configured to perform acquisition of a noise signal included in the audio signal to be processed and a reverberation duration of the audio signal to be processed;
a signal-to-noise ratio and reverberation signal determination module configured to determine a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determine a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and a reverberation time length;
and the dereverberation module is configured to dereverberate the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain a dereverberated audio signal.
Optionally, the dereverberation module includes:
a first gain factor calculation unit configured to calculate, for any current frame audio signal of the audio signals to be processed, a first gain factor corresponding to the current frame audio signal through an audio signal of the current frame audio signal, a reverberation signal of the current frame audio signal, and a preset minimum dereverberation gain factor;
a second gain factor calculation unit configured to calculate a second gain factor according to the first gain factor and a noise reduction gain factor corresponding to the current frame audio signal;
a target gain factor calculation unit configured to perform smoothing processing on the first gain factor and the second gain factor to obtain a target gain factor;
and the dereverberation unit is configured to perform dereverberation on the current frame audio signal through the target gain factor to obtain a dereverberated audio signal corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is smaller than a preset signal-to-noise ratio, and the reverberation duration of the current frame audio signal is greater than a preset reverberation duration;
the second gain factor calculation unit is specifically configured to perform:
calculating the dereverberation and denoising scale factor of the current frame audio signal according to the following formula:
Figure BDA0002670178180000051
wherein gamma is a dereverberation and denoising scale factor of the current frame audio signal, and snr (n) is a signal-to-noise ratio corresponding to the current frame audio signal;
the second gain factor is calculated according to the following formula:
Figure BDA0002670178180000052
wherein G istmpIs a second gain factor, said Gdereverb(n) is the first gain factor corresponding to the current frame audio signal, GdenoiseAnd (n) is a noise reduction gain factor corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is greater than a preset signal-to-noise ratio, or the reverberation duration of the current frame audio signal is less than a preset reverberation duration;
the second gain factor calculation unit is specifically configured to perform:
and determining the noise reduction gain factor corresponding to the current frame audio signal as a second gain factor.
Optionally, the signal-to-noise ratio and reverberation signal determining module includes:
the energy calculation unit is configured to calculate the energy of an excitation energy vector of an audio signal in the previous frame of the audio signal of the current frame after attenuation for any current frame of the audio signal to be processed;
an excitation energy vector determination unit configured to perform determining an excitation energy vector of the current frame audio signal according to a maximum value between an energy of the previous frame audio signal after attenuation of the excitation energy vector and an energy of the current frame audio signal;
and the reverberation signal determination unit is configured to determine the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed and the reverberation time length corresponding to the current frame audio signal.
Optionally, the energy calculating unit is specifically configured to perform:
when a current frame audio signal of the audio signal to be processed is a first frame audio signal of the audio signal to be processed, determining the energy of the current frame audio signal after attenuation of an excitation energy vector as 0;
when the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, calculating the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000061
wherein, r (n) is the energy after attenuation of the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, Ra (n-1) is the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, and T1The time interval between two adjacent frames of the audio signal to be processed is defined.
Optionally, the reverberation signal determination unit is specifically configured to perform:
calculating a reverberation signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000062
M=RT60(n)/T1
wherein sr (n) is a reverberation signal of the current frame audio signal, RT60(n) is a reverberation duration corresponding to the current frame audio signal, and M is a number of frames within the reverberation duration corresponding to the current frame audio signal; ra (n-M) is the energy after attenuation of the excitation energy vector of the previous M frames of the current frame audio signal.
Optionally, the audio signal obtaining module is specifically configured to perform:
acquiring an original audio signal;
carrying out short-time Fourier transform on the original audio signal to obtain a time-frequency domain signal of the original audio signal;
and determining the time-frequency domain signal as the audio signal to be processed.
Optionally, the apparatus further comprises:
a denoising module configured to perform denoising of a noise signal included in the audio signal to be processed by the denoising gain factor.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio processing method of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the audio processing method of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to carry out the audio processing method of the first aspect.
According to the technical scheme provided by the embodiment of the disclosure, an audio signal to be processed is obtained; acquiring a noise signal included in an audio signal to be processed and reverberation time of the audio signal to be processed; determining the signal-to-noise ratio and the noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal; determining a reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time; and removing reverberation of the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain the audio signal after removing reverberation.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
Drawings
FIG. 1 is a flow diagram illustrating a method of audio processing according to an exemplary embodiment;
FIG. 2 is a flowchart of one implementation of step S14 in the embodiment of FIG. 1;
FIG. 3 is a flowchart illustrating one embodiment of determining a reverberation signal included in an audio signal to be processed according to the audio signal to be processed and a reverberation time duration according to an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating an audio processing procedure in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating an audio processing device according to an exemplary embodiment;
FIG. 6 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating an audio processing device according to an example embodiment;
fig. 8 is a block diagram illustrating another audio processing device according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In order to solve the technical problem existing in the related art that when noise exists in an audio signal, dereverberation effect is poor when dereverberating the audio signal, embodiments of the present disclosure provide an audio processing method, an apparatus, an electronic device and a storage medium,
in a first aspect, an audio processing method provided by an embodiment of the present disclosure is first explained in detail.
As shown in fig. 1, an audio processing method provided in an embodiment of the present disclosure may include the following steps:
in step S11, an audio signal to be processed is acquired.
Specifically, the audio signal to be processed is an audio signal to be dereverberated. In practical applications, the audio signal to be processed usually includes signal components such as a speech signal, a noise signal and a reverberation signal.
In one embodiment, acquiring the audio signal to be processed may include the following steps, respectively step a1 to step a 3:
step a1, an original audio signal is obtained.
Step a2, performing short-time Fourier transform on the original audio signal to obtain a time-frequency domain signal of the original audio signal.
Step a3, determining the time-frequency domain signal as the audio signal to be processed.
In this embodiment, the original audio signal may be converted into a time-frequency domain signal by a short-time fourier transform, specifically, as follows:
X(n)=STFT(x(t))
wherein, x (t) is a time domain audio signal, x (N) is a time domain audio signal, N is a frame sequence, N is more than 0 and less than or equal to N, and N is the total frame number. It should be noted that, in the embodiment of the present disclosure, since the processing is the same for each frequency band, the symbol indicating the frequency band information is not represented in the frequency domain signal. After converting the original audio signal into a time-frequency domain signal, the time-frequency domain signal may be determined as the audio signal to be processed.
In step S12, a noise signal included in the audio signal to be processed and a reverberation time period of the audio signal to be processed are acquired.
Specifically, after the audio signal to be processed is acquired, the noise signal included in the audio signal to be processed may be extracted through methods such as stationary noise estimation based on a time window, noise estimation based on statistics, and the like. Those skilled in the art should understand that, for a specific implementation process of extracting a noise signal based on stationary noise estimation of a time window or based on statistical noise estimation, details of the embodiment of the disclosure are not repeated here. In addition, the embodiment of the present disclosure does not specifically limit the way of extracting the noise signal included in the audio signal to be processed. In addition, in practical applications, only stationary noise signals included in the audio signal to be processed may be extracted, and non-stationary noise signals included in the signal to be processed may not be extracted.
And obtaining the reverberation time length of the audio signal to be processed through the attenuation characteristic of the audio signal to be processed. Specifically, the audio signal to be processed may be input into a pre-trained reverberation time length estimation model, the reverberation time length estimation model extracts attenuation characteristics of each frame of audio signal included in the audio signal to be processed, and obtains a reverberation time length corresponding to each frame of audio signal based on the attenuation characteristics of each frame of audio signal, that is, RT60(n) is output from the reverberation time length estimation model, where RT60(n) is the reverberation time length corresponding to each frame of audio signal.
Of course, the reverberation time of the audio signal to be processed may also be obtained through other implementation manners, and the specific implementation manner of obtaining the reverberation time of the audio signal to be processed through the attenuation feature of the audio signal to be processed is not particularly limited in the embodiment of the present disclosure.
In step S13, the signal-to-noise ratio and the noise reduction gain factor of the audio signal to be processed are determined according to the audio signal to be processed and the noise signal, and the reverberation signal included in the audio signal to be processed is determined according to the audio signal to be processed and the reverberation time length.
Specifically, after the audio signal to be processed and the noise signal are obtained, the signal-to-noise ratio and the noise reduction gain factor of each frame of audio signal included in the audio signal to be processed may be estimated by using the audio signal to be processed x (n) and the noise signal noise (n), so as to obtain the signal-to-noise ratio and the noise reduction gain factor of the audio signal to be processed.
And, the following formula can be used to calculate the signal-to-noise ratio of each frame of audio signal included in the audio signal to be processed:
Figure BDA0002670178180000101
wherein, snr (n) represents the signal-to-noise ratio corresponding to the nth frame of audio signal, and x (n) is the signal voltage corresponding to the nth frame of audio signal; noise (n) is the signal voltage corresponding to the noise signal of the nth frame.
And the noise reduction gain factor of each frame of audio signal included in the audio signal to be processed can be calculated by using the following formula:
Figure BDA0002670178180000102
and, the reverberation signal of the audio signal to be processed can also be determined according to the audio signal to be processed and the reverberation time length. Specifically, after obtaining the reverberation time of each frame of audio signal included in the audio signal to be processed, the reverberation signal of the audio signal to be processed may be obtained according to the audio signal to be processed and the reverberation time of each frame of audio signal.
For clarity of the description of the scheme, a specific implementation manner of determining the reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time length will be explained in detail in the following embodiments.
In step S14, the audio signal to be processed is dereverberated according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal, so as to obtain a dereverberated audio signal.
Specifically, after the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal of the audio signal to be processed are obtained, the reverberation of the audio signal to be processed may be removed according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal, so as to obtain the audio signal after being removed from the reverberation. Therefore, according to the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when noise exists in the audio signal to be processed.
For clarity of the description of the scheme, a specific implementation of step S14, dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain a dereverberated audio signal will be described in detail in the following embodiments.
According to the technical scheme provided by the embodiment of the disclosure, an audio signal to be processed is obtained; acquiring a noise signal included in an audio signal to be processed and reverberation time of the audio signal to be processed; determining the signal-to-noise ratio and the noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal; determining a reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time length; and removing reverberation of the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain the audio signal after removing reverberation.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
For clarity of the description of the scheme, a specific implementation of step S14, dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain a dereverberated audio signal will be described in detail in the following embodiments.
In one embodiment, dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal to obtain a dereverberated audio signal, as shown in fig. 2, the method may include the following steps:
in step S141, for any current frame of audio signals to be processed, a first gain factor corresponding to the current frame of audio signals is calculated through the audio signal of the current frame of audio signals, the reverberation signal of the current frame of audio signals, and a preset minimum dereverberation gain factor.
Specifically, the current frame audio signal may be any one of the audio signals to be processed. The first gain factor corresponding to the current frame audio signal may be calculated according to the following formula:
Figure BDA0002670178180000111
wherein G isdereverb(n) is the first gain factor corresponding to the current frame audio signal, | X (n) & gt2The energy of the current frame audio signal is sr (n), the reverberation signal corresponding to the current frame audio signal, and lambda is a preset minimum dereverberation gain factor, and the size of lambda can be set according to the maximum limit of dereverberation required, for example, the size of lambda can be 0.1.
In step S142, a second gain factor is calculated according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal.
Specifically, after the first gain factor is obtained through calculation, the second gain factor may be calculated according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal.
Also, in order to accurately calculate the second gain factor, the audio signal to be processed may be better dereverberated in a subsequent step. In practical applications, there are two cases when calculating the second gain factor.
As an implementation manner of the embodiment of the present disclosure, a signal-to-noise ratio of a current frame audio signal is smaller than a preset signal-to-noise ratio, and a reverberation duration of the current frame audio signal is greater than a preset reverberation duration; the preset signal-to-noise ratio may be 20dB, and the preset reverberation time may be 300 ms.
At this time, calculating the second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal may include the following steps:
calculating the dereverberation and de-noise scale factor of the current frame audio signal according to the following formula:
Figure BDA0002670178180000121
wherein gamma is the dereverberation and de-noise scale factor of the current frame audio signal, and SNR (n) is the signal-to-noise ratio corresponding to the current frame audio signal;
the second gain factor is calculated according to the following formula:
Figure BDA0002670178180000122
wherein G istmpIs a second gain factor, Gdereverb(n) is a first gain factor, G, corresponding to the audio signal of the current framedenoiseAnd (n) is the noise reduction gain factor corresponding to the current frame audio signal.
As another implementation manner of the embodiment of the present disclosure, a signal-to-noise ratio of the current frame audio signal is greater than a preset signal-to-noise ratio, or a reverberation duration of the current frame audio signal is less than a preset reverberation duration; the preset signal-to-noise ratio may be 20dB, and the preset reverberation time may be 300 ms.
At this time, calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal includes:
and determining the noise reduction gain factor corresponding to the current frame audio signal as a second gain factor.
That is, Gtmp=Gdenoise(n)。
In step S143, the first gain factor and the second gain factor are smoothed to obtain a target gain factor.
After the first gain factor and the second gain factor are calculated, the first gain factor and the second gain factor may be smoothed to obtain a target gain factor for final dereverberation.
Specifically, the first gain factor and the second gain factor may be smoothed according to the following formula:
G(n)=smooth*Gdereverb(n-1)+(1-smooth)*Gtmp
when n is 1, Gdereverb(n-1) is the initialized gain factor, i.e. Gdereverb(0) 1. Where smooth is a smoothing coefficient, the magnitude of the smoothing coefficient may be a value close to 1, such as 0.9.
In step S144, dereverberation is performed on the current frame audio signal by using the target gain factor, so as to obtain a dereverberated audio signal corresponding to the current frame audio signal.
Specifically, after the target gain factor g (n) corresponding to the current frame audio signal is obtained, dereverberation may be performed on the current frame audio signal through the target gain factor, so as to obtain a dereverberated audio signal corresponding to the current frame audio signal.
The dereverberated audio signal corresponding to the current frame audio signal can be obtained according to the following formula:
Y(n)=G(n)*X(n)
y (n) is the audio signal after dereverberation corresponding to the current frame audio signal, G (n) is the target gain factor corresponding to the current frame audio signal, and X (n) is the current frame audio signal.
Since the current frame audio signal may be any frame audio signal of the audio signal to be processed, after the dereverberated audio signals corresponding to all the current frame audio signals are obtained, the dereverberated audio signal of the audio signal to be processed may be obtained.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
For clarity of the description of the scheme, a specific implementation manner of determining the reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time length will be explained in detail in the following embodiments.
In one embodiment, determining the reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time length, as shown in fig. 3, may include the following steps:
in step S131, for any current frame of audio signals of the audio signals to be processed, the energy after attenuation of the excitation energy vector of the previous frame of audio signals of the current frame of audio signals is calculated.
Specifically, if the current frame audio signal of the audio signal to be processed is the first frame audio signal of the audio signal to be processed, the current frame audio signal does not have the previous frame audio signal, and therefore, the excitation energy vector of the previous frame audio signal does not exist. At this time, the excitation energy vector of the previous frame of audio signal may be initialized, that is, ra (n) is 0, and the energy of the previous frame of audio signal after attenuation of the excitation energy vector may also be 0.
If the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, the current frame audio signal has the previous frame audio signal, and therefore, the excitation energy vector of the previous frame audio signal also exists, and at this time, the energy after the attenuation of the excitation energy vector of the previous frame audio signal is not 0.
As an implementation manner of the embodiment of the present disclosure, for any current frame of audio signals to be processed, calculating the energy after attenuation of the excitation energy vector of the previous frame of audio signals of the current frame of audio signals, the following steps may be included, which are step b1 and step b 2:
and b1, when the current frame audio signal of the audio signal to be processed is the first frame audio signal of the audio signal to be processed, determining the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal as 0.
As can be seen from the above description, when the current frame audio signal is the first frame audio signal, there is no previous frame audio signal in the current frame audio signal, and therefore, the energy after the attenuation of the excitation energy vector of the previous frame of the current frame audio signal can be determined as 0.
Step b2, when the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, calculating the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000151
wherein R (n) is the energy after attenuation of the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, Ra (n-1) is the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, T1The time interval between two adjacent frames of the audio signal to be processed is defined.
In step S132, the excitation energy vector of the audio signal of the current frame is determined according to the maximum value between the attenuated energy of the excitation energy vector of the audio signal of the previous frame and the energy of the audio signal of the current frame.
Specifically, after obtaining the energy of the previous frame of audio signal after attenuation of the excitation energy vector and the energy of the current frame of audio signal, comparing the energy of the previous frame of audio signal after attenuation of the excitation energy vector with the energy of the current frame of audio signal, and taking the maximum value to obtain the excitation energy vector of the current frame of audio signal. The specific formula is as follows:
Ra(n)=max(R(n),|X(n)|2)
wherein Ra (n) is the excitation energy vector of the current frame audio signal, R (n) is the energy attenuated by the excitation energy vector of the previous frame audio signal, | X (n) is2Is the energy of the audio signal of the current frame.
In step S133, a reverberation signal of the current frame audio signal is determined according to the excitation energy vector of the current frame audio signal, a time interval between two adjacent frames of the audio signal to be processed, and a reverberation duration corresponding to the current frame audio signal.
Specifically, after obtaining the excitation energy vector of the current frame audio signal, the reverberation signal of the current frame audio signal can be obtained through the excitation energy vector of the current frame audio signal, a time interval between two adjacent frames of the audio signal to be processed, and a reverberation duration corresponding to the current frame audio signal.
As an implementation manner of the embodiment of the present disclosure, determining a reverberation signal of a current frame audio signal according to an excitation energy vector of the current frame audio signal, a time interval between two adjacent frames of an audio signal to be processed, and a reverberation duration corresponding to the current frame audio signal may include the following steps:
calculating the reverberation signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000161
M=RT60(n)/T1
wherein, sr (n) is a reverberation signal of the current frame audio signal, RT60(n) is a reverberation duration corresponding to the current frame audio signal, and M is a frame number corresponding to the reverberation duration corresponding to the current frame audio signal; ra (n-M) is the energy after attenuation of the excitation energy vector of the previous M frames of the current frame audio signal.
Since the current frame audio signal may be any frame audio signal of the audio signal to be processed, after obtaining the reverberation signals of all the current frame audio signals, the reverberation signals included in the audio signal to be processed may be obtained.
Therefore, according to the technical scheme provided by the embodiment, the reverberation time of the audio signal to be processed is estimated, and the excitation energy vector is utilized to accurately and efficiently determine the reverberation signal included in the audio signal to be processed, so that the subsequent dereverberation of the audio signal to be processed is facilitated.
On the basis of the foregoing embodiment, in order to further improve the signal quality of the dereverberated audio signal, in an implementation manner, the audio processing method may further include the following steps:
and removing the noise signal included in the audio signal to be processed by a noise reduction gain factor.
In this embodiment, not only the audio signal to be processed may be dereverberated, but also the audio signal to be processed may be denoised, so that the signal quality of the processed audio signal is higher.
For clarity of description, the audio processing signal provided by the embodiment of the present disclosure will be described in detail with reference to specific examples. As shown in fig. 4.
In practical application, the system may include the following modules: the device comprises a stationary noise estimation module, a signal-to-noise ratio estimation module, a reverberation time estimation module, a reverberation spectrum estimation module and a reverberation elimination module.
The audio input is an audio signal to be processed collected by the system microphone module, and generally includes signal components such as a speech signal, a noise signal, and a reverberation signal.
Firstly, an audio signal to be processed is input into a stationary noise estimation module, and the stationary noise estimation module is used for estimating stationary noise signals included in the audio signal to be processed.
And secondly, the signal-to-noise ratio estimation module carries out signal-to-noise ratio estimation by utilizing a noise signal estimation result output by the stationary noise estimation module, and simultaneously calculates a gain factor for removing noise, namely a noise reduction gain factor.
Again, the audio signal to be processed may be input to the reverberation time estimation module, and the ambient reverberation level of the audio signal to be processed is estimated by the reverberation time estimation module, that is, the reverberation duration RT60 of the audio signal to be processed is estimated.
And then, using the reverberation time length RT60 index obtained by the reverberation time estimation module as a reference to perform reverberation spectrum estimation to obtain a reverberation spectrum of the audio signal to be processed, namely obtaining the reverberation signal included in the audio signal to be processed.
And finally, eliminating the reverberation signal and the stable noise signal in the audio signal to be processed simultaneously by utilizing information such as the estimated reverberation spectrum, the estimated signal-to-noise ratio, the noise reduction gain factor and the like to obtain the output audio without reverberation, namely the processed audio signal.
Therefore, according to the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation spectrum are considered, so that when the audio signal to be processed has a noise signal, the audio signal to be processed can be dereverberated well, and the noise reduction gain factor is utilized to perform denoising on the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus, as shown in fig. 5, including:
an audio signal acquisition module 510 configured to perform acquiring an audio signal to be processed;
a noise signal and reverberation duration obtaining module 520 configured to perform obtaining of a noise signal included in the audio signal to be processed and a reverberation duration of the audio signal to be processed;
a signal-to-noise ratio and reverberation signal determination module 530 configured to determine a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determine a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and the reverberation time length;
a dereverberation module 540 configured to perform dereverberation on the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal, so as to obtain a dereverberated audio signal.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
Optionally, the dereverberation module includes:
a first gain factor calculation unit configured to calculate, for any current frame audio signal of the audio signals to be processed, a first gain factor corresponding to the current frame audio signal through an audio signal of the current frame audio signal, a reverberation signal of the current frame audio signal, and a preset minimum dereverberation gain factor;
a second gain factor calculation unit configured to calculate a second gain factor according to the first gain factor and a noise reduction gain factor corresponding to the current frame audio signal;
a target gain factor calculation unit configured to perform smoothing processing on the first gain factor and the second gain factor to obtain a target gain factor;
and the dereverberation unit is configured to perform dereverberation on the current frame audio signal through the target gain factor to obtain a dereverberated audio signal corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is smaller than a preset signal-to-noise ratio, and the reverberation duration of the current frame audio signal is greater than a preset reverberation duration;
the second gain factor calculation unit is specifically configured to perform:
calculating the dereverberation and denoising scale factor of the current frame audio signal according to the following formula:
Figure BDA0002670178180000181
wherein gamma is a dereverberation and denoising scale factor of the current frame audio signal, and snr (n) is a signal-to-noise ratio corresponding to the current frame audio signal;
the second gain factor is calculated according to the following formula:
Figure BDA0002670178180000182
wherein G istmpIs a second gain factor, said Gdereverb(n) is the first gain factor corresponding to the current frame audio signal, GdenoiseAnd (n) is a noise reduction gain factor corresponding to the current frame audio signal.
Optionally, when the signal-to-noise ratio of the current frame audio signal is greater than a preset signal-to-noise ratio, or the reverberation duration of the current frame audio signal is less than a preset reverberation duration;
the second gain factor calculation unit is specifically configured to perform:
and determining the noise reduction gain factor corresponding to the current frame audio signal as a second gain factor.
Optionally, the signal-to-noise ratio and reverberation signal determining module includes:
the energy calculation unit is configured to calculate the energy of an excitation energy vector of an audio signal in the previous frame of the audio signal of the current frame after attenuation for any current frame of the audio signal to be processed;
an excitation energy vector determination unit configured to perform determining an excitation energy vector of the current frame audio signal according to a maximum value between an energy of the previous frame audio signal after attenuation of the excitation energy vector and an energy of the current frame audio signal;
and the reverberation signal determination unit is configured to determine the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed and the reverberation time length corresponding to the current frame audio signal.
Optionally, the energy calculating unit is specifically configured to perform:
when a current frame audio signal of the audio signal to be processed is a first frame audio signal of the audio signal to be processed, determining the energy of the current frame audio signal after attenuation of an excitation energy vector as 0;
when the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, calculating the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000191
wherein, r (n) is the energy after attenuation of the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, Ra (n-1) is the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, and T1The time interval between two adjacent frames of the audio signal to be processed is defined.
Optionally, the reverberation signal determination unit is specifically configured to perform:
calculating a reverberation signal of the current frame audio signal according to the following formula:
Figure BDA0002670178180000192
M=RT60(n)/T1
wherein sr (n) is a reverberation signal of the current frame audio signal, RT60(n) is a reverberation duration corresponding to the current frame audio signal, and M is a number of frames within the reverberation duration corresponding to the current frame audio signal; ra (n-M) is the energy after attenuation of the excitation energy vector of the previous M frames of the current frame audio signal.
Optionally, the audio signal obtaining module is specifically configured to perform:
acquiring an original audio signal;
carrying out short-time Fourier transform on the original audio signal to obtain a time-frequency domain signal of the original audio signal;
and determining the time-frequency domain signal as the audio signal to be processed.
Optionally, the apparatus further comprises:
a denoising module configured to perform denoising of a noise signal included in the audio signal to be processed by the denoising gain factor.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, as shown in fig. 6, including:
a processor 610;
a memory 620 for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio processing method of the first aspect.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
Fig. 7 is a block diagram illustrating an audio processing device 700 according to an example embodiment. For example, the apparatus 700 may be provided as a server. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors and memory resources, represented by memory 732, for storing instructions, such as applications, that are executable by processing component 722. The application programs stored in memory 732 may include one or more modules that each correspond to a set of instructions. Furthermore, the processing component 722 is configured to execute instructions to perform the audio processing method according to the first aspect.
The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in memory 732, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
Fig. 8 is a block diagram illustrating an audio processing device 800 according to an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 8, the apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply components 807 provide power to the various components of device 800. The power components 807 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in memory 404 or transmitted via communications component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the audio processing method described in the first aspect.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, such as a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the audio processing method of the first aspect.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to carry out the audio processing method of the first aspect.
Therefore, by the technical scheme provided by the embodiment of the disclosure, when the audio signal to be processed is dereverberated, the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal are considered, so that the audio signal to be processed can be dereverberated well when the noise signal exists in the audio signal to be processed. Moreover, the disturbance of the reverberation on the audio signal can be stably reduced, meanwhile, the audio signal after the reverberation is removed cannot be distorted, and the quality and the intelligibility of the audio signal of a real-time communication scene are improved.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
The present disclosure is not limited to the precise arrangements described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An audio processing method, comprising:
acquiring an audio signal to be processed;
acquiring a noise signal included in the audio signal to be processed and reverberation time of the audio signal to be processed;
determining a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determining a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and the reverberation time length;
and dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain an audio signal after dereverberation.
2. The method of claim 1, wherein dereverberating the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor, and the reverberation signal to obtain a dereverberated audio signal comprises:
for any current frame audio signal of the audio signal to be processed, calculating a first gain factor corresponding to the current frame audio signal through the audio signal of the current frame audio signal, the reverberation signal of the current frame audio signal and a preset minimum dereverberation gain factor;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal;
smoothing the first gain factor and the second gain factor to obtain a target gain factor;
and performing dereverberation on the current frame audio signal through the target gain factor to obtain a dereverberated audio signal corresponding to the current frame audio signal.
3. The method of claim 2, wherein when the signal-to-noise ratio of the current frame audio signal is less than a preset signal-to-noise ratio, and the reverberation duration of the current frame audio signal is greater than a preset reverberation duration;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal, including:
calculating the dereverberation and denoising scale factor of the current frame audio signal according to the following formula:
Figure FDA0002670178170000011
wherein gamma is a dereverberation and denoising scale factor of the current frame audio signal, and snr (n) is a signal-to-noise ratio corresponding to the current frame audio signal;
the second gain factor is calculated according to the following formula:
Figure FDA0002670178170000021
wherein G istmpIs a second gain factor, said Gdereverb(n) is the first gain factor corresponding to the current frame audio signal, GdenoiseAnd (n) is a noise reduction gain factor corresponding to the current frame audio signal.
4. The method of claim 2, wherein when the signal-to-noise ratio of the current frame audio signal is greater than a preset signal-to-noise ratio, or the reverberation duration of the current frame audio signal is less than a preset reverberation duration;
calculating a second gain factor according to the first gain factor and the noise reduction gain factor corresponding to the current frame audio signal, including:
and determining the noise reduction gain factor corresponding to the current frame audio signal as a second gain factor.
5. The method of claim 1, wherein the determining the reverberation signal of the audio signal to be processed according to the audio signal to be processed and the reverberation time comprises:
for any current frame audio signal of the audio signals to be processed, calculating the energy of the previous frame audio signal of the current frame audio signal after the attenuation of the excitation energy vector;
determining the excitation energy vector of the current frame audio signal according to the maximum value between the energy of the previous frame audio signal after the attenuation of the excitation energy vector and the energy of the current frame audio signal;
and determining the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed and the reverberation duration corresponding to the current frame audio signal.
6. The method according to claim 5, wherein said calculating, for any current frame audio signal of the audio signals to be processed, the attenuated energy of the excitation energy vector of the previous frame audio signal of the current frame audio signal comprises:
when a current frame audio signal of the audio signal to be processed is a first frame audio signal of the audio signal to be processed, determining the energy of the current frame audio signal after attenuation of an excitation energy vector as 0;
when the current frame audio signal of the audio signal to be processed is not the first frame audio signal of the audio signal to be processed, calculating the energy after the attenuation of the excitation energy vector of the previous frame audio signal of the current frame audio signal according to the following formula:
Figure FDA0002670178170000031
wherein, r (n) is the energy after attenuation of the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, Ra (n-1) is the excitation energy vector of the previous frame of audio signal of the current frame of audio signal, and T1The time interval between two adjacent frames of the audio signal to be processed is defined.
7. The method of claim 6, wherein the determining the reverberation signal of the current frame audio signal according to the excitation energy vector of the current frame audio signal, the time interval between two adjacent frames of the audio signal to be processed, and the reverberation duration corresponding to the current frame audio signal comprises:
calculating a reverberation signal of the current frame audio signal according to the following formula:
Figure FDA0002670178170000032
M=RT60(n)/T1
wherein sr (n) is a reverberation signal of the current frame audio signal, RT60(n) is a reverberation duration corresponding to the current frame audio signal, and M is a number of frames within the reverberation duration corresponding to the current frame audio signal; ra (n-M) is the energy after attenuation of the excitation energy vector of the previous M frames of the current frame audio signal.
8. An audio processing apparatus, comprising:
an audio signal acquisition module configured to perform acquisition of an audio signal to be processed;
a noise signal and reverberation duration acquisition module configured to perform acquisition of a noise signal included in the audio signal to be processed and a reverberation duration of the audio signal to be processed;
a signal-to-noise ratio and reverberation signal determination module configured to determine a signal-to-noise ratio and a noise reduction gain factor of the audio signal to be processed according to the audio signal to be processed and the noise signal, and determine a reverberation signal included in the audio signal to be processed according to the audio signal to be processed and a reverberation time length;
and the dereverberation module is configured to dereverberate the audio signal to be processed according to the signal-to-noise ratio, the noise reduction gain factor and the reverberation signal to obtain a dereverberated audio signal.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the audio processing method of any of claims 1 to 7.
10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the audio processing method of any of claims 1 to 7.
CN202010930871.9A 2020-09-07 2020-09-07 Audio processing method and device, electronic equipment and storage medium Pending CN112201267A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010930871.9A CN112201267A (en) 2020-09-07 2020-09-07 Audio processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010930871.9A CN112201267A (en) 2020-09-07 2020-09-07 Audio processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112201267A true CN112201267A (en) 2021-01-08

Family

ID=74006455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010930871.9A Pending CN112201267A (en) 2020-09-07 2020-09-07 Audio processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112201267A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496698A (en) * 2021-08-12 2021-10-12 云知声智能科技股份有限公司 Method, device and equipment for screening training data and storage medium
WO2023105778A1 (en) * 2021-12-10 2023-06-15 日本電信電話株式会社 Speech signal processing method, speech signal processing device, and program

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0559530A1 (en) * 1992-03-03 1993-09-08 France Telecom Method and system for artificial spatial processing of digital audio signals
AUPO714197A0 (en) * 1997-06-02 1997-06-26 University Of Melbourne, The Multi-strategy array processor
US20040190730A1 (en) * 2003-03-31 2004-09-30 Yong Rui System and process for time delay estimation in the presence of correlated noise and reverberation
CN1684143A (en) * 2004-04-14 2005-10-19 华为技术有限公司 Method for strengthening sound
US20090112584A1 (en) * 2007-10-24 2009-04-30 Xueman Li Dynamic noise reduction
US20090299742A1 (en) * 2008-05-29 2009-12-03 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for spectral contrast enhancement
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
US20130188798A1 (en) * 2012-01-24 2013-07-25 Fujitsu Limited Reverberation reduction device and reverberation reduction method
CN103440869A (en) * 2013-09-03 2013-12-11 大连理工大学 Audio-reverberation inhibiting device and inhibiting method thereof
CN105474312A (en) * 2013-09-17 2016-04-06 英特尔公司 Adaptive phase difference based noise reduction for automatic speech recognition (ASR)
CN105590630A (en) * 2016-02-18 2016-05-18 南京奇音石信息技术有限公司 Directional noise suppression method based on assigned bandwidth
CN106898359A (en) * 2017-03-24 2017-06-27 上海智臻智能网络科技股份有限公司 Acoustic signal processing method, system, audio interactive device and computer equipment
CN108235181A (en) * 2016-12-13 2018-06-29 奥迪康有限公司 The method of noise reduction in apparatus for processing audio
CN109686347A (en) * 2018-11-30 2019-04-26 北京达佳互联信息技术有限公司 Sound effect treatment method, sound-effect processing equipment, electronic equipment and readable medium
CN110087168A (en) * 2019-05-06 2019-08-02 浙江齐聚科技有限公司 Audio reverberation processing method, device, equipment and storage medium
CN110289009A (en) * 2019-07-09 2019-09-27 广州视源电子科技股份有限公司 Processing method, device and the interactive intelligence equipment of voice signal

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0559530A1 (en) * 1992-03-03 1993-09-08 France Telecom Method and system for artificial spatial processing of digital audio signals
AUPO714197A0 (en) * 1997-06-02 1997-06-26 University Of Melbourne, The Multi-strategy array processor
US20040190730A1 (en) * 2003-03-31 2004-09-30 Yong Rui System and process for time delay estimation in the presence of correlated noise and reverberation
CN1684143A (en) * 2004-04-14 2005-10-19 华为技术有限公司 Method for strengthening sound
US20090112584A1 (en) * 2007-10-24 2009-04-30 Xueman Li Dynamic noise reduction
US20090299742A1 (en) * 2008-05-29 2009-12-03 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for spectral contrast enhancement
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
US20130188798A1 (en) * 2012-01-24 2013-07-25 Fujitsu Limited Reverberation reduction device and reverberation reduction method
CN103440869A (en) * 2013-09-03 2013-12-11 大连理工大学 Audio-reverberation inhibiting device and inhibiting method thereof
CN105474312A (en) * 2013-09-17 2016-04-06 英特尔公司 Adaptive phase difference based noise reduction for automatic speech recognition (ASR)
CN105590630A (en) * 2016-02-18 2016-05-18 南京奇音石信息技术有限公司 Directional noise suppression method based on assigned bandwidth
CN108235181A (en) * 2016-12-13 2018-06-29 奥迪康有限公司 The method of noise reduction in apparatus for processing audio
CN106898359A (en) * 2017-03-24 2017-06-27 上海智臻智能网络科技股份有限公司 Acoustic signal processing method, system, audio interactive device and computer equipment
CN109686347A (en) * 2018-11-30 2019-04-26 北京达佳互联信息技术有限公司 Sound effect treatment method, sound-effect processing equipment, electronic equipment and readable medium
CN110087168A (en) * 2019-05-06 2019-08-02 浙江齐聚科技有限公司 Audio reverberation processing method, device, equipment and storage medium
CN110289009A (en) * 2019-07-09 2019-09-27 广州视源电子科技股份有限公司 Processing method, device and the interactive intelligence equipment of voice signal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙博;梅铁民;: "复倒谱去混响研究", 电子世界, no. 17 *
蒋建中;张东方;张连海;: "一种新的强噪声环境下的语音增强算法", 计算机工程与应用, no. 20 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496698A (en) * 2021-08-12 2021-10-12 云知声智能科技股份有限公司 Method, device and equipment for screening training data and storage medium
CN113496698B (en) * 2021-08-12 2024-01-23 云知声智能科技股份有限公司 Training data screening method, device, equipment and storage medium
WO2023105778A1 (en) * 2021-12-10 2023-06-15 日本電信電話株式会社 Speech signal processing method, speech signal processing device, and program

Similar Documents

Publication Publication Date Title
CN110853664B (en) Method and device for evaluating performance of speech enhancement algorithm and electronic equipment
CN111009256B (en) Audio signal processing method and device, terminal and storage medium
CN111128221B (en) Audio signal processing method and device, terminal and storage medium
CN111968662A (en) Audio signal processing method and device and storage medium
CN107833579B (en) Noise elimination method, device and computer readable storage medium
CN111009257B (en) Audio signal processing method, device, terminal and storage medium
CN111883164B (en) Model training method and device, electronic equipment and storage medium
CN111402917B (en) Audio signal processing method and device and storage medium
CN111179960B (en) Audio signal processing method and device and storage medium
CN110890083A (en) Audio data processing method and device, electronic equipment and storage medium
CN111986693A (en) Audio signal processing method and device, terminal equipment and storage medium
CN111862995A (en) Code rate determination model training method, code rate determination method and device
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
CN109256145B (en) Terminal-based audio processing method and device, terminal and readable storage medium
CN111292761B (en) Voice enhancement method and device
CN112447184A (en) Voice signal processing method and device, electronic equipment and storage medium
CN111933171A (en) Noise reduction method and device, electronic equipment and storage medium
CN116741191A (en) Audio signal processing method, device, electronic equipment and storage medium
CN111667842B (en) Audio signal processing method and device
CN113223553B (en) Method, apparatus and medium for separating voice signal
CN113345461A (en) Voice processing method and device for voice processing
CN112185413A (en) Voice processing method and device for voice processing
CN111613239A (en) Audio denoising method and device, server and storage medium
CN113362848B (en) Audio signal processing method, device and storage medium
CN112951262B (en) Audio recording method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination