WO2024088142A1 - 音频信号处理方法、装置、电子设备及可读存储介质 - Google Patents

音频信号处理方法、装置、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2024088142A1
WO2024088142A1 PCT/CN2023/125312 CN2023125312W WO2024088142A1 WO 2024088142 A1 WO2024088142 A1 WO 2024088142A1 CN 2023125312 W CN2023125312 W CN 2023125312W WO 2024088142 A1 WO2024088142 A1 WO 2024088142A1
Authority
WO
WIPO (PCT)
Prior art keywords
reverberation
audio signal
energy
time frame
target time
Prior art date
Application number
PCT/CN2023/125312
Other languages
English (en)
French (fr)
Inventor
陈新磊
刘良兵
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2024088142A1 publication Critical patent/WO2024088142A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing

Definitions

  • the present application belongs to the field of audio technology, and specifically relates to an audio signal processing method, device, electronic device and readable storage medium.
  • Speech dereverberation has become an important step in the audio signal processing process.
  • Electronic devices can suppress the reverberant audio signal by removing the late reverberation in the reverberant audio signal, thereby making the speech fuller.
  • electronic equipment can perform linear fitting on the entire time axis for each frequency band energy attenuation curve in the room impulse response (RIR) energy attenuation curve, and obtain the slope of each sub-band energy attenuation curve by the least squares method. Then, the energy attenuation process of the RIR can be modeled and described based on the obtained slope, so as to infer the late reverberation.
  • RIR room impulse response
  • the purpose of the embodiments of the present application is to provide an audio signal processing method, device, electronic device and readable storage medium, which can solve the problem of poor effect of suppressing reverberation audio signals.
  • an embodiment of the present application provides an audio signal processing method, the method comprising: dividing a frequency band energy attenuation curve into N segments of energy attenuation curves, and performing linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves, where N is an integer greater than or equal to 2; based on the N linear fitting curves, determining a reverberation suppression function corresponding to a target time frame in the N segments of the energy attenuation curves; based on the reverberation suppression function, suppressing a reverberation part in a reverberation audio signal of a target time frame in a first audio signal to obtain a second audio signal; wherein the first audio signal is: an audio signal in the audio signal that is in a frequency band corresponding to the frequency band energy attenuation curve; and the frequency band energy attenuation curve is: a frequency band energy attenuation curve in an RIR energy attenuation curve.
  • an embodiment of the present application provides an audio signal processing device, the device comprising a processing module, A determination module and a suppression module; a processing module, used to divide the frequency band energy attenuation curve into N segments of energy attenuation curves, and perform linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves, where N is an integer greater than or equal to 2; a determination module, used to determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curves based on the N linear fitting curves obtained by processing the processing module; a suppression module, used to suppress the reverberation part in the reverberation audio signal of the target time frame in the first audio signal based on the reverberation suppression function determined by the determination module to obtain a second audio signal; wherein the first audio signal is: an audio signal in the frequency band corresponding to the frequency band energy attenuation curve in the audio signal; and the frequency band energy attenuation curve is:
  • an embodiment of the present application provides an electronic device, which includes a processor and a memory, wherein the memory stores programs or instructions that can be run on the processor, and when the program or instructions are executed by the processor, the steps of the method described in the first aspect are implemented.
  • an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
  • an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.
  • an embodiment of the present application provides a computer program product, which is stored in a storage medium and is executed by at least one processor to implement the method described in the first aspect.
  • a frequency band energy attenuation curve may be divided into N segments of energy attenuation curves, and each segment of the energy attenuation curve may be linearly fitted to obtain N linear fitting curves, where N is an integer greater than or equal to 2; and based on the N linear fitting curves, a reverberation suppression function corresponding to a target time frame in the energy attenuation curve may be determined; and based on the reverberation suppression function, a reverberation part in a reverberation audio signal of a target time frame in a first audio signal may be suppressed to obtain a second audio signal; wherein the first audio signal is: an audio signal in an audio signal that is in a frequency band corresponding to the frequency band energy attenuation curve; and the frequency band energy attenuation curve is: a frequency band energy attenuation curve in an RIR energy attenuation curve.
  • the electronic device can divide a frequency band energy attenuation curve in the RIR energy attenuation curve into N segments of energy attenuation curves and perform linear fitting on each of them, and can determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curve based on the obtained N linear fitting curves, so as to suppress the reverberation part in the reverberation audio signal of the target time frame in the audio signal within the frequency band corresponding to the frequency band energy attenuation curve, the reverberation audio signal of each time frame can be accurately suppressed through piecewise linear fitting with a small fitting error and the reverberation suppression function corresponding to each time frame, thereby improving the effect of suppressing the reverberation audio signal.
  • FIG1 is a schematic diagram of a reverberation audio signal generation process
  • FIG2 is a schematic diagram of an RIR energy decay curve
  • FIG3 is a schematic diagram of linear fitting in traditional speech dereverberation
  • FIG4 is a flow chart of an audio signal processing method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of an audio signal processing method according to an embodiment of the present application.
  • FIG6 is a second schematic diagram of the audio signal processing method provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of an audio signal processing device provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 9 is a hardware schematic diagram of an electronic device provided in an embodiment of the present application.
  • first, second, etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first”, “second”, etc. are generally of one type, and the number of objects is not limited.
  • the first object can be one or more.
  • “and/or” in the specification and claims represents at least one of the connected objects, and the character “/" generally indicates that the objects associated with each other are in an "or” relationship.
  • RT60 Reverberation Time-60dB: The time required for the sound field to attenuate by 60dB.
  • Speech dereverberation is a technology widely used in audio equipment, commonly found in mobile phones, speakers, conference call devices and other devices.
  • a sound source continuously emits an audio signal.
  • the emitted audio signal will continue to reflect due to the existence of obstacles during the propagation process.
  • the energy of the audio signal will gradually attenuate in this process.
  • the audio signal with attenuated energy reaches the pickup device after a certain delay. It is collected by the pickup device together with the direct audio signal at the current moment, so that the direct audio signal at the current moment is interfered by the reflected audio signal to form a reverberation audio signal, and the energy of the reverberation audio signal will become stronger as the distance between the sound source and the pickup device increases.
  • FIG1 is a schematic diagram showing the process of generating a reverberation audio signal.
  • a microphone 11 and a loudspeaker 12 are placed in a box space 10, and the propagation medium is air.
  • the attenuation coefficient of sound propagation in air is ⁇
  • the reflection coefficient of the wall of the box space 10 is ⁇
  • the audio signal emitted by the loudspeaker 12 at time
  • the audio signal emitted at time is , the audio signal at time reaches the microphone 11 at time after reflection, and the propagation time of the direct audio signal is ignored.
  • the signal received by the microphone 11 at time is , and the signal received by the microphone 11 at time is , which is the reverberation audio signal.
  • the reverberant audio signal is generated by convolving the clean speech with the RIR, as shown in the following formula (1):
  • z(n) is the reverberation audio signal
  • h(n) is the RIR
  • s(n) is the clean speech.
  • the above formula (1) is converted to the time domain after Fourier transformation, as shown in the following formula (2):
  • ⁇ z (m, k) ⁇ ze (m, k) + ⁇ zl (m, k);
  • ⁇ z (m,k) represents the energy of the reverberation audio signal at the kth frequency point of the mth frame, that is, the spectral variance of the reverberation audio signal
  • ⁇ ze (m,k) represents the early reverberation energy (spectral variance) at the kth frequency point of the mth frame
  • ⁇ zl (m,k) represents the late reverberation energy (spectral variance) at the kth frequency point of the mth frame.
  • the part that usually affects the speech quality is the late reverberation audio signal.
  • the reflected energy within the delay range of 50ms-80ms after a pulse signal is sent belongs to the early reverberation energy, and the energy after this is all late reverberation energy.
  • RIR needs to be accurately described and modeled.
  • FIG2 shows a schematic diagram of an RIR energy attenuation curve.
  • the RIR energy attenuation curve is an RIR energy attenuation curve with an RT60 of approximately 900 ms, wherein the horizontal axis is the time frame, the vertical axis is the energy dB, the sampling rate is 16 kHz, the frame length of the short-time Fourier transform is 512, and the frame offset is 160.
  • the RIR energy attenuation curve includes multiple curves, each of which represents a trend of a sub-band energy change over time, and each sub-band takes the average value of 32 frequency points, wherein the DC component of the first sub-band is removed.
  • a linear fit is performed on the RIR energy decay curve on the entire time axis.
  • curve 31 is a sub-band energy decay curve from the 65th frequency point to the 96th frequency point
  • curve 32 is a curve obtained by linear fitting curve 31 on the entire time axis.
  • the slope of the curve can be obtained by the least square method, and thus T60 can be obtained by the following formula (4):
  • the frequency-related parameter ⁇ (k) is defined as the following formula (5):
  • R represents the frame offset.
  • the energy decay process of RIR can be modeled and described, and the late reverberation energy ⁇ zl (m,k) can be inferred.
  • the audio signal processing method can divide the frequency band energy attenuation curve into N segments of energy attenuation curves, and perform linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves, where N is an integer greater than or equal to 2; and based on the N linear fitting curves, determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curves; and based on the reverberation suppression function, suppress the reverberation part in the reverberation audio signal of the target time frame in the first audio signal to obtain the second audio signal; wherein the first audio signal is: an audio signal in the audio signal that is in the frequency band corresponding to the frequency band energy attenuation curve; and the frequency band energy attenuation curve is: a frequency band energy attenuation curve in the RIR energy attenuation curve.
  • the electronic device can divide a frequency band energy attenuation curve in the RIR energy attenuation curve into N segments of energy attenuation curves and perform linear fitting on each of them, and can determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curve based on the obtained N linear fitting curves, so as to suppress the reverberation part in the reverberation audio signal of the target time frame in the audio signal within the frequency band corresponding to the frequency band energy attenuation curve, it is possible to use piecewise linear fitting with a small fitting error, And the reverberation suppression function corresponding to each time frame can accurately suppress the reverberation audio signal of each time frame, thereby improving the effect of suppressing the reverberation audio signal.
  • FIG4 shows a flow chart of the audio signal processing method provided by the embodiment of the present application.
  • the audio signal processing method provided by the embodiment of the present application may include the following steps 401 to 403. The method is exemplarily described below by taking an electronic device executing the method as an example.
  • Step 401 The electronic device divides the frequency band energy attenuation curve into N segments of energy attenuation curves, and performs linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves.
  • N is an integer greater than or equal to 2.
  • the above-mentioned frequency band energy attenuation curve is: a frequency band energy attenuation curve in the RIR energy attenuation curve.
  • N when the above-mentioned N is equal to 3, that is, when the above-mentioned frequency band energy attenuation curve is divided into 3 segments of energy attenuation curves, the optimal piecewise linear fitting effect can be achieved; of course, in actual implementation, N can be any integer greater than or equal to 2, and the embodiment of the present application is not limited.
  • the electronic device divides the energy attenuation curve 50 (i.e., the above-mentioned frequency band energy attenuation curve) into three energy attenuation curves according to the time frame m 1 and the time frame m 2 on the time frame, and performs linear fitting on each energy attenuation curve to obtain a linear fitting curve 51, a linear fitting curve 52, and a linear fitting curve 53 (i.e., the above-mentioned N linear fitting curves).
  • the energy attenuation curve 50 i.e., the above-mentioned frequency band energy attenuation curve
  • the electronic device divides the energy attenuation curve 50 into three energy attenuation curves according to the time frame m 1 and the time frame m 2 on the time frame, and performs linear fitting on each energy attenuation curve to obtain a linear fitting curve 51, a linear fitting curve 52, and a linear fitting curve 53 (i.e., the above-mentioned N linear fitting curves).
  • Step 402 The electronic device determines a reverberation suppression function corresponding to a target time frame in N energy decay curves based on N linear fitting curves.
  • the reverberation suppression function is used to suppress the reverberation part in the reverberation audio signal.
  • the above-mentioned reverberation part is not a separate audio signal, but the reverberation energy in the reverberation audio signal, that is, the energy generated during the propagation of the audio signal in the box; if there is no RIR or a clean audio signal in the audio signal, the reverberation audio signal in the audio signal does not exist.
  • the target time frame may be any time frame.
  • the target time frame may be any time frame after the 5th frame.
  • step 402 can be specifically implemented by the following steps 402a to 402c.
  • Step 402a The electronic device calculates the reverberation weight corresponding to each linear fitting curve based on the slope of each linear fitting curve in the N linear fitting curves, so as to obtain N reverberation weights.
  • the electronic device may calculate the reverberation weight corresponding to each of the above linear fitting curves respectively by using the above formula (4) and formula (5).
  • the electronic device can calculate the reverberation weight ⁇ (k) corresponding to the linear fitting curve ⁇ , the reverberation weight ⁇ (k) corresponding to the linear fitting curve ⁇ , and the reverberation weight ⁇ (k) corresponding to the linear fitting curve ⁇ according to the slope of each linear fitting curve by using the above formula (4) and formula (5), as shown in the following formula (9):
  • Step 402b The electronic device calculates the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal based on the N reverberation weights.
  • the reverberation audio signal is: a reverberation audio signal of a target time frame in the first audio signal.
  • the first audio signal is: an audio signal in the audio signal that is in the frequency band corresponding to the above-mentioned frequency band energy attenuation curve.
  • the early reverberation energy and the late reverberation energy are determined by the direct audio signal of each time frame before the target time frame (ie, the clean audio signal in the first audio signal).
  • step 402b can be specifically implemented by the following steps 402b1 and 402b2.
  • Step 402b1 For each time frame before the target time frame, the electronic device calculates the residual energy of the energy of the direct audio signal of a time frame in the first audio signal in the target time frame according to a time frame and a reverberation weight corresponding to the time frame, and obtains the residual energy corresponding to each time frame.
  • the electronic device may calculate the remaining energy corresponding to each of the above time frames by using the above formula (6).
  • Step 402b2 The electronic device calculates the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal according to the residual energy corresponding to each time frame before the target time frame.
  • the frequency band energy attenuation curve is divided into three energy attenuation curves, and the target time frame is the mth frame.
  • the specific number of segments, i.e., the target time frame, is not limited in actual implementation.
  • the electronic device may derive expressions of the early reverberation energy ⁇ ze (m, k) and the late reverberation energy ⁇ zl (m, k) according to the above formula (8), as shown in the following formulas (10) and (11):
  • m 3 20. It can be understood that when the energy of the direct audio signal is attenuated to a certain extent, the overall impact can be ignored. The existence of m 3 also makes formula (11) a finite polynomial, which is more operational for engineering practice.
  • the electronic device can calculate the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal according to the residual energy corresponding to each time frame obtained, the accuracy of the electronic device in calculating the early reverberation energy and the late reverberation energy can be improved.
  • Step 402c The electronic device determines a reverberation suppression function corresponding to a target time frame in the N-segment energy decay curve based on the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal.
  • step 402c can be specifically implemented through the following steps 402c1 to 402c3.
  • Step 402c1 The electronic device calculates a priori signal-to-noise ratio corresponding to the target time frame according to the early reverberation energy of the reverberation audio signal, the late reverberation energy of the reverberation audio signal, and the energy of the ambient noise audio signal of the target time frame in the first audio signal.
  • the first audio signal may include a direct audio signal, a reverberation audio signal and an ambient noise audio signal, then the first audio signal may be expressed as the following formula (12):
  • Step 402c2 The electronic device calculates a posterior signal-to-noise ratio corresponding to the target time frame based on the late reverberation energy of the reverberation audio signal, the energy of the ambient noise audio signal of the target time frame in the first audio signal, and the amplitude spectrum of the first audio signal in the target time frame.
  • the electronic device may calculate the above posterior signal-to-noise ratio ⁇ (m,k) by the following formula (15):
  • Step 402c3 The electronic device determines the reverberation suppression function corresponding to the target time frame in the N-segment energy decay curve according to the priori signal-to-noise ratio corresponding to the target time frame and the a posteriori signal-to-noise ratio corresponding to the target time frame.
  • the electronic device obtains the a priori signal-to-noise ratio ⁇ v (m, k) and the a posteriori signal-to-noise ratio After ⁇ (m,k), the above reverberation suppression function can be determined, which can be expressed as the following formula (16):
  • the electronic device can determine the above-mentioned reverberation suppression function based on the calculated priori signal-to-noise ratio and a posteriori signal-to-noise ratio corresponding to the target time frame, the accuracy of the electronic device in determining the reverberation suppression function can be improved, so that the reverberation audio signal of the target time frame can be accurately suppressed by the reverberation suppression function.
  • the electronic device can calculate the reverberation weight corresponding to each of the N linear fitting curves based on the slope of each linear fitting curve, and calculate the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal based on the obtained N reverberation weights to determine the reverberation suppression function, the accuracy of the electronic device in determining the reverberation suppression function can be further improved.
  • Step 403 The electronic device suppresses the reverberation part of the reverberation audio signal of the target time frame in the first audio signal based on the reverberation suppression function corresponding to the target time frame in the N-segment energy decay curve to obtain a second audio signal.
  • the second audio signal is: a direct audio signal of the estimated target time frame after suppressing the above-mentioned reverberation part.
  • step 403 can be specifically implemented by the following steps 403a and 403b.
  • Step 403a The electronic device performs a dot product operation on the reverberation suppression function corresponding to the target time frame in the N-segment energy attenuation curve and the amplitude spectrum of the first audio signal in the target time frame to obtain a target amplitude spectrum.
  • the target amplitude spectrum is: the amplitude spectrum of the first audio signal after suppressing the reverberation audio signal.
  • the target amplitude spectrum can be calculated by the following formula (17):
  • Step 403b The electronic device performs inverse Fourier transform on the target amplitude spectrum and the phase of the first audio signal in the target time frame to obtain a second audio signal.
  • the inverse Fourier transform can restore the audio signal from the frequency domain back to the time domain.
  • the electronic device can perform an inverse Fourier transform on the target amplitude spectrum obtained by performing a dot multiplication operation on the above-mentioned reverberation suppression function and the amplitude spectrum of the first audio signal in the target time frame, and on the phase of the first audio signal in the target time frame to obtain the second audio signal, the reverberation audio signal can be accurately suppressed by the reverberation suppression function, thereby improving the robustness and flexibility of suppressing the reverberation audio signal.
  • the electronic device can suppress the reverberation audio signal of each time frame in the first audio signal through the above steps, and further suppress the reverberation audio signal in each frequency band in the above collected audio signal, so as to achieve reverberation suppression of the entire collected audio signal.
  • the electronic device can divide a frequency band energy attenuation curve in the RIR energy attenuation curve into N segments of energy attenuation curves and perform linear fitting on each of them, and can determine the reverberation suppression function corresponding to the target time frame in the N segments of energy attenuation curves based on the obtained N linear fitting curves, so as to suppress the reverberation audio signal of the target time frame in the audio signal in the frequency band corresponding to the frequency band energy attenuation curve.
  • the reverberation part in the image is suppressed, so the reverberation audio signal of each time frame can be accurately suppressed through piecewise linear fitting with small fitting error and the reverberation suppression function corresponding to each time frame, thereby improving the effect of suppressing the reverberation audio signal.
  • the sampling rate is 16kHz
  • the frame length of the short-time Fourier transform is 512
  • the frame offset is 160
  • the time represented by one frame is 10ms
  • the 5th frame is the boundary between the early reverberation audio signal and the late reverberation audio signal
  • the 1st frame to the 5th frame are the early reverberation part
  • the 5th frame and later are the late reverberation part, ignoring the background noise
  • Figure 6 is a schematic diagram showing the effect of suppressing the reverberation part in the reverberation audio signal using the audio signal processing method of an embodiment of the present application.
  • area 61 is the spectrogram of the clean speech (i.e., the direct audio signal)
  • area 62 is the reverberation speech obtained after the clean speech and RIR convolution (i.e., the reverberation audio signal)
  • area 62 is the dereverberated speech (i.e., the second audio signal); it can be seen that the dereverberated speech basically restores the harmonic structure of the clean speech, and the reverberation speech is effectively suppressed, thereby improving the speech quality and speech intelligibility.
  • the audio signal processing method provided in the embodiment of the present application can be executed by an audio signal processing device.
  • an audio signal processing device executing the audio signal processing method is taken as an example to illustrate the audio signal processing device provided in the embodiment of the present application.
  • an embodiment of the present application provides an audio signal processing device 70, which may include a processing module 71, a determination module 72, and a suppression module 73.
  • the processing module 71 may be used to divide a frequency band energy attenuation curve into N segments of energy attenuation curves, and perform linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves, where N is an integer greater than or equal to 2.
  • the determination module 72 may be used to determine a reverberation suppression function corresponding to a target time frame in the N segments of the energy attenuation curves based on the N linear fitting curves processed by the processing module 71.
  • the suppression module 73 may be used to suppress the reverberation part of the reverberation audio signal of the target time frame in the first audio signal based on the reverberation suppression function determined by the determination module 72 to obtain a second audio signal.
  • the first audio signal is: an audio signal in the frequency band corresponding to the frequency band energy attenuation curve in the audio signal; the frequency band energy attenuation curve is: a frequency band energy attenuation curve in the RIR energy attenuation curve.
  • the determination module 72 may be specifically configured to calculate the reverberation weight corresponding to each linear fitting curve based on the slope of each of the N linear fitting curves to obtain N reverberation weights; and based on the N reverberation weights, calculate the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal; and based on the early reverberation energy and the late reverberation energy, determine the reverberation suppression function.
  • the determination module 72 may be specifically configured to calculate, for each time frame before the target time frame, a reverberation weight corresponding to a time frame in the first audio signal.
  • the residual energy of the energy of the direct audio signal of the time frame in the target time frame is obtained to obtain the residual energy corresponding to each time frame; and the early reverberation energy and the late reverberation energy are calculated according to the residual energy corresponding to each time frame.
  • the determination module 72 may be specifically configured to calculate a priori signal-to-noise ratio corresponding to a target time frame according to the early reverberation energy, the late reverberation energy, and the energy of the ambient noise audio signal of the target time frame in the first audio signal; and calculate a posteriori signal-to-noise ratio corresponding to the target time frame according to the late reverberation energy, the energy of the ambient noise audio signal, and the amplitude spectrum of the first audio signal in the target time frame; and determine the reverberation suppression function according to the priori signal-to-noise ratio and the posteriori signal-to-noise ratio.
  • the suppression module 73 can be specifically used to perform a dot multiplication operation on the above-mentioned reverberation suppression function and the amplitude spectrum of the first audio signal in the target time frame to obtain a target amplitude spectrum; and perform an inverse Fourier transform on the target amplitude spectrum and the phase of the first audio signal in the target time frame to obtain a second audio signal.
  • the audio signal processing device can divide a frequency band energy attenuation curve in the RIR energy attenuation curve into N segments of energy attenuation curves and perform linear fitting on each of them, and can determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curve based on the obtained N linear fitting curves, so as to suppress the reverberation part in the reverberation audio signal of the target time frame in the audio signal within the frequency band corresponding to the frequency band energy attenuation curve, the reverberation audio signal of each time frame can be accurately suppressed by piecewise linear fitting with a small fitting error and the reverberation suppression function corresponding to each time frame, thereby improving the effect of suppressing the reverberation audio signal.
  • the audio signal processing device in the embodiment of the present application can be an electronic device or a component in the electronic device, such as an integrated circuit or a chip.
  • the electronic device can be a terminal or other devices other than a terminal.
  • the electronic device can be a mobile phone, a tablet computer, a laptop computer, a PDA, a vehicle-mounted electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook or a personal digital assistant (personal digital assistant, PDA), etc.
  • NAS Network Attached Storage
  • PC personal computer
  • TV television
  • teller machine a self-service machine
  • the audio signal processing device in the embodiment of the present application may be a device having an operating system.
  • the operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.
  • the audio signal processing device provided in the embodiment of the present application can implement each process implemented in the method embodiments of Figures 4 to 6, and will not be described again here to avoid repetition.
  • an embodiment of the present application also provides an electronic device 800, including a processor 801 and a memory 802, and the memory 802 stores a program or instruction that can be run on the processor 801.
  • the program or instruction is executed by the processor 801, the various steps of the audio signal processing method embodiment described above are implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
  • the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.
  • FIG. 9 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 1000 includes but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010 and other components.
  • the electronic device 1000 can also include a power supply (such as a battery) for supplying power to each component, and the power supply can be logically connected to the processor 1010 through a power management system, so that the power management system can manage charging, discharging, and power consumption.
  • a power supply such as a battery
  • the electronic device structure shown in FIG9 does not constitute a limitation on the electronic device, and the electronic device can include more or fewer components than shown, or combine certain components, or arrange components differently, which will not be described in detail here.
  • the processor 1010 can be used to divide the frequency band energy attenuation curve into N segments of energy attenuation curves, and perform linear fitting on each segment of the energy attenuation curve to obtain N linear fitting curves, where N is an integer greater than or equal to 2; and based on the processed N linear fitting curves, determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curves; and based on the determined reverberation suppression function, suppress the reverberation part in the reverberation audio signal of the target time frame in the first audio signal to obtain the second audio signal.
  • the first audio signal is: an audio signal in the audio signal that is in the frequency band corresponding to the frequency band energy attenuation curve; and the frequency band energy attenuation curve is: a frequency band energy attenuation curve in the RIR energy attenuation curve.
  • the processor 1010 may be specifically configured to calculate, based on the slope of each of the N linear fitting curves, the reverberation weight corresponding to each linear fitting curve, to obtain N reverberation weights; and based on the N reverberation weights, calculate the early reverberation energy of the reverberation audio signal and the late reverberation energy of the reverberation audio signal; and based on the early reverberation energy and the late reverberation energy, determine the reverberation suppression function.
  • the processor 1010 may be specifically configured to calculate, for each time frame before the target time frame, the residual energy of the energy of the direct audio signal of the time frame in the first audio signal in the target time frame according to the time frame and the reverberation weight corresponding to the time frame, to obtain the residual energy corresponding to each time frame; and calculate the above-mentioned early reverberation energy and late reverberation energy according to the residual energy corresponding to each time frame.
  • the processor 1010 may be specifically configured to calculate a priori signal-to-noise ratio corresponding to a target time frame according to the early reverberation energy, the late reverberation energy, and the energy of the ambient noise audio signal of the target time frame in the first audio signal; and calculate a posteriori signal-to-noise ratio corresponding to the target time frame according to the late reverberation energy, the energy of the ambient noise audio signal, and the amplitude spectrum of the first audio signal in the target time frame; and determine the reverberation suppression function according to the priori signal-to-noise ratio and the posteriori signal-to-noise ratio.
  • the processor 1010 may be specifically configured to perform a dot multiplication operation on the reverberation suppression function and the amplitude spectrum of the first audio signal in a target time frame to obtain a target amplitude spectrum; and perform an inverse Fourier transform on the target amplitude spectrum and the phase of the first audio signal in the target time frame to obtain a second audio signal.
  • the electronic device since the electronic device can divide a frequency band energy attenuation curve in the RIR energy attenuation curve into N segments of energy attenuation curves and perform linear fitting on each of them, and can determine the reverberation suppression function corresponding to the target time frame in the N segments of the energy attenuation curve based on the obtained N linear fitting curves, so as to suppress the reverberation part in the reverberation audio signal of the target time frame in the audio signal within the frequency band corresponding to the frequency band energy attenuation curve, the reverberation audio signal of each time frame can be accurately suppressed by piecewise linear fitting with a smaller fitting error and the reverberation suppression function corresponding to each time frame, thereby improving the effect of suppressing the reverberation audio signal.
  • the input unit 1004 may include a graphics processing unit (GPU) 10041 and a microphone 10042, and the graphics processor 10041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode.
  • the display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc.
  • the user input unit 1007 includes a touch panel 10071 and at least one of other input devices 10072.
  • the touch panel 10071 is also called a touch screen.
  • the touch panel 10071 may include two parts: a touch detection device and a touch controller.
  • Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (such as a volume control key, a switch key, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
  • the memory 1009 can be used to store software programs and various data.
  • the memory 1009 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the memory 1009 may include a volatile memory or a non-volatile memory, or the memory 1009 may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory may be a random access memory (RAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDRSDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchronous link dynamic random access memory (SLDRAM) and a direct memory bus random access memory (DRRAM).
  • the memory 1009 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
  • the processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 1010.
  • An embodiment of the present application further provides a readable storage medium, on which a program or instruction is stored.
  • a program or instruction is stored.
  • the program or instruction is executed by a processor, each process of the above-mentioned audio signal processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
  • the processor is the processor in the electronic device described in the above embodiment.
  • the readable storage medium includes a computer readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk.
  • An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned audio signal processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
  • An embodiment of the present application provides a computer program product, which is stored in a storage medium.
  • the program product is executed by at least one processor to implement the various processes of the above-mentioned audio signal processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the technical solution of the present application can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal (which can be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a storage medium such as ROM/RAM, a disk, or an optical disk
  • a terminal which can be a mobile phone, a computer, a server, or a network device, etc.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请公开了一种音频信号处理方法、装置、电子设备及可读存储介质,属于音频技术领域。该方法包括:将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;基于N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数;基于混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;其中,第一音频信号为:音频信号中处于频带能量衰减曲线对应的频带内的音频信号;频带能量衰减曲线为:房间冲击响应RIR能量衰减曲线中的一个频带能量衰减曲线。

Description

音频信号处理方法、装置、电子设备及可读存储介质
相关申请的交叉引用
本申请要求在2022年10月25日提交中国专利局、申请号为202211314023.0、名称为“音频信号处理方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于音频技术领域,具体涉及一种音频信号处理方法、装置、电子设备及可读存储介质。
背景技术
语音去混响已经成为音频信号处理过程中的一个重要步骤,电子设备可以通过去除混响音频信号中的晚期混响,实现对该混响音频信号的抑制,从而使语音更加饱满。
目前,为了得到混响音频信号中的晚期混响,电子设备可以对房间冲击响应(Room Inpluse Reponse,RIR)能量衰减曲线中的每个频带能量衰减曲线,在整个时间轴上进行线性拟合,并通过最小二乘法得到每条子带能量衰减曲线的斜率,然后可以基于得到的斜率对RIR的能量衰减过程进行建模描述,从而可以推算出该晚期混响。
然而,按照上述方法,在直达音频的残余能量较高的前几帧中,线性拟合值的拟合误差通常较大,会使通过上述线性拟合推算的晚期混响的准确性较差,从而导致抑制混响音频信号的效果较差。
发明内容
本申请实施例的目的是提供一种音频信号处理方法、装置、电子设备及可读存储介质,能够解决抑制混响音频信号的效果较差的问题。
第一方面,本申请实施例提供了一种音频信号处理方法,该方法包括:将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;基于N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数;基于混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;其中,第一音频信号为:音频信号中处于频带能量衰减曲线对应的频带内的音频信号;频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
第二方面,本申请实施例提供了一种音频信号处理装置,该装置包括处理模块、 确定模块和抑制模块;处理模块,用于将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;确定模块,用于基于处理模块处理得到的N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数;抑制模块,用于基于确定模块确定的混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;其中,第一音频信号为:音频信号中处于频带能量衰减曲线对应的频带内的音频信号;频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
第六方面,本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如第一方面所述的方法。
在本申请实施例中,可以将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;且基于该N个线性拟合曲线,确定能量衰减曲线中目标时间帧对应的混响抑制函数;并基于该混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;其中,第一音频信号为:音频信号中处于该频带能量衰减曲线对应的频带内的音频信号;该频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。通过该方案,由于电子设备可以将RIR能量衰减曲线中的一个频带能量衰减曲线,划分为N段能量衰减曲线且分别进行线性拟合,并可以基于得到的N个线性拟合曲线,确定该N段能量衰减曲线中目标时间帧对应的混响抑制函数,以对该频带能量衰减曲线对应的频带内的音频信号中目标时间帧的混响音频信号中的混响部分进行抑制,因此可以通过拟合误差较小的分段线性拟合,以及每个时间帧对应的混响抑制函数,对各个时间帧的混响音频信号进行准确抑制,从而可以提高抑制混响音频信号的效果。
附图说明
图1是混响音频信号生成过程的示意图;
图2是RIR能量衰减曲线的示意图;
图3是传统语音去混响中的线性拟合的示意图;
图4是本申请实施例提供的音频信号处理方法的流程图;
图5是本申请实施例提供的音频信号处理方法的示意图之一;
图6是本申请实施例提供的音频信号处理方法的示意图之二;
图7是本申请实施例提供的音频信号处理装置的示意图;
图8是本申请实施例提供的电子设备的示意图;
图9是本申请实施例提供的电子设备的硬件示意图。
具体实施例
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面首先对本申请的说明书和权利要求书中涉及的一些名词或者术语进行解释说明。
RT60(即Reverberation Time-60dB):声场衰减60dB所需要的时间。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的音频信号处理方法、装置、电子设备及可读存储介质进行详细地说明。
语音去混响是一种广泛应用于音频设备中的技术,常见于手机、音箱以及会议通话装置等设备中。
在箱体空间里,一个声源持续发出音频信号,发出的音频信号在传播过程中由于障碍的存在会持续地进行反射,同时音频信号的能量会在这一过程中逐渐衰减,衰减能量后的音频信号经过一定的延迟到达拾音设备,其与当前时刻的直达音频信号一起被拾音设备采集到,使得当前时刻的直达音频信号受到反射音频信号的干扰,形成混响音频信号,且混响音频信号的能量会随着声源与拾音设备之间距离的增加而变强。
图1示出了混响音频信号生成过程的示意图,如图1所示,在箱体空间10中放置了一个麦克风11和一个扬声器12,传播介质为空气,假设声音在空气中传播的衰减系数为α,箱体空间10墙壁的反射系数为β,扬声器12在时刻发出的音频信号 为,在时刻发出的音频信号为,时刻的音频信号经反射传播在时刻到达麦克风11,且忽略直达音频信号的传播时间,那么时刻麦克风11接收到的信号为,时刻麦克风11接收到的信号为,其中即混响音频信号。
由于混响音频信号的存在会极大地降低语音质量,影响用户的主观听感,且在一些智能设备中,还会影响语音识别的准确率,因此语音去混响成为音频信号处理领域中一个重要的步骤。
通常,混响音频信号的生成是将干净语音和RIR进行卷积,如下述的公式(1)所示:
其中,z(n)为混响音频信号,h(n)为RIR,s(n)为干净语音;对上述公式(1)进行傅里叶变换后转换到时域,如下述的公式(2)所示:
其中,m表示时间帧,k表示频点。而混响音频信号通常分为早期混响音频信号和晚期混响音频信号,将上述公式(2)平方后,可表示为下述的公式(3):
λz(m,k)=λze(m,k)+λzl(m,k);       (3)
其中,λz(m,k)表示第m帧第k个频点的混响音频信号的能量,即混响音频信号的谱方差,λze(m,k)表示第m帧第k个频点的早期混响能量(谱方差),λzl(m,k)表示第m帧第k个频点的晚期混响能量(谱方差)。而通常影响语音质量的部分为晚期混响音频信号,在去混响过程中只去除晚期混响音频信号而保留早期混响音频信号,可以让语音更加饱满,听感更好。一般来说,一个脉冲信号发出后50ms-80ms的延迟范围内的反射能量属于早期混响能量,在这之后的能量均为晚期混响能量。若要较好地去除晚期混响音频信号而保留早期混响音频信号,需对RIR进行精确描述和建模。
图2示出了RIR能量衰减曲线的示意图,如图2所示,该RIR能量衰减曲线为一个RT60约为900ms的RIR能量衰减曲线,其中横轴为时间帧,纵轴为能量dB,采样率为16kHz,短时傅里叶变换的帧长512,帧偏移160,该RIR能量衰减曲线包括多条曲线,每条曲线表示一个子带能量随时间变化的趋势,每个子带取32个频点的平均值,其中第一个子带去除了直流分量。
在传统的语音去混响中,是对整个时间轴上的RIR能量衰减曲线进行线性拟合,例如,如图3所示,曲线31为第65号频点到第96号频点的子带能量衰减曲线,曲线32为对整个时间轴上的曲线31进行线性拟合后得到的曲线。在得到线性拟合后的曲线之后,可以通过最小二乘法得到该曲线的斜率,从而可以通过下述的公式(4)得到T60:
而频率的相关参数α(k)定义为下述的公式(5):
其中,fs为采样率,由此可以通过下述的公式(6),得到第m帧的直达音频信号的能量λs(m,k)在经过i帧衰减之后的能量E(i,k):
E(i,k)=e-2α(k)□R□iλs(m,k);    (6)
其中,R表示帧偏移。如此,可以对RIR的能量衰减过程进行建模描述,并推算出晚期混响能量λzl(m,k)。
然而,按照上述方法,电子设备对RIR能量衰减曲线的线性拟合是基于整个时间轴的,但这种全局性的线性拟合并不能够实现全局最优,具体表现如下:
1、在直达音频的残余能量较高的前几帧中,线性拟合值的拟合误差较大;
2、根据上述公式(5)和公式(6),可以得到下述的公式(7):
若记则上述公式(7)可以表示为下述的公式(8):
E(i,k)=εi□λs(m,k);    (8)
可以看出,0<ε<1,所以εi会随着i的增大而减小,这表示第m帧的直达音频信号在之后的时间帧内的衰减后残余的能量是不同的,且距离第m帧时间越近,残余能量越大,对于混响分量的影响越高。显然在前几帧中,直达音频信号的残余能量较高,而线性拟合值在这些时间帧中的拟合误差偏偏较大,这对于混响分量的估计影响巨大,从而导致抑制混响音频信号的效果较差。
为了解决上述问题,在本申请实施例提供的音频信号处理方法,可以将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;且基于该N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数;并基于该混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;其中,第一音频信号为:音频信号中处于该频带能量衰减曲线对应的频带内的音频信号;该频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。通过该方案,由于电子设备可以将RIR能量衰减曲线中的一个频带能量衰减曲线,划分为N段能量衰减曲线且分别进行线性拟合,并可以基于得到的N个线性拟合曲线,确定该N段能量衰减曲线中目标时间帧对应的混响抑制函数,以对该频带能量衰减曲线对应的频带内的音频信号中目标时间帧的混响音频信号中的混响部分进行抑制,因此可以通过拟合误差较小的分段线性拟合, 以及每个时间帧对应的混响抑制函数,对各个时间帧的混响音频信号进行准确抑制,从而可以提高抑制混响音频信号的效果。
本申请实施例提供一种音频信号处理方法,图4示出了本申请实施例提供的音频信号处理方法的流程图。如图4所示,本申请实施例提供的音频信号处理方法可以包括下述的步骤401至步骤403。下面以电子设备执行该方法为例对该方法进行示例性地说明。
步骤401、电子设备将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线。
其中,N为大于或等于2的整数。
本申请实施例中,上述频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
可选地,本申请实施例中,当上述N等于3时,即将上述频带能量衰减曲线划分为3段能量衰减曲线时,可以达到最优的分段线性拟合效果;当然,实际实现中,N可以为大于或等于2的任意整数,本申请实施例不作限定。
对电子设备将上述频带能量衰减曲线划分为N段能量衰减曲线,并进行分段线性拟合的描述,可以参照相关技术中关于分段线性回归的具体描述,为了避免重复,此处不予赘述。
下面结合附图对本申请实施例提供的音频信号处理方法进行示例性地说明。
示例性地,如图5所示,电子设备根据时间帧上的时间帧m1和时间帧m2,将能量衰减曲线50(即上述频带能量衰减曲线)划分为3段能量衰减曲线,并对每段能量衰减曲线进行线性拟合之后,得到线性拟合曲线51、线性拟合曲线52和线性拟合曲线53(即上述N个线性拟合曲线)。
步骤402、电子设备基于N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数。
本申请实施例中,混响抑制函数用于抑制混响音频信号中的混响部分。
需要说明的是,上述混响部分并非单独的音频信号,而是混响音频信号中的混响能量,即音频信号在箱体中传播的过程中所产生的能量;若没有RIR或音频信号中的干净音频信号,则该音频信号中的混响音频信号也不存在。
可选地,本申请实施例中,目标时间帧可以为任一时间帧。
可选地,本申请实施例中,目标时间帧可以为第5帧之后的任一时间帧。
可选地,本申请实施例中,上述步骤402具体可以通过下述的步骤402a至步骤402c实现。
步骤402a、电子设备基于N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算每个线性拟合曲线对应的混响权重,以得到N个混响权重。
可选地,本申请实施例中,电子设备可以通过上述公式(4)和公式(5),分别计算上述每个线性拟合曲线对应的混响权重。
示例性地,假设上述N个线性拟合曲线为线性拟合曲线α,线性拟合曲线β,线性拟合曲线γ,那么电子设备可以根据每个线性拟合曲线的斜率,通过上述公式(4)和公式(5),分别计算线性拟合曲线α对应的混响权重α(k),线性拟合曲线β对应的混响权重β(k),线性拟合曲线γ对应的混响权重γ(k),如下述的公式(9)所示:
步骤402b、电子设备基于N个混响权重,计算混响音频信号的早期混响能量和混响音频信号的晚期混响能量。
本申请实施例中,上述混响音频信号为:第一音频信号中目标时间帧的混响音频信号。
本申请实施例中,第一音频信号为:音频信号中处于上述频带能量衰减曲线对应的频带内的音频信号。
可以理解,上述早期混响能量和晚期混响能量,由目标时间帧之前每个时间帧的直达音频信号(即第一音频信号中的干净音频信号)决定。
可选地,本申请实施例中,上述步骤402b具体可以通过下述的步骤402b1和步骤402b2实现。
步骤402b1、电子设备对于目标时间帧之前的每个时间帧,根据一个时间帧以及一个时间帧对应的混响权重,计算第一音频信号中一个时间帧的直达音频信号的能量在目标时间帧的剩余能量,得到每个时间帧对应的剩余能量。
可选地,本申请实施例中,电子设备可以通过上述公式(6),计算上述每个时间帧对应的剩余能量。
步骤402b2、电子设备根据目标时间帧之前的每个时间帧对应的剩余能量,计算混响音频信号的早期混响能量和混响音频信号的晚期混响能量。
需要说明的是,本申请实施例中,均是以将上述频带能量衰减曲线划分为3段能量衰减曲线,且目标时间帧为第m帧为例进行示例的,实际实现中并不限定具体的分段数量即目标时间帧。
可选地,本申请实施例中,电子设备可以根据上述公式(8),推导出上述早期混响能量λze(m,k)和晚期混响能量λzl(m,k)的表达式,如下述的公式(10)和公式(11)所示:
阈值。例如,假设第一帧的直达音频信号的能量为-10dB,经过20帧的衰减后能量变成 了-58dB,在第21帧的能量变为-65dB,若预设阈值为-60dB,那么m3=20。可以理解,当直达音频信号的能量衰减到一定程度时,对于整体的影响可以忽略不计,m3的存在也使得公式(11)是一个有限多项式,对于工程实践来说更具有可操作性。
本申请实施例中,由于电子设备可以根据得到的每个时间帧对应的剩余能量,计算混响音频信号的早期混响能量和混响音频信号的晚期混响能量,因此可以提高电子设备计算早期混响能量和晚期混响能量的准确性。
步骤402c、电子设备基于混响音频信号的早期混响能量和混响音频信号的晚期混响能量,确定N段能量衰减曲线中的目标时间帧对应的混响抑制函数。
可选地,本申请实施例中,上述步骤402c具体可以通过下述的步骤402c1至步骤402c3实现。
步骤402c1、电子设备根据混响音频信号的早期混响能量、混响音频信号的晚期混响能量,以及第一音频信号中目标时间帧的环境噪声音频信号的能量,计算目标时间帧对应的先验信噪比。
可选地,本申请实施例中,第一音频信号可以包括直达音频信号、混响音频信号和环境噪声音频信号,那么第一音频信号可以表示为下述的公式(12):
述公式(2)和公式(3)可以得到下述的公式(13):
|Y(m,k)|2=λze(m,k)+λzl(m,k)+λv(m,k);    (13)
其中,|Y(m,k)|2表示第一音频信号的幅度谱的平方,λv(m,k)表示上述环境噪声音频信号的能量;从而可以计算得到该环境噪声音频信号的能量,并通过下述的公式(14)计算上述先验信噪比ε(m,k):
步骤402c2、电子设备根据混响音频信号的晚期混响能量、第一音频信号中目标时间帧的环境噪声音频信号的能量,以及第一音频信号在目标时间帧的幅度谱,计算目标时间帧对应的后验信噪比。
可选地,本申请实施例中,电子设备在得到上述λv(m,k)之后,可以通过下述的公式(15)计算上述后验信噪比ζ(m,k):
步骤402c3、电子设备根据目标时间帧对应的先验信噪比和目标时间帧对应的后验信噪比,确定N段能量衰减曲线中的目标时间帧对应的混响抑制函数。
可选地,本申请实施例中,电子设备在得到上述先验信噪比λv(m,k)和后验信噪比 ζ(m,k)之后,可以确定上述混响抑制函数,该混响抑制函数可以表示为下述的公式(16):
本申请实施例中,由于电子设备可以基于计算得到的目标时间帧对应的先验信噪比和后验信噪比,确定上述混响抑制函数,因此可以提高电子设备确定混响抑制函数的准确性,从而可以通过该混响抑制函数准确抑制目标时间帧的混响音频信号。
本申请实施例中,由于电子设备可以基于上述N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算各个所述线性拟合曲线对应的混响权重,并基于得到的N个混响权重,计算混响音频信号的早期混响能量和所述混响音频信号的晚期混响能量,以确定上述混响抑制函数,因此可以进一步提高电子设备确定混响抑制函数的准确性。
步骤403、电子设备基于N段能量衰减曲线中目标时间帧对应的混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号。
本申请实施例中,第二音频信号为:抑制上述混响部分后,估计的目标时间帧的直达音频信号。
可选地,本申请实施例中,上述步骤403具体可以通过下述的步骤403a和步骤403b实现。
步骤403a、电子设备将N段能量衰减曲线中的目标时间帧对应的混响抑制函数,与第一音频信号在目标时间帧的幅度谱进行点乘运算,得到目标幅度谱。
可选地,本申请实施例中,目标幅度谱为:抑制混响音频信号后的第一音频信号的幅度谱,目标幅度谱可以通过下述的公式(17)计算得到:
步骤403b、电子设备将目标幅度谱,与第一音频信号在目标时间帧的相位进行逆傅里叶变换,得到第二音频信号。
可选地,本申请实施例中,逆傅里叶变换可以将音频信号从频域还原回时域。
本申请实施例中,由于电子设备可以将上述混响抑制函数,与第一音频信号在目标时间帧的幅度谱进行点乘运算得到的目标幅度谱,与第一音频信号在目标时间帧的相位进行逆傅里叶变换,得到第二音频信号,因此可以通过该混响抑制函数准确抑制混响音频信号,从而可以提高抑制混响音频信号的鲁棒性和灵活性。
需要说明的是,电子设备可以通过上述步骤,分别抑制第一音频信号中每个时间帧的混响音频信号,进而可以抑制上述采集的音频信号中每个频带内的混响音频信号,如此可以实现对采集的整个音频信号的混响抑制。
在本申请实施例提供的音频信号处理方法中,由于电子设备可以将RIR能量衰减曲线中的一个频带能量衰减曲线,划分为N段能量衰减曲线且分别进行线性拟合,并可以基于得到的N个线性拟合曲线,确定该N段能量衰减曲线中目标时间帧对应的混响抑制函数,以对该频带能量衰减曲线对应的频带内的音频信号中目标时间帧的混响音频信号 中的混响部分进行抑制,因此可以通过拟合误差较小的分段线性拟合,以及每个时间帧对应的混响抑制函数,对各个时间帧的混响音频信号进行准确抑制,从而可以提高抑制混响音频信号的效果。
下面结合附图,对本申请实施例提供的音频信号处理方法进行示例性地说明。
示例性地,假设采样率为16kHz,短时傅里叶变换的帧长为512,帧偏移为160为例,那么一帧表示的时间为10ms,取m1=2,m2=5,若第5帧为早期混响音频信号和晚期混响音频信号的分界,则第1帧到第5帧为早期混响部分,第5帧以后为晚期混响部分,不考虑背景噪声,有以下推导:
第1帧:
λz(1,k)=α(k)λs(1,k)
λze(1,k)=λz(1,k)
λzl(1,k)=0;
第2帧:
λz(2,k)=α(k)λs(2,k)+α2(k)λs(1,k)
λze(2,k)=λz(2,k)
λzl(2,k)=0;
第3帧:
λz(3,k)=α(k)λs(3,k)+α2(k)λs(2,k)+α2(k)β(k)λs(1,k)
λze(3,k)=λz(3,k)
λzl(3,k)=0;
第4帧:
λz(4,k)=α(k)λs(4,k)+α2(k)λs(3,k)+α2(k)β(k)λs(2,k)+α2(k)β2(k)λs(1,k)
λze(4,k)=λz(4,k)
λzl(4,k)=0;
第5帧:
λz(5,k)=α(k)λs(5,k)+α2(k)λs(4,k)+α2(k)β(k)λs(3,k)
2(k)β2(k)λs(2,k)+α2(k)β3(k)λs(1,k)
λze(5,k)=λz(5,k)
λzl(5,k)=0;
第6帧:
λz(6,k)=α(k)λs(6,k)+α2(k)λs(5,k)+α2(k)β(k)λs(4,k)+α2(k)β2(k)λs(3,k)
2(k)β3(k)λs(2,k)+α2(k)β3(k)γ(k)λs(1,k)
λze(6,k)=α(k)λs(6,k)+α2(k)λs(5,k)+α2(k)β(k)λs(4,k)+α2(k)β2(k)λs(3,k)
2(k)β3(k)λs(2,k)
λzl(6,k)=α2(k)β3(k)γ(k)λs(1,k);
第7帧:
λz(7,k)=α(k)λs(7,k)+α2(k)λs(6,k)+α2(k)β(k)λs(5,k)+α2(k)β2(k)λs(4,k)
2(k)β3(k)λs(3,k)+α2(k)β3(k)γ(k)λs(2,k)+α2(k)β3(k)γ2(k)λs(1,k)
λze(7,k)=α(k)λs(7,k)+α2(k)λs(6,k)+α2(k)β(k)λs(5,k)+α2(k)β2(k)λs(4,k)
2(k)β3(k)λs(3,k)
λzl(7,k)=α2(k)β3(k)γ(k)λs(2,k)+α2(k)β3(k)γ2(k)λs(1,k);
……
依此类推,由以上的推导可知,在m1=2,m2=5的情况下,从第6帧开始,λze的项数恒定为5项,λzl的项数随帧数增加而增加,但当每一帧的新加项的能量小于设置的阈值(通常为-60dB)时就不考虑在内,即λzl的项数此时也恒定,这里对应上述第m3帧,且对应上述公式(11)。
图6示出了采用本申请实施例的音频信号处理方法抑制混响音频信号中的混响部分的效果的示意图,如图6所示,区域61中为干净语音(即直达音频信号)的语谱图,区域62中为干净语音和RIR卷积后得到的混响语音(即混响音频信号),区域62中为去混响后的语音(即第二音频信号);可以看出,去混响后的语音基本还原出了干净语音的谐波结构,混响语音得到了有效抑制,从而可以提升语音质量及语音的可懂度。
本申请实施例提供的音频信号处理方法,执行主体可以为音频信号处理装置。本申请实施例中以音频信号处理装置执行音频信号处理方法为例,说明本申请实施例提供的音频信号处理装置。
结合图7,本申请实施例提供一种音频信号处理装置70,该音频信号处理装置70可以包括处理模块71、确定模块72和抑制模块73。处理模块71,可以用于将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数。确定模块72,可以用于基于处理模块71处理得到的该N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数。抑制模块73,可以用于基于确定模块72确定的该混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号。其中,第一音频信号为:音频信号中处于该频带能量衰减曲线对应的频带内的音频信号;该频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
一种可能的实现方式中,确定模块72,具体可以用于基于上述N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算每个线性拟合曲线对应的混响权重,以得到N个混响权重;且基于该N个混响权重,计算上述混响音频信号的早期混响能量和该混响音频信号的晚期混响能量;并基于该早期混响能量和晚期混响能量,确定上述混响抑制函数。
一种可能的实现方式中,确定模块72,具体可以用于对于目标时间帧之前的每个时间帧,根据一个时间帧以及该一个时间帧对应的混响权重,计算第一音频信号中该一个 时间帧的直达音频信号的能量在目标时间帧的剩余能量,得到该每个时间帧对应的剩余能量;并根据该每个时间帧对应的剩余能量,计算上述早期混响能量和晚期混响能量。
一种可能的实现方式中,确定模块72,具体可以用于根据上述早期混响能量、晚期混响能量,以及第一音频信号中目标时间帧的环境噪声音频信号的能量,计算目标时间帧对应的先验信噪比;且根据该晚期混响能量、该环境噪声音频信号的能量,以及第一音频信号在目标时间帧的幅度谱,计算目标时间帧对应的后验信噪比;并根据该先验信噪比和后验信噪比,确定上述混响抑制函数。
一种可能的实现方式中,抑制模块73,具体可以用于将上述混响抑制函数,与第一音频信号在目标时间帧的幅度谱进行点乘运算,得到目标幅度谱;并将目标幅度谱,与第一音频信号在目标时间帧的相位进行逆傅里叶变换,得到第二音频信号。
在本申请实施例提供的音频信号处理装置中,由于该音频信号处理装置可以将RIR能量衰减曲线中的一个频带能量衰减曲线,划分为N段能量衰减曲线且分别进行线性拟合,并可以基于得到的N个线性拟合曲线,确定该N段能量衰减曲线中目标时间帧对应的混响抑制函数,以对该频带能量衰减曲线对应的频带内的音频信号中目标时间帧的混响音频信号中的混响部分进行抑制,因此可以通过拟合误差较小的分段线性拟合,以及每个时间帧对应的混响抑制函数,对各个时间帧的混响音频信号进行准确抑制,从而可以提高抑制混响音频信号的效果。
本申请实施例中的音频信号处理装置可以是电子设备,也可以是电子设备中的部件,例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,还可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的音频信号处理装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的音频信号处理装置能够实现图4至图6的方法实施例实现的各个过程,为避免重复,这里不再赘述。
如图8所示,本申请实施例还提供一种电子设备800,包括处理器801和存储器802,存储器802上存储有可在所述处理器801上运行的程序或指令,该程序或指令被处理器801执行时实现如上述音频信号处理方法实施例的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。
图9为实现本申请实施例的一种电子设备的硬件结构示意图。
该电子设备1000包括但不限于:射频单元1001、网络模块1002、音频输出单元1003、输入单元1004、传感器1005、显示单元1006、用户输入单元1007、接口单元1008、存储器1009、以及处理器1010等部件。
本领域技术人员可以理解,电子设备1000还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1010逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图9中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
其中,处理器1010,可以用于将频带能量衰减曲线划分为N段能量衰减曲线,并对每段能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;且基于处理得到的该N个线性拟合曲线,确定N段能量衰减曲线中目标时间帧对应的混响抑制函数;并基于确定的该混响抑制函数,抑制第一音频信号中目标时间帧的混响音频信号中的混响部分,以得到第二音频信号。其中,第一音频信号为:音频信号中处于该频带能量衰减曲线对应的频带内的音频信号;该频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
一种可能的实现方式中,处理器1010,具体可以用于基于上述N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算每个线性拟合曲线对应的混响权重,以得到N个混响权重;且基于该N个混响权重,计算上述混响音频信号的早期混响能量和该混响音频信号的晚期混响能量;并基于该早期混响能量和晚期混响能量,确定上述混响抑制函数。
一种可能的实现方式中,处理器1010,具体可以用于对于目标时间帧之前的每个时间帧,根据一个时间帧以及该一个时间帧对应的混响权重,计算第一音频信号中该一个时间帧的直达音频信号的能量在目标时间帧的剩余能量,得到该每个时间帧对应的剩余能量;并根据该每个时间帧对应的剩余能量,计算上述早期混响能量和晚期混响能量。
一种可能的实现方式中,处理器1010,具体可以用于根据上述早期混响能量、晚期混响能量,以及第一音频信号中目标时间帧的环境噪声音频信号的能量,计算目标时间帧对应的先验信噪比;且根据该晚期混响能量、该环境噪声音频信号的能量,以及第一音频信号在目标时间帧的幅度谱,计算目标时间帧对应的后验信噪比;并根据该先验信噪比和后验信噪比,确定上述混响抑制函数。
一种可能的实现方式中,处理器1010,具体可以用于将上述混响抑制函数,与第一音频信号在目标时间帧的幅度谱进行点乘运算,得到目标幅度谱;并将目标幅度谱,与第一音频信号在目标时间帧的相位进行逆傅里叶变换,得到第二音频信号。
在本申请实施例提供的电子设备中,由于该电子设备可以将RIR能量衰减曲线中的一个频带能量衰减曲线,划分为N段能量衰减曲线且分别进行线性拟合,并可以基于得到的N个线性拟合曲线,确定该N段能量衰减曲线中目标时间帧对应的混响抑制函数,以对该频带能量衰减曲线对应的频带内的音频信号中目标时间帧的混响音频信号中的混响部分进行抑制,因此可以通过拟合误差较小的分段线性拟合,以及每个时间帧对应的混响抑制函数,对各个时间帧的混响音频信号进行准确抑制,从而可以提高抑制混响音频信号的效果。
本实施例中各种实现方式具有的有益效果具体可以参见上述方法实施例中相应实现方式所具有的有益效果,为避免重复,此处不再赘述。
应理解的是,本申请实施例中,输入单元1004可以包括图形处理器(Graphics Processing Unit,GPU)10041和麦克风10042,图形处理器10041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1006可包括显示面板10061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板10061。用户输入单元1007包括触控面板10071以及其他输入设备10072中的至少一种。触控面板10071,也称为触摸屏。触控面板10071可包括触摸检测装置和触摸控制器两个部分。其他输入设备10072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
存储器1009可用于存储软件程序以及各种数据。存储器1009可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器1009可以包括易失性存储器或非易失性存储器,或者,存储器1009可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器1009包括但不限于这些和任意其它适合类型的存储器。
处理器1010可包括一个或多个处理单元;可选的,处理器1010集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器1010中。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现如上述音频信号处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如上述音频信号处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如上述音频信号处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (14)

  1. 一种音频信号处理方法,所述方法包括:
    将频带能量衰减曲线划分为N段能量衰减曲线,并对每段所述能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;
    基于所述N个线性拟合曲线,确定所述N段能量衰减曲线中目标时间帧对应的混响抑制函数;
    基于所述混响抑制函数,抑制第一音频信号中所述目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;
    其中,所述第一音频信号为:音频信号中处于所述频带能量衰减曲线对应的频带内的音频信号;
    所述频带能量衰减曲线为:房间冲击响应RIR能量衰减曲线中的一个频带能量衰减曲线。
  2. 根据权利要求1所述的方法,其中,所述基于所述N个线性拟合曲线,确定所述能量衰减曲线中目标时间帧对应的混响抑制函数,包括:
    基于所述N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算每个所述线性拟合曲线对应的混响权重,以得到N个混响权重;
    基于所述N个混响权重,计算所述混响音频信号的早期混响能量和所述混响音频信号的晚期混响能量;
    基于所述早期混响能量和所述晚期混响能量,确定所述混响抑制函数。
  3. 根据权利要求2所述的方法,其中,所述基于所述N个混响权重,计算所述混响音频信号的早期混响能量和所述混响音频信号的晚期混响能量,包括:
    对于所述目标时间帧之前的每个时间帧,根据一个时间帧以及所述一个时间帧对应的混响权重,计算所述第一音频信号中所述一个时间帧的直达音频信号的能量在所述目标时间帧的剩余能量,得到所述每个时间帧对应的剩余能量;
    根据所述每个时间帧对应的剩余能量,计算所述早期混响能量和所述晚期混响能量。
  4. 根据权利要求2所述的方法,其中,所述基于所述早期混响能量和所述晚期混响能量,确定所述混响抑制函数,包括:
    根据所述早期混响能量、所述晚期混响能量,以及所述第一音频信号中所述目标时间帧的环境噪声音频信号的能量,计算所述目标时间帧对应的先验信噪比;
    根据所述晚期混响能量、所述环境噪声音频信号的能量,以及所述第一音频信号在所述目标时间帧的幅度谱,计算所述目标时间帧对应的后验信噪比;
    根据所述先验信噪比和所述后验信噪比,确定所述混响抑制函数。
  5. 根据权利要求1所述的方法,其中,所述基于所述混响抑制函数,抑制第一音频 信号中所述目标时间帧的混响音频信号中的混响部分,以得到第二音频信号,包括:
    将所述混响抑制函数与所述第一音频信号在所述目标时间帧的幅度谱进行点乘运算,得到目标幅度谱;
    将所述目标幅度谱,与所述第一音频信号在所述目标时间帧的相位进行逆傅里叶变换,得到所述第二音频信号。
  6. 一种音频信号处理装置,所述装置包括处理模块、确定模块和抑制模块;
    所述处理模块,用于将频带能量衰减曲线划分为N段能量衰减曲线,并对每段所述能量衰减曲线进行线性拟合,得到N个线性拟合曲线,N为大于或等于2的整数;
    所述确定模块,用于基于所述处理模块处理得到的所述N个线性拟合曲线,确定所述N段能量衰减曲线中目标时间帧对应的混响抑制函数;
    所述抑制模块,用于基于所述确定模块确定的所述混响抑制函数,抑制第一音频信号中所述目标时间帧的混响音频信号中的混响部分,以得到第二音频信号;
    其中,所述第一音频信号为:音频信号中处于所述频带能量衰减曲线对应的频带内的音频信号;
    所述频带能量衰减曲线为:RIR能量衰减曲线中的一个频带能量衰减曲线。
  7. 根据权利要求6所述的装置,其中,
    所述确定模块,具体用于基于所述N个线性拟合曲线中的每个线性拟合曲线的斜率,分别计算每个所述线性拟合曲线对应的混响权重,以得到N个混响权重;且基于所述N个混响权重,计算所述混响音频信号的早期混响能量和所述混响音频信号的晚期混响能量;并基于所述早期混响能量和所述晚期混响能量,确定所述混响抑制函数。
  8. 根据权利要求7所述的装置,其中,
    所述确定模块,具体用于对于所述目标时间帧之前的每个时间帧,根据一个时间帧以及所述一个时间帧对应的混响权重,计算所述第一音频信号中所述一个时间帧的直达音频信号的能量在所述目标时间帧的剩余能量,得到所述每个时间帧对应的剩余能量;并根据所述每个时间帧对应的剩余能量,计算所述早期混响能量和所述晚期混响能量。
  9. 根据权利要求7所述的装置,其中,
    所述确定模块,具体用于根据所述早期混响能量、所述晚期混响能量,以及所述第一音频信号中所述目标时间帧的环境噪声音频信号的能量,计算所述目标时间帧对应的先验信噪比;且根据所述晚期混响能量、所述环境噪声音频信号的能量,以及所述第一音频信号在所述目标时间帧的幅度谱,计算所述目标时间帧对应的后验信噪比;并根据所述先验信噪比和所述后验信噪比,确定所述混响抑制函数。
  10. 根据权利要求6所述的装置,其中,
    所述抑制模块,具体用于将所述混响抑制函数,与所述第一音频信号在所述目标时间帧的幅度谱进行点乘运算,得到目标幅度谱;并将所述目标幅度谱,与所述第一音频信号在所述目标时间帧的相位进行逆傅里叶变换,得到所述第二音频信号。
  11. 一种电子设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-5中任一项所述的音频信号处理方法的步骤。
  12. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-5中任一项所述的音频信号处理方法的步骤。
  13. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1-5任一项所述的方法。
  14. 一种计算机程序产品,所述程序产品被存储在非瞬态存储介质中,所述程序产品被至少一个处理器执行以实现如权利要求1-5任一项所述的方法。
PCT/CN2023/125312 2022-10-25 2023-10-19 音频信号处理方法、装置、电子设备及可读存储介质 WO2024088142A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211314023.0 2022-10-25
CN202211314023.0A CN115604627A (zh) 2022-10-25 2022-10-25 音频信号处理方法、装置、电子设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024088142A1 true WO2024088142A1 (zh) 2024-05-02

Family

ID=84848503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/125312 WO2024088142A1 (zh) 2022-10-25 2023-10-19 音频信号处理方法、装置、电子设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN115604627A (zh)
WO (1) WO2024088142A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115604627A (zh) * 2022-10-25 2023-01-13 维沃移动通信有限公司(Cn) 音频信号处理方法、装置、电子设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101416237A (zh) * 2006-05-01 2009-04-22 日本电信电话株式会社 基于源和室内声学的概率模型的语音去混响方法和设备
CN105679330A (zh) * 2016-03-16 2016-06-15 南京工程学院 基于改进子带信噪比估计的数字助听器降噪方法
CN106558315A (zh) * 2016-12-02 2017-04-05 深圳撒哈拉数据科技有限公司 异质麦克风自动增益校准方法及系统
US10440498B1 (en) * 2018-11-05 2019-10-08 Facebook Technologies, Llc Estimating room acoustic properties using microphone arrays
CN112382305A (zh) * 2020-10-30 2021-02-19 北京百度网讯科技有限公司 调节音频信号的方法、装置、设备和存储介质
CN115604627A (zh) * 2022-10-25 2023-01-13 维沃移动通信有限公司(Cn) 音频信号处理方法、装置、电子设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101416237A (zh) * 2006-05-01 2009-04-22 日本电信电话株式会社 基于源和室内声学的概率模型的语音去混响方法和设备
CN105679330A (zh) * 2016-03-16 2016-06-15 南京工程学院 基于改进子带信噪比估计的数字助听器降噪方法
CN106558315A (zh) * 2016-12-02 2017-04-05 深圳撒哈拉数据科技有限公司 异质麦克风自动增益校准方法及系统
US10440498B1 (en) * 2018-11-05 2019-10-08 Facebook Technologies, Llc Estimating room acoustic properties using microphone arrays
CN112382305A (zh) * 2020-10-30 2021-02-19 北京百度网讯科技有限公司 调节音频信号的方法、装置、设备和存储介质
CN115604627A (zh) * 2022-10-25 2023-01-13 维沃移动通信有限公司(Cn) 音频信号处理方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN115604627A (zh) 2023-01-13

Similar Documents

Publication Publication Date Title
US20210217433A1 (en) Voice processing method and apparatus, and device
CN108141502B (zh) 降低声学系统中的声学反馈的方法及音频信号处理设备
EP2783504B1 (en) Acoustic echo cancellation based on ultrasound motion detection
WO2020228404A1 (zh) 即时通讯的音质优化方法、装置及设备
WO2024088142A1 (zh) 音频信号处理方法、装置、电子设备及可读存储介质
CN111524498B (zh) 滤波方法、装置及电子设备
CN110176244B (zh) 回声消除方法、装置、存储介质和计算机设备
US11349525B2 (en) Double talk detection method, double talk detection apparatus and echo cancellation system
EP3929919A1 (en) Voice signal processing method and device, apparatus, and readable storage medium
CN110177317A (zh) 回声消除方法、装置、计算机可读存储介质和计算机设备
JP7159438B2 (ja) エコー検出
CN113192527A (zh) 用于消除回声的方法、装置、电子设备和存储介质
WO2020252629A1 (zh) 残余回声检测方法、残余回声检测装置、语音处理芯片及电子设备
WO2020097824A1 (zh) 音频处理方法、装置、存储介质及电子设备
US11380312B1 (en) Residual echo suppression for keyword detection
CN114879929A (zh) 多媒体文件播放方法及其装置
US8406430B2 (en) Simulated background noise enabled echo canceller
CN112397082B (zh) 估计回声延迟的方法、装置、电子设备和存储介质
CN111355855B (zh) 回声处理方法、装置、设备及存储介质
CN111989934B (zh) 回声消除装置、回声消除方法、信号处理芯片及电子设备
CN110021289B (zh) 一种声音信号处理方法、装置及存储介质
CN112997249B (zh) 语音处理方法、装置、存储介质及电子设备
CN113077808B (zh) 一种语音处理方法、装置和用于语音处理的装置
KR20220157475A (ko) 반향 잔류 억제
Guo et al. An Improved Low-Complexity Echo Suppression Algorithm Based on the Acoustic Coloration Effect