CN111445916A

CN111445916A - Audio dereverberation method, device and storage medium in conference system

Info

Publication number: CN111445916A
Application number: CN202010160669.2A
Authority: CN
Inventors: 黄景标; 林聚财; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2020-07-24
Anticipated expiration: 2040-03-10
Also published as: CN111445916B

Abstract

The invention discloses an audio dereverberation method, an audio dereverberation device and a storage medium in a conference system. The audio dereverberation method in the conference system comprises the following steps: calculating a first reverberation time under an audio scene by using an acoustic echo path; calculating a second reverberation time under the audio scene by using the audio signal received by the microphone; calculating a path deviation of the acoustic echo path, and performing weight distribution on the first reverberation time and the second reverberation time according to the path deviation, so that the first reverberation time and the second reverberation time after weight distribution are the same, and defining that the first reverberation time and the second reverberation time after weight distribution are third reverberation time; and performing dereverberation processing on the audio signal according to the third reverberation time. The invention can improve the effectiveness and robustness of estimating the reverberation time and more effectively remove the reverberation component in the audio signal.

Description

Audio dereverberation method, device and storage medium in conference system

Technical Field

The present invention relates to the field of audio signal processing technologies, and in particular, to an audio dereverberation method, device and storage medium in a conference system.

Background

In the case of sound signal collection or recording, the microphone receives not only the part of the sound wave emitted by the desired sound source and directly arriving, but also the sound wave emitted by the sound source and arriving by other routes, and the undesired sound wave (i.e. background noise) generated by other sound sources in the environment. Acoustically, the reflected wave with a delay time of about 50ms or more is called echo, and the effect of the remaining reflected wave is called reverberation. The reverberation phenomenon will have an effect on the reception effect of the desired acoustic signal. In many cases, reverberation tends to cause interference, resulting in poor performance of the acoustic receiving system. Therefore, it is important to reduce the influence of reverberation on the sound receiving system, i.e. dereverberation.

The inventors of the present application found that the present dereverberation process is not effective.

Disclosure of Invention

The invention provides a method and a device for removing reverberation of audio in a conference system and a storage medium, which can solve the technical problem of poor reverberation removing effect in the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an audio dereverberation method in a conference system, comprising the steps of:

calculating a first reverberation time in the audio scene using the acoustic echo path, an

Calculating a second reverberation time in the audio scene using the audio signal received by the microphone;

calculating a path deviation of the acoustic echo path, and performing weight distribution on the first reverberation time and the second reverberation time according to the path deviation, so that the first reverberation time and the second reverberation time after weight distribution are the same, and defining that the first reverberation time and the second reverberation time after weight distribution are third reverberation time;

and performing dereverberation processing on the audio signal according to the third reverberation time.

The technical scheme adopted by the invention also comprises the following steps: the calculating, using the acoustic echo path, prior to the first reverberation time in the audio scene comprises:

and acquiring the acoustic echo path between a loudspeaker and a microphone under the audio scene by using an echo cancellation algorithm.

The technical scheme adopted by the invention also comprises the following steps: said weight assigning the first and second reverberation times according to the path deviation further comprises:

if the path deviation is larger than a set path deviation threshold value, indicating that the convergence of the echo cancellation algorithm is unsuccessful, distributing a smaller weight to the first reverberation time; if the path deviation is smaller than a set path deviation threshold value, indicating that the convergence of the echo cancellation algorithm is successful, allocating a larger weight to the first reverberation time;

the weight of the second reverberation time is: the total weight minus the weight assigned to the first reverberation time.

The technical scheme adopted by the invention also comprises the following steps: the dereverberating the audio signal according to the third reverberation time further comprises:

and calculating the late reverberation power spectral density of the audio signal by using the third reverberation time.

and carrying out short-time Fourier transform on the audio signal to obtain the representation of the audio signal on a short-time frequency domain, and calculating the noise power spectral density of the audio signal by using a noise estimation algorithm.

and based on the calculation results of the noise power spectral density and the late reverberation power spectral density, performing voice enhancement processing on each frequency point in the audio signal by using a voice enhancement mode to eliminate a reverberation part in the audio signal.

The technical scheme adopted by the invention also comprises the following steps: the speech enhancement means comprises spectral subtraction, wiener filtering or mmse estimator.

The invention adopts another technical scheme that: an audio dereverberation apparatus in a conference system, the apparatus comprising:

a first reverberation time estimation module: for calculating a first reverberation time in the audio scene using the acoustic echo path;

a second reverberation time estimation module: for calculating a second reverberation time in the audio scene using the audio signal received by the microphone;

a weight assignment module: the acoustic echo path is used for calculating a path deviation of the acoustic echo path, and performing weight distribution on the first reverberation time and the second reverberation time according to the path deviation, so that the first reverberation time and the second reverberation time after weight distribution are the same, and the first reverberation time and the second reverberation time after weight distribution are both defined as a third reverberation time;

a voice enhancement module: for dereverberating the audio signal in accordance with the third reverberation time.

In order to solve the technical problems, the invention adopts another technical scheme that: there is provided an audio dereverberation apparatus in a conference system, comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the audio dereverberation method in the conferencing system as set forth above;

the processor is to execute the program instructions stored by the memory to dereverberate an audio signal.

In order to solve the technical problems, the invention adopts another technical scheme that: a storage medium storing program instructions executable by a processor to perform the audio dereverberation method in a conference system as described above.

The invention has the beneficial effects that: according to the audio dereverberation method, the device and the storage medium in the conference system, the reverberation time of the audio signal is estimated by using the acoustic echo path, so that the effectiveness and robustness of the estimated reverberation time are improved; meanwhile, in order to prevent the acoustic echo path from changing in the time-varying process, the calculated reverberation time is subjected to weight distribution, so that the accuracy of reverberation time estimation is further improved, and reverberation components in the audio signal are removed more effectively.

Drawings

Fig. 1 is a flow chart illustrating an audio dereverberation method in a conference system according to a first embodiment of the present invention;

fig. 2 is a flow chart of an audio dereverberation method in a conference system according to a second embodiment of the present invention;

fig. 3 is a schematic diagram of a first structure of an audio dereverberation apparatus in a conference system according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a second structure of an audio dereverberation apparatus in the conference system according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of a storage medium structure according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Example one

Please refer to fig. 1, which is a flowchart illustrating an audio dereverberation method in a conference system according to a first embodiment of the present invention. The audio dereverberation method in the conference system of the first embodiment of the present invention includes the steps of:

s100: calculating a first reverberation time under an audio scene by using an acoustic echo path;

in S100, first, an acoustic echo path is estimated by adaptive filtering:

in the formula (1), ω is₁Denotes the estimation of ω, μ_adpIs a step size factor, and ranges from [0, 1 ]]；

Is a residual signal;

the average power of the reference audio signal is calculated as follows:

in the formula (2), λ is a smoothing factor, and is usually set to 0.98.

Then, calculating a first reverberation time using the acoustic echo path;

converting the acoustic echo path into a DB representation:

estimating a correlation regression coefficient c by adopting a linear fitting mode, setting a fitted curve as cn + b, and calculating the correlation regression coefficient c in a mode of:

wherein the content of the first and second substances,

the first reverberation time calculated using the acoustic echo path is:

wherein

Is a regulatory factor.

S101: calculating a second reverberation time under the audio scene by using the audio signal received by the microphone;

in S101, it is assumed that the audio signal received by the microphone is represented as:

in the formula (7), the reaction mixture is,

propagation of audio played for speakers through a conference roomAnd the signal arriving in the microphone has an acoustic echo path with the length of N, wherein the path is omega (N) [ [ omega ] ]₀(n),…,ω_N-1(n)]^T，

x_revb(n) is the audio reverberation signal and v (n) is the background noise.

Performing echo cancellation algorithm processing on the audio signal d (n) received by the microphone to obtain d '(n), and performing second reverberation time estimation on d' (n):

d′(n)＝x_revb(n)+v(n) (8)

in the formula (8), x_revb(n) may be represented as

Wherein T is_sIs the inverse of the sampling rate and,

referred to as the reverberation attenuation factor;

estimating the reverberation attenuation factor by using maximum likelihood estimation:

ρ＝arg{max{L(d′,ρ)}} (9)

in the formula (9), the reaction mixture is,

is the power of the noise.

S102: calculating a path deviation of an acoustic echo path, and performing weight distribution on the first reverberation time and the second reverberation time according to the path deviation so that the first reverberation time and the second reverberation time after weight distribution are the same, and defining that the first reverberation time and the second reverberation time after weight distribution are third reverberation time;

in S102, the weight distribution strategy adopted by the present invention is: firstly, calculating the path deviation of an acoustic echo path, and if the path deviation is greater than a set path deviation threshold value, indicating that the convergence of an echo cancellation algorithm is unsuccessful, distributing a smaller weight to the first reverberation time; if the path deviation is smaller than the set path deviation threshold value, indicating that the convergence of the echo cancellation algorithm is successful, distributing a larger weight to the first reverberation time; the weight of the second reverberation time is assigned as: the total weight minus the weight assigned to the first reverberation time.

S103: performing dereverberation processing on the audio signal according to the third reverberation time;

in S103, dereverberation specifically includes: and calculating the noise power spectral density of the audio signal by using a noise estimation algorithm, calculating the late reverberation power spectral density of the audio signal by using the third reverberation time, and performing dereverberation processing on the audio signal by using a voice enhancement mode based on the calculation results of the noise power spectral density and the late reverberation power spectral density.

According to the audio dereverberation method in the conference system, the first reverberation time of the audio signal is estimated by using the acoustic echo path, the second reverberation time is calculated by using the audio signal received by the microphone, then the two reverberation times are subjected to weight distribution, and the audio signal is subjected to dereverberation treatment according to the reverberation time after the weight distribution, so that the dereverberation effect of the audio signal is improved.

Example two

Please refer to fig. 2, which is a flowchart illustrating an audio dereverberation method in a conference system according to a second embodiment of the present invention. The audio dereverberation method in the conference system of the second embodiment of the present invention includes the steps of:

s200: acquiring an acoustic echo path between a loudspeaker and a microphone in a current audio scene;

in S200, the acoustic echo path acquisition mode is: and calculating by using an echo cancellation algorithm in a conference scene. The embodiment of the invention adopts self-adaptive filtering to estimate the acoustic echo path:

Is a residual signal;

the average power of the reference audio signal is calculated as follows:

in the formula (2), λ is a smoothing factor, and is usually set to 0.98.

S201: calculating a first reverberation time under a current audio scene by using an acoustic echo path;

in S201, the reverberation time refers to the time required for the sound source to stop playing sound and attenuate the sound source energy by 60dB, and may be used to represent the reverberation degree of the room, and also may be used to estimate the power of late reverberation; the calculation method for calculating the first reverberation time by using the acoustic echo path comprises the following steps:

converting the acoustic echo path into a DB representation:

wherein the content of the first and second substances,

the first reverberation time calculated using the acoustic echo path is:

wherein

Is a regulatory factor.

S202: acquiring an audio signal received by a microphone, calculating a second reverberation time in the current audio scene by using the received audio signal, and respectively executing S203 and S204;

in S202, in a conference scene, there are usually a speaker and a microphone, where audio played by the speaker is from an audio signal sent by a network, and an audio signal received by the microphone includes the audio signal played by the speaker and an audio signal of a speaker in the current conference scene. Assume that the audio signal received by the microphone is represented as:

in the formula (7), the reaction mixture is,

the acoustic echo path ω (N) ([ ω) with a length N for the audio played by the loudspeaker to travel through the conference room to reach the signal in the microphone₀(n),…,ω_N-1(n)]^T，

x_revb(n) is the audio reverberation signal and v (n) is the background noise.

d′(n)＝x_revb(n)+v(n) (8)

in the formula (8), x_revb(n) may be represented as

Wherein T is_sIs the inverse of the sampling rate and,

referred to as the reverberation attenuation factor;

ρ＝arg{max{L(d′,ρ)}} (9)

in the formula (9), the reaction mixture is,

is the power of the noise.

S203: analyzing the acoustic echo path, calculating a path deviation of the acoustic echo path, performing weight distribution on the first reverberation time and the second reverberation time according to the path deviation of the acoustic echo path, so that the first reverberation time and the second reverberation time after the weight distribution are the same, obtaining a final third reverberation time, and executing S205;

in S203, since the reverberation time is not affected by the distance between the microphone and the sound source, the accuracy of the first reverberation time calculated using the acoustic echo path is higher than that of the second reverberation time calculated using the audio signal. Due to environmental factors, the acoustic echo path is time-varying, so that the acoustic echo path calculated by using an echo cancellation algorithm changes during updating, and aiming at the situation, the weight distribution strategy adopted by the invention is as follows: firstly, calculating the path deviation of an acoustic echo path calculated by using an echo cancellation algorithm, and if the path deviation is greater than a set path deviation threshold value, indicating that the convergence of the echo cancellation algorithm is unsuccessful, distributing a smaller weight to a first reverberation time calculated by the acoustic echo path; on the contrary, if the path deviation is smaller than the set path deviation threshold value, which indicates that the convergence of the echo cancellation algorithm is successful, a larger weight is assigned to the first reverberation time calculated by the acoustic echo path. The first reverberation time and the second reverberation time are subjected to weight distribution through the path deviation of the acoustic echo path, namely the first reverberation time calculated by the acoustic echo path is corrected by the second reverberation time calculated by the audio signal, so that the final third reverberation time is more accurate, and the reverberation component in the audio signal can be more effectively removed in the subsequent dereverberation.

In the embodiment of the present invention, the weight of the first reverberation time is:

sx is 11+ e-x; the weights of the second reverberation time are: 1-w; it is understood that the weight distribution includes, but is not limited to, the above modes, and can be adjusted or set according to actual operation.

S204: carrying out short-time Fourier transform on the audio signal received by the microphone to obtain the representation of the audio signal on a short-time frequency domain, and calculating the noise power spectral density in the audio signal by using a noise estimation algorithm;

s205: calculating to obtain the late reverberation power spectral density of the audio signal by using the third reverberation time;

in S205, the late reverberation power spectral density is calculated by:

η_nlk(n,k)＝e^-2β(n)Rη(n-N_e,k) (10)

in formula (10), η_nlk(n, k) is the power of the late reverberation component, n represents the time frame, k represents the frequency point, η (n, k) is the average power of the signal, η (n, k) is αη (n-1, k) + (1- α) | d' (n, k) |²Where α is a smoothing factor, typically taken to be 0.95N_eTo adjust the parameters, typically 8, R is the size of each time frame sliding;

fs is the sampling rate;

wherein

Representing a first reverberation time estimated for the acoustic transfer path,

representing a second reverberation time estimated for the audio signal.

S206: based on the calculation results of the noise power spectral density and the late reverberation power spectral density, performing voice enhancement processing on each frequency point in the audio signal by using a voice enhancement mode to eliminate a reverberation part in the audio signal;

in S206, the speech enhancement method includes, but is not limited to, spectral subtraction, wiener filtering, or mmse estimator. Taking wiener filtering as an example, wiener filtering can be expressed as:

in equation (11), ξ (n, k) represents the prior signal-to-noise ratio,

ξ_minthe lower limit of the prior signal-to-noise ratio can be set according to the actual situation.

Finally, the audio signal after dereverberation is obtained: x is the number of_e(n,k)＝H(n,k)d′(n,k)。

The audio dereverberation method in the conference system of the second embodiment of the invention estimates the reverberation time of the audio signal by using the acoustic echo path, thereby improving the effectiveness and robustness of the estimation of the reverberation time; meanwhile, in order to prevent the acoustic echo path from changing in the time-varying process, the calculated reverberation time is subjected to weight distribution, so that the accuracy of reverberation time estimation is further improved, and reverberation components in the audio signal are removed more effectively.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a first structure of an audio dereverberation apparatus in a conference system according to an embodiment of the present invention. The apparatus 40 comprises:

the first reverberation time estimation module 41: for calculating a first reverberation time in the audio scene using the acoustic echo path;

the second reverberation time estimation module 42: the second reverberation time is used for calculating a second reverberation time under the audio scene by using the audio signals received by the microphone;

weight assignment module 43: the path deviation is used for calculating the path deviation of the acoustic echo path, and the first reverberation time and the second reverberation time are subjected to weight distribution according to the path deviation, so that the first reverberation time and the second reverberation time after the weight distribution are the same, and the first reverberation time and the second reverberation time after the weight distribution are both defined as a third reverberation time;

the speech enhancement module 44: for dereverberating the audio signal in accordance with the third reverberation time.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a second structure of the audio dereverberation apparatus in the conference system according to the present invention. As shown in fig. 4, the apparatus 50 includes a processor 51, and a memory 52 coupled to the processor 51.

The memory 52 stores program instructions for implementing the audio dereverberation method in the conference system described above.

The processor 51 is operative to execute program instructions stored in the memory 52 to dereverberate the audio signal.

The processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for audio dereverberation in a conference system, comprising the steps of:

calculating a first reverberation time under an audio scene by using an acoustic echo path, and calculating a second reverberation time under the audio scene by using an audio signal received by a microphone;

2. The method of claim 1, wherein the calculating the first reverberation time of the audio scene using the acoustic echo path comprises:

3. The audio dereverberation method in a conference system according to claim 2,

said weight assigning the first and second reverberation times according to the path deviation further comprises:

4. The audio dereverberation method in a conference system as claimed in any one of claims 1 to 3, wherein the dereverberation processing of the audio signal according to the third reverberation time further comprises:

5. The method of claim 4, wherein the dereverberating the audio signal according to the third reverberation time further comprises:

6. The method of claim 5, wherein the dereverberating the audio signal according to the third reverberation time further comprises:

7. The method as claimed in claim 6, wherein the speech enhancement mode comprises spectral subtraction, wiener filtering or mmse estimator.

8. An apparatus for audio dereverberation in a conference system, the apparatus comprising:

9. An audio dereverberation apparatus in a conference system, the apparatus comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing an audio dereverberation method in a conference system as claimed in any one of claims 1 to 7;

10. A storage medium having stored thereon program instructions executable by a processor to perform the method of audio dereverberation in a conference system as claimed in any one of claims 1 to 7.