CN116543784A - Multi-sound source automatic gain control method based on sound field perception - Google Patents

Multi-sound source automatic gain control method based on sound field perception Download PDF

Info

Publication number
CN116543784A
CN116543784A CN202310648272.1A CN202310648272A CN116543784A CN 116543784 A CN116543784 A CN 116543784A CN 202310648272 A CN202310648272 A CN 202310648272A CN 116543784 A CN116543784 A CN 116543784A
Authority
CN
China
Prior art keywords
sound source
sound
frequency
automatic gain
gain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310648272.1A
Other languages
Chinese (zh)
Inventor
卢佳欣
陈枢茜
朱阳燕
王君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Institute of Technology
Original Assignee
Nantong Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Institute of Technology filed Critical Nantong Institute of Technology
Priority to CN202310648272.1A priority Critical patent/CN116543784A/en
Publication of CN116543784A publication Critical patent/CN116543784A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a sound field perception-based multi-sound source automatic gain control method, which comprises space initialization and multi-sound source automatic gain initialization; converting the plurality of microphone signals into a frequency domain through short-time Fourier transform; obtaining the signal to noise ratio of the sound source angle of each region; selecting a sound source playing a leading role at a frequency point; iteratively solving sound source space propagation parameters of different areas at the current moment; calculating the inter-frame similarity of the time space distribution of the sound source at the current moment; updating the energy tracking of the sound source and obtaining the space automatic gain; using the reverberation grade of the Mel filter group, calculating to obtain the automatic compensation gain of the frequency band; the spatial gain is calculated and applied to the spectrum, and the processed audio is obtained by using short-time Fourier inverse transformation. The method can improve the effectiveness of volume balance under a multi-sound source switching scene, improve the effect of volume balance under a reverberation scene, and solve the problem of different energy losses of different frequency bands caused by sound transmission.

Description

Multi-sound source automatic gain control method based on sound field perception
Technical Field
The invention belongs to the field of sound field control, and particularly relates to a sound field perception-based multi-sound source automatic gain control method.
Background
With the rapid development of network telephony (Voice Over Internet Protocol, voIP) applications in recent years, an audio/video communication scheme represented by WebRTC technology is becoming more and more popular. As a core of an audio processing technology in a network call technology, an automatic gain control (Automatic Gain Control, AGC) technology performs automatic gain on received data by adaptively tracking an energy envelope of audio in a time domain or a frequency domain, so that volume control of sound can be effectively realized.
However, in audio communication scenes such as conference rooms, multi-person audio is collected through a microphone array, the sound sizes of different persons are not uniform, and the difference in positions between a sound source and the microphone array also causes significant difference in energy of audio data received by the microphones.
For a multi-sound source switching scenario, the flow of the conventional AGC algorithm is shown in fig. 1: firstly, sound source localization is carried out, a beam forming technology is used for estimating the energy of the sound sources based on the sound source localization result, and automatic gain tracking is carried out on the energy of each sound source.
Aiming at a multi-sound source switching scene, in practical application, the existing AGC, DAGC and ASGC technical schemes still face the following 5 problems:
1. when the number of microphones is small (less than 3), the sound source positioning accuracy cannot be guaranteed, and the sound source energy tracking result is not stable enough by utilizing beam forming.
2. When the speaker is faced with switching, the spatial sound source parameters cannot be accurately estimated.
3. The speed of the energy tracking of the spatial sound source cannot be adjusted in a self-adaptive mode, so that the energy of each sound source cannot be tracked quickly and stably, and the calculation accuracy and timeliness of the gain of each sound source cannot be ensured.
4. The method often faces the problem of high occupation of computing resources in the sound source positioning searching process.
5. This solution does not take into account the problem of sound distortion caused by reverberation.
Disclosure of Invention
The invention aims to: in order to overcome the defects in the prior art, the invention provides the multi-sound-source automatic gain control method based on sound field perception, which can improve the effectiveness of volume balance and improve the effect of volume balance in a reverberation scene under a multi-sound-source switching scene and solve the problem of different energy losses in different frequency bands caused by sound propagation.
The technical scheme is as follows: in order to achieve the above purpose, the technical scheme of the invention is as follows:
a multi-sound source automatic gain control method based on sound field perception comprises the following steps:
s1: space initialization: setting the maximum number of sound sources, dividing the space, and dividing the plane by 0-180 degrees to form a plurality of areas;
initializing the automatic gain of multiple sound sources;
s2: and (3) time-frequency conversion: converting the plurality of microphone signals into a frequency domain through short-time Fourier transform;
s3: sound field perception: obtaining the signal to noise ratio of the sound source angle of each region;
selecting a sound source playing a leading role at a frequency point;
iteratively solving sound source space propagation parameters of different areas at the current moment;
s4: calculating the time-space similarity of sound sources: calculating the inter-frame similarity of the time space distribution of the sound source at the current moment;
s5: spatial automatic gain control: updating the energy tracking of the sound source and obtaining the space automatic gain;
s6: adaptive multiband automatic gain compensation: using the reverberation grade of the Mel filter group, calculating to obtain the automatic compensation gain of the frequency band;
s7: gain smoothing, time-frequency inverse transformation: the spatial gain is calculated and applied to the spectrum, and the processed audio is obtained by using short-time Fourier inverse transformation.
Further, based on step S1: setting the maximum sound source number, dividing the space into 6 areas with 30 degrees as one area, dividing the plane 0-180 degrees into 6 areas, and initializing the angle eta= {15,45,75,105} of each area;
multiple sound source automatic gain initialization: automatic gain control energy flatteningSlip factor alpha minmax Target energy level
Further, based on step S2: the convolution model is widely applied to sound propagation in a closed space, and the mathematical model is as follows:
wherein: x is x i (t) represents an audio signal received by the ith microphone at time t; s is(s) j (t) represents a sound source j; n represents the number of sound sources;representing the transfer function of sound source j to microphone i;
model the microphone array signal x i (t) conversion to a frequency-domain formRe-expansion into a double microphone, assumingAssuming the first microphone is taken as a reference, one can get:
further, based on step S3: assuming that the propagation parameter a of each sound source is a constant 1, and the spatial information of the sound source is reflected on the sound source propagation parameter delta, the iterative solution of the sound source spatial propagation parameter delta is as follows:
wherein:gamma denotes a forgetting factor and beta denotes the update speed of the sound source space parameter.
Further, based on step S4: based on the formula (23), the space in which the sound source with remarkable current moment is positioned is judged,
then, calculating the cross correlation of the spatial distribution vectors of the sound source at the current moment and the previous moment, and calculating a time-space similarity factor xi, wherein:
wherein if (DominateSource (t). Noteq. DominateStource (t-1)), h prob (t)=0,h cnt (t)=0。
Further, based on step S5: an adaptive energy smoothing algorithm based on the time-space similarity is designed:
where α (t) is a dynamic energy temporal smoothing factor, α LT Is a fixed energy long-term tracking factor,
α(t)=(α min +(α maxmin ) ζ (t)) type (26)
The gain at time t can be obtained
Further, based on step S6: when the sound propagates indoors, diffuse reflection is more likely to occur because the wavelength of the low-frequency component of the sound is longer, the reverberation of the low-frequency component which is the audio on the audio data is larger, and the energy loss is smaller; the high-frequency component is easier to generate specular reflection due to shorter wavelength, and the reverberation at the microphone is smaller than the low-frequency component, and the energy loss is large; to improve this phenomenon, mel filter banks are based on:
wherein: m is M F (ω) is the filter coefficients of the F-th mel-filter bank at frequency ω;
the reverberation levels K (F, τ) of the different mel filter banks are calculated:
based on the relationship between the degree of reverberation and the distance of the sound source from the microphone: the farther the sound source is from the microphone, the higher the degree of reverberation; with the low frequency part of the sound as a reference frequency band, automatic gain compensation based on the reverberation degree of the reference frequency band is constructed:
wherein F is sum Is the number of frequency bands.
Further, based on step S7:
smoothing the calculated space gain, controlling the update speed of the automatic gain, smoothing the gain in the frequency range from the time dimension and the frequency dimension, and simultaneously:
wherein K is min (t) tracking minimum acquisition of K (t) by improved minimum control recursive average technique;
in the formula (33), gain resetting is carried out when the reverberation degree is smaller than a certain threshold value, so that the problem of gain tracking errors caused by scattered noise and large reverberation is reduced;
in frequency, a gain interpolation algorithm is employed, namely:
finally, output through time-frequency inverse transformation:
X out =ISTFT(Gain*X in ) Formula (35).
The beneficial effects are that: the invention has the following effects:
(1) When the number of microphones is small, the invention creatively combines a sound source signal-to-noise ratio estimation technology based on cross correlation and a voice separation technology based on window joint orthogonal and decomposition estimation to solve the problem 1-3, and provides a multi-sound source maximum likelihood estimation sound source positioning method based on directional signal-to-noise ratio constraint, which changes the course of coarse search in sound source positioning into the process of solving the maximum signal-to-noise ratio of the sound source direction, converts the search problem into a closed solving problem, improves the precision of the coarse search and reduces the calculated amount; the method is limited by the number of microphones, the value of the traditional fine search algorithm cannot be exerted, the fine search is converted into the parameter estimation of maximum likelihood, and the more accurate sound source position is solved through a self-adaptive iterative process based on the signal-to-noise ratio estimation result of the maximum direction in the first step.
(2) Aiming at the problem 4, compared with the traditional scheme of calculating the sound source gain in the direction by using sound source positioning driving wave beam forming, the invention constructs the method for controlling the speed of sound source energy tracking based on the space-time similarity of the sound source, improves the effect of an automatic gain control technology on the sound source energy tracking, and reduces the requirement on the precision of sound source positioning estimation.
(3) Aiming at the problem 5, the invention constructs the self-adaptive gain compensation curve based on the hearing frequency under different reverberation intensities, and improves the problem of different energy losses of different frequency bands caused by sound transmission.
Drawings
FIG. 1 is an algorithm flow of the prior art automatic gain control technique;
FIG. 2 is a flow of a multi-sound source automatic gain control algorithm based on sound field perception;
FIG. 3 shows the values of different superparameter constCorresponding curves of (2);
fig. 4 shows the spectrum comparison results before and after the frequency adaptive gain compensation is added in the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 2, a multi-sound source automatic gain control method based on sound field perception comprises the following steps:
s1: space initialization: setting the maximum number of sound sources, dividing the space, and dividing the plane by 0-180 degrees to form a plurality of angle areas;
multiple sound source automatic gain initialization: automatic gain control energy smoothing factor alpha minmax Target energy level
S2: and (3) time-frequency conversion: converting the plurality of microphone signals into a frequency domain through short-time Fourier transform;
s3: sound field perception: obtaining a signal-to-noise ratio SNR of the sound source angle eta of each region by using formulas (17), (18), (19), (20), (21);
selecting a sound source dominant at the (ω, t) frequency bin using equation (22);
iteratively solving the sound source space propagation parameters of different areas at the time t by using a formula (14);
s4: calculating the time-space similarity of sound sources: calculating the inter-frame similarity xi of the time space distribution of the sound source at the current moment by using formulas (23) (24) (25);
s5: spatial automatic gain control: updating the energy tracking of the sound source by using (26) (27) (28) and obtaining the space automatic gain;
s6: adaptive multiband automatic gain compensation: using (2) (18) (29) the reverberations level K (F, τ) of the Mel filter bank, using (30) to calculate the auto-compensation gain G for band F Compensate
S7: gain smoothing, time-frequency inverse transformation: the spatial Gain is calculated using equations (32) (34) and applied to the spectrum, and the processed audio is obtained using the short-time inverse fourier transform.
When the number of microphones is small, a sound source signal-to-noise ratio estimation technology based on cross correlation and a voice separation technology based on window joint orthogonal and decomposition estimation (Degenerate Unmixing Estimation Technique, DUET) are combined, and a multi-sound source maximum likelihood estimation sound source positioning method based on direction signal-to-noise ratio constraint is provided, and the method changes the course of rough search in sound source positioning into the course of solving the maximum signal-to-noise ratio of the sound source direction, converts a search problem into a closed solving problem, improves the precision of rough search and reduces the calculated amount; the method is limited by the number of microphones, the value of the traditional fine search algorithm cannot be exerted, the fine search is converted into the parameter estimation of maximum likelihood, and the more accurate sound source position is solved through a self-adaptive iterative process based on the signal-to-noise ratio estimation result of the maximum direction in the first step.
The speed of sound source energy tracking is controlled based on the space-time similarity of the sound source, the effect of the automatic gain control technology on the sound source energy tracking is improved, and the accuracy requirement on sound source positioning estimation is reduced.
An adaptive gain compensation technique (Automatic Gain Compensate based on Reverb Level Estimation, AGC-RLE) based on auditory filter bank reverberant intensity estimation constructs an adaptive gain compensation curve based on auditory frequencies under different reverberant intensities, and solves the problem of different energy losses in different frequency bands caused by sound propagation.
Step S1: initializing:
space initialization: setting the maximum sound source number, dividing the space into 6 areas with 30 degrees as one area, dividing the plane 0-180 degrees into 6 areas, and initializing the angle eta= {15,45,75,105,135,165} of each area;
multiple sound source automatic gain initialization: automatic gain control energy smoothing factor alpha minmax Target energy level
Step two: microphone signal time-frequency conversion
The convolution model is widely used for sound propagation in a closed space, and the mathematical model is as follows,
wherein, the liquid crystal display device comprises a liquid crystal display device,
x i (t) represents the audio signal received by the ith microphone at time t (here an array of two microphones is taken as an example);
s j (t) represents a sound source j;
n represents the number of sound sources;
representing the transfer function of sound source j to microphone i.
Model the microphone array signal x i (t) conversion to a frequency-domain formRe-expansion into a double microphone, assumingAssuming the first microphone is taken as a reference, one can get:
step three: sound field perception
Since the distance between the microphone arrays is much smaller than the distance between the microphones and the sound sources, and considering the influence of reverberation, we assume that the propagation parameter a of each sound source is a constant 1, and the spatial information of the sound source is reflected on the propagation parameter δ of the sound source, constructing an objective function
Based on the window joint quadrature assumption, i.e. at the frequency point (ω, τ), only one sound source k dominates, i.e.,
wherein, the liquid crystal display device comprises a liquid crystal display device,
thus, the estimation of the spatial propagation parameters delta for each sound source in space can be performed by constructing likelihood functions
Wherein M represents the number of frequency bands,
π -1 (k) Representing the set of all the frequency bins at time t where the sound source k dominates.
Taking the logarithm of likelihood function formula (6), since only one sound source has dominant effect at each frequency point, it can be obtained
Maximizing the log-likelihood function equation (8) to obtain
To ensure continuity of the objective function, an auxiliary function is constructed
Then equation (8) can be deduced as
By deriving the objective function J (t), it is possible to obtain
X represents 2 Iterative solution of (ω, t) conjugate of sound source space propagation parameter δ
Wherein, the liquid crystal display device comprises a liquid crystal display device,
gamma denotes a forgetting factor and beta denotes the update speed of the sound source space parameter.
From the above-described derivation, it can be seen that the sound source space parameter estimation based on the maximum likelihood estimation is strictly dependent on establishment of the assumption of the expression (4), and therefore, in order to ensure the accuracy of the sound source space parameter estimation, it is necessary to select the frequency point update (13) satisfying the assumption condition of the expression (4).
A cross-correlation-based directional sound source signal-to-noise ratio estimation method in a reverberation scene is adopted, and a cross-correlation function between microphones is assumed if the incidence direction is eta for a directional sound source j
Wherein, the liquid crystal display device comprises a liquid crystal display device,f s for the sampling rate, d is the distance between the microphones,
modeling reverberation as a scattered noise field with a cross-correlation function between microphones that approximates
For the signals received by the microphone, we can obtain by practice of recursive smoothing
Wherein, the liquid crystal display device comprises a liquid crystal display device,alpha represents a temporal recursion constant.
Based on the scattering noise field model, the reverberation degree estimation based on the direct scattering ratio can be expressed as:
assuming that the signal-to-noise ratio of the sound source with the incident direction eta is SNR, the method can obtain
The equation (18) is developed by Euler's equation to obtain the sound source with the incident direction eta and the signal-to-noise ratio of SNR
Wherein, the liquid crystal display device comprises a liquid crystal display device,
substituting the formula (19) into the formula (4) to obtain
Step four: spatio-temporal similarity estimation
By the derivation of the last part, we can obtain the spatial propagation parameters delta of multiple sound sources j In practical application, in order to ensure the stability and timeliness of tracking multiple sound sources, a vector for representing the spatial distribution of the sound sources is constructed,
V(t)={f(ρ(δ 1 ,t)),...,f(ρ(δ 1 t)) (22)
Wherein the method comprises the steps ofω LH Represented as the lower and upper frequency ranges used to represent sound source spatial information, ω in this context L The lower limit is 500Hz, omega H Selecting a frequency corresponding to the microphone array without frequency aliasing, wherein C represents the frequency omega LH The sum of the number of frequency bands in between, a nonlinear function f (x) is constructed, inversely proportional to the input x, the purpose of this function being to further increase the different spatial propagation parameters delta j To satisfy monotonicity, we take p=3, and fig. 3 shows +_for different values of the super parameter const>Is a corresponding curve of (a).
Specifically, fig. 3 is a graph of y=f (x) at different values of the hyper-parameter const when p=3, and the corresponding curve first determines the space where the sound source with significant current time is located based on equation (23),
then, calculating the cross-correlation of the spatial distribution vectors of the sound source at the current moment and the sound source at the last moment, and calculating a time-space similarity factor xi, wherein
Wherein if (DominateSource (t). Noteq. DominateStource (t-1)), h prob (t)=0,h cnt (t)=0。
Step five: space automatic gain calculation
In order to avoid the influence of rapid switching, gain mutation and the like on the spatial gain caused by the estimated spatial sound source energy deviation, the energy of the original microphone is used for replacing the energy after the spatial filter is used for carrying out the spatial automatic gain calculation in the DAGC scheme, and meanwhile, an adaptive energy smoothing algorithm based on the time-space similarity is designed for replacing a fixed long-time smoothing factor to rapidly realize the rapid tracking of the sound source energy, so that the accuracy and the continuity of the spatial automatic gain calculation are ensured.
Where α (t) is a dynamic energy temporal smoothing factor, α LT Is a fixed energy long-term tracking factor,
α(t)=(α min +(α maxmin ) ζ (t)) type (26)
The gain at time t can be obtained
Step six: adaptive multiband automatic gain compensation
When the sound propagates indoors, diffuse reflection is more likely to occur because the wavelength of the low-frequency component of the sound is longer, the reverberation of the low-frequency component which is the audio on the audio data is larger, and the energy loss is smaller; the high frequency component is more likely to be specularly reflected due to its shorter wavelength, and the reverberation at the microphone is less than the low frequency, with a large energy loss. To improve this, the present document is based on mel-filter banks,
wherein: m is M F (ω) is the filter coefficient of the F-th mel-filter bank at frequency ω
The reverberations levels K (F, tau) of different Mel filter banks are calculated in combination with (28) (17),
based on the relationship between the degree of reverberation and the distance of the sound source from the microphone: the farther the sound source is from the microphone, the higher the degree of reverberation. The low frequency part of the sound is taken as a reference frequency band, automatic gain compensation based on the reverberation degree of the reference frequency band is constructed,
wherein F is sum Is the number of frequency bands.
Step seven: adaptive multiband spatial gain smoothing and inverse fourier transform
After the space automatic gain and the multiband automatic gain are compensated, the gains of different frequency bands can be obtained, but the calculated space gain is directly applied, so that the sound distortion caused by discontinuous gains of the audio frequency in time and frequency can be caused. To improve the above problems, the present invention uses the sound source reverberation level instead of the energy comparison to control the update rate of the automatic gain, while smoothing the gain from the time and frequency dimensions in the frequency range, while using the synthesis of the gain in comparison to the DAGC algorithm
Wherein K is min And (t) tracking the minimum value of K (t) through a modified minimum value control recursive average (Improved Minima Controlled Recursive Averaging, IMCRA) technology, and resetting the gain when the reverberation degree is smaller than a certain threshold value in the formula (33), so that the gain tracking error problem caused by scattered noise and large reverberation is reduced.
In frequency, gain interpolation algorithms are employed, i.e
Finally, output through time-frequency inverse transformation:
X out =ISTFT(Gain*X in ) Formula (35).
The method comprises the steps of constructing a sound source space parameter estimation algorithm based on maximum likelihood estimation of space signal-to-noise ratio control, accurately estimating space sound source parameters by the algorithm, adaptively adjusting the speed of space sound source energy tracking based on the space parameters of the sound source, realizing quick and stable tracking of energy of each sound source, and ensuring calculation accuracy and timeliness of gain of each sound source; in order to further improve the sound distortion caused by the reverberation, a frequency band self-adaptive gain compensation technology based on the reverberation parameter is constructed, and the influence of the reverberation on the sound frequency distortion is reduced.
In order to verify the effect of the spatial automatic gain technique proposed herein on frequency adaptive compensation, the spectrum results before and after frequency adaptive gain compensation are compared with those before and after frequency adaptive gain compensation are shown in fig. 3, and it can be seen from fig. 3 that the frequency adaptive gain compensation technique based on reverberation proposed herein can not only improve attenuation of high frequency energy, but also further improve direct component of sound and reduce influence of reverberation on sound quality.
As shown in fig. 4, in order to increase the spectrum comparison results before and after the frequency adaptive gain compensation, multiple experiments can reflect that the effectiveness of volume equalization in a multi-sound source switching scene can be verified through real recording data and simulation data of a conference room, and the effect of volume equalization in a reverberation scene can be improved.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (8)

1. A sound field perception-based multi-sound source automatic gain control method is characterized by comprising the following steps of: the method comprises the following steps:
s1: space initialization: setting the maximum number of sound sources, dividing the space, and dividing the plane by 0-180 degrees to form a plurality of areas;
initializing the automatic gain of multiple sound sources;
s2: and (3) time-frequency conversion: converting the plurality of microphone signals into a frequency domain through short-time Fourier transform;
s3: sound field perception: obtaining the signal to noise ratio of the sound source angle of each region;
selecting a sound source playing a leading role at a frequency point;
iteratively solving sound source space propagation parameters of different areas at the current moment;
s4: calculating the time-space similarity of sound sources: calculating the inter-frame similarity of the time space distribution of the sound source at the current moment;
s5: spatial automatic gain control: updating the energy tracking of the sound source and obtaining the space automatic gain;
s6: adaptive multiband automatic gain compensation: using the reverberation grade of the Mel filter group, calculating to obtain the automatic compensation gain of the frequency band;
s7: gain smoothing, time-frequency inverse transformation: the spatial gain is calculated and applied to the spectrum, and the processed audio is obtained by using short-time Fourier inverse transformation.
2. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S1: setting the maximum sound source number, dividing the space into 6 areas with 30 degrees as one area, dividing the plane 0-180 degrees into 6 areas, and initializing the angle eta= {15,45,75,105,135,165} of each area;
multiple sound source automatic gain initialization: automatic gain control energy smoothing factor alpha minmax Target energy level
3. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S2: the convolution model is widely applied to sound propagation in a closed space, and the mathematical model is as follows:
wherein: x is x i (t) represents an audio signal received by the ith microphone at time t; s is(s) j (t) represents a sound source j; n represents the number of sound sources;representing the transfer function of sound source j to microphone i;
model the microphone array signal x i (t) conversion to a frequency-domain formRe-expansion into a double microphone, assumingAssuming the first microphone is taken as a reference, one can get:
4. the method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S3: assuming that the propagation parameter a of each sound source is a constant 1, and the spatial information of the sound source is reflected on the sound source propagation parameter delta, the iterative solution of the sound source spatial propagation parameter delta is as follows:
wherein:gamma denotes a forgetting factor and beta denotes the update speed of the sound source space parameter.
5. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S4: based on the formula (23), the space in which the sound source with remarkable current moment is positioned is judged,
then, calculating the cross correlation of the spatial distribution vectors of the sound source at the current moment and the previous moment, and calculating a time-space similarity factor xi, wherein:
wherein if (DominateSource (t). Noteq. DominateStource (t-1)), h prob (t)=0,h cnt (t)=0。
6. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S5: an adaptive energy smoothing algorithm based on the time-space similarity is designed:
where α (t) is a dynamic energy temporal smoothing factor, α LT Is a fixed energy long-term tracking factor,
α(t)=(α min +(α maxmin ) ζ (t)) type (26)
The gain at time t can be obtained
7. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S6: when the sound propagates indoors, diffuse reflection is more likely to occur because the wavelength of the low-frequency component of the sound is longer, the reverberation of the low-frequency component which is the audio on the audio data is larger, and the energy loss is smaller; the high-frequency component is easier to generate specular reflection due to shorter wavelength, and the reverberation at the microphone is smaller than the low-frequency component, and the energy loss is large; to improve this phenomenon, mel filter banks are based on:
wherein: m is M F (ω) is the filter coefficients of the F-th mel-filter bank at frequency ω;
the reverberation levels K (F, τ) of the different mel filter banks are calculated:
based on the relationship between the degree of reverberation and the distance of the sound source from the microphone: the farther the sound source is from the microphone, the higher the degree of reverberation; with the low frequency part of the sound as a reference frequency band, automatic gain compensation based on the reverberation degree of the reference frequency band is constructed:
wherein F is sum Is the number of frequency bands.
8. The method for controlling the automatic gain of a plurality of sound sources based on sound field perception according to claim 1, wherein the method comprises the following steps: based on step S7:
smoothing the calculated space gain, controlling the update speed of the automatic gain, smoothing the gain in the frequency range from the time dimension and the frequency dimension, and simultaneously:
wherein K is min (t) tracking minimum acquisition of K (t) by improved minimum control recursive average technique;
in the formula (33), gain resetting is carried out when the reverberation degree is smaller than a certain threshold value, so that the problem of gain tracking errors caused by scattered noise and large reverberation is reduced;
in frequency, a gain interpolation algorithm is employed, namely:
finally, output through time-frequency inverse transformation:
X out =ISTFT(Gain*X in ) Formula (35).
CN202310648272.1A 2023-06-02 2023-06-02 Multi-sound source automatic gain control method based on sound field perception Withdrawn CN116543784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310648272.1A CN116543784A (en) 2023-06-02 2023-06-02 Multi-sound source automatic gain control method based on sound field perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310648272.1A CN116543784A (en) 2023-06-02 2023-06-02 Multi-sound source automatic gain control method based on sound field perception

Publications (1)

Publication Number Publication Date
CN116543784A true CN116543784A (en) 2023-08-04

Family

ID=87447112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310648272.1A Withdrawn CN116543784A (en) 2023-06-02 2023-06-02 Multi-sound source automatic gain control method based on sound field perception

Country Status (1)

Country Link
CN (1) CN116543784A (en)

Similar Documents

Publication Publication Date Title
US7567675B2 (en) System and method for automatic multiple listener room acoustic correction with low filter orders
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
CN103871421B (en) A kind of self-adaptation noise reduction method and system based on subband noise analysis
US8204252B1 (en) System and method for providing close microphone adaptive array processing
US8238569B2 (en) Method, medium, and apparatus for extracting target sound from mixed sound
US10930298B2 (en) Multiple input multiple output (MIMO) audio signal processing for speech de-reverberation
US9031257B2 (en) Processing signals
US20110096631A1 (en) Audio processing device
US9716962B2 (en) Audio signal correction and calibration for a room environment
KR20130084298A (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
JP3582712B2 (en) Sound pickup method and sound pickup device
CN108172231A (en) A kind of dereverberation method and system based on Kalman filtering
CN110111802B (en) Kalman filtering-based adaptive dereverberation method
WO2007123051A1 (en) Adaptive array controlling device, method, program, and adaptive array processing device, method, program
CN108630216B (en) MPNLMS acoustic feedback suppression method based on double-microphone model
CN112331226A (en) Voice enhancement system and method for active noise reduction system
WO2022000174A1 (en) Audio processing method, audio processing apparatus, and electronic device
US11153695B2 (en) Hearing devices and related methods
CN116543784A (en) Multi-sound source automatic gain control method based on sound field perception
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
JP5937451B2 (en) Echo canceling apparatus, echo canceling method and program
JP4116600B2 (en) Sound collection method, sound collection device, sound collection program, and recording medium recording the same
US9666206B2 (en) Method, system and computer program product for attenuating noise in multiple time frames
CN114724574A (en) Double-microphone noise reduction method with adjustable expected sound source direction
US20240087589A1 (en) Apparatus, Methods and Computer Programs for Spatial Processing Audio Scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230804

WW01 Invention patent application withdrawn after publication