CN115714948A - Audio signal processing method and device and storage medium - Google Patents

Audio signal processing method and device and storage medium Download PDF

Info

Publication number
CN115714948A
CN115714948A CN202211215359.1A CN202211215359A CN115714948A CN 115714948 A CN115714948 A CN 115714948A CN 202211215359 A CN202211215359 A CN 202211215359A CN 115714948 A CN115714948 A CN 115714948A
Authority
CN
China
Prior art keywords
sound pressure
gain
threshold
signal
noise ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211215359.1A
Other languages
Chinese (zh)
Inventor
万成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202211215359.1A priority Critical patent/CN115714948A/en
Publication of CN115714948A publication Critical patent/CN115714948A/en
Pending legal-status Critical Current

Links

Images

Abstract

The disclosure relates to an audio signal processing method, an audio signal processing device and a storage medium. The audio signal processing method includes: collecting an audio signal and determining the signal-to-noise ratio of the audio signal; determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold; performing gain processing on the audio signal based on the target gain. The gain effect of performing gain on the audio signal can be improved through the present disclosure.

Description

Audio signal processing method and device and storage medium
Technical Field
The present disclosure relates to the field of acoustoelectric technologies, and in particular, to an audio signal processing method, an audio signal processing apparatus, and a storage medium.
Background
The global hearing loss people are rising sharply, and the most common device for assisting hearing loss is a digital hearing aid, and the core technologies of the digital hearing aid are as follows: wide Dynamic Range Compression (WDRC), speech enhancement, echo suppression, frequency down algorithms, scene recognition, sound source localization, etc. In actual life, the ratio of mild to moderate hearing loss is higher, and the effect of the auxiliary hearing earphone on the crowd is better.
In the related art, the adaptive WDRC method using the uncomfortable threshold and the hearing threshold of the hearing impaired people takes personalized parameters of different people into consideration, so that the gain effect of the WDRC is more matched with the wearer, but the design of a WDRC gain curve generally directly follows the fitting formula used in digital hearing aids, and the commonly used formula comprises gain and output formula (POGO), NAL-RP, FIG6, NAL-NL2 and CAM2. These fitting formulas do not consider the quality of the received signal in the actual environment, and the design of the auxiliary listening earphone completely based on them may result in the equal amplification of the noise signal, so that the purpose of improving the understanding of the hearing impaired person on the sound signal cannot be achieved.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides an audio signal processing method, apparatus, and storage medium.
According to a first aspect of embodiments of the present disclosure, there is provided an audio signal processing method, the method comprising:
collecting an audio signal and determining the signal-to-noise ratio of the audio signal; determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold; performing gain processing on the audio signal based on the target gain.
In one embodiment, the determining a target gain according to the snr and a preset snr threshold includes:
if the signal-to-noise ratio is smaller than a first signal-to-noise ratio threshold value, determining that the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain; if the signal-to-noise ratio is larger than a second signal-to-noise ratio threshold value, determining that the target gain is a linear gain value; if the signal-to-noise ratio is larger than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, determining that the target gain is a Wide Dynamic Range Compression (WDRC) gain; wherein the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold.
In one embodiment, the target gain comprises a linear gain and a wide dynamic range compression, WDRC, gain;
the performing gain processing on the audio signal based on the target gain mode includes:
determining a self-adaptive compensation coefficient according to the signal-to-noise ratio; determining a target frequency segment to which the audio signal belongs, wherein different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used for representing the relationship among a self-adaptive compensation coefficient, a linear gain and a wide dynamic range compression gain; and determining linear gain and wide dynamic range compression gain based on the target gain compensation function corresponding to the target frequency segment and the self-adaptive compensation coefficient.
In one embodiment, the determining an adaptive compensation coefficient according to the signal-to-noise ratio includes:
if the signal-to-noise ratio is smaller than a third signal-to-noise ratio threshold value, the self-adaptive compensation coefficient is 0; and if the signal-to-noise ratio is greater than a third signal-to-noise ratio threshold and smaller than the first signal-to-noise ratio threshold, determining an adaptive compensation coefficient based on an e function of the signal-to-noise ratio.
In one embodiment, the wide dynamic range compression gain is determined as follows:
determining an input sound pressure level of the audio signal; determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve; determining a difference between the output sound pressure level and the input sound pressure level as a wide dynamic range compression gain.
In one embodiment, the determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve comprises:
determining parameter values of a wide dynamic range compression curve, wherein the parameter values comprise a lowest sound pressure input threshold value, a lowest sound pressure output threshold value, a first inflection point sound pressure threshold value, a second inflection point sound pressure threshold value, a first inflection point sound pressure threshold value gain and a second inflection point sound pressure threshold value gain, and the first inflection point sound pressure threshold value is smaller than the second inflection point sound pressure threshold value; if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, determining that the output sound pressure level is 0; if the input sound pressure level is greater than a lowest sound pressure input threshold and less than a lowest sound pressure output threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure input threshold, and the lowest sound pressure output threshold; if the input sound pressure level is greater than a lowest sound pressure output threshold and less than a first inflection point sound pressure threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain and a compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain and the second inflection point sound pressure threshold gain; and if the input sound pressure level is greater than the second inflection point sound pressure threshold, determining the output sound pressure level based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
In one embodiment, the method further comprises:
determining the pain threshold of the human ear according to the hearing threshold range of the human ear; and adjusting the output sound pressure level according to the pain threshold.
According to a second aspect of embodiments of the present disclosure, there is provided an audio signal processing apparatus, the apparatus comprising:
the determining unit is used for acquiring an audio signal and determining the signal-to-noise ratio of the audio signal; determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold;
a processing unit for performing gain processing on the audio signal based on the target gain.
In one embodiment, the determining unit determines the target gain according to the snr and a preset snr threshold as follows:
if the signal-to-noise ratio is smaller than a first signal-to-noise ratio threshold value, determining that the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain; if the signal-to-noise ratio is larger than a second signal-to-noise ratio threshold value, determining that the target gain is a linear gain value; if the signal-to-noise ratio is larger than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, determining that the target gain is a Wide Dynamic Range Compression (WDRC) gain; wherein the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold.
In one embodiment, the processing unit performs gain processing on the audio signal based on the target gain mode as follows:
determining a self-adaptive compensation coefficient according to the signal-to-noise ratio; determining a target frequency segment to which the audio signal belongs, wherein different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used for representing the relationship among a self-adaptive compensation coefficient, a linear gain and a wide dynamic range compression gain; and determining linear gain and wide dynamic range compression gain based on the target gain compensation function corresponding to the target frequency band and the self-adaptive compensation coefficient.
In one embodiment, the determining unit determines the adaptive compensation coefficient according to the signal-to-noise ratio in the following manner:
if the signal-to-noise ratio is smaller than a third signal-to-noise ratio threshold value, the self-adaptive compensation coefficient is 0; and if the signal-to-noise ratio is greater than a third signal-to-noise ratio threshold and smaller than the first signal-to-noise ratio threshold, determining an adaptive compensation coefficient based on an e function of the signal-to-noise ratio.
In one embodiment, the determining unit determines the wide dynamic range compression gain as follows:
determining an input sound pressure level of the audio signal; determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve; determining a difference between the output sound pressure level and the input sound pressure level as a wide dynamic range compression gain.
In one embodiment, the determining unit determines the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve in the following manner, including:
determining parameter values of a wide dynamic range compression curve, wherein the parameter values comprise a lowest sound pressure input threshold value, a lowest sound pressure output threshold value, a first inflection point sound pressure threshold value, a second inflection point sound pressure threshold value, a first inflection point sound pressure threshold value gain and a second inflection point sound pressure threshold value gain, and the first inflection point sound pressure threshold value is smaller than the second inflection point sound pressure threshold value; if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, determining that the output sound pressure level is 0; if the input sound pressure level is greater than a lowest sound pressure input threshold and less than a lowest sound pressure output threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure input threshold, and the lowest sound pressure output threshold; if the input sound pressure level is greater than a lowest sound pressure output threshold and less than a first inflection point sound pressure threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain and a compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain and the second inflection point sound pressure threshold gain; and if the input sound pressure level is greater than the second inflection point sound pressure threshold, determining the output sound pressure level based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
In one embodiment, the method comprises the step of,
the determining unit is also used for determining the pain threshold of the human ear according to the hearing threshold range of the human ear; the device further comprises: and the adjusting unit is used for adjusting the output sound pressure level according to the pain threshold.
According to a third aspect of the embodiments of the present disclosure, there is provided an audio signal processing apparatus including:
a processor; a memory for storing processor-executable instructions;
wherein the processor is configured to: performing the method of the first aspect or any one of the embodiments of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions stored therein, where the instructions when executed by a processor of a terminal enable the terminal to perform the method described in the first aspect or any one of the implementation manners of the first aspect.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: estimating and determining the signal-to-noise ratio of the audio signal through the acquired audio signal; according to the signal-to-noise ratio, a target gain used for performing gain processing on the audio signal is determined, the target gain takes the influence of the SNR into consideration, the receiving quality of the actual audio signal can be further considered, equal amplification on noise is avoided, and therefore the target gain after the gain processing can be improved by performing the gain processing on the audio signal based on the target gain.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic general structural diagram of an auxiliary listening device provided in an embodiment of the present disclosure.
Fig. 2 is a flow chart illustrating a method of audio signal processing according to an exemplary embodiment.
Fig. 3 is a schematic diagram of a simulation result of SNR estimation provided by the embodiment of the present disclosure.
Fig. 4 is a flow chart illustrating a method of audio signal processing according to an exemplary embodiment.
Fig. 5 is a flow chart illustrating a method of audio signal processing according to an exemplary embodiment.
Fig. 6 is a flow chart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 7 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 8 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 9 is a schematic diagram of a three-segment WDRC for a certain frequency segment according to an embodiment of the present disclosure.
Fig. 10 is a schematic diagram of processing results of speech by different methods provided by the embodiment of the present disclosure.
Fig. 11 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment.
Fig. 12 is a schematic view of a threshold curve provided by an embodiment of the present disclosure.
Fig. 13 is a schematic view of an application scenario of a secondary listening headset according to an embodiment of the present disclosure.
Fig. 14 is a flowchart illustrating an audio signal processing method according to an embodiment of the disclosure.
Fig. 15 is a flowchart illustrating an audio signal processing method according to an embodiment of the disclosure.
Fig. 16 is a flowchart illustrating an audio signal processing method according to an embodiment of the disclosure.
Fig. 17 is a block diagram 100 illustrating an audio signal processing apparatus according to an exemplary embodiment.
Fig. 18 is a block diagram 200 illustrating an apparatus for audio signal processing according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure.
The audio signal processing method provided by the embodiment of the disclosure can be used on various audio signal processing devices, and particularly can be applied to hearing assistance devices for assisting hearing patients to listen. The hearing assistance devices are most commonly digital hearing aids, including glasses type hearing aids, box type hearing aids, ear-hang type hearing aids, in-ear type hearing aids, and the like, and can also be wired or wireless head-mounted devices, earphones. The core technology of the auxiliary listening is as follows: wide dynamic range compression algorithm, speech enhancement, echo suppression, frequency reduction algorithm, scene recognition, sound source localization and the like. In actual life, the ratio of light to medium hearing loss is high, and the effect of the auxiliary hearing earphone on the crowd is good.
In an implementation manner of the audio signal processing method provided by the embodiment of the disclosure, various sound signals in a daily environment are collected by a microphone of an audio processing device, and are converted into electric signals and then sent to the audio processing device, an internal system of the audio processing device processes and amplifies the electric signals, a speaker in the audio processing device is a sound output device, the electric signals are converted into sound signals, and finally the sound signals are output.
Fig. 1 is a schematic diagram of a general structure of an auxiliary listening device according to an embodiment of the present disclosure, and as shown in fig. 1, an audio signal is input into a microphone, processed by an amplifier, and output through an earphone.
The audio signal processing method provided by the embodiment of the disclosure is suitable for a scene of an audio signal gain formed by using one or more microphones and one or more loudspeakers. A typical application scenario of the embodiment of the present disclosure is an auxiliary listening earphone scenario including one loud speaker. The auxiliary listening earphone generally includes a small speaker and a microphone. The embodiment of the present disclosure takes an auxiliary listening earphone as an example for explanation.
Fig. 2 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 2, including the following steps.
In step S11, an audio signal is acquired and a signal-to-noise ratio of the audio signal is determined.
In step S12, a target gain is determined according to the snr and a preset snr threshold.
In step S13, gain processing is performed on the audio signal based on the target gain.
In the embodiment of the disclosure, the signal-to-noise ratio of the audio signal collected by the microphone is determined, the gain mode and the target gain are determined according to the signal-to-noise ratio, and the audio signal is subjected to gain processing based on the target gain. The method disclosed by the invention can not only improve the loudness of the voice, but also improve the intelligibility of the voice.
In the embodiment of the present disclosure, an auxiliary listening earphone is taken as an example for explanation. Before the sound signal amplification is carried out, steps such as beam forming, blind source separation, voice enhancement and the like may be carried out, so that if the sound signal which needs to be amplified finally by the auxiliary earphone is assumed to be t (n), the sampling rate f of the voice signal processing system is assumed to be t (n) s Typically 16kHz, 44.1kHz or 48kHz, the present disclosure uses a 16kHz sampling rate due to hardware and computational limitations. Since the speech signal approximately satisfies the short-time stationary characteristic between 10-40ms, information such as the second-order statistic thereof can be used, and so on, the received signal is subjected to STFT, i.e., frame windowing and FFT. The present disclosure selects each frame to be 32ms, then the frame length L =512; the window function w (n) is a Hanning window, and the length of the window function w (n) is the same as the frame length; the frame shift is 50% of the frame length, i.e. inc =256. The frame windowed signal st (n, m) can be obtained as formula (1):
st(n,m)=t((m-1)*inc+n)*w(n),0≤n≤(L-1) (1)
where m denotes a frame number index, and n denotes a data point index of the mth frame of audio data. And then, performing FFT on st (n, m) to obtain the spectrum data T (k, m) of the microphone, wherein k represents the index of the frequency point.
In the embodiment of the present disclosure, an SNR estimation method is provided, and since time domain energy is theoretically separated from frequency energy by L times, the method can be used in a time domain or a frequency domain. In order to reduce the amount of computation, the present disclosure uses a time domain estimation method assuming that the energy of speech, the energy of noise, and the average energy of noise in a received signal are σ respectively 2 (m)、
Figure BDA0003876510480000061
And
Figure BDA0003876510480000062
Figure BDA0003876510480000063
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003876510480000064
assuming that only noise is present in the first 5 frames of the signal, i.e. noise
Figure BDA0003876510480000065
Calculating the average energy of the noise starting from the 6 th frame
Figure BDA0003876510480000066
And make it
Figure BDA0003876510480000067
From frame 7, the noise energy of the current frame is modified to be the weighted value of the noise energy of the previous frame and the speech energy of the current frame, as shown in equation 2:
Figure BDA0003876510480000071
in the above-mentioned formula, the compound has the following structure,
Figure BDA0003876510480000072
representing the noise estimation coefficient generated by the S-shaped function, wherein a and T respectively represent the adjustment parameter of the signal-to-noise ratio and the parameter of the deviation of the noise estimation coefficient, 2 is taken in the disclosure
Figure BDA0003876510480000073
Representing the ratio of the speech energy to the average energy of the noise, the snr of the current frame can then be calculated, as shown in equation 3:
Figure BDA0003876510480000074
fig. 3 is a schematic diagram of SNR estimation simulation results provided by an embodiment of the present disclosure, as shown in fig. 3, which is a result of simulation performed by the SNR estimation method, and (a) and (b) in fig. 3 respectively show results of a background noise being white noise plus Babble (Babble can be understood as low-frequency noise in an office environment), and a white noise plus Cafeteria (Cafeteria can be understood as noise in a dining room environment). The calculated SNR result and the real SNR result have small difference under most conditions, and the calculated result is smaller than the real result when the difference is large. The SNR estimate can only be smaller since the present disclosure uses larger gain values at high SNR, otherwise the noise signal would be abnormally amplified.
In the embodiment of the present disclosure, the gain mode is determined based on the signal-to-noise ratio and the signal-to-noise ratio threshold, generally, the larger the signal-to-noise ratio is, the smaller the noise mixed in the signal is, the higher the sound quality of the sound playback is, otherwise, the opposite is. In the present disclosure, the first signal-to-noise ratio threshold is 10dB, and the second signal-to-noise ratio threshold is 15dB.
Fig. 4 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 4, including the following steps.
In step S21, a target gain is determined.
In step S22a, if the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, it is determined that the target gain includes a linear gain and a wide dynamic range compression WDRC gain.
In step S22b, if the snr is greater than the second snr threshold, the target gain is determined to be a linear gain value.
In step S22c, if the signal-to-noise ratio is greater than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, the target gain is determined to be the wide dynamic range compression WDRC gain.
In the embodiment of the present disclosure, after the current SNR result is obtained through calculation, if the SNR is greater than 15dB, it indicates that the voice information in the current received signal is dominant, and linear gain may be directly used, that is, gain with the same magnitude is performed on all frequency band signals; the linear amplification has the advantages that: for the input signal with medium sound pressure level, the linear amplification can provide proper gain, the signal has no distortion, the voice quality is good, and the intelligibility is high. The disadvantages of linear amplification are: for low sound pressure level input signals, linear amplification may provide insufficient gain; however, for high sound pressure level input signals, the patient often feels discomfort due to the excessive gain provided by linear amplification. If 10dB-less SNR (signal to noise ratio) is covered to 15dB, the voice information in the current receiving signal is dominant, and the gain value of WDRC is used without compensation; if the SNR is <10dB, it means that the influence of noise signals, especially in the low frequency band, needs to be considered currently.
Fig. 5 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 5, including the following steps.
In step S31, an adaptive compensation coefficient is determined based on the signal-to-noise ratio.
In the embodiment of the present disclosure, γ is an adaptive compensation coefficient based on the SNR result, and is specifically shown in formula (4):
Figure BDA0003876510480000081
in step S32, a target frequency segment to which the audio signal belongs is determined, and different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used to represent the relationship between the adaptive compensation coefficients, the linear gains, and the wide dynamic range compression gains.
In step S33, a linear gain and a wide dynamic range compression gain are determined based on the target gain compensation function and the adaptive compensation coefficient corresponding to the target frequency segment.
In the embodiment of the disclosure, if the SNR is less than 10dB, it needs to be further determined that the SNR is less than 5dB and less than 5dB but less than 10db, when the sound pressure level of the voice is above the hearing threshold of the hearing impaired, the linear amplification is better than the wide dynamic compression technology in terms of voice recognition rate, and when the voice signal-to-noise ratio is low, the wide dynamic compression can enable the moderate-heavy hearing impaired patient to obtain a higher voice recognition rate. When the signal-to-noise ratio is lower than 5dB, the high recognition rate can be obtained by compensating the voice by adopting the wide dynamic compression algorithm, and when the signal-to-noise ratio is higher than 5dB and lower than 10dB, the high recognition rate can be obtained by simultaneously compensating the voice by adopting the wide dynamic compression algorithm and linear amplification. Since the human ear has a high recognition rate for low-frequency speech, the present disclosure utilizes respective advantages of linear amplification and conventional wide dynamic compression in frequency response compensation, and determines a target gain used for gain processing of an audio signal based on an adaptive compensation coefficient, a determined linear gain value and a wide dynamic range compression gain value.
In the embodiment of the present disclosure, if the SNR is less than 10dB, it indicates that the influence of the noise signal needs to be considered currently, especially in the low frequency band. Since most of the environmental noise is concentrated in the low frequency band in the actual environment, the final gain is shown in equation (5):
Figure BDA0003876510480000082
wherein, G L And G W Respectively representing the linear gain value and the WDRC preset gain value, and the disclosure takes G L =25。
In the embodiment of the present disclosure, when the audio signal frequency is less than 0.5kHz, the gain is calculated by 0.8 × (γ G) L +(1-γ)G W ) When the frequency of the audio signal is equal to or greater than 0.5kHz, the gain is calculated by the formula (gamma G) L +(1-γ)G W )。
How to determine the adaptive compensation coefficients is described in detail below. Fig. 6 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, including the following steps as shown in fig. 6.
In step S41, an adaptive compensation coefficient is determined.
In step S42a, if the snr is less than the third snr threshold, the adaptive compensation factor is 0.
In step S42b, if the snr is greater than the third snr threshold and less than the first snr threshold, the adaptive compensation coefficient is determined based on the e-function of the snr.
In the embodiment of the present disclosure, γ is an adaptive compensation coefficient based on the SNR result, and is specifically shown in formula (4):
Figure BDA0003876510480000091
it can be seen from formula 4 that the weights occupied by the linear amplification and the wide dynamic compression are adaptively adjusted according to the signal-to-noise ratio of the speech in a certain frequency band. When the signal-to-noise ratio of the voice is below 5dB, the size of the self-adaptive compensation coefficient gamma is 0, namely the signal-to-noise ratio is smaller than a certain threshold value, the linear gain value is 0, and the gain compensation mode is wide dynamic compression; and when the signal-to-noise ratio interval of the voice is [5dB,10dB ], the self-adaptive compensation coefficient gamma is an e function based on the signal-to-noise ratio, the size of gamma is between (0, 1), then the linear amplification and the wide dynamic compression are combined to perform the self-adaptive frequency response compensation, and the closer the signal-to-noise ratio is to 10dB, the closer the size of gamma is to 1, namely the more the percentage of the linear amplification gain is, and conversely, the closer the signal-to-noise ratio is to 5dB, the closer the size of gamma is to 0, namely the more the percentage of the wide dynamic compression gain is.
Fig. 7 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 7, including the following steps.
In step S51, an input sound pressure level of the audio signal is determined.
In step S52, an output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve.
In step S53, the difference between the output sound pressure level and the input sound pressure level is determined as the wide dynamic range compression gain.
Fig. 8 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 8, including the following steps.
In step S61, a parameter value of the wide dynamic range compression curve is determined, where the parameter value includes a lowest sound pressure input threshold, a lowest sound pressure output threshold, a first inflection point sound pressure threshold, a second inflection point sound pressure threshold, a first inflection point sound pressure threshold gain, and a second inflection point sound pressure threshold gain, and the first inflection point sound pressure threshold is smaller than the second inflection point sound pressure threshold.
In step S62, an output sound pressure level is determined based on the input sound pressure level and the wide dynamic range compression curve.
In step S63a, if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, it is determined that the output sound pressure level is 0.
In step S63b, if the input sound pressure level is greater than the lowest sound pressure input threshold and less than the lowest sound pressure output threshold, the output sound pressure level is determined based on the input sound pressure level, the lowest sound pressure input threshold, and the lowest sound pressure output threshold.
In step S63c, if the input sound pressure level is greater than the lowest sound pressure output threshold and less than the first inflection point sound pressure threshold, the output sound pressure level is determined based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain, and the compression ratio, which is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain, and the second inflection point sound pressure threshold gain.
In step S63d, if the input sound pressure level is greater than the second inflection point sound pressure threshold, the output sound pressure level is determined based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
Fig. 9 is a schematic diagram of a three-stage WDRC for a certain frequency band provided by the embodiment of the present disclosure, as shown in fig. 9, where Thi, tho, LK, HK, LKG, and HKG respectively represent: minimum input threshold, minimum output, low inflection point, high inflection point, low inflection point gain, high inflection point gain. Since the human hearing is complex, LK-HK segments may be obtained from multiple compression curves, and the disclosure is only exemplified by a standard three-segment WDRC, and fig. 9 can be described using equation (6):
Figure BDA0003876510480000101
wherein, SPL in And SPL out Input and output sound pressure levels, i.e. original sound pressure level and amplified sound pressure level, respectively, gain G of WDRC W =SPL out -SPL in Compression ratio
Figure BDA0003876510480000102
It means that for every 1dB increase in input, the output increases by 1/CRdB.
In the embodiment of the present disclosure, since the speech signal is an unsteady signal, the sound pressure level of the signal in each channel changes with time, and therefore, the compensation gain in each channel also changes with time. An input/output curve of the in-channel signal WDRC is shown in fig. 9, where the curve is composed of 3 sections, the linear gain section, if the input sound pressure level uses linear gain below the low knee point (LK); a wide dynamic range compression part, which compresses the input sound pressure level between LK and a high inflection point (HK) by using the wide dynamic range with a compression ratio CR of 1, namely, the output is increased by 1/CRdB when the input is increased by 1 dB; the gain at LK was LKG and the gain at HKG was HKG; and a limiter compression section for compressing the input sound pressure level higher than HK by using a limiter. If the input sound pressure level is lower than the input threshold value THi, the output sound pressure level is 0.
In the embodiment of the present disclosure, THi may be understood as representing the hearing threshold of a normal person, THo may be understood as representing the hearing threshold of an impaired person, LK may represent the most appropriate hearing threshold of a normal person, LK + LKG may represent the most appropriate hearing threshold of an impaired person, HK may represent the pain threshold of a normal person, and HK + HKG may represent the pain threshold of an impaired person. When the sound pressure level of the inputSPL in Less than the hearing threshold of healthy persons, with SPL out =SPL in . This is because sounds that are not heard by normal human ears do not map into the hearing threshold range of the hearing impaired, in which case the hearing impaired does not hear the sounds.
When the sound pressure level THI is input<=SPL in <And = LK, the sound intensity is higher than the hearing threshold but lower than the optimum threshold, and a larger gain is needed, so that the auxiliary hearing earphone starts to start linear gain amplification, and the hearing impaired person can hear a weak sound.
When the sound pressure level LK is inputted<=SPL in <And = HK, the loudness of the sound is larger than the optimum threshold at the moment but smaller than the discomfort threshold at the moment, the sound intensity is stronger at the moment, only weak compensation is needed, meanwhile, the slight degree of the sound is controlled to be below the discomfort threshold, therefore, the auxiliary hearing earphone starts to carry out WDRC gain processing algorithm, the hearing impaired can hear stronger sound, and when SPL (sinusoidal pulse duration) is reached, the sound intensity is higher than the optimum threshold, but is smaller than the discomfort threshold, the auxiliary hearing earphone starts to hear stronger sound, and when the sound intensity is higher than the optimum threshold, the auxiliary hearing earphone starts to hear the stronger sound in SPL when = LK out = LK + LKG, hearing impaired person reaches optimum threshold.
When the sound pressure level SPL is input in >= HK, i.e. the discomfort threshold of the hearing impaired. In order to protect the hearing impaired from further damage due to excessive sound, it is necessary to suppress the excessive sound. At the moment, the output of the auxiliary hearing earphone is kept stable, and the hearing loss person feels uncomfortable when the sound intensity reaches the pain threshold range.
The NAL-RP fitting formula used in the present disclosure is obtained by improving the NAL-R fitting formula, and also based on a half gain rule, the loudness equalization for steep-falling hearing loss compensation is corrected. The above parameters of WDRC are given by NAL-RP fitting formula, and no specific values are given in the present disclosure.
Fig. 10 is a schematic diagram illustrating processing results of different methods for speech according to an embodiment of the present disclosure, as shown in fig. 10, which shows processing results of different methods for speech, and diagrams (a) - (c) respectively show processing results of original speech and DRC, where DRC shows the same gain result given to signals of all frequency bands, and WDRC based on NAL-RP fitting formula. It can be derived that the gain of the method used by the present disclosure for low frequencies is smaller than the gain for mid-band with respect to DRC. This is mainly because the low-frequency mid-range noise accounts for a large proportion in the actual environment and does not contribute to the semantic understanding as much as the mid-range.
Fig. 11 is a flowchart illustrating an audio signal processing method according to an exemplary embodiment, as shown in fig. 11, including the following steps.
In step S71, the pain threshold of the human ear is determined according to the hearing threshold range of the human ear.
In the embodiment of the disclosure, the pain threshold of the human ear is determined according to the hearing threshold range of the human ear. Hearing loss causes an increase in the hearing threshold of the patient and a decrease in the temporal and frequency resolution of the patient's perception of the sound signal. The perception of the sound signal by the patient's cochlea changes and the patient's hearing threshold rises, but the discomfort threshold remains substantially unchanged. One characteristic of hearing impaired patients is that the hearing threshold rises, causing a decrease in the audibility of the signal, which causes a decrease in the intelligibility of the signal. For patients with mild hearing loss, some sounds may be heard in daily life, and some sounds may not be heard. For patients with severe or extremely severe hearing loss, almost all sounds in daily life are inaudible unless they are shout at close range. The present disclosure provides a method for estimating pain threshold DCL from hearing threshold, see equation (7):
Figure BDA0003876510480000121
when the hearing threshold HL is less than or equal to 60, the pain threshold DCL is estimated to be 105dB, and when the hearing threshold is greater than 60, the pain threshold DCL is estimated to be 105+ (HL-60)/2.
Fig. 12 is a schematic view of a hearing threshold curve provided by an embodiment of the present disclosure, as shown in fig. 12, which is an average of the hearing threshold curves of the left ear and the right ear of 15 experimental persons. The difference in threshold curves of the left and right ears can be made to be primarily in the low and high frequency bands, with the end use of the present disclosure being the result of the right ear.
In step S72, the output sound pressure level is adjusted according to the pain threshold.
In the embodiment of the present disclosure, since the pain threshold is an unacceptable threshold for human ears, it is necessary to ensure that the final output sound pressure level of each frequency band is less than the pain threshold, and the final gain needs to be adjusted by using formula (8):
Figure BDA0003876510480000122
when the audio signal is less than 0.5kHz, the output sound pressure level should be less than the pain threshold minus 10, when the audio signal is greater than or equal to 0.5kHz, the output sound pressure level should be less than the pain threshold minus 5, and the output sound pressure level is adjusted to not exceed the set pain threshold by setting the pain threshold.
Fig. 13 is a schematic view of an application scenario of an auxiliary listening headset provided by an embodiment of the present disclosure, and as shown in fig. 13, the auxiliary listening headset needs to enhance a sound signal to be processed, so as to improve understanding of the sound signal by a wearer. Firstly, calculating the signal-to-noise ratio of a received signal, selecting different WDRC modes and compensation coefficients, then calculating the gain values of different frequency bands and different input sound pressure level signals according to a WDRC gain region, and finally, gaining and outputting the signals.
Fig. 14 is a schematic flow chart of an audio signal processing method according to an embodiment of the present disclosure, and as shown in fig. 14, the present disclosure calculates gains of different frequency bands and different input sound pressure level signals according to a preset WDRC gain curve, and adaptively adjusts the gains of the frequency bands by using SNR estimation and pain threshold estimation. The basic flow is as follows: 1. and (4) preprocessing data. And performing STFT (frame-by-frame windowing) on the audio data input into the microphone, namely performing FFT (fast Fourier transform) to obtain a corresponding frequency domain signal. And 2.SNR estimation. The SNR of the received signal is estimated in real time. WDRC gain calculation. The key technology of the auxiliary listening earphone is a WDRC algorithm, and the core idea of the auxiliary listening earphone is to give different gains to signals with different sound pressure levels aiming at a specific frequency band. Since the human ear is non-linear to frequency perception, each frequency segment to be processed contains a different bandwidth (frequency) and can be determined by Bark band, mel, or Gamatone filter bank. For example, in the disclosure, a Mel filter bank is used, a plurality of band pass filters with triangular filtering properties are arranged on the frequency of a voice signal, the center frequency is equal in bandwidth in the Mel frequency range, the sound pressure level of each frequency band is further calculated, and the gains of the frequency bands are obtained according to a preset gain curve. A typical three-segment WDRC gain curve includes 4 regions, a mute region, a linear amplification region, a compression region, and a confinement region. The mute area sets the signal as 0, the gains of all the signals in the linear amplification area are consistent, the larger the input sound pressure level in the compression area is, the smaller the gain is, and the limit area limits or negatively gains the sound signal. 4. And (4) judging and adjusting the pain threshold. And (4) estimating the pain threshold according to the hearing threshold curve, and finely adjusting the gain value obtained in the step (3).
Fig. 15 is a schematic flowchart of an audio signal processing method provided by an embodiment of the disclosure, and as shown in fig. 15, a multi-channel decomposition is first performed on a signal by using a filter bank; secondly, extracting the envelope of the signal in the channel and calculating the sound pressure level of the signal; again, the gain to be compensated is determined from the sound pressure level and the hearing threshold of the patient in that channel: finally, the channel signal is compensated.
Fig. 16 is a schematic flow chart of an audio signal processing method according to an embodiment of the disclosure, and as shown in fig. 16, an audio signal is input, and an audio signal with a plurality of preset frequency bands is obtained through a Mel filter, and is subjected to SNR determination, if the SNR is determined, the SNR is determined>15dB represents that the voice information in the current receiving signal is dominant, and linear gain can be directly used, namely, the gain with the same size is carried out on all frequency band signals; if 10dB<SNR<15dB represents that the voice information in the current receiving signal is dominant, and the gain value of WDRC does not need to be compensated; if SNR<10dB represents that the influence of noise signals needs to be considered at present, particularly low frequency range is considered, the frequency of the audio signal is further judged, and if the frequency of the audio signal is less than 0.5kHz, the calculation formula of the gain is 0.8 x (gamma G) L +(1-γ)G W ) If the frequency of the audio signal is greater than or equal to 0.5kHz, the gain is calculated as (gamma G) L +(1-γ)G W ) At this time, if the SNR is<5dB, the adaptive compensation coefficient gamma is 0, namely the signal-to-noise ratio is less than a certain threshold value, the linear gain value is 0, and the gain only takes the WDRC gain value; if the signal-to-noise ratio is 5dB<=SNR<=10dB, the adaptive compensation factor γ is an e-function based on the signal-to-noise ratio. And finally, determining the pain threshold of the human ear according to the hearing threshold range of the human ear, adjusting the output sound pressure level according to the pain threshold, and outputting the signal.
The method comprises the steps of collecting audio signals and determining the signal-to-noise ratio of the audio signals; calculating to obtain target gains of different frequency bands and different input sound pressure level signals according to the signal-to-noise ratio and a preset WDRC gain curve; the audio signal is gain processed based on the target gain, while the gain for each frequency band is adaptively adjusted using the pain threshold estimate. By the method, the audio gain effect can be improved, and the understanding of hearing loss people to the voice in the noise environment is further improved.
Based on the same conception, the embodiment of the disclosure also provides an audio signal processing device.
It is understood that the audio signal processing apparatus provided by the embodiments of the present disclosure includes a hardware structure and/or a software module for performing the respective functions in order to implement the above functions. The disclosed embodiments can be implemented in hardware or a combination of hardware and computer software, in combination with the exemplary elements and algorithm steps disclosed in the disclosed embodiments. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the subject matter of the embodiments of the present disclosure.
Fig. 17 is a block diagram 100 illustrating an audio signal processing apparatus according to an exemplary embodiment. Referring to fig. 17, the apparatus 100 includes a determination unit 101, a processing unit 102, and an adjustment unit 103.
The determining unit 101 is configured to acquire an audio signal and determine a signal-to-noise ratio of the audio signal; and determining the target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold value.
A processing unit 102, configured to perform gain processing on the audio signal based on the target gain.
In one embodiment, the determining unit 101 determines the target gain used for gain processing of the audio signal according to the signal-to-noise ratio in the following manner:
if the signal-to-noise ratio is smaller than a first signal-to-noise ratio threshold value, determining that the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain; if the signal-to-noise ratio is larger than the second signal-to-noise ratio threshold value, determining that the target gain is a linear gain value; if the signal-to-noise ratio is larger than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, determining that the target gain is a wide dynamic range compression WDRC gain; wherein the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold.
In one embodiment, the determining unit 101 determines the target gain used for gain processing of the audio signal based on the linear gain value and the wide dynamic range compression gain value in the following manner:
determining a self-adaptive compensation coefficient according to the signal-to-noise ratio; determining a target frequency segment to which the audio signal belongs, wherein different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used for representing the relationship among a self-adaptive compensation coefficient, linear gain and wide dynamic range compression gain; and determining linear gain and wide dynamic range compression gain based on the target gain compensation function corresponding to the target frequency segment and the self-adaptive compensation coefficient.
In one embodiment, the determining unit 101 determines the adaptive compensation coefficient according to the signal-to-noise ratio in the following manner:
if the signal-to-noise ratio is smaller than the third signal-to-noise ratio threshold value, the self-adaptive compensation coefficient is 0; and if the signal-to-noise ratio is greater than the third signal-to-noise ratio threshold and smaller than the first signal-to-noise ratio threshold, determining the self-adaptive compensation coefficient based on the e function of the signal-to-noise ratio.
In one embodiment, the determining unit 101 determines the wide dynamic range compression gain value as follows:
determining an input sound pressure level of the audio signal; determining an output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve; the difference between the output and input sound pressure levels is determined as the wide dynamic range compression gain.
In one embodiment, the determining unit 101 determines the output sound pressure level based on the input sound pressure level and the wide dynamic range compression curve in the following manner, including:
determining parameter values of a wide dynamic range compression curve, wherein the parameter values comprise a lowest sound pressure input threshold value, a lowest sound pressure output threshold value, a first inflection point sound pressure threshold value, a second inflection point sound pressure threshold value, a first inflection point sound pressure threshold value gain and a second inflection point sound pressure threshold value gain, and the first inflection point sound pressure threshold value is smaller than the second inflection point sound pressure threshold value; if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, determining that the output sound pressure level is 0; if the input sound pressure level is greater than the lowest sound pressure input threshold and less than the lowest sound pressure output threshold, determining an output sound pressure level based on the input sound pressure level, the lowest sound pressure input threshold and the lowest sound pressure output threshold; if the input sound pressure level is greater than the lowest sound pressure output threshold and less than the first inflection point sound pressure threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain and the compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain and the second inflection point sound pressure threshold gain; and if the input sound pressure level is greater than the second inflection point sound pressure threshold, determining the output sound pressure level based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
In one embodiment, the determining unit 101 is further configured to determine a pain threshold of the human ear according to a hearing threshold range of the human ear; the device still includes: and the adjusting unit 103 is used for adjusting the output sound pressure level according to the pain threshold.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 18 is a block diagram 200 illustrating an apparatus for audio signal processing according to an exemplary embodiment. For example, the apparatus 200 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 18, the apparatus 200 may include one or more of the following components: a processing component 202, a memory 204, a power component 206, a multimedia component 208, an audio component 210, an input/output (I/O) interface 212, a sensor component 214, and a communication component 216.
The processing component 202 generally controls overall operation of the device 200, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 202 may include one or more processors 220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 202 can include one or more modules that facilitate interaction between the processing component 202 and other components. For example, the processing component 202 may include a multimedia module to facilitate interaction between the multimedia component 208 and the processing component 202.
The memory 204 is configured to store various types of data to support operations at the device 200. Examples of such data include instructions for any application or method operating on the device 200, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 204 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 206 provide power to the various components of device 200. Power components 206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 200.
The multimedia component 208 includes a screen that provides an output interface between the device 200 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 208 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 200 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 210 is configured to output and/or input audio signals. For example, audio component 210 includes a Microphone (MIC) configured to receive external audio signals when apparatus 200 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 204 or transmitted via the communication component 216. In some embodiments, audio component 210 also includes a speaker for outputting audio signals.
The I/O interface 212 provides an interface between the processing component 202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 214 includes one or more sensors for providing various aspects of status assessment for the device 200. For example, the sensor assembly 214 may detect an open/closed state of the device 200, the relative positioning of components, such as a display and keypad of the device 200, the sensor assembly 214 may also detect a change in the position of the device 200 or a component of the device 200, the presence or absence of user contact with the device 200, the orientation or acceleration/deceleration of the device 200, and a change in the temperature of the device 200. The sensor assembly 214 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 216 is configured to facilitate wired or wireless communication between the apparatus 200 and other devices. The device 200 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 216 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 216 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as memory 204, that are executable by processor 220 of device 200 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It is understood that "a plurality" in this disclosure means two or more, and other words are analogous. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "first," "second," and the like are used to describe various information and that such information should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," and the like are fully interchangeable. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.
It will be further understood that, unless otherwise specified, "connected" includes direct connections between the two without the presence of other elements, as well as indirect connections between the two with the presence of other elements.
It is further to be understood that while operations are depicted in the drawings in a particular order, this is not to be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the scope of the appended claims.

Claims (16)

1. A method of audio signal processing, the method comprising:
collecting an audio signal and determining the signal-to-noise ratio of the audio signal;
determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold;
performing gain processing on the audio signal based on the target gain.
2. The method of claim 1, wherein determining a target gain based on the snr and a predetermined snr threshold comprises:
if the signal-to-noise ratio is smaller than a first signal-to-noise ratio threshold value, determining that the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain;
if the signal-to-noise ratio is larger than a second signal-to-noise ratio threshold value, determining that the target gain is a linear gain value;
if the signal-to-noise ratio is larger than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, determining that the target gain is a Wide Dynamic Range Compression (WDRC) gain;
wherein the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold.
3. The method of claim 2, wherein the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain;
the performing gain processing on the audio signal based on the target gain mode includes:
determining a self-adaptive compensation coefficient according to the signal-to-noise ratio;
determining a target frequency segment to which the audio signal belongs, wherein different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used for representing the relationship among an adaptive compensation coefficient, a linear gain and a wide dynamic range compression gain;
and determining linear gain and wide dynamic range compression gain based on the target gain compensation function corresponding to the target frequency segment and the self-adaptive compensation coefficient.
4. The method of claim 3, wherein determining an adaptive compensation factor according to the signal-to-noise ratio comprises:
if the signal-to-noise ratio is smaller than a third signal-to-noise ratio threshold value, the self-adaptive compensation coefficient is 0;
and if the signal-to-noise ratio is greater than a third signal-to-noise ratio threshold and smaller than the first signal-to-noise ratio threshold, determining an adaptive compensation coefficient based on an e function of the signal-to-noise ratio.
5. The method of any of claims 2 to 4, wherein the wide dynamic range compression gain is determined by:
determining an input sound pressure level of the audio signal;
determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve;
determining a difference between the output sound pressure level and the input sound pressure level as a wide dynamic range compression gain.
6. The method of claim 5, wherein determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve comprises:
determining parameter values of a wide dynamic range compression curve, wherein the parameter values comprise a lowest sound pressure input threshold value, a lowest sound pressure output threshold value, a first inflection point sound pressure threshold value, a second inflection point sound pressure threshold value, a first inflection point sound pressure threshold value gain and a second inflection point sound pressure threshold value gain, and the first inflection point sound pressure threshold value is smaller than the second inflection point sound pressure threshold value;
if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, determining that the output sound pressure level is 0;
if the input sound pressure level is greater than a lowest sound pressure input threshold and less than a lowest sound pressure output threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure input threshold, and the lowest sound pressure output threshold;
if the input sound pressure level is greater than a lowest sound pressure output threshold and less than a first inflection point sound pressure threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain and a compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain and the second inflection point sound pressure threshold gain;
and if the input sound pressure level is greater than the second inflection point sound pressure threshold, determining the output sound pressure level based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
7. The method of claim 5, further comprising:
determining the pain threshold of the human ear according to the hearing threshold range of the human ear;
and adjusting the output sound pressure level according to the pain threshold.
8. An audio signal processing apparatus, characterized in that the apparatus comprises:
the determining unit is used for acquiring an audio signal and determining the signal-to-noise ratio of the audio signal; determining a target gain according to the signal-to-noise ratio and a preset signal-to-noise ratio threshold value;
a processing unit for performing gain processing on the audio signal based on the target gain.
9. The apparatus of claim 8, wherein the determining unit determines the target gain according to the snr and a preset snr threshold as follows:
if the signal-to-noise ratio is smaller than a first signal-to-noise ratio threshold value, determining that the target gain comprises a linear gain and a Wide Dynamic Range Compression (WDRC) gain;
if the signal-to-noise ratio is larger than a second signal-to-noise ratio threshold value, determining that the target gain is a linear gain value;
if the signal-to-noise ratio is larger than the first signal-to-noise ratio threshold and smaller than the second signal-to-noise ratio threshold, determining that the target gain is a Wide Dynamic Range Compression (WDRC) gain;
wherein the second signal-to-noise ratio threshold is greater than the first signal-to-noise ratio threshold.
10. The apparatus of claim 9, wherein the processing unit performs gain processing on the audio signal based on the target gain pattern by:
determining a self-adaptive compensation coefficient according to the signal-to-noise ratio;
determining a target frequency segment to which the audio signal belongs, wherein different frequency segments correspond to different gain compensation functions, and the gain compensation functions are used for representing the relationship among a self-adaptive compensation coefficient, a linear gain value and a wide dynamic range compression gain value;
and determining a target gain used for gain processing of the audio signal based on the adaptive compensation coefficient, the determined linear gain value and the wide dynamic range compression gain value based on a target gain compensation function corresponding to the target frequency band.
11. The apparatus of claim 10, wherein the determining unit determines the adaptive compensation factor according to the signal-to-noise ratio by:
if the signal-to-noise ratio is smaller than a third signal-to-noise ratio threshold value, the self-adaptive compensation coefficient is 0;
and if the signal-to-noise ratio is greater than a third signal-to-noise ratio threshold and smaller than the first signal-to-noise ratio threshold, determining an adaptive compensation coefficient based on an e function of the signal-to-noise ratio.
12. The apparatus according to any of claims 9 to 11, wherein the determining unit determines the wide dynamic range compression gain as follows:
determining an input sound pressure level of the audio signal;
determining an output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve;
determining a difference between the output sound pressure level and the input sound pressure level as a wide dynamic range compression gain.
13. The apparatus of claim 12, wherein the determining unit determines the output sound pressure level based on the input sound pressure level and a wide dynamic range compression curve in a manner comprising:
determining parameter values of a wide dynamic range compression curve, wherein the parameter values comprise a lowest sound pressure input threshold value, a lowest sound pressure output threshold value, a first inflection point sound pressure threshold value, a second inflection point sound pressure threshold value, a first inflection point sound pressure threshold value gain and a second inflection point sound pressure threshold value gain, and the first inflection point sound pressure threshold value is smaller than the second inflection point sound pressure threshold value;
if the input sound pressure level is greater than 0 and less than the lowest sound pressure input threshold, determining that the output sound pressure level is 0;
if the input sound pressure level is greater than a lowest sound pressure input threshold and less than a lowest sound pressure output threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure input threshold, and the lowest sound pressure output threshold;
if the input sound pressure level is greater than a lowest sound pressure output threshold and less than a first inflection point sound pressure threshold, determining the output sound pressure level based on the input sound pressure level, the lowest sound pressure output threshold, the first inflection point sound pressure threshold gain and a compression ratio, wherein the compression ratio is determined based on the first inflection point sound pressure threshold, the second inflection point sound pressure threshold, the first inflection point sound pressure threshold gain and the second inflection point sound pressure threshold gain;
and if the input sound pressure level is greater than the second inflection point sound pressure threshold, determining the output sound pressure level based on the second inflection point sound pressure threshold and the second inflection point sound pressure threshold gain.
14. The apparatus of claim 12,
the determining unit is also used for determining the pain threshold of the human ear according to the hearing threshold range of the human ear;
the device further comprises: and the adjusting unit is used for adjusting the output sound pressure level according to the pain threshold.
15. An audio signal processing apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: performing the method of any one of claims 1 to 7.
16. A storage medium having stored therein instructions that, when executed by a processor of a device, enable the device to perform the method of any one of claims 1 to 7.
CN202211215359.1A 2022-09-30 2022-09-30 Audio signal processing method and device and storage medium Pending CN115714948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211215359.1A CN115714948A (en) 2022-09-30 2022-09-30 Audio signal processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211215359.1A CN115714948A (en) 2022-09-30 2022-09-30 Audio signal processing method and device and storage medium

Publications (1)

Publication Number Publication Date
CN115714948A true CN115714948A (en) 2023-02-24

Family

ID=85231009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211215359.1A Pending CN115714948A (en) 2022-09-30 2022-09-30 Audio signal processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN115714948A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116847245A (en) * 2023-06-30 2023-10-03 杭州雄迈集成电路技术股份有限公司 Digital audio automatic gain method, system and computer storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116847245A (en) * 2023-06-30 2023-10-03 杭州雄迈集成电路技术股份有限公司 Digital audio automatic gain method, system and computer storage medium
CN116847245B (en) * 2023-06-30 2024-04-09 浙江芯劢微电子股份有限公司 Digital audio automatic gain method, system and computer storage medium

Similar Documents

Publication Publication Date Title
US8855343B2 (en) Method and device to maintain audio content level reproduction
US8315400B2 (en) Method and device for acoustic management control of multiple microphones
US8081780B2 (en) Method and device for acoustic management control of multiple microphones
US9197181B2 (en) Loudness enhancement system and method
US11276384B2 (en) Ambient sound enhancement and acoustic noise cancellation based on context
US10701494B2 (en) Hearing device comprising a speech intelligibility estimator for influencing a processing algorithm
RU2568281C2 (en) Method for compensating for hearing loss in telephone system and in mobile telephone apparatus
US11605395B2 (en) Method and device for spectral expansion of an audio signal
US11153677B2 (en) Ambient sound enhancement based on hearing profile and acoustic noise cancellation
CN112037825B (en) Audio signal processing method and device and storage medium
US11551704B2 (en) Method and device for spectral expansion for an audio signal
CN114630239A (en) Method, device and storage medium for reducing earphone blocking effect
CN115714948A (en) Audio signal processing method and device and storage medium
WO2020023856A1 (en) Forced gap insertion for pervasive listening
WO2020044377A1 (en) Personal communication device as a hearing aid with real-time interactive user interface
CN113810828A (en) Audio signal processing method and device, readable storage medium and earphone
Kąkol et al. A study on signal processing methods applied to hearing aids
US11902747B1 (en) Hearing loss amplification that amplifies speech and noise subsignals differently
CN114979889A (en) Method and device for reducing occlusion effect of earphone, earphone and storage medium
Patel Acoustic Feedback Cancellation and Dynamic Range Compression for Hearing Aids and Its Real-Time Implementation
US20210337301A1 (en) Method at an electronic device involving a hearing device
CN115550791A (en) Audio processing method, device, earphone and storage medium
CN116017250A (en) Data processing method, device, storage medium, chip and hearing aid device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination