CN113079440B

CN113079440B - Audio signal processing method and device, terminal and storage medium

Info

Publication number: CN113079440B
Application number: CN202110301715.0A
Authority: CN
Inventors: 许逸君; 郭华
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2022-12-06
Anticipated expiration: 2041-03-22
Also published as: CN113079440A; WO2022199288A1

Abstract

The embodiment of the application discloses a method, a device, a terminal and a storage medium for processing an audio signal, and belongs to the technical field of audio. The method comprises the following steps: acquiring a first audio signal output by a first channel and a second audio signal output by a second channel, wherein the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to second analog gain, and the first analog gain is greater than the second analog gain; and performing signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal. By adopting the scheme provided by the embodiment of the application, the signal pickup of the non-high-sound-pressure-level signal and the high-sound-pressure-level signal is considered, the dynamic range of the audio signal is expanded, and the high-dynamic-range recording is realized.

Description

Audio signal processing method and device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of audio, in particular to a method and a device for processing an audio signal, a terminal and a storage medium.

Background

Recording is a process of converting an analog audio signal collected by a microphone into a digital audio signal and storing the digital audio signal.

In the related art, an Analog to Digital Converter (ADC) is used to convert an Analog audio Signal into a Digital audio Signal after an Analog audio Signal collected by a microphone is amplified by gain, and then a Digital Signal Processor (DSP) is used to process the Digital audio Signal and output and store the Digital audio Signal.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method, an audio signal processing device, a terminal and a storage medium. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a method for processing an audio signal, where the method includes:

acquiring a first audio signal output by a first channel and a second audio signal output by a second channel, wherein the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to second analog gain, and the first analog gain is greater than the second analog gain;

and performing signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal.

In another aspect, an embodiment of the present application provides an apparatus for processing an audio signal, where the apparatus includes:

the signal acquisition module is used for acquiring a first audio signal output by a first channel and a second audio signal output by a second channel, wherein the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to second analog gain, and the first analog gain is greater than the second analog gain;

a signal fusion module, configured to perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, where a dynamic range of the target audio signal is a superposition of dynamic ranges of the first audio signal and the second audio signal.

In another aspect, the present application provides a computer-readable storage medium, in which at least one program code is stored, and the program code is loaded and executed by a processor to implement the method for processing an audio signal according to the above aspect.

In another aspect, the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method of processing the audio signal provided in the various alternative implementations of the above aspect.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

in the embodiment of the application, the same original audio signal is respectively input into a first channel and a second channel with different analog gains to obtain a first audio signal and a second audio signal, and signal pickup of a non-high-sound-pressure-level signal and a high-sound-pressure-level signal is considered; furthermore, when a target audio signal is output based on signal fusion of the first audio signal and the second audio signal, the Dynamic Range of the target audio signal is the superposition of the Dynamic ranges of the first audio signal and the second audio signal, so that the Dynamic Range of the audio signal is expanded, the recording quality of non-High-sound-pressure-level signals and High-sound-pressure-level signals is improved, and High Dynamic Range (HDR) recording is realized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an audio recording process in the related art;

FIG. 2 is a schematic diagram of an audio recording process in an embodiment of the present application;

FIG. 3 is a flow chart of a method of processing an audio signal provided by an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a method of processing an audio signal provided by another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of an audio recording process shown in an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a signal compensation process shown in an exemplary embodiment of the present application;

fig. 7 is a flowchart of a method for processing an audio signal according to another exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of an implementation of an audio signal processing process shown in an exemplary embodiment of the present application;

fig. 9 is a block diagram illustrating an audio signal processing apparatus according to an embodiment of the present application;

fig. 10 shows a block diagram of a computer device according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In the related art, an audio recording process is shown in fig. 1. An original audio signal (analog signal) output by the microphone 101 is first input to a Programmable Gain Amplifier (PGA) 102, and the PGA 102 performs an analog Gain on the original audio signal, so as to reduce an equivalent input noise of the ADC 103 (because the ADC itself has quantization noise and electrical noise, signal noise floor is increased). The original audio signal subjected to the analog gain is input to the ADC 103, the ADC 103 converts the original audio signal from an analog audio signal to a digital audio signal, and the digital audio signal is input to the DSP 104. The DSP 104 further processes the digital audio signal and ultimately outputs the processed digital audio signal for use in generating an audio file.

Although the analog gain can reduce the equivalent input noise of the ADC, since the analog gain increases the amplitude of the original audio signal, if the amplitude of the original audio signal after the analog gain is too large (especially when picking up a signal with a high sound pressure level), signal clipping may be caused at the ADC, resulting in audio distortion. Therefore, the audio recording by adopting the above method cannot simultaneously reduce the equivalent input noise of the ADC and pick up the audio signal (especially the high sound pressure level signal) without distortion.

In the scheme provided by the embodiment of the application, two paths are arranged, so that analog gain and analog-to-digital conversion of the original audio signal in different degrees are respectively carried out, and signal pickup of non-high-sound-pressure-level and high-sound-pressure-level signals is considered; furthermore, the audio signals output by the two paths are fused, so that the dynamic range of the finally output audio signals is expanded (the dynamic ranges of the two paths of audio signals are overlapped), and high-dynamic-range recording is realized.

As shown in fig. 2, the original audio signals output from the microphone 201 are input to the first channel and the second channel, respectively. In the first channel, after the high-gain PGA 202 performs analog gain on the original audio signal, the first ADC 203 performs analog-to-digital conversion on the gained original audio signal to obtain a first audio signal; in the second channel, after the low-gain PGA 204 performs analog gain on the original audio signal, the second ADC 205 performs analog-to-digital conversion on the gained original audio signal to obtain a second audio signal. Further, the DSP 206 fuses the first audio signal and the second audio signal through an algorithm, and finally outputs a target audio signal with a high dynamic range.

The method for processing an audio signal provided in the embodiment of the present application may be applied to a computer device with an audio signal processing capability, where the computer device may be a smart phone, a tablet computer, a wearable device, a personal computer, and the like, and this embodiment does not limit this.

Moreover, the computer device may collect signals through a built-in microphone or an external microphone, which is not limited in the embodiments of the present application.

In addition, the solution provided in this embodiment of the present Application may be executed by a Processor in a computer device, where the Processor may be a DSP or an Application Processor (AP), and this is not limited in this embodiment of the present Application. For convenience of description, the following embodiments are described in terms of a method of processing an audio signal performed by a computer device.

Referring to fig. 3, a flowchart of a method for processing an audio signal according to an exemplary embodiment of the present application is shown, where the method includes:

step 302, a first audio signal output by a first channel and a second audio signal output by a second channel are obtained, the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to a first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to a second analog gain, and the first analog gain is greater than the second analog gain.

In order to reduce the clipping probability of a high-sound-pressure-level signal on the premise of reducing equivalent input noise, in the embodiment of the application, an original audio signal is subjected to two paths of analog gains with different degrees. Wherein, in the first channel, the original audio signal passes through the first analog gain, and in the second channel, the original audio signal passes through the second analog gain.

Since the first analog gain is greater than the second analog gain, the first channel has a lower equivalent input noise and is suitable for picking up signals at non-high pressure levels (the probability of clipping due to too high amplitude is lower after the signals at non-high pressure levels have been subjected to the high analog gain), while the second channel is suitable for picking up signals at high pressure levels (the probability of clipping due to too high amplitude is also lower after the signals at high pressure levels have been subjected to the low analog gain).

In a possible embodiment, the computer device performs analog gain on the original audio signal through the PGA provided on each channel, and the first analog gain and the second analog gain are fixed gains, or can be adjusted according to requirements. Illustratively, the first analog gain is 30db and the second analog gain is 10db.

Further, the ADC in the first channel performs analog-to-digital conversion on the original audio signal (analog signal) subjected to the first analog gain to obtain a first audio signal (digital signal); and the ADC in the second channel performs analog-to-digital conversion on the original audio signal subjected to the second analog gain to obtain a second audio signal. When the original audio signal is a non-high-sound-pressure-level signal, the probability of signal clipping of the two channels is small, and the equivalent input background noise in the first channel is smaller than that in the second channel; when the original audio signal is a high-sound-pressure-level signal, the probability of signal clipping of the second channel is lower than that of the first channel, and the pickup quality of a non-high-sound-pressure-level signal and a high-sound-pressure-level signal is considered under the condition of large-dynamic-range signal acquisition.

Step 304, performing signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal.

Because two audio signals exist and a single audio signal needs to be output finally, the computer equipment needs to further fuse the two audio signals to obtain a target audio signal. In the process of signal fusion, the computer device performs signal fusion on the first audio signal and the second audio signal based on the signal pickup advantages corresponding to the first channel and the second channel respectively. The first channel has signal pickup advantages in non-high-sound-pressure-level signal pickup, and the second channel has signal pickup advantages in high-sound-pressure-level signal pickup, so that the first audio signal has a large influence on the non-high-sound-pressure-level signal in the target audio signal, and the second audio signal has a large influence on the high-sound-pressure-level signal in the target audio signal.

In a possible implementation manner, in the signal fusion process, the computer device performs sample point level fusion or sample frame level fusion on the two audio signals, which is not limited in this embodiment.

For example, when the sampling rate is 48kHz, the computer device fuses the sampling point signals in the first audio signal and the second audio signal at the same sampling time every 1/48000 seconds; or, the computer device fuses sampling point signals (containing 480 sampling point signals) in the first audio signal and the second audio signal under the same sampling frame every 10 ms.

Optionally, the computer device performs signal fusion by executing a fusion algorithm through the DSP or the AP, and the specific process of signal fusion will be described in detail in the following embodiments.

Since the signal pickup advantages of each of the two channels are fused, the dynamic range of the target audio signal (i.e., having a better signal pickup quality within the dynamic range) fuses the dynamic ranges of the first audio signal and the second audio signal, i.e., the target audio signal has a larger dynamic range than the audio signal output in a single channel. Illustratively, the dynamic range of the first audio signal is 30db to 90db, the dynamic range of the second audio signal is 60db to 120db, and the dynamic range of the target audio signal is 30db to 120db.

Optionally, the computer device may further perform noise reduction processing, dynamic range processing, amplitude limiting processing, and spectrum equalization processing on the target audio signal, and finally store the processed audio signal as a recording file, which is not described herein again.

Compared with the scheme in the related art, the scheme provided by the embodiment of the application can realize the recording with a larger dynamic range. For example, when recording a drum set performance or a concert (high sound pressure level scene), the sound break problem during recording can be reduced; more sound detail can be preserved when recording in quiet environments (non-high sound pressure level scenes).

To sum up, in the embodiment of the present application, the same original audio signal is respectively input to the first channel and the second channel with different analog gains to obtain the first audio signal and the second audio signal, and signal pickup of the non-high-sound-pressure-level signal and the high-sound-pressure-level signal is also considered; furthermore, when a path of target audio signal is output based on signal fusion of the first audio signal and the second audio signal, because the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal, the dynamic range of the audio signal is expanded, the recording quality of non-high-sound-pressure-level signals and high-sound-pressure-level signals is improved, and high-dynamic-range recording is realized.

Because the same original audio signal is subjected to analog gains of different degrees, a large difference exists between the first audio signal and the second audio signal, and if the first audio signal and the second audio signal are directly fused, the problem that the target audio signal is subjected to signal mutation after fusion can occur. In order to improve the signal fusion quality, in one possible embodiment, the computer device first needs to perform signal compensation on the second audio signal before performing signal fusion, so as to reduce the signal difference between the first audio signal and the second audio signal. The following description will be made using exemplary embodiments.

Referring to fig. 4, a flowchart of a method for processing an audio signal according to another exemplary embodiment of the present application is shown, where the method includes:

step 401, obtaining a first audio signal output by a first channel and a second audio signal output by a second channel, where the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to a first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal subjected to a second analog gain, and the first analog gain is greater than the second analog gain.

The step 302 may be referred to in the implementation manner of this step, and this embodiment is not described herein again.

As shown in fig. 5, the original audio signal is subjected to a first analog gain and analog-to-digital conversion to output a first audio signal, and the original audio signal is subjected to a second analog gain and analog-to-digital conversion to output a second audio signal.

Step 402, performing signal compensation on the second audio signal to obtain a third audio signal, where the signal compensation is used to compensate for a signal difference caused by the first analog gain and the second analog gain.

After the same original audio signal is subjected to analog gains of different degrees, the signal amplitudes of the same original audio signal are different, so that the computer equipment firstly needs to compensate the second audio signal based on the first audio signal, and the second audio signal subjected to signal compensation is made to be as close to the first audio signal as possible, so that signal fusion is carried out on two paths of audio signals with similar amplitudes in the subsequent process.

Since the difference between the first audio signal and the second audio signal is mainly caused by the analog gain, the computer device performs gain compensation on the second audio signal (different from the analog gain of the PGA, which is a digital gain here). Further, since the first analog gain is higher than the second analog gain, it is necessary to increase the gain of the second audio signal to obtain the third audio signal when performing the gain compensation.

Since the first and second analog gains in the first and second channels are known, in one possible embodiment, the computer device determines an analog gain difference of the first and second analog gains, and performs gain compensation on the second audio signal based on the analog gain difference to obtain a third audio signal.

Optionally, the computer device determines a gain multiple based on a analog gain difference between the first analog gain and the second analog gain, so as to perform gain compensation on the second audio signal based on the gain multiple to obtain a third audio signal.

In one illustrative example, when the first audio signal is x1, the second audio signal is x2, and the first analog gain is g1dB, and the second analog gain is g2dB, the computer device determines the gain factor to be x 1dB, and the second analog gain is g2dB

However, in practical applications, it is found that, because there may be a difference between the actual analog gain and the preset analog gain (possibly due to the PGA), there is a certain amplitude difference between the fixed-gain compensated second audio signal and the first audio signal. Therefore, to further reduce the difference between the audio signals, as shown in fig. 5, the second audio signal is first subjected to fixed gain compensation to obtain a fourth audio signal, and then subjected to adaptive amplitude compensation (fine tuning) to obtain a third audio signal.

Specifically, during adaptive amplitude compensation, as shown in fig. 6, the computer device calculates respective signal amplitudes of the first audio signal and the fourth audio signal, so as to determine a signal amplitude difference therebetween, and then performs amplitude compensation on the fourth audio signal based on the signal amplitude difference, so as to obtain a third audio signal.

In one possible embodiment, as shown in fig. 7, the step may include the steps of:

step 402A, determine a difference in analog gain between the first analog gain and the second analog gain.

And 402B, performing gain compensation on the second audio signal based on the analog gain difference value to obtain a fourth audio signal.

The implementation of

steps

402A and 402B may refer to the above embodiments, which are not described herein again.

Step 402C, a signal amplitude ratio of the first audio signal and the fourth audio signal is determined.

Optionally, the computer device calculates the signal amplitude of the first audio signal and the signal amplitude of the fourth audio signal, and determines the signal amplitude difference by calculating the signal amplitude ratio between the first audio signal and the fourth audio signal.

In a possible embodiment, the signal amplitude of the audio signal is represented by an energy envelope, and correspondingly, the signal amplitude ratio of the first audio signal to the fourth audio signal may be represented by an energy envelope ratio, i.e. signal amplitude ratio = first audio signal energy envelope/fourth audio signal energy envelope.

Of course, the signal amplitude of the audio signal may also be represented by other parameters besides the energy envelope, which is not limited in this embodiment.

And 402D, performing amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain a third audio signal.

Further, the computer device performs amplitude compensation on the fourth audio signal based on the determined signal amplitude ratio to obtain a third audio signal.

In connection with the example in the above step, the amplitude compensated third audio signal may be represented as x2_ c = x2_ g × ratio.

And step 403, performing signal fusion on the first audio signal and the third audio signal to obtain a target audio signal.

Since the signal collected by the microphone may change from moment to moment, i.e. the high sound pressure level signal and the non-high sound pressure level signal may appear alternately, the computer device needs to dynamically fuse the first audio signal and the third audio signal based on the real-time sound pressure level of the signal.

In order to improve the overall recording quality in the high dynamic range, in one possible implementation, when the signal pickup quality of the first channel is higher than that of the second channel (non-high sound pressure level signal), the computer device improves the specific gravity of the first audio signal, reduces the specific gravity of the third audio signal and improves the recording effect of the non-high sound pressure level signal when performing signal fusion; when the signal pickup quality of the second channel is higher than that of the first channel (high sound pressure level signal), the computer equipment improves the proportion of the third audio signal, reduces the proportion of the first audio signal and improves the recording quality of the high sound pressure level signal during signal fusion. Optionally, this step may include the following steps.

1. And determining the fusion proportion corresponding to the first audio signal and the third audio signal respectively.

The sum of the fusion specific gravities of the first audio signal and the third audio signal is 1, for example, if the fusion specific gravity of the first audio signal is a, the fusion specific gravity of the third audio signal is 1-a, and a is greater than or equal to 0 and less than or equal to 1.

Because the first analog gain at the first channel is large, the clipping condition may occur when the original audio signal in the first channel is analog-to-digital converted by the first analog gain, and the clipping condition is not easy to occur because the second analog gain at the second channel is small. If the first audio signal is directly analyzed to determine whether clipping occurs in the first channel (for example, it is determined that the first audio signal at the highest amplitude or higher than the preset amplitude is clipped), the probability of misjudgment is high. In order to improve the accuracy of the recognition of the clipping condition in the first channel, in the embodiment of the application, the computer device determines the clipping condition of the first channel based on the third audio signal, and further, the computer device dynamically determines the fusion weight of each of the first audio signal and the third audio signal based on the clipping condition of the first channel and the real-time signal amplitude of the third audio signal.

In one possible implementation, the corresponding fusion ratio of the first audio signal and the third audio signal includes the following cases.

In case one, in response to that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and a first clipping identifier is set, it is determined that the fusion specific gravity of the first audio signal is 1 and the fusion specific gravity of the third audio signal is 0, and the first clipping identifier is used to represent that the first channel is not clipped.

In this embodiment, two levels of amplitude thresholds are set in the computer device, which are a first amplitude threshold and a second amplitude threshold, respectively, where the second amplitude threshold is greater than the first amplitude threshold. Optionally, when the signal amplitude of the third audio signal is less than the first amplitude threshold, the computer device determines that clipping has not occurred in the first channel; when the signal amplitude of the third audio signal is greater than or equal to the first amplitude threshold but less than the second amplitude threshold, the computer device determines that clipping may occur in the first channel; when the signal amplitude of the third audio signal is greater than or equal to the second amplitude threshold, the computer device determines that clipping occurs in the first channel.

In some embodiments, the computer device sets the first amplitude threshold and the second amplitude threshold based on analog gain conditions of the two channels, which is not limited by this embodiment.

Optionally, the computer device is provided with a clipping flag bit (flag) for characterizing the clipping condition of the first channel, wherein when the clipping flag bit sets the first clipping flag (flag = 0), it is characterized that the first channel is not clipped, and when the clipping flag bit sets the second clipping flag (flag = 1), it is characterized that the first channel is clipped.

Regarding the setting manner of the clipping flag, in a possible implementation manner, the initial clipping flag of the clipping flag bit is a first clipping flag, when the signal amplitude of the third audio signal is greater than a first amplitude threshold, the computer device determines that clipping may occur in the first channel, and adjusts the first clipping flag to be a second clipping flag; in order to improve smoothness of the fused target audio signal, under the condition that a second clipping identifier is currently set, when the signal amplitude of a third audio signal is smaller than a first amplitude threshold value, the computer device does not directly adjust the second clipping identifier to be the first clipping identifier, but adjusts the second clipping identifier to be the first clipping identifier when the signal amplitude of the third audio signal is smaller than the first amplitude threshold value and reaches a certain time length.

Optionally, when the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the first clipping flag is set, it indicates that the first channel has not been clipped in the last period of time, and since the signal pickup quality of the first channel is better than the signal pickup quality of the second channel (because the equivalent input noise of the first channel is smaller) in the case that the clipping has not occurred, the computer device determines that the fusion weight of the first audio signal is 1 and the fusion weight of the third audio signal is 0, that is, the third audio signal is not fused during signal fusion.

Illustratively, the signal amplitude mag = abs (x 2_ c) of the third audio signal x2_ c, and when mag < thrd1 and flag =0, the computer device determines that the fusion weight of the first audio signal x1 is 1 and the fusion weight of the third audio signal x2_ c is 0.

And in the second situation, in response to the signal amplitude of the third audio signal being greater than the second amplitude threshold, determining that the fusion weight of the first audio signal is 0, the fusion weight of the third audio signal is 1, and the second amplitude threshold is greater than the first amplitude threshold.

Because the first channel is clipped inevitably under the condition that the signal amplitude of the third audio signal is greater than the second amplitude threshold (because the first amplitude threshold is reached before the second amplitude threshold is greater than the second amplitude threshold), and the first audio signal is distorted under the condition that the clipping is occurred, in order to avoid the distortion of the finally output audio signal, the computer device determines that the fusion specific gravity of the first audio signal is 0 and the fusion specific gravity of the third audio signal is 1, namely the first audio signal is not fused during the signal fusion.

Illustratively, the signal amplitude mag = abs (x 2_ c) of the third audio signal x2_ c, and when mag ≧ thrd2, the computer device determines that the fusion weight of the first audio signal x1 is 0 and the fusion weight of the third audio signal x2_ c is 1.

And in response to the third audio signal, determining that the fusion weight of the first audio signal is a first dynamic weight and the fusion weight of the second audio signal is a second dynamic weight in response to the signal amplitude of the third audio signal being smaller than the first amplitude threshold and being provided with a second clipping identifier, or determining that the signal amplitude of the third audio signal is larger than the first amplitude threshold and smaller than the second amplitude threshold, wherein the second clipping identifier is used for representing that clipping occurs in the first channel, and the sum of the first dynamic weight and the second dynamic weight is 1.

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold and is provided with the second clipping identifier, or when the signal amplitude of the third audio signal is larger than the first amplitude threshold and is smaller than the second amplitude threshold, it is indicated that clipping may occur in the first channel within a recent period of time.

In one possible embodiment, the computer device is provided with a first dynamic specific gravity of g (g is initially 1) and a second dynamic specific gravity of 1-g. In the third case, the computer device performs signal fusion based on the first dynamic gravity and the second dynamic gravity.

And after the signal fusion is completed under the above condition, the computer device updates the first dynamic weight and the second dynamic weight based on the magnitude relationship between the signal amplitude of the third audio signal and the first amplitude threshold and the second amplitude threshold. Wherein the computer device increases the first dynamic weight and decreases the second dynamic weight when it is determined that the clipping probability of the first channel decreases based on a magnitude relation between the signal amplitude of the third audio signal and the first amplitude threshold and the second amplitude threshold; when it is determined that the first channel clipping probability increases, the computer device decreases the first dynamic weight and increases the second dynamic weight.

In one possible implementation, in response to the signal amplitude of the third audio signal being less than the first amplitude threshold and the second clipping flag being set, the computer device updates the first dynamic weight and the second dynamic weight based on a first update step size, wherein the updated first dynamic weight is greater than the first dynamic weight before the update and the updated second dynamic weight is less than the second dynamic weight before the update.

Wherein the first update step size may be determined based on a sampling rate of the audio signal. Optionally, the first update step size and the sampling rate have a negative correlation, that is, the higher the sampling rate, the smaller the first update step size, and the lower the sampling rate, the larger the first update step size.

In an illustrative example, when the first dynamic specific gravity is g, the second dynamic specific gravity is 1-g, and the first update step is delta _ rel (0 < delta _ rel < 1), if the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, the computer device updates g to g + delta _ rel after completing signal fusion based on g and 1-g, thereby increasing the fusion specific gravity of the first audio signal during subsequent signal fusion and reducing the fusion specific gravity of the third audio signal.

Optionally, after each dynamic weight, the computer device detects whether the first dynamic weight is greater than or equal to 1, and if the first dynamic weight is greater than or equal to 1, it indicates that the signal amplitude of the third audio signal is lower than the first amplitude threshold value in a period of time, and the first channel is not clipped in a period of time, so as to replace the second clipping identifier with the first clipping identifier.

Further, since the maximum value of the first dynamic specific gravity is 1, the computer device sets the first dynamic specific gravity to 1 when the updated first dynamic specific gravity is 1 or more.

In another possible embodiment, in response to the signal amplitude of the third audio signal being greater than the first amplitude threshold and less than the second amplitude threshold, the computer device updates the first dynamic specific gravity and the second dynamic specific gravity based on a second update step size, wherein the updated first dynamic specific gravity is less than the first dynamic specific gravity before the update, and the updated second dynamic specific gravity is greater than the second dynamic specific gravity before the update.

And when the signal amplitude of the third audio signal is larger than the first amplitude threshold value and is provided with the first clipping identification, the computer equipment replaces the first clipping identification with the second clipping identification to indicate that clipping possibly occurs in the first channel.

Wherein the second update step size may be determined based on a sampling rate of the audio signal. Optionally, the second update step size is in a negative correlation with the sampling rate, that is, the higher the sampling rate, the smaller the second update step size is, the lower the sampling rate is, and the larger the second update step size is.

In an illustrative example, when the first dynamic specific gravity is g, the second dynamic specific gravity is 1-g, and the first update step is delta _ att (0 < delta _ att < 1), if the signal amplitude of the third audio signal is greater than or equal to the first amplitude threshold and less than the second amplitude threshold, the computer device updates g to g-delta _ att after completing signal fusion based on g and 1-g, so as to reduce the fusion specific gravity of the first audio signal during subsequent signal fusion and increase the fusion specific gravity of the third audio signal.

Further, since the minimum value of the first dynamic specific gravity is 0, the computer device sets the first dynamic specific gravity to 0 when the updated first dynamic specific gravity is 0 or less.

2. And performing signal fusion on the first audio signal and the third audio signal based on the fusion proportion to obtain a target audio signal.

With reference to the three cases in the above step, when the first audio signal is x1, the third audio signal is x2_ c, the first dynamic gravity is g, and the second dynamic gravity is 1-g, in a case that the fusion gravity of the first audio signal is 1 and the fusion gravity of the third audio signal is 0, the target audio signal is the first audio signal, that is, y = x1.

In the second case, since the fusion weight of the first audio signal is 0 and the fusion weight of the third audio signal is 1, the target audio signal is the second audio signal, i.e., y = x2_ c.

In case three, when the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, the computer device merges the obtained target audio signal y = g × x1+ (1-g) × 2_ c, and updates g to g + delta _ rel.

And when the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, the computer device fuses the obtained target audio signal y = g x1+ (1-g) x2_ c, updates g to g-delta _ att, and sets flag =1.

In this embodiment, the computer device performs fixed gain compensation and adaptive amplitude compensation on the second audio signal, so that the third audio signal obtained after compensation is as close to the first audio signal as possible, which is beneficial to improving subsequent signal fusion quality.

In addition, in this embodiment, the computer device determines the clipping condition at the first channel based on the signal amplitude of the third audio signal, and further dynamically determines that the signal fusion is the fusion proportion of the first audio signal and the third audio signal based on the clipping condition and the signal amplitude of the first channel, so as to avoid the occurrence of a situation that a target audio signal jumps, and improve the fusion quality and smoothness of the target audio signal after the fusion.

In connection with the above embodiments, in one illustrative example, the process of audio signal processing by a computer device when an original audio signal is gradually increased from a non-high sound pressure level to a high sound pressure level and then gradually decreased to a non-high sound pressure level is shown in fig. 8.

Step 801, obtain a first audio signal output by a first channel and a second audio signal output by a second channel.

Step 802, performing fixed gain compensation and adaptive amplitude compensation on the second audio signal to obtain a third audio signal.

Steps 801 to 802 are signal compensation processes.

In step 803, the signal amplitude of the third audio signal is smaller than the first amplitude threshold (initially, the first clipping flag), and the first audio signal is output.

When the non-high sound pressure level signal is acquired, the audio signal of the first channel is directly output.

And 804, performing signal fusion on the first audio signal and the third audio signal based on the first dynamic proportion and the second dynamic proportion and outputting the signals, wherein the signal amplitude of the third audio signal is greater than or equal to the first amplitude threshold and smaller than the second amplitude threshold.

Based on the second update step, the first dynamic weight is reduced (the second dynamic weight is increased), and the second clipping flag is set in step 805.

In steps 804 to 805, when the non-high-sound-pressure-level signal is gradually changed into the high-sound-pressure-level signal, the fusion weight of the second channel is dynamically increased, the fusion weight of the first channel is decreased, and signal fusion is performed.

In step 806, the signal amplitude of the third audio signal is greater than or equal to the second amplitude threshold, and the third audio signal is output.

When the high sound pressure level signal is collected, the audio signal of the second channel is directly output.

In step 807, the signal amplitude of the third audio signal is smaller than the first amplitude threshold (in this case, the second clipping flag), and the first audio signal and the third audio signal are signal-fused and output based on the first dynamic proportion and the second dynamic proportion.

Step 808, increasing the first dynamic weight (decreasing the second dynamic weight) based on the first update step.

And step 809, when the first dynamic proportion is larger than or equal to 1, setting a first clipping mark.

In steps 807 to 809, in the process of gradually changing the high sound pressure level signal into the non-high sound pressure level signal, the fusion proportion of the first channel is dynamically increased, the fusion proportion of the second channel is reduced, and signal fusion is performed.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 9, a block diagram of an apparatus for processing an audio signal according to an embodiment of the present application is shown. The device has the function of realizing the execution of the computer equipment in the method embodiment, and the function can be realized by hardware or by hardware executing corresponding software. As shown in fig. 9, the apparatus may include:

a signal obtaining module 901, configured to obtain a first audio signal output by a first channel and a second audio signal output by a second channel, where the first audio signal is obtained by performing analog-to-digital conversion on an original audio signal with a first analog gain, the second audio signal is obtained by performing analog-to-digital conversion on an original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

a signal fusion module 902, configured to perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, where a dynamic range of the target audio signal is a superposition of dynamic ranges of the first audio signal and the second audio signal.

Optionally, the signal fusion module 902 includes:

the compensation unit is used for performing signal compensation on the second audio signal to obtain a third audio signal, wherein the signal compensation is used for compensating a signal difference caused by the first analog gain and the second analog gain;

and the fusion unit is used for carrying out signal fusion on the first audio signal and the third audio signal to obtain the target audio signal.

Optionally, the fusion unit is specifically configured to:

determining fusion weights corresponding to the first audio signal and the third audio signal respectively;

and performing signal fusion on the first audio signal and the third audio signal based on the fusion proportion to obtain the target audio signal.

Optionally, when determining the respective corresponding fusion specific gravities of the first audio signal and the third audio signal, the fusion unit is specifically configured to:

in response to that the signal amplitude of the third audio signal is smaller than a first amplitude threshold value and a first clipping identifier is set, determining that the fusion weight of the first audio signal is 1 and the fusion weight of the third audio signal is 0, wherein the first clipping identifier is used for representing that clipping does not occur in the first channel;

in response to the signal amplitude of the third audio signal being greater than a second amplitude threshold, determining that the fusion weight of the first audio signal is 0, the fusion weight of the third audio signal is 1, and the second amplitude threshold is greater than the first amplitude threshold;

in response to that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and a second clipping flag is set, or the signal amplitude of the third audio signal is larger than the first amplitude threshold and smaller than the second amplitude threshold, determining that the fusion weight of the first audio signal is a first dynamic weight, determining that the fusion weight of the second audio signal is a second dynamic weight, and the second clipping flag is used for representing that clipping occurs to the first channel, wherein the sum of the first dynamic weight and the second dynamic weight is 1.

Optionally, the apparatus further comprises:

a first weight update module, configured to update the first dynamic weight and the second dynamic weight based on a first update step length in response to that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, where the updated first dynamic weight is larger than the first dynamic weight before update and the updated second dynamic weight is smaller than the second dynamic weight before update;

a second specific gravity updating module, configured to update the first dynamic specific gravity and the second dynamic specific gravity based on a second updating step size in response to that the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, where the updated first dynamic specific gravity is less than the first dynamic specific gravity before updating, and the updated second dynamic specific gravity is greater than the second dynamic specific gravity before updating.

Optionally, the apparatus further comprises:

a first identifier replacing module, configured to replace the second clipping identifier with the first clipping identifier in response to the updated first dynamic weight being greater than or equal to 1.

Optionally, the apparatus further comprises:

and the second identification replacing module is used for replacing the first clipping identification with the second clipping identification in response to the signal amplitude of the third audio signal being larger than the first amplitude threshold and the first clipping identification being set.

Optionally, the compensation unit is specifically configured to:

determining a analog gain difference between the first analog gain and the second analog gain;

and performing gain compensation on the second audio signal based on the analog gain difference value to obtain the third audio signal.

Optionally, the compensation unit is specifically configured to:

performing gain compensation on the second audio signal based on the analog gain difference value to obtain a fourth audio signal;

determining a signal amplitude ratio of the first audio signal and the fourth audio signal;

and performing amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain the third audio signal.

To sum up, in the embodiment of the present application, the same original audio signal is respectively input to the first channel and the second channel with different analog gains to obtain the first audio signal and the second audio signal, and signal pickup of the non-high-sound-pressure-level signal and the high-sound-pressure-level signal is also considered; furthermore, signal fusion is carried out based on the first audio signal and the second audio signal, when one path of target audio signal is output, the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal, so that the dynamic range of the audio signal is expanded, the recording quality of non-high-sound-pressure-level signals and high-sound-pressure-level signals is improved, and high-dynamic-range recording is realized.

In addition, in this embodiment, the computer device determines the clipping condition of the first channel based on the signal amplitude of the third audio signal, and further dynamically determines that the signal fusion is the fusion proportion of the first audio signal and the third audio signal based on the clipping condition of the first channel and the signal amplitude, so as to avoid the occurrence of a situation that a target audio signal jumps, and improve the fusion quality and smoothness of the target audio signal after the fusion.

Referring to fig. 10, a block diagram of a computer device according to an exemplary embodiment of the present application is shown. The computer device 1400 may be a smartphone, tablet, wearable device, or the like. The computer device 1000 in the present application may include one or more of the following components: a processor 1010 and a memory 1020.

Processor 1010 may include one or more processing cores. The processor 1010 interfaces with various components throughout the computer device 1000 using various interfaces and circuitry to perform various functions of the computer device 1400 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1020 and invoking data stored in the memory 1020. Alternatively, the processor 1010 may be implemented in hardware using at least one of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1010 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content to be displayed by the touch screen 1030; the NPU is used for realizing an Artificial Intelligence (AI) function; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1010, but may be implemented by a single chip.

The Memory 1020 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 1020 includes a non-transitory computer-readable medium. The memory 1020 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 1020 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the stored data area may store data (such as audio data, a phonebook) created according to the use of the computer device 1000, and the like.

Optionally, the computer device 1000 is provided with a microphone 1030, and the microphone 1030 may be a built-in microphone of the computer device 1000, or an external microphone connected to the computer device 1000 through a microphone interface.

In this embodiment, the computer device 1000 is further provided with a first path and a second path (audio circuit), the microphone 1030 is respectively connected to the first path and the second path, wherein the first path is provided with a high-gain PGA and a first ADC, the second path is provided with a low-gain PGA and a second ADC, the first path and the second path respectively perform analog gain and analog-to-digital conversion on an original audio signal output by the microphone, and input the two converted audio signals to the processor 1010, and the processor 1010 performs audio signal processing.

In addition, those skilled in the art will appreciate that the configuration of computer device 1000 illustrated in the above-described figures does not constitute a limitation of computer devices, which may include more or fewer components than those illustrated, or some of the components may be combined, or a different arrangement of components. For example, the computer device 1000 further includes components such as a display screen, a sensor, a speaker, and a power supply, which are not described herein again.

The present embodiments also provide a computer-readable storage medium, which stores at least one program code, and the program code is loaded and executed by a processor to implement the method for processing an audio signal according to the above embodiments.

According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method for processing the audio signal provided in the various alternative implementations of the above aspect.

It should be understood that reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of processing an audio signal, the method comprising:

determining an analog gain difference of the first analog gain and the second analog gain;

performing amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain a third audio signal;

in response to that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and a second clipping flag is set, or the signal amplitude of the third audio signal is larger than the first amplitude threshold and smaller than the second amplitude threshold, determining that the fusion weight of the first audio signal is a first dynamic weight, determining that the fusion weight of the second audio signal is a second dynamic weight, wherein the second clipping flag is used for representing that clipping occurs to the first channel, and the sum of the first dynamic weight and the second dynamic weight is 1;

and performing signal fusion on the first audio signal and the third audio signal based on the fusion proportion to obtain a target audio signal, wherein the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal.

2. The method of claim 1, wherein after determining that the fusion weight of the first audio signal is a first dynamic weight and determining that the fusion weight of the second audio signal is a second dynamic weight, the method further comprises:

in response to the signal amplitude of the third audio signal being less than the first amplitude threshold and the second clipping flag being set, updating the first dynamic weight and the second dynamic weight based on a first update step size, wherein the first dynamic weight after updating is greater than the first dynamic weight before updating, and the second dynamic weight after updating is less than the second dynamic weight before updating;

updating the first and second dynamic specific gravities based on a second update step in response to the signal amplitude of the third audio signal being greater than the first amplitude threshold and less than the second amplitude threshold, wherein the updated first dynamic specific gravity is less than the first dynamic specific gravity before updating, and the updated second dynamic specific gravity is greater than the second dynamic specific gravity before updating.

3. The method according to claim 2, wherein after the updating the first and second dynamic gravities based on a first update step size, the method further comprises:

and replacing the second clipping identifier with the first clipping identifier in response to the updated first dynamic proportion being greater than or equal to 1.

4. The method of claim 1, further comprising:

in response to the signal amplitude of the third audio signal being greater than the first amplitude threshold and the first clipping flag being set, replacing the first clipping flag with the second clipping flag.

5. An apparatus for processing an audio signal, the apparatus comprising:

a signal fusion module for determining a difference in analog gain between the first analog gain and the second analog gain;

the signal fusion module is further configured to perform gain compensation on the second audio signal based on the analog gain difference value to obtain a fourth audio signal;

the signal fusion module is further configured to determine a signal amplitude ratio of the first audio signal to the fourth audio signal;

the signal fusion module is further configured to perform amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain a third audio signal;

the signal fusion module is further configured to determine that the fusion weight of the first audio signal is 1 and the fusion weight of the third audio signal is 0 in response to that the signal amplitude of the third audio signal is smaller than a first amplitude threshold and a first clipping flag is set, where the first clipping flag is used to indicate that the first channel is not clipped;

the signal fusion module is further configured to determine that the fusion weight of the first audio signal is 0, the fusion weight of the third audio signal is 1, and the second amplitude threshold is greater than the first amplitude threshold in response to the signal amplitude of the third audio signal being greater than the second amplitude threshold;

the signal fusion module is further configured to determine that the fusion weight of the first audio signal is a first dynamic weight and the fusion weight of the second audio signal is a second dynamic weight in response to that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and a second clipping flag is set, or that the signal amplitude of the third audio signal is larger than the first amplitude threshold and smaller than the second amplitude threshold, the second clipping flag is used to indicate that clipping occurs in the first channel, and a sum of the first dynamic weight and the second dynamic weight is 1;

the signal fusion module is further configured to perform signal fusion on the first audio signal and the third audio signal based on the fusion proportion to obtain a target audio signal, where a dynamic range of the target audio signal is a superposition of dynamic ranges of the first audio signal and the second audio signal.

6. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the method of processing an audio signal as claimed in any one of claims 1 to 4.

7. A computer-readable storage medium, in which at least one program code is stored, the program code being loaded and executed by a processor to implement the method of processing an audio signal according to any one of claims 1 to 4.