WO2022199288A1

WO2022199288A1 - Audio signal processing method and apparatus, terminal, and storage medium

Info

Publication number: WO2022199288A1
Application number: PCT/CN2022/076756
Authority: WO
Inventors: 许逸君; 郭华
Original assignee: Oppo广东移动通信有限公司
Priority date: 2021-03-22
Filing date: 2022-02-18
Publication date: 2022-09-29
Also published as: CN113079440A; CN113079440B

Abstract

An audio signal processing method and apparatus, a terminal, and a storage medium, relating to the technical field of audios. The method comprises: obtaining a first audio signal outputted by a first channel and a second audio signal outputted by a second channel, the first audio signal being a signal obtained by performing analog-to-digital conversion on an original audio signal subjected to a first analog gain, the second audio signal being a signal obtained by performing analog-to-digital conversion on the original audio signal subjected to a second analog gain, and the first analog gain being greater than the second analog gain (302); and performing signal fusion on the basis of the first audio signal and the second audio signal to obtain a target audio signal, a dynamic range of the target audio signal being a superposition of dynamic ranges of the first audio signal and the second audio signal (304). By adoption of the solution provided by embodiments of the present application, signal pickup of both a non-high sound pressure level signal and a high sound pressure level signal is considered, the dynamic range of the audio signal is expanded, and high dynamic range recording is realized.

Description

Audio signal processing method, device, terminal and storage medium

This application claims the priority of the Chinese Patent Application No. 202110301715.0 filed on March 22, 2021, with the invention titled "Audio Signal Processing Method, Device, Terminal and Storage Medium", the entire contents of which are incorporated herein by reference Applying.

technical field

The embodiments of the present application relate to the field of audio technology, and in particular, to a method, device, terminal, and storage medium for processing audio signals.

Background technique

Recording is the process of converting an analog audio signal captured by a microphone into a digital audio signal and storing it.

In the related art, after the analog audio signal collected by the microphone is amplified by gain, it is input to an analog to digital converter (Analog to Digital Converter, ADC), and the analog audio signal is converted into a digital audio signal by the ADC. Signal Processor, DSP) for digital audio signal processing and output save.

SUMMARY OF THE INVENTION

Embodiments of the present application provide an audio signal processing method, device, terminal, and storage medium. The technical solution is as follows:

On the one hand, an embodiment of the present application provides a method for processing an audio signal, the method comprising:

Obtain the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with the first analog gain, the The second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

Perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the dynamic range of the first audio signal and the second audio signal superposition.

On the other hand, an embodiment of the present application provides an apparatus for processing an audio signal, and the apparatus includes:

A signal acquisition module, configured to acquire the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is obtained by performing analog-to-digital conversion on the original audio signal that has undergone the first analog gain the obtained signal, the second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

A signal fusion module, configured to perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the first audio signal and the second audio signal. The superposition of the dynamic range of the two audio signals.

On the other hand, an embodiment of the present application provides a computer-readable storage medium, where at least one piece of program code is stored in the computer-readable storage medium, and the program code is loaded and executed by a processor to implement the above aspects The processing method of the audio signal.

On the other hand, an embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the audio signal processing methods provided in various optional implementations of the above aspects.

Description of drawings

1 is a schematic diagram of an audio recording process in the related art;

2 is a schematic diagram of an audio recording process in an embodiment of the present application;

3 is a flowchart of an audio signal processing method provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart of an audio signal processing method provided by another exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of the principle of an audio recording process shown in an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a signal compensation process shown in an exemplary embodiment of the present application;

FIG. 7 is a flowchart of an audio signal processing method provided by another exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of the implementation of an audio signal processing process shown in an exemplary embodiment of the present application;

9 shows a structural block diagram of an apparatus for processing an audio signal provided by an embodiment of the present application;

FIG. 10 shows a structural block diagram of a computer device provided by an exemplary embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

In the related art, the audio recording process is shown in FIG. 1 . The original audio signal (analog signal) output by the microphone 101 is first input to the programmable gain amplifier 102 (Programmable Gain Amplifier, PGA), and the PGA 102 performs analog gain on the original audio signal, thereby reducing the equivalent input noise of the ADC 103 ( Because the ADC itself has quantization noise and electrical noise, it will increase the signal noise floor). The analog-gained original audio signal is input to the ADC 103, the ADC 103 converts the original audio signal from an analog audio signal to a digital audio signal, and the digital audio signal is input to the DSP 104. The DSP 104 further processes the digital audio signal, and finally outputs the processed digital audio signal for generating an audio file.

Although the analog gain method can reduce the equivalent input noise of the ADC, since the analog gain will increase the amplitude of the original audio signal, if the amplitude of the original audio signal after the analog gain is too large (especially when picking up high sound pressure level signals) ), it will cause signal clipping at the ADC, resulting in audio distortion. It can be seen that it is impossible to simultaneously reduce the equivalent input noise of the ADC and pick up the audio signal (especially the high sound pressure level signal) without distortion in the above-mentioned method for audio recording.

However, in the solution provided by the embodiment of the present application, by setting two paths, different degrees of analog gain and analog-to-digital conversion are respectively performed on the original audio signal, taking into account the signal pickup of non-high sound pressure level and high sound pressure level signals; further , by fusing the audio signals output by the two channels, the dynamic range of the final output audio signal is expanded (the superposition of the dynamic ranges of the two audio signals), and the high dynamic range recording is realized.

As shown in FIG. 2 , the original audio signal output by the microphone 201 is respectively input to the first channel and the second channel. Among them, in the first channel, after the high-gain PGA 202 performs analog gain on the original audio signal, the first ADC 203 performs analog-to-digital conversion on the gained original audio signal to obtain the first audio signal; in the second channel, the low gain After the PGA 204 performs analog gain on the original audio signal, the second ADC 205 performs analog-to-digital conversion on the gained original audio signal to obtain a second audio signal. Further, the DSP 206 fuses the first audio signal and the second audio signal through an algorithm, and finally outputs a high dynamic range target audio signal.

The audio signal processing method provided in the embodiment of the present application can be applied to a computer device with audio signal processing capability, and the computer device may be a smart phone, a tablet computer, a wearable device, a personal computer, a vehicle-mounted terminal, etc. The example does not limit this.

In addition, the computer device may collect signals through a built-in microphone, or may collect signals through an external microphone, which is not limited in this embodiment of the present application.

In addition, the solutions provided by the embodiments of the present application may be executed by a processor in the computer device, and the processor may be a DSP or an application processor (Application Processor, AP), which is not limited in the embodiments of the present application. For the convenience of description, the following embodiments are described by using a method for processing an audio signal to be performed by a computer device.

Please refer to FIG. 3, which shows a flowchart of an audio signal processing method provided by an exemplary embodiment of the present application, and the method includes:

Step 302: Obtain the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with the first analog gain, The second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with the second analog gain, and the first analog gain is greater than the second analog gain.

In order to reduce the clipping probability of the high sound pressure level signal on the premise of reducing the equivalent input noise, in the embodiment of the present application, the original audio signal passes through two channels of analog gains of different degrees. Wherein, under the first channel, the original audio signal passes through the first analog gain, and under the second channel, the original audio signal passes through the second analog gain.

Since the first analog gain is greater than the second analog gain, the first channel has lower equivalent input noise, which is suitable for picking up non-high sound pressure level signals (non-high sound pressure level signals after high analog gain, due to excessive amplitude The probability of clipping is lower), while the second channel is suitable for picking up high SPL signals (high SPL signals have a low probability of clipping due to excessive amplitude after low analog gain).

In a possible implementation manner, the computer device performs analog gain on the original audio signal through the PGA set on each channel, and the first analog gain and the second analog gain are fixed gains, or can be adjusted as required. Illustratively, the first analog gain is 30db, and the second analog gain is 10db.

Further, the ADC in the first channel performs analog-to-digital conversion on the original audio signal (analog signal) that has undergone the first analog gain to obtain the first audio signal (digital signal); the ADC in the second channel is subjected to the second analog gain. The original audio signal is converted from analog to digital to obtain the second audio signal. Among them, when the original audio signal is a non-high sound pressure level signal, the probability of signal clipping in both channels is small, and the equivalent input noise floor in the first channel is smaller than that in the second channel; when the original audio signal is high In the case of sound pressure level signals, the probability of signal clipping in the second channel is lower than that in the first channel. In the large dynamic range signal acquisition scenario, the pickup quality of non-high sound pressure level signals and high sound pressure level signals is considered.

Step 304: Perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the superposition of the dynamic ranges of the first audio signal and the second audio signal.

Since there are two audio signals, and ultimately a single audio signal needs to be output, the computer equipment needs to further fuse the two audio signals to obtain the target audio signal. In the process of signal fusion, the computer device performs signal fusion on the first audio signal and the second audio signal based on the respective signal pickup advantages of the first channel and the second channel. Among them, the first channel has a signal pickup advantage in non-high sound pressure level signal pickup, and the second channel has a signal pickup advantage in high sound pressure level signal pickup. Therefore, the first audio signal has an impact on the non-high sound pressure level in the target audio signal The influence of the signal is larger, and the influence of the second audio signal on the high sound pressure level signal in the target audio signal is larger.

In a possible implementation manner, during the signal fusion process, the computer device performs sampling point level fusion or sampling frame level fusion on the two audio signals, which is not limited in this embodiment.

For example, when the sampling rate is 48kHz, the computer equipment fuses the sampling point signals in the first audio signal and the second audio signal at the same sampling time every 1/48000 second; The sampling point signals (including 480 sampling point signals) in the first audio signal and the second audio signal under the sampling frame are fused.

Optionally, the computer device performs signal fusion by executing a fusion algorithm through a DSP or an AP, and the specific process of the signal fusion will be described in detail in the following embodiments.

Since the respective signal pickup advantages of the two channels are merged, the dynamic range of the target audio signal (that is, with better signal pickup quality within the dynamic range) is combined with the respective dynamic ranges of the first audio signal and the second audio signal, that is, The target audio signal has a larger dynamic range than the single-channel output audio signal. Illustratively, the dynamic range of the first audio signal is 30db to 90db, the dynamic range of the second audio signal is 60db to 120db, and the dynamic range of the target audio signal is 30db to 120db.

Optionally, the computer device may further perform noise reduction processing, dynamic range processing, amplitude limiting processing, and spectral equalization processing on the target audio signal, and finally save the processed audio signal as a recording file, which will not be repeated in this embodiment.

Compared with the solutions in the related art, using the solutions provided by the embodiments of the present application can realize recording with a larger dynamic range. For example, when recording drum performances or concerts (high sound pressure level scenes), it can reduce the problem of broken sound during recording; when recording in a quiet environment (non-high sound pressure level scenes), it can retain more sound detail.

To sum up, in the embodiment of the present application, the first audio signal and the second audio signal are obtained by respectively inputting the same original audio signal into the first channel and the second channel with different analog gains, while taking into account the non-high sound pressure level signal. and signal pickup of high sound pressure level signals; further, signal fusion is performed based on the first audio signal and the second audio signal, and when outputting a target audio signal, because the dynamic range of the target audio signal is the first audio signal and the second audio signal The superposition of the dynamic range of the signal, thus expanding the dynamic range of the audio signal, helps to improve the recording quality of non-high sound pressure level signals and high sound pressure level signals, and realizes high dynamic range recording.

In a possible implementation, signal fusion is performed based on the first audio signal and the second audio signal to obtain a target audio signal, including:

performing signal compensation on the second audio signal to obtain a third audio signal, wherein the signal compensation is used to compensate the signal difference caused by the first analog gain and the second analog gain;

Perform signal fusion on the first audio signal and the third audio signal to obtain a target audio signal.

In a possible implementation manner, signal fusion is performed on the first audio signal and the third audio signal to obtain the target audio signal, including:

Determine the respective fusion proportions corresponding to the first audio signal and the third audio signal;

Signal fusion is performed on the first audio signal and the third audio signal based on the fusion proportion to obtain a target audio signal.

In a possible implementation manner, determining the respective corresponding fusion proportions of the first audio signal and the third audio signal, including:

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the first clipping flag is set, it is determined that the fusion ratio of the first audio signal is 1, the fusion ratio of the third audio signal is 0, and the first clipping ratio is determined to be 0. The wave mark is used to indicate that the first channel is not clipped;

When the signal amplitude of the third audio signal is greater than the second amplitude threshold, it is determined that the fusion proportion of the first audio signal is 0, the fusion proportion of the third audio signal is 1, and the second amplitude threshold is greater than the first amplitude threshold;

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, or the signal amplitude of the third audio signal is larger than the first amplitude threshold and smaller than the second amplitude threshold, determine the first amplitude threshold. The fusion proportion of the audio signal is the first dynamic proportion, and it is determined that the fusion proportion of the second audio signal is the second dynamic proportion, and the second clipping flag is used to characterize that the first channel is clipped. and is 1.

In a possible implementation manner, after determining that the fusion weight of the first audio signal is the first dynamic weight, and determining that the fusion weight of the second audio signal is the second dynamic weight, the method further includes:

In the case where the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, the first dynamic proportion and the second dynamic proportion are updated based on the first update step size, wherein the updated first dynamic proportion and the second dynamic proportion are updated. The dynamic proportion is greater than the first dynamic proportion before the update, and the second dynamic proportion after the update is smaller than the second dynamic proportion before the update;

When the signal amplitude of the third audio signal is greater than the first amplitude threshold and smaller than the second amplitude threshold, the first dynamic proportion and the second dynamic proportion are updated based on the second update step, wherein the updated first dynamic proportion is less than The first dynamic proportion before the update and the second dynamic proportion after the update are greater than the second dynamic proportion before the update.

In a possible implementation manner, after updating the first dynamic proportion and the second dynamic proportion based on the first update step size, the method further includes:

In the case that the updated first dynamic proportion is greater than or equal to 1, the second clipping identifier is replaced with the first clipping identifier.

Optionally, the method further includes:

In the case that the signal amplitude of the third audio signal is greater than the first amplitude threshold and the first clipping flag is set, the first clipping flag is replaced with the second clipping flag.

In a possible implementation manner, performing signal compensation on the second audio signal to obtain a third audio signal, including:

determining the analog gain difference between the first analog gain and the second analog gain;

Gain compensation is performed on the second audio signal based on the analog gain difference to obtain a third audio signal.

Perform gain compensation on the second audio signal based on the analog gain difference to obtain a fourth audio signal;

determining the signal amplitude ratio of the first audio signal and the fourth audio signal;

Amplitude compensation is performed on the fourth audio signal based on the signal amplitude ratio to obtain a third audio signal.

Because the same original audio signal has undergone different degrees of analog gain, there is a big difference between the first audio signal and the second audio signal. If the first audio signal and the second audio signal are directly fused, the fusion target will appear. The audio signal has the problem of signal mutation. In order to improve the quality of signal fusion, in a possible implementation manner, before performing signal fusion, the computer device first needs to perform signal compensation on the second audio signal, so as to reduce the signal between the first audio signal and the second audio signal difference. Exemplary embodiments are used for description below.

Please refer to FIG. 4, which shows a flowchart of an audio signal processing method provided by another exemplary embodiment of the present application, the method includes:

Step 401: Obtain the first audio signal output by the first channel and the second audio signal output by the second channel. The first audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal that has undergone the first analog gain. The second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with the second analog gain, and the first analog gain is greater than the second analog gain.

For the implementation of this step, reference may be made to the foregoing step 302, and details are not described herein again in this embodiment.

As shown in FIG. 5 , the original audio signal is outputted after the first analog gain and analog-to-digital conversion, and the original audio signal is outputted after the second analog gain and analog-to-digital conversion.

Step 402: Perform signal compensation on the second audio signal to obtain a third audio signal, where the signal compensation is used to compensate for the signal difference caused by the first analog gain and the second analog gain.

After the same original audio signal has undergone different degrees of analog gain, its signal amplitude is different, so the computer equipment first needs to compensate the second audio signal based on the first audio signal, so that the second audio signal after signal compensation is as close as possible to the first audio signal. audio signal, so as to subsequently perform signal fusion on two audio signals with similar amplitudes.

Since the difference between the first audio signal and the second audio signal is mainly caused by the analog gain, the computer device performs gain compensation on the second audio signal (different from the analog gain of the PGA, here is the digital gain). Moreover, since the first analog gain is higher than the second analog gain, it is necessary to increase the gain of the second audio signal when performing the gain compensation, so as to obtain the third audio signal.

Since the first analog gain and the second analog gain in the first channel and the second channel are known, in a possible implementation, the computer device determines the analog gain difference between the first analog gain and the second analog gain, thereby Gain compensation is performed on the second audio signal based on the analog gain difference to obtain a third audio signal.

Optionally, the computer device determines the gain multiplier based on the analog gain difference between the first analog gain and the second analog gain, so as to perform gain compensation on the second audio signal based on the gain multiplier to obtain the third audio signal.

In an illustrative example, when the first audio signal is x1, the second audio signal is x2, the first analog gain is g1dB, and the second analog gain is g2dB, the computer device determines the gain multiple to be

Correspondingly, the third audio signal

However, in practical applications, it is found that due to the possible difference between the actual analog gain and the preset analog gain (which may be caused by PGA), correspondingly, the difference between the second audio signal after fixed gain compensation and the first audio signal is There are still some differences in magnitude. Therefore, in order to further reduce the difference between the audio signals, as shown in FIG. 5 , the second audio signal is first compensated by a fixed gain to obtain the fourth audio signal, and then the third audio signal is obtained by adaptive amplitude compensation (fine adjustment).

Specifically, during the adaptive amplitude compensation, as shown in FIG. 6 , the computer device calculates the respective signal amplitudes of the first audio signal and the fourth audio signal, so as to determine the signal amplitude difference between the two, and then based on the signal amplitude difference Amplitude compensation is performed on the fourth audio signal to obtain a third audio signal.

In a possible implementation, as shown in FIG. 7 , this step may include the following steps:

Step 402A: Determine the analog gain difference between the first analog gain and the second analog gain.

Step 402B: Perform gain compensation on the second audio signal based on the analog gain difference to obtain a fourth audio signal.

For the implementation of

steps

402A and 402B, reference may be made to the foregoing embodiments, and details are not described herein again in this embodiment.

Step 402C, determining the signal amplitude ratio of the first audio signal and the fourth audio signal.

Optionally, the computer device calculates the signal amplitude of the first audio signal and the signal amplitude of the fourth audio signal, and determines the signal amplitude difference by obtaining the signal amplitude ratio between the two.

In a possible implementation manner, the signal amplitude of the audio signal is represented by an energy envelope, and correspondingly, the signal amplitude ratio of the first audio signal and the fourth audio signal can be represented by the energy envelope ratio, that is, the signal amplitude ratio ratio=first audio signal energy envelope/fourth audio signal energy envelope.

Certainly, the signal amplitude of the audio signal may also be represented by other parameters than the energy envelope, which is not limited in this embodiment.

Step 402D, performing amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain a third audio signal.

Further, the computer device performs amplitude compensation on the fourth audio signal based on the determined signal amplitude ratio to obtain the third audio signal.

With reference to the examples in the above steps, the third audio signal obtained through amplitude compensation can be expressed as x2_c=x2_g×ratio.

Step 403: Perform signal fusion on the first audio signal and the third audio signal to obtain a target audio signal.

Since the signal collected by the microphone may change at any time, that is, the high sound pressure level signal and the non-high sound pressure level signal may appear alternately, so the computer equipment needs to be based on the real-time sound pressure level of the signal. Perform dynamic fusion.

In order to improve the overall recording quality in the high dynamic range, in a possible implementation, when the signal pickup quality of the first channel is higher than the signal pickup quality of the second channel (non-high sound pressure level signal), the computer equipment When performing signal fusion, increase the proportion of the first audio signal, reduce the proportion of the third audio signal, and improve the recording effect of non-high sound pressure level signals; when the signal pickup quality of the second channel is higher than that of the first channel. (high sound pressure level signal), the computer equipment increases the proportion of the third audio signal when performing signal fusion, reduces the proportion of the first audio signal, and improves the recording quality of the high sound pressure level signal. Optionally, this step may include the following steps.

1. Determine the respective fusion proportions corresponding to the first audio signal and the third audio signal.

The sum of the corresponding fusion proportions of the first audio signal and the third audio signal is 1. For example, if the fusion proportion of the first audio signal is a, the fusion proportion of the third audio signal is 1-a, and 0≤a ≤1.

Since the first analog gain at the first channel is relatively large, clipping may occur during the analog-to-digital conversion of the original audio signal that has passed through the first analog gain in the first channel, while the second analog gain output from the second channel is relatively high. Small, so less prone to clipping. If the first audio signal is directly analyzed to determine whether clipping occurs in the first channel (for example, it is determined that clipping occurs in the first audio signal at the highest amplitude or higher than a preset amplitude), the probability of misjudgment is high. In order to improve the recognition accuracy of the clipping situation in the first channel, in this embodiment of the present application, the computer device determines the clipping situation of the first channel based on the third audio signal, and further, the computer device determines the clipping situation of the first channel based on the clipping situation of the first channel. , and the real-time signal amplitude of the third audio signal, dynamically determine the respective fusion proportions of the first audio signal and the third audio signal.

In a possible implementation manner, the respective fusion ratios corresponding to the first audio signal and the third audio signal include the following situations.

Situation 1, when the signal amplitude of the third audio signal is less than the first amplitude threshold and is provided with the first clipping mark, it is determined that the fusion proportion of the first audio signal is 1, and the fusion proportion of the third audio signal is 0, The first clipping flag is used to indicate that the first channel is not clipped.

In this embodiment, two levels of amplitude thresholds are set in the computer device, which are a first amplitude threshold and a second amplitude threshold, wherein the second amplitude threshold is greater than the first amplitude threshold. Optionally, when the signal amplitude of the third audio signal is less than the first amplitude threshold, the computer device determines that no clipping occurs in the first channel; when the signal amplitude of the third audio signal is greater than or equal to the first amplitude threshold but less than the second amplitude threshold When , the computer device determines that clipping may occur in the first channel; when the signal amplitude of the third audio signal is greater than or equal to the second amplitude threshold, the computer device determines that clipping occurs in the first channel.

In some embodiments, the computer device sets the first amplitude threshold and the second amplitude threshold based on the analog gain conditions of the two channels, which is not limited in this embodiment.

Optionally, the computer equipment is provided with a clipping flag bit (flag) for characterizing the clipping situation of the first channel, wherein, when the clipping flag bit sets the first clipping flag (flag=0), it indicates that the first channel is not Clipping occurs. When the clipping flag bit is set to the second clipping flag (flag=1), it indicates that the first channel is clipped.

Regarding the setting mode of the clipping flag, in a possible implementation manner, the initial clipping flag of the clipping flag bit is the first clipping flag, and when the signal amplitude of the third audio signal is greater than the first amplitude threshold, the computer equipment It is determined that clipping may occur in the first channel, and the first clipping mark is adjusted to the second clipping mark; and in order to improve the smoothness of the target audio signal after fusion, when the second clipping mark is currently provided, when the first clipping mark is When the signal amplitude of the three audio signals is less than the first amplitude threshold, the computer device does not directly adjust the second clipping flag to the first clipping flag, but when the signal amplitude of the third audio signal is less than the first amplitude threshold and reaches a certain duration When , the second clipping flag is adjusted to the first clipping flag.

Optionally, when the signal amplitude of the third audio signal is less than the first amplitude threshold, and the first clipping flag is set, it indicates that the first channel has not clipped in a recent period of time, and because no clipping has occurred in the In this case, the signal pickup quality of the first channel is better than that of the second channel (because the equivalent input noise of the first channel is smaller), so the computer device determines that the fusion ratio of the first audio signal is 1, and the third audio The fusion ratio of the signal is 0, that is, the third audio signal is not integrated into the signal fusion.

Illustratively, the signal amplitude mag=abs(x2_c) of the third audio signal x2_c, when mag<thrd1 and flag=0, the computer device determines that the fusion ratio of the first audio signal x1 is 1, and the third audio signal x2_c has a The fusion proportion is 0.

Situation 2: When the signal amplitude of the third audio signal is greater than the second amplitude threshold, it is determined that the fusion proportion of the first audio signal is 0, the fusion proportion of the third audio signal is 1, and the second amplitude threshold is greater than the first amplitude threshold. .

Since the signal amplitude of the third audio signal is greater than the second amplitude threshold (because the first amplitude threshold is reached before being greater than the second amplitude threshold, when the clipping flag is set with the second clipping flag), the first channel Clipping will inevitably occur, and in the case of clipping, the first audio signal will be distorted. Therefore, in order to avoid distortion of the final output audio signal, the computer equipment determines that the fusion ratio of the first audio signal is 0, and the third audio signal is 0. The fusion ratio of the audio signal is 1, that is, the first audio signal is not integrated into the signal fusion.

Illustratively, the signal amplitude mag=abs(x2_c) of the third audio signal x2_c, when mag≥thrd2, the computer device determines that the fusion weight of the first audio signal x1 is 0, and the fusion weight of the third audio signal x2_c is 1.

Case 3: When the signal amplitude of the third audio signal is less than the first amplitude threshold and a second clipping flag is set, or the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, It is determined that the fusion proportion of the first audio signal is the first dynamic proportion, and the fusion proportion of the second audio signal is determined to be the second dynamic proportion, and the second clipping mark is used to characterize that the first channel is clipped, and the first dynamic proportion and the second The sum of the dynamic proportions is 1.

When the signal amplitude of the third audio signal is less than the first amplitude threshold and the second clipping flag is set, or when the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, it indicates that the first channel Clipping may occur in a recent period of time. Since the change process of the signal amplitude is not a mutation process, in order to improve the smoothness of the target audio signal after fusion, the computer equipment determines the fusion weight of the first audio signal and the second audio signal as the dynamic weight. , that is, during the change of the signal amplitude, the fusion proportion of the first audio signal and the second audio signal also changes accordingly.

In a possible implementation, the computer device is provided with a first dynamic specific gravity and a second dynamic specific gravity, wherein the first dynamic specific gravity is g (g is initially 1), and the second dynamic specific gravity is 1-g. In the above-mentioned situation three, the computer device performs signal fusion based on the first dynamic proportion and the second dynamic proportion.

Moreover, after the signal fusion is completed in the above situation, the computer device will update the first dynamic proportion and the second dynamic proportion based on the magnitude relationship between the signal amplitude of the third audio signal and the first amplitude threshold and the second amplitude threshold. Wherein, when it is determined that the clipping probability of the first channel is reduced based on the magnitude relationship between the signal amplitude of the third audio signal and the first amplitude threshold and the second amplitude threshold, the computer device increases the first dynamic proportion and reduces the second Dynamic weight; when it is determined that the clipping probability of the first channel is increased, the computer device reduces the first dynamic weight and increases the second dynamic weight.

In a possible implementation manner, when the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, the computer device updates the first dynamic proportion and the first dynamic proportion based on the first update step size. Two dynamic proportions, wherein the first dynamic proportion after the update is greater than the first dynamic proportion before the update, and the second dynamic proportion after the update is smaller than the second dynamic proportion before the update.

The first update step size may be determined based on the sampling rate of the audio signal. Optionally, the first update step size is negatively correlated with the sampling rate, that is, the higher the sampling rate, the smaller the first update step size, and the lower the sampling rate, the larger the first update step size.

In an illustrative example, when the first dynamic weight is g, the second dynamic weight is 1-g, and the first update step is delta_rel (0<delta_rel<1), if the signal amplitude of the third audio signal is is less than the first amplitude threshold, and is provided with a second clipping mark, after the computer equipment completes the signal fusion based on g and 1-g, the g is updated to g+delta_rel, thereby improving the fusion proportion of the first audio signal during subsequent signal fusion, Reduce the fusion weight of the third audio signal.

Optionally, after each dynamic proportion, the computer device detects whether the first dynamic proportion is greater than or equal to 1. If the first dynamic proportion is greater than or equal to 1, it indicates that the signal amplitude of the third audio signal within a period of time is lower than the first amplitude threshold. , and the first channel does not clip for a period of time, so the second clipping flag is replaced with the first clipping flag.

In addition, since the maximum value of the first dynamic specific gravity is 1, when the updated first dynamic specific gravity is greater than or equal to 1, the computer device sets the first dynamic specific gravity to 1.

In another possible implementation, when the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, the computer device updates the first dynamic proportion and the second dynamic proportion based on the second update step size weight, wherein the updated first dynamic weight is smaller than the pre-update first dynamic weight, and the updated second dynamic weight is greater than the before-update second dynamic weight.

Moreover, when the signal amplitude of the third audio signal is greater than the first amplitude threshold and the first clipping flag is set, the computer device replaces the first clipping flag with the second clipping flag, indicating that the first channel may be clipped.

The second update step size may be determined based on the sampling rate of the audio signal. Optionally, the second update step size is negatively correlated with the sampling rate, that is, the higher the sampling rate is, the smaller the second update step size is, and the lower the sampling rate is, the larger the second update step size is.

In an illustrative example, when the first dynamic weight is g, the second dynamic weight is 1-g, and the first update step is delta_att (0<delta_att<1), if the signal amplitude of the third audio signal is Greater than or equal to the first amplitude threshold and less than the second amplitude threshold, after the computer equipment completes the signal fusion based on g and 1-g, it updates g to g-delta_att, thereby reducing the fusion proportion of the first audio signal during subsequent signal fusion, and improving the first audio signal. The fusion proportion of the three audio signals.

In addition, since the minimum value of the first dynamic proportion is 0, when the updated first dynamic proportion is less than or equal to 0, the computer device sets the first dynamic proportion to 0.

2. Perform signal fusion on the first audio signal and the third audio signal based on the fusion proportion to obtain a target audio signal.

Combining the three cases in the above steps, when the first audio signal is x1, the third audio signal is x2_c, and the first dynamic proportion is g, and the second dynamic proportion is 1-g, in the case, because the first audio signal The fusion weight of 1 is 1, and the fusion weight of the third audio signal is 0, so the target audio signal is the first audio signal, that is, y=x1.

In the second case, since the fusion weight of the first audio signal is 0 and the fusion weight of the third audio signal is 1, the target audio signal is the second audio signal, that is, y=x2_c.

In the third case, when the signal amplitude of the third audio signal is less than the first amplitude threshold, and the second clipping flag is set, the target audio signal y=g*x1+(1-g)*x2_c obtained by computer equipment fusion, and Update g to g+delta_rel.

When the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, the target audio signal y=g*x1+(1-g)*x2_c obtained by the computer device fusion, and update g to g-delta_att , and set flag=1.

In this embodiment, the computer device performs fixed gain compensation and adaptive amplitude compensation on the second audio signal, so that the third audio signal obtained after compensation is as close to the first audio signal as possible, which helps to improve the quality of subsequent signal fusion.

In addition, in this embodiment, the computer device determines the clipping situation at the first channel based on the signal amplitude of the third audio signal, and further dynamically determines that the signal fusion is the first audio frequency based on the clipping situation and the signal amplitude of the first channel. The fusion ratio of the signal and the third audio signal avoids the occurrence of jumps in the target audio signal, and improves the fusion quality and smoothness of the target audio signal after fusion.

In conjunction with the above embodiments, in an illustrative example, when the original audio signal is gradually increased from a non-high sound pressure level to a high sound pressure level, and then gradually decreased to a non-high sound pressure level, the computer equipment performs the audio signal processing process as follows: shown in Figure 8.

Step 801: Obtain the first audio signal output by the first channel and the second audio signal output by the second channel.

Step 802, performing fixed gain compensation and adaptive amplitude compensation on the second audio signal to obtain a third audio signal.

Among them, steps 801 to 802 are signal compensation processes.

Step 803, the signal amplitude of the third audio signal is smaller than the first amplitude threshold (initially the first clipping flag), and the first audio signal is output.

When a non-high sound pressure level signal is collected, the audio signal of the first channel is directly output.

Step 804, the signal amplitude of the third audio signal is greater than or equal to the first amplitude threshold and less than the second amplitude threshold, and based on the first dynamic proportion and the second dynamic proportion, the first audio signal and the third audio signal are fused and output. .

Step 805 , based on the second update step size, reduce the first dynamic proportion (increase the second dynamic proportion), and set a second clipping flag.

In steps 804 to 805, when the non-high sound pressure level signal gradually becomes a high sound pressure level signal, the fusion proportion of the second channel is dynamically increased, the fusion proportion of the first channel is decreased, and signal fusion is performed.

Step 806, the signal amplitude of the third audio signal is greater than or equal to the second amplitude threshold, and the third audio signal is output.

When a high sound pressure level signal is collected, the audio signal of the second channel is directly output.

Step 807, the signal amplitude of the third audio signal is less than the first amplitude threshold (this time is the second clipping mark), and based on the first dynamic proportion and the second dynamic proportion, the first audio signal and the third audio signal are subjected to signal fusion. and output.

Step 808, based on the first update step size, increase the first dynamic proportion (decrease the second dynamic proportion).

Step 809, when the first dynamic proportion is greater than or equal to 1, set the first clipping flag.

In steps 807 to 809, in the process of gradually changing the high sound pressure level signal into a non-high sound pressure level signal, dynamically increase the fusion proportion of the first channel, decrease the fusion proportion of the second channel, and perform signal fusion.

The following are apparatus embodiments of the present application, which can be used to execute the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Please refer to FIG. 9 , which shows a structural block diagram of an audio signal processing apparatus provided by an embodiment of the present application. The apparatus has the function of realizing the function executed by the computer device in the above method embodiment, and the function may be realized by hardware, or may be realized by hardware executing corresponding software. As shown in Figure 9, the apparatus may include:

A signal acquisition module 901, configured to acquire the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is an analog-to-digital conversion of the original audio signal that has undergone the first analog gain The second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

A signal fusion module 902, configured to perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the first audio signal and the Superposition of the dynamic range of the second audio signal.

Optionally, the signal fusion module 902 includes:

a compensation unit, configured to perform signal compensation on the second audio signal to obtain a third audio signal, wherein the signal compensation is used to compensate the signal difference caused by the first analog gain and the second analog gain;

A fusion unit, configured to perform signal fusion on the first audio signal and the third audio signal to obtain the target audio signal.

Optional, fusion unit, specifically for:

Perform signal fusion on the first audio signal and the third audio signal based on the fusion ratio to obtain the target audio signal.

Optionally, when determining the respective fusion proportions of the first audio signal and the third audio signal, the fusion unit is specifically used for:

In the case that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the first clipping flag is set, it is determined that the fusion proportion of the first audio signal is 1, and the fusion proportion of the third audio signal is 1 is 0, and the first clipping flag is used to represent that the first channel is not clipped;

In the case that the signal amplitude of the third audio signal is greater than the second amplitude threshold, it is determined that the fusion proportion of the first audio signal is 0, the fusion proportion of the third audio signal is 1, and the second amplitude threshold greater than the first amplitude threshold;

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold, and a second clipping flag is set, or the signal amplitude of the third audio signal is greater than the first amplitude threshold and smaller than the first amplitude threshold In the case of two amplitude thresholds, the fusion proportion of the first audio signal is determined to be the first dynamic proportion, the fusion proportion of the second audio signal is determined to be the second dynamic proportion, and the second clipping flag is used to represent the The first channel is clipped, and the sum of the first dynamic specific gravity and the second dynamic specific gravity is 1.

Optionally, the device further includes:

The first specific gravity update module is used to update the third audio signal based on the first update step when the signal amplitude of the third audio signal is less than the first amplitude threshold and the second clipping flag is set. A dynamic proportion and the second dynamic proportion, wherein the first dynamic proportion after the update is greater than the first dynamic proportion before the update, and the second dynamic proportion after the update is smaller than the second dynamic proportion before the update dynamic proportion;

A second proportion updating module, configured to update the first dynamic proportion based on a second update step size when the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold and the second dynamic proportion, wherein the first dynamic proportion after the update is smaller than the first dynamic proportion before the update, and the second dynamic proportion after the update is greater than the second dynamic proportion before the update.

Optionally, the device further includes:

A first identifier replacement module, configured to replace the second clipping identifier with the first clipping identifier when the updated first dynamic proportion is greater than or equal to 1.

Optionally, the device further includes:

A second identifier replacement module, configured to replace the first clipping identifier with the first clipping identifier when the signal amplitude of the third audio signal is greater than the first amplitude threshold the second clipping flag.

Optional, compensation unit, specifically for:

determining an analog gain difference between the first analog gain and the second analog gain;

Gain compensation is performed on the second audio signal based on the analog gain difference to obtain the third audio signal.

Optional, compensation unit, specifically for:

determining a signal amplitude ratio of the first audio signal to the fourth audio signal;

Perform amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain the third audio signal.

In addition, in this embodiment, the computer device determines the clipping situation of the first channel based on the signal amplitude of the third audio signal, and further dynamically determines that the signal fusion is the first audio frequency based on the clipping situation and the signal amplitude of the first channel. The fusion ratio of the signal and the third audio signal avoids the occurrence of jumps in the target audio signal, and improves the fusion quality and smoothness of the target audio signal after fusion.

Please refer to FIG. 10 , which shows a structural block diagram of a computer device provided by an exemplary embodiment of the present application. The computer device 1000 may be a smartphone, a tablet computer, a wearable device, or the like. The computer device 1000 in this application may include one or more of the following components: a processor 1010 and a memory 1020 .

Processor 1010 may include one or more processing cores. The processor 1010 uses various interfaces and lines to connect various parts of the entire computer device 1000, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 1020, and calling the data stored in the memory 1020. Various functions of the computer device 1400 and processing data. Optionally, the processor 1010 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA). A hardware form is implemented. The processor 1010 may integrate one or more of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a neural network processor (Neural-network Processing Unit, NPU), and a modem, etc. The combination. Among them, the CPU mainly processes the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing the content that needs to be displayed on the touch display screen 1030; the NPU is used for implementing artificial intelligence (Artificial Intelligence, AI) functions; the modem is used for Handle wireless communications. It can be understood that, the above-mentioned modem may not be integrated into the processor 1010, but is implemented by a single chip.

The memory 1020 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory, ROM). Optionally, the memory 1020 includes a non-transitory computer-readable storage medium. Memory 1020 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 1020 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like for implementing the various method embodiments described below; the storage data area may store data (such as audio data, phone book) and the like created according to the use of the computer device 1000 .

Optionally, the computer device 1000 is provided with a microphone 1030, and the microphone 1030 may be a built-in microphone of the computer device 1000 or an external microphone connected to the computer device 1000 through a microphone interface.

In this embodiment of the present application, the computer device 1000 is further provided with a first channel and a second channel (audio circuit), and the microphone 1030 is respectively connected with the first channel and the second channel, wherein the first channel is provided with a high-gain PGA and The first ADC and the second channel are provided with a low-gain PGA and a second ADC. The first channel and the second channel respectively perform analog gain and analog-to-digital conversion on the original audio signal output by the microphone, and convert the converted two channels of audio signals. Input to the processor 1010, and the processor 1010 performs audio signal processing.

In addition, those skilled in the art can understand that the structure of the computer device 1000 shown in the above drawings does not constitute a limitation on the computer device, and the computer device may include more or less components than those shown in the drawings, or combinations thereof. certain components, or different component arrangements. For example, the computer device 1000 also includes components such as a display screen, a sensor, a speaker, and a power supply, which will not be repeated here.

Embodiments of the present application further provide a computer-readable storage medium, where at least one piece of program code is stored in the computer-readable storage medium, and the program code is loaded and executed by a processor to implement the audio signal processing according to the above embodiments. Approach.

According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the audio signal processing methods provided in various optional implementations of the above aspects.

It should be understood that references herein to "a plurality" means two or more. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship. In addition, the numbering of the steps described in this document only exemplarily shows a possible execution sequence between the steps. In some other embodiments, the above steps may also be executed in different order, such as two different numbers. The steps are performed at the same time, or two steps with different numbers are performed in a reverse order to that shown in the figure, which is not limited in this embodiment of the present application.

The above descriptions are only optional embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the protection of the present application. within the range.

Claims

A method for processing an audio signal, the method comprising:

Obtain the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with the first analog gain, the The second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

Perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the dynamic range of the first audio signal and the second audio signal superposition.
The method according to claim 1, wherein the performing signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal comprises:

performing signal compensation on the second audio signal to obtain a third audio signal, wherein the signal compensation is used to compensate the signal difference caused by the first analog gain and the second analog gain;

Perform signal fusion on the first audio signal and the third audio signal to obtain the target audio signal.
The method according to claim 2, wherein the performing signal fusion on the first audio signal and the third audio signal to obtain the target audio signal comprises:

Determine the respective fusion proportions corresponding to the first audio signal and the third audio signal;

Perform signal fusion on the first audio signal and the third audio signal based on the fusion ratio to obtain the target audio signal.
The method according to claim 3, wherein the determining the respective fusion proportions corresponding to the first audio signal and the third audio signal comprises:

In the case that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the first clipping flag is set, it is determined that the fusion proportion of the first audio signal is 1, and the fusion proportion of the third audio signal is 1 is 0, and the first clipping flag is used to represent that the first channel is not clipped;

In the case that the signal amplitude of the third audio signal is greater than the second amplitude threshold, it is determined that the fusion proportion of the first audio signal is 0, the fusion proportion of the third audio signal is 1, and the second amplitude threshold greater than the first amplitude threshold;

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold, and a second clipping flag is set, or the signal amplitude of the third audio signal is greater than the first amplitude threshold and smaller than the first amplitude threshold In the case of two amplitude thresholds, the fusion proportion of the first audio signal is determined to be the first dynamic proportion, the fusion proportion of the second audio signal is determined to be the second dynamic proportion, and the second clipping flag is used to represent the The first channel is clipped, and the sum of the first dynamic specific gravity and the second dynamic specific gravity is 1.
The method according to claim 4, wherein after determining that the fusion weight of the first audio signal is the first dynamic weight, and determining that the fusion weight of the second audio signal is the second dynamic weight, the method further comprises: include:

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the second clipping flag is set, the first dynamic weight and the second dynamic weight are updated based on a first update step size A dynamic proportion, wherein the updated first dynamic proportion is greater than the first dynamic proportion before the update, and the updated second dynamic proportion is smaller than the second dynamic proportion before the update;

in the case where the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold, updating the first dynamic proportion and the second dynamic proportion based on a second update step size, Wherein, the updated first dynamic proportion is smaller than the first dynamic proportion before the update, and the updated second dynamic proportion is greater than the second dynamic proportion before the update.
The method according to claim 5, wherein after updating the first dynamic proportion and the second dynamic proportion based on the first update step size, the method further comprises:

In the case that the updated first dynamic proportion is greater than or equal to 1, the second clipping identifier is replaced with the first clipping identifier.
The method of claim 4, wherein the method further comprises:

When the signal amplitude of the third audio signal is greater than the first amplitude threshold and the first clipping flag is set, the first clipping flag is replaced with the second clipping flag.
The method according to claim 2, wherein the performing signal compensation on the second audio signal to obtain a third audio signal comprises:

determining an analog gain difference between the first analog gain and the second analog gain;

Gain compensation is performed on the second audio signal based on the analog gain difference to obtain the third audio signal.
The method according to claim 2, wherein the performing signal compensation on the second audio signal to obtain a third audio signal comprises:

determining an analog gain difference between the first analog gain and the second analog gain;

Perform gain compensation on the second audio signal based on the analog gain difference to obtain a fourth audio signal;

determining a signal amplitude ratio of the first audio signal to the fourth audio signal;

Perform amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain the third audio signal.
An audio signal processing device, the device comprises:

A signal acquisition module, configured to acquire the first audio signal output by the first channel and the second audio signal output by the second channel, where the first audio signal is obtained by performing analog-to-digital conversion on the original audio signal that has undergone the first analog gain the obtained signal, the second audio signal is a signal obtained by performing analog-to-digital conversion on the original audio signal with a second analog gain, and the first analog gain is greater than the second analog gain;

A signal fusion module, configured to perform signal fusion based on the first audio signal and the second audio signal to obtain a target audio signal, wherein the dynamic range of the target audio signal is the first audio signal and the second audio signal. The superposition of the dynamic range of the two audio signals.
The apparatus according to claim 10, wherein the signal fusion module comprises:

a compensation unit, configured to perform signal compensation on the second audio signal to obtain a third audio signal, wherein the signal compensation is used to compensate the signal difference caused by the first analog gain and the second analog gain;

A fusion unit, configured to perform signal fusion on the first audio signal and the third audio signal to obtain the target audio signal.
The apparatus according to claim 11, wherein the fusion unit is used for:

Determine the respective fusion proportions corresponding to the first audio signal and the third audio signal;

Perform signal fusion on the first audio signal and the third audio signal based on the fusion ratio to obtain the target audio signal.
The device according to claim 12, wherein, in the process of determining the respective corresponding fusion proportions of the first audio signal and the third audio signal, the fusion unit is configured to:

In the case that the signal amplitude of the third audio signal is smaller than the first amplitude threshold and the first clipping flag is set, it is determined that the fusion proportion of the first audio signal is 1, and the fusion proportion of the third audio signal is 1 is 0, and the first clipping flag is used to represent that the first channel is not clipped;

In the case that the signal amplitude of the third audio signal is greater than the second amplitude threshold, it is determined that the fusion proportion of the first audio signal is 0, the fusion proportion of the third audio signal is 1, and the second amplitude threshold greater than the first amplitude threshold;

When the signal amplitude of the third audio signal is smaller than the first amplitude threshold, and a second clipping flag is set, or the signal amplitude of the third audio signal is greater than the first amplitude threshold and smaller than the first amplitude threshold In the case of two amplitude thresholds, the fusion proportion of the first audio signal is determined to be the first dynamic proportion, the fusion proportion of the second audio signal is determined to be the second dynamic proportion, and the second clipping flag is used to represent the The first channel is clipped, and the sum of the first dynamic specific gravity and the second dynamic specific gravity is 1.
The apparatus of claim 13, wherein the apparatus further comprises:

A first specific gravity update module, configured to update the third audio signal based on a first update step when the signal amplitude of the third audio signal is less than the first amplitude threshold and the second clipping flag is set. A dynamic proportion and the second dynamic proportion, wherein the first dynamic proportion after the update is greater than the first dynamic proportion before the update, and the second dynamic proportion after the update is smaller than the second dynamic proportion before the update dynamic proportion;

A second proportion updating module, configured to update the first dynamic proportion based on a second update step size when the signal amplitude of the third audio signal is greater than the first amplitude threshold and less than the second amplitude threshold and the second dynamic proportion, wherein the first dynamic proportion after the update is smaller than the first dynamic proportion before the update, and the second dynamic proportion after the update is greater than the second dynamic proportion before the update.
The apparatus of claim 14, wherein the apparatus further comprises:

A first identifier replacement module, configured to replace the second clipping identifier with the first clipping identifier when the updated first dynamic proportion is greater than or equal to 1.
The apparatus of claim 13, wherein the apparatus further comprises:

A second identifier replacement module, configured to replace the first clipping identifier with the first clipping identifier when the signal amplitude of the third audio signal is greater than the first amplitude threshold the second clipping flag.
The apparatus according to claim 11, wherein the compensation unit is used for:

determining an analog gain difference between the first analog gain and the second analog gain;

Gain compensation is performed on the second audio signal based on the analog gain difference to obtain the third audio signal.
The apparatus according to claim 11, wherein the compensation unit is used for:

determining an analog gain difference between the first analog gain and the second analog gain;

Perform gain compensation on the second audio signal based on the analog gain difference to obtain a fourth audio signal;

determining a signal amplitude ratio of the first audio signal to the fourth audio signal;

Perform amplitude compensation on the fourth audio signal based on the signal amplitude ratio to obtain the third audio signal.
A computer device comprising a processor and a memory, wherein the memory stores at least one instruction, the at least one instruction is loaded and executed by the processor to implement any one of claims 1 to 9 The processing method of the audio signal.
A computer-readable storage medium having at least one piece of program code stored in the computer-readable storage medium, the program code being loaded and executed by a processor to implement the audio signal processing according to any one of claims 1 to 9 method.
A computer program product comprising computer instructions stored in a computer-readable storage medium; a processor of a computer device reading the computer instructions from the computer-readable storage medium, the The processor executes the computer instructions to implement the audio signal processing method according to any one of claims 1 to 9.