CN116627377A

CN116627377A - Audio processing method, device, electronic equipment and storage medium

Info

Publication number: CN116627377A
Application number: CN202310597099.7A
Authority: CN
Inventors: 程戈
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-08-22

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, and belongs to the technical field of audio signal processing. Comprising the following steps: receiving a first input of a user; responding to the first input, determining second audio with volume to be adjusted from K paths of first audio, wherein K is an integer greater than 1; performing volume adjustment on the second audio to obtain a third audio; and playing the third audio and the first audio except the second audio in the K paths of first audio.

Description

Audio processing method, device, electronic equipment and storage medium

Technical Field

The application belongs to the field of audio signal processing, and particularly relates to an audio processing method, an audio processing device, electronic equipment and a storage medium.

Background

The development of internet technology has led to the popularization of group chat methods such as group voice and group video. However, in the group chat scenario, once the voice quality of the member is poor, the conversation experience of all the members is affected.

For example, in a group chat scenario, the voice volume of different members is different, but the volume adjustment operable by the user is to perform fixed adjustment on the whole of all members, for example, the volume of all members is fixed to be increased by 5 db, so that the situation that the voice volume of a certain member is higher, the voice volume of the member needs to be further increased, and the member with low volume needs to be further increased to be heard may occur.

Therefore, the user cannot effectively adjust the volume in the group chat scene through the terminal, and the conversation experience of the user is directly affected.

Disclosure of Invention

The embodiment of the application aims to provide an audio processing method, an audio processing device, electronic equipment and a storage medium, which can solve the problem of effectiveness of volume adjustment in a group chat scene.

In a first aspect, an embodiment of the present application provides an audio processing method, including:

receiving a first input of a user;

responding to the first input, determining second audio with volume to be adjusted from K paths of first audio, wherein K is an integer greater than 1;

performing volume adjustment on the second audio to obtain a third audio;

and playing the third audio and the first audio except the second audio in the K paths of first audio.

In a second aspect, an embodiment of the present application provides an audio processing apparatus, including:

a first receiving unit for receiving a first input of a user;

the audio determining unit is used for responding to the first input, determining second audio with volume to be adjusted from K paths of first audio, wherein K is an integer greater than 1;

the adjusting unit is used for adjusting the volume of the second audio to obtain a third audio;

And the playing unit is used for playing the third audio and the first audio except the second audio in the K paths of first audio.

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.

In the embodiment of the application, the second audio with the volume to be adjusted is determined from the K paths of first audio based on the first input, so that the volume of one path or the K paths of second audio is designated to be adjusted in the K paths of first audio playing scenes, the problem of hearing caused by overall volume up or overall volume down of the K paths of audio is avoided, the effectiveness of volume adjustment is ensured, and the tone quality experience in the K paths of volume synchronous playing scenes is optimized.

Drawings

Fig. 1 is a schematic flow chart of an audio processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a display according to an embodiment of the present application;

fig. 3 is a flowchart of a volume adjustment method based on standard audio according to an embodiment of the present application;

FIG. 4 is a second schematic illustration of the present application;

FIG. 5 is a second flowchart of an audio processing method according to an embodiment of the present application;

FIG. 6 is a third schematic illustration of the display provided by the embodiment of the application;

FIG. 7 is a third flowchart of an audio processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The audio processing method provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

In the group chat scene, the voice volume of different members is different, but the user-operable volume adjustment is uniformly adjusted up or down for the whole of all members, that is, the user-operable volume adjustment is performed for the whole voice after the voice of all members is mixed. For example, when the voice volume of all the members is uniformly increased, there is a high possibility that the voice volume of one member is low and cannot be heard, and the voice volume of the other member is already high, so that the voice volume of the other member is not increased. Or, the voice volume of a certain member may be too high, but if the voice volume of all the members is uniformly reduced, the voice volume of the member voice with the proper current volume is reduced, so that the problem that part of the member voices cannot be heard is caused.

In order to solve the problem, the embodiment of the application provides an audio processing method which can be applied to the scene of chat or online meeting of group members such as group voice, group video and the like. Fig. 1 is a schematic flow chart of an audio processing method according to an embodiment of the present application, as shown in fig. 1, the method includes:

Step 110, receiving a first input of a user;

and step 120, in response to the first input, determining a second audio with volume to be adjusted from K paths of first audio, wherein K is an integer greater than 1.

Specifically, the first audio is the audio being played, and the first audio may be the audio collected by a microphone set by the terminal, or may be the audio collected by an external microphone connected by wire or wirelessly, or may be the audio received by the terminal based on a network. K paths of synchronously played first audio exist, and sources of the K paths of first audio can be the same or different. Here, K is an integer greater than 1, and K paths of first audios are at least two paths of first audios. The form of synchronous playing of the K paths of first audio can be based on K parallel audio tracks, or can be that the K paths of first audio are synchronously played, or the K paths of first audio are subjected to audio mixing processing, so that audio after audio mixing is played.

It can be understood that the group chat scene generally comprises at least two groups of chat members, the K paths of first audio can be voices respectively corresponding to the groups of chat members, and at this time, the K paths of first audio can be audio received by the terminal through a network; or in a multi-user live broadcast scene, the K paths of first audio can be audio corresponding to each live broadcast source respectively, such as the voice of a host A, prerecorded product introduction audio and the like.

In the case of K-way first audio synchronized playback, the terminal may receive a first input. The first input is used for indicating the audio needing to be subjected to volume adjustment, namely, the second audio in the K paths of first audio.

The first input may specifically be an operation specified by the user for audio objects corresponding to the K paths of first audio displayed on the terminal screen, where: the audio object herein may be an object capable of identifying the first audio, such as an avatar of the first audio corresponding to a sound source, or a name of the first audio corresponding to a sound source, or the like; the specified operation herein may be any one of a single-click gesture, a swipe gesture, a drag gesture, a pressure recognition gesture, a long-press gesture, an area change gesture, a double-press gesture, and a double-click gesture, which is not particularly limited by the embodiment of the present application.

The first input may also be a voice command spoken by the user for at least one of the K paths of first audio, for example, "turn down the volume of user a", or "adjust the bar according to the volume of user B", which is not particularly limited in the embodiment of the present application.

After receiving the first input, the terminal may determine a second audio of which volume is to be adjusted from the K paths of first audio in response to the first input, that is, based on an indication of the first input.

In an alternative embodiment, the terminal may directly determine the second audio of the volume to be adjusted based on the indication of the first input, for example, if the first input is an audio object corresponding to a certain first audio by long pressing, the first audio may be directly recorded as the second audio; for another example, if the first input is to pick up one or several audio objects corresponding to the first audio, the picked up first audio may be directly recorded as the second audio.

In another alternative embodiment, the terminal may determine, based on the indication of the first input, that the first audio that does not need to be adjusted in volume, so as to record the remaining first audio as the second audio, e.g., if the first input is an audio object corresponding to a long press of a certain first audio, then the remaining first audio except the first audio may be directly recorded as the second audio; for another example, if the first input is to pick up one or several audio objects corresponding to the first audio, the first audio that is not picked up may be directly recorded as the second audio.

It should be noted that, the second audio determined based on the first input may be a single audio or multiple audio.

And 130, adjusting the volume of the second audio to obtain a third audio.

Specifically, after the second audio is determined, the volume adjustment can be performed on the second audio, and in the embodiment of the present application, the second audio after the volume adjustment is recorded as the third audio.

Here, the volume adjustment of the second audio may be based on a standard volume, that is, the volume of the second audio is adjusted to a standard volume, where the standard volume may be a default value set in advance, or may be a volume preferred by the user based on a volume statistic of the user's historical play audio, etc.; the volume adjustment of the second audio may also be performed based on the volume specified by the user, that is, the user may input the specified volume through a volume adjustment key, or "turn up" or "turn down" through the direction of the voice input volume adjustment, or "0.5", "1.5", "2", etc. are multiples of the volume adjustment through the screen input, so that the terminal may adjust the volume of the second audio to the volume specified by the user; for the volume adjustment of the second audio, the audio with appropriate volume in the K paths of first audio may be used as a standard audio, and the volume of the second audio is automatically adjusted along with the volume of the standard audio, where the standard audio may be specified by a user, or may be the volume range of each path of first audio, and the embodiment of the present application is not limited specifically by this, where the volume range of each path of first audio is counted, and the audio selected from each path of first audio is stable and has appropriate height.

And 140, playing the third audio and the first audio except the second audio in the multiple paths of first audio.

Specifically, after the volume adjustment for the second audio is completed, the adjusted K paths of first audio can be synchronously played. The K paths of first audios comprise third audios obtained through volume adjustment and first audios except the second audios without volume adjustment. The synchronous playing of the audio may be based on K parallel audio tracks, or may be playing audio after mixing the K channels of audio, which is not particularly limited in the embodiment of the present application.

According to the method provided by the embodiment of the application, the second audio with the volume to be adjusted is determined from the K paths of first audio based on the first input, so that the volume of one or more paths of second audio is designated to be adjusted in the K paths of first audio playing scenes, the problem of hearing caused by overall volume up or overall volume down of the K paths of audio is avoided, the effectiveness of volume adjustment is ensured, and the tone quality experience in the K paths of volume synchronous playing scenes is optimized.

Based on the above embodiment, step 120 includes:

And determining standard audio from the K paths of first audio in response to the first input, and determining first audio except the standard audio in the K paths of first audio as the second audio.

Specifically, the standard audio is the first audio whose volume specified by the first input can be used as the volume adjustment standard. The terminal may determine one or more K-way mark audio from the K-way first audio based on the indication of the first input. On the basis, the terminal can determine the rest first audios except the standard audios in the K paths of first audios as second audios with volume to be adjusted.

For example, if the first input is an audio object corresponding to a certain first audio by long-pressing, the first audio may be recorded as standard audio, and the rest of the first audio except the first audio may be recorded as second audio; for another example, if the first input is to pick up one or several audio objects corresponding to the first audio, the first audio that is directly picked up may be referred to as standard audio, and the first audio that is not picked up may be referred to as second audio.

For another example, fig. 2 is one of display schematic diagrams provided in the embodiment of the present application, as shown in fig. 2, in the case where first audio corresponding to 4 audio objects is played synchronously, a user may press an avatar of one or more audio objects for a long time, so as to implement audio object hook. For example, in fig. 2, 201 is an unchecked audio object and 202 is a unchecked audio object. After the terminal receives the first input to pick up the audio object, the display pop-up window 203 "confirm that the volume of the selected speaker is the standard audio? And when the user clicks the 'confirm', the first audio corresponding to the checked audio object is marked as standard audio, and the first audio corresponding to the unchecked audio object is marked as second audio.

Accordingly, step 130 includes:

and adjusting the volume of the second audio based on the standard audio to obtain the third audio.

Specifically, after the standard audio and the second audio are determined, the volume of the standard audio can be used as a reference, and the volume of the second audio can be adjusted, so that the volume of the third audio obtained by adjusting the volume can be close to the volume of the standard audio, and the volume of the K paths of audio is ensured to be equivalent.

When the volume of the second audio is adjusted based on the standard audio, the average volume value of the standard audio can be counted, so that the volume of the second audio is adjusted to the average volume value; alternatively, the volume envelope estimation may be performed on the standard audio and the second audio, and automatic gain control (Automatic Generation Control, AGC) or other volume adjustment manners may be applied to implement automatic adjustment of the volume envelope of the second audio based on the volume envelope of the standard audio, so as to ensure that the third audio obtained by adjustment is always equal to the volume of the standard audio. Here, the AGC can realize attenuation of a larger signal and amplification of a smaller signal, so as to adjust the signal gain to a suitable range, thereby achieving the best adjusting effect.

According to the method provided by the embodiment of the application, the standard audio and the second audio are determined from K paths of first audio based on the first input, so that the volume adjustment standard is provided for the volume adjustment of the second audio, automatic volume adjustment is realized, and therefore, the volume of the third audio obtained by the volume adjustment can be ensured to meet the expectations of users. In the process, the user only needs to specify standard audio, does not need to participate in the volume adjustment process, and ensures the effectiveness of volume adjustment while reducing the operation difficulty of the user.

In addition, unlike the volume adjustment scheme of overall volume up or overall volume down for K paths of audio in the prior art, in the method for adjusting the volume of at least one path of second audio in the embodiment of the present application, the volume adjustment is performed with the volume of standard audio as a target, for example, there are 2 paths of second audio, that is, there are second audio a and second audio B, where the volume of second audio a is lower than the volume of standard audio, the volume of second audio B is higher than the volume of standard audio, and then volume up is required for second audio a, and volume down is required for second audio B. That is, in the method for adjusting the volume of each path of the second audio according to the embodiment of the present application, each path of the second audio refers to the standard audio to perform volume adjustment, and the volume adjustment parameters of each path of the second audio, including volume up and volume down, and volume adjustment amplitude, are all based on the difference between the self volume and the volume of the standard audio, and the volume adjustment parameters of each path of the second audio may be different.

Based on any of the above embodiments, fig. 3 is a flowchart of a volume adjustment method based on standard audio according to an embodiment of the present application, as shown in fig. 3, in step 130, the performing volume adjustment on the second audio based on the standard audio to obtain the third audio includes:

step 131, estimating the volume envelope of each path of the first audio to obtain a standard volume envelope of the standard audio and a second volume envelope of the second audio;

and step 132, adjusting the second volume envelope by taking the standard volume envelope as a standard to obtain the third audio.

Here, the envelope is a curve of amplitude of a random process with time, the volume envelope is an acoustic feature of audio, and the volume envelope is a curve formed by connecting highest points of amplitude of different frequencies in the audio, and reflects low-frequency change of volume with time. In step 131, the volume envelope estimation is performed on the K paths of first audio, which specifically includes performing the volume envelope estimation on the standard audio in the K paths of first audio, so as to obtain the volume envelope of the standard audio, which is herein denoted as the standard volume envelope; and performing volume envelope estimation on the second audio to obtain a volume envelope of the second audio, which is herein denoted as a second volume envelope. Here, the manner of volume envelope estimation is consistent for each path of audio.

In order to ensure that the adjusted third audio is always equivalent to the standard audio in terms of volume, i.e., the third audio is always equivalent to the standard audio in terms of volume at all times, the second volume envelope may be adjusted by taking as a standard volume envelope capable of reflecting the volume change of the standard audio over time, so that the adjusted second volume envelope can be changed synchronously with the standard volume envelope. Here, the second volume envelope is adjusted using the standard volume envelope as a standard, and may be implemented by an automatic gain control AGC or other volume adjustment method.

According to the method provided by the embodiment of the application, the standard volume envelope is taken as a standard, the second volume envelope is adjusted, so that the volume adjustment of the second audio is realized, the volume of the third audio obtained by adjustment can be ensured to be synchronous and real-time equivalent to the volume of the standard audio, and the tone quality experience of synchronous playing of K paths of audio is optimized.

Based on any of the foregoing embodiments, in step 131, the performing volume envelope estimation on each path of the first audio includes:

determining a module value of a voice frame at each moment in the first audio;

When the modulus value of the voice frame at any one of the moments is smaller than or equal to the envelope value at the moment before the moment, the envelope value at any moment is determined based on the attenuation factor and the envelope value at the moment before.

In particular, the volume envelope estimation may be performed based on envelope detection (envelope-demodulation). Here, the envelope detection is a vibration signal processing method based on filtering detection, and the peak points of the high-frequency signal of a certain period of time are connected, so that an upper (positive) line and a lower (negative) line can be obtained, and the envelope is obtained.

Volume envelope estimation, i.e. framing the audio x (N) in frames of N points, where l represents the number of speech frames and N represents the nth point within the speech frame, yields x (N, l). And selecting the maximum volume value in each voice frame as the modulus value of the voice frame. It can be understood that, in audio, each time corresponds to a speech frame, and the speech frame at the first time, i.e. the first speech frame, has a modulus value of:

wherein mo (l) is the modulus of the speech frame at the first time,/>Is the absolute value of x (n, l).

In the related art, after obtaining the modulus value of the speech frame at each time in the audio, the modulus value of the speech frame at each time can be used as the volume envelope value to form the volume envelope. However, if the model value determined based on the maximum volume value is directly taken as the envelope value, there is a high possibility that the envelope value cannot be updated according to the actual volume value due to the fact that the audio has a volume maximum value at a certain moment, which affects the authenticity of the volume envelope.

Based on this, in the embodiment of the present application, after obtaining the modulus value of the speech frame at each time in the first audio, it is necessary to determine the envelope value at each time in the first audio from time to time: the magnitude of the envelope value of each moment can be determined by comparing the magnitude of the model value of the voice frame at each moment with the magnitude of the envelope value of the previous moment at each moment and combining the magnitude relation with a preset attenuation factor.

In determining the envelope value from time to time, taking any time, i.e. the first time as an example, the modulus mo (l) of the speech frame at that time can be compared with the envelope value B (l-1) at the time preceding the time, i.e. the first-1 time:

if mo (l). Ltoreq.B (l-1), it is indicated that B (l-1) may belong to the instantaneous maximum volume value, if mo (l) is directly taken as the envelope value at that moment, it may cause distortion of the volume envelope, at this time, according to the characteristic that the volume envelope is slowly attenuated by the limitation of the frequency of the human vocal cords, B (l-1) may be attenuated by a preset attenuation factor, and then B (l-1) after attenuation is taken as the envelope value at that moment, that is, B (l) =βB (l-1), where β is an attenuation factor, β is specifically a fraction smaller than and approaching 1, for example, β may be 0.99, or 0.9, etc.

The method provided by the embodiment of the application combines the processing mode of framing envelope detection with the attenuation factor to obtain the volume envelope, can ensure that the calculated envelope value can immediately keep up with the peak value of the audio amplitude based on the framing calculation module value, can also ensure that the envelope value presents the characteristic of slow attenuation on the millisecond level based on the attenuation factor, is simple to calculate, has high fitting degree on the volume envelope, and is beneficial to improving the accuracy of automatic volume adjustment.

Based on any of the foregoing embodiments, in step 131, after determining the modulus value of the speech frame at each time in the first audio, the method further includes:

and when the modulus value of the voice frame at any moment is larger than the envelope value at the previous moment, determining the modulus value of the voice frame at any moment as the envelope value at any moment.

That is, if the modulus value mo (l) of the speech frame at any time is greater than the envelope value B (l-1) at the previous time, it is explained that B (l-1) does not belong to the instantaneous volume maximum value, and mo (l) may be directly taken as the envelope value at that time, that is, the envelope value B (l) =mo (l) at that time.

Based on any of the above embodiments, step 132 includes:

adjusting the envelope value of any moment in the second volume envelope based on a dynamic step length and the standard envelope value of any moment in the standard volume envelope;

The dynamic step size is determined based on at least one of an envelope error at the arbitrary time and a volume value of the second audio at the arbitrary time;

the envelope error at any one time is determined based on the standard envelope value and the envelope value adjusted at any one time, or is determined based on the standard envelope value, the envelope value adjusted at any one time, and the envelope error at a previous time to the any one time.

Specifically, the second volume envelope is adjusted by using the standard volume envelope as a standard, and may be implemented by an automatic gain control AGC algorithm, for example, by an AGC algorithm under a minimum mean square error algorithm based on the volume envelope. Here, the AGC algorithm under the minimum mean square error algorithm based on the volume envelope may be embodied as the following formula:

y(n,l)＝x(n,l)G(l),1≤n≤N

B _y (l)＝G(l)B _x (l)

e(l)＝B _e (l)-B _y (l)

G(l)＝G(l-1)+μ(l)B _x (l)e(l)

in the formula, assuming that any time is the first time, G (l) is a gain coefficient of the first time iterated according to each time, mu (l) is a step factor of the first time, and e (l) is a standard envelope B _e (l) Output envelope B after action with gain _y (l) Is a difference in (c). Here, standard envelope B _e (l) I.e. standard volume envelopeStandard envelope value at the first moment in (B), output envelope B after gain action _y (l) I.e. the envelope value after gain adjustment at the first instant in the second volume envelope. B (B) _y (l) To be the input envelope B _x (l) And the gain factor G (l). Input envelope B _x (l) I.e. the envelope value at the first instant in the second volume envelope, i.e. the envelope value before the gain adjustment.

Multiplying each point of the audio signal x (n, l) at the first moment of the second audio by the gain coefficient G (l) to obtain the output audio signal y (n, l) after gain adjustment, i.e. the audio signal at the first moment of the third audio.

Under the minimum mean square error algorithm, gradient amplification is caused when the audio signal is large, and algorithm convergence is slowed down when the audio signal is small, thereby causing unstable convergence speed of the above AGC algorithm.

In order to stabilize the convergence speed of the AGC algorithm and thus ensure the stability and reliability of automatic volume adjustment, in the embodiment of the application, the step factor which is fixed and unchanged in the AGC algorithm is adjusted.

For the adjustment of the step size factor, the embodiment of the application can perform step size normalization based on the sound volume value of the second audio so as to perform the AGC algorithm based on the dynamic step size. Further, the square of the volume value of the second audio, or the absolute value of the volume value of the second audio, may be used as the denominator of the dynamic step, so that when the volume value is larger, the dynamic step is relatively smaller, thereby avoiding gradient amplification; when the volume value is smaller, the dynamic step length is relatively larger, so that the gradient is amplified, and the reduction of the convergence speed of the algorithm is avoided.

For example, the dynamic step size can be expressed as:

where μ (l) is the dynamic step size at the first time,is a preset stepAnd x (n) is the second audio.

In addition, in order to minimize the error between the standard volume envelope and the adjusted second volume envelope, the error magnitude of the envelope may also be introduced into the dynamic adjustment of the step factor. It will be appreciated that the step size may be smaller in the case of smaller errors and larger in order to reduce the error as soon as possible. Based on this, the envelope error at the time may be determined based on both the envelope value adjusted at any time in the second volume envelope and the standard envelope value at the time in the standard volume envelope, or the envelope error at the time may be determined based on the envelope value adjusted at any time in the second volume envelope, the standard envelope value at the time in the standard volume envelope, and the envelope error at the time immediately before the time. Dynamic step size adjustment is then performed based on the envelope error. For example, the square e (l) of the difference between the two can be used ² And (5) marking as a mean square error, and carrying out dynamic step adjustment by taking the mean square error as an envelope error.

The sound volume value of the second audio and the envelope error at any time can be used to determine the dynamic step size at the same time. For example, the dynamic step size may represent:

Where μ (l) is the dynamic step size at the first time, x (n) is the second audio,for the envelope error at time l, γ and +.>Is a preset parameter.

It can be understood that after the step factor after the dynamic adjustment is obtained, that is, after the dynamic step is obtained, the dynamic step can be substituted as the step factor into the AGC algorithm, so as to implement adjustment of the second volume envelope with the standard volume envelope as a standard, thereby obtaining the third audio synchronized with the standard audio volume.

The method provided by the embodiment of the application realizes the dynamic adjustment of the step factor required by the volume adjustment based on the envelope error and/or the volume value of the second audio, thereby ensuring the stability of the convergence speed of the algorithm during the automatic volume adjustment, and further ensuring the stability and reliability of the automatic volume adjustment.

Based on any of the above embodiments, in step 132, the envelope error at any time is determined based on the standard envelope value, the envelope value adjusted at any time, and the envelope error at a time previous to the any time, including:

and determining the envelope error at any moment based on the envelope error at the moment before any moment and the error between the standard envelope value and the envelope value adjusted at any moment.

Specifically, in consideration of strong correlation between frames before and after audio, the error generated by the standard envelope value at any time in the standard volume envelope and the envelope value adjusted at the time in the second volume envelope can be subjected to recursive smoothing, that is, the error after the recursive smoothing is applied, so as to perform dynamic step adjustment. That is, the envelope error at any time for realizing the dynamic step adjustment in step 132 is an error after the recursive smoothing.

Here, the recursive smoothing may be performed by combining an envelope error obtained by recursively smoothing the previous time point of any one of the second volume envelopes and an error between the envelope value obtained by adjusting the time point and the standard envelope value at the time point of the standard volume envelopes, that is, by smoothing the envelope error at the time point by applying the envelope error at the previous time point. Thus, the envelope error at any instant can be expressed as:

for the envelope error at time l +.>For the envelope error at time l-1, e (l) is the standard envelope value B at time l in the standard volume envelope _e (l) Envelope value B after adjustment with the first moment in the second volume envelope _y (l) The value of the preset weight alpha can be between 0 and 1, and further, in order to ensure that the envelope errors at adjacent moments can be smoothed as much as possible, the preset weight alpha can be set to be 0.8 or 0.9, etc.

The method provided by the embodiment of the application combines the envelope error of the previous moment at any moment to determine the envelope error of the moment, can combine the strong correlation between the previous moment and the next moment of the audio to realize the recursion smoothing of the envelope error, thereby enabling the second audio envelope under the dynamic step length adjusted based on the method to be more close to the standard audio envelope and improving the reliability and the authenticity of automatic volume adjustment.

Based on any of the above embodiments, step 130 includes:

responsive to the first input, displaying a volume adjustment interface;

receiving a second input of the user to the volume adjustment interface;

determining a reference volume parameter in response to the second input;

and adjusting the volume of the second audio based on the reference volume parameter to obtain a third audio.

Specifically, the terminal may directly determine the second audio of the volume to be adjusted based on the indication of the first input, for example, if the first input is an audio object corresponding to a certain first audio by long press, the first audio may be directly recorded as the second audio; for another example, if the first input is to pick up one or several audio objects corresponding to the first audio, the picked up first audio may be directly recorded as the second audio. And, the terminal may also display a volume adjustment interface based on the first input.

The volume adjustment interface herein, that is, an interface for prompting the user to input the reference volume parameter, may present the audio object of the second audio determined based on the first input, and present the speech segment prompting the user to input the reference volume parameter, for example, "please turn up or turn down the volume of the selected caller by the volume key control", "volume adjustment multiple: please input numbers between 0.1 and 5, "etc.

The terminal may receive a second input under the prompt of the volume adjustment interface. The second input may specifically be an input operation of a reference volume parameter performed by the user with respect to a prompt given by a volume adjustment interface displayed on the terminal screen. The reference volume parameter here, i.e. the parameter of the volume control desired by the user for the second audio, may be, for example, the direction of the volume adjustment "up" or "down", or the multiples of the volume adjustment "0.5", "1.5", "2". Based on this, the terminal can adjust the volume of the second audio based on the reference volume parameter, thereby obtaining the third audio satisfying the user's desire.

For example, fig. 4 is a second schematic display diagram provided in the embodiment of the present application, as shown in fig. 4, in the case where the first audio corresponding to the 4 audio objects is played synchronously, the user may pick up the head portrait of one or more of the audio objects. For example, in fig. 4, 401 represents an avatar of an audio object that is not checked, and 402 represents an avatar of an audio object that is checked. After the terminal receives the first input to pick up the audio object, display pop-up window 403 "confirm that the volume of the selected speaker is to be standard audio? And when the user clicks the 'confirm' operation, recording the first audio corresponding to the checked audio object as the second audio, and displaying a volume adjusting interface.

The terminal displays a prompt voice section 'volume adjustment multiple' under a volume adjustment interface: please input a number between 0.1 and 5 "and a volume adjustment multiple input window 404, the user may input a desired volume adjustment multiple through the input window, and after receiving a second input of the volume adjustment multiple through the input window 404, the terminal may use the input volume adjustment multiple as a reference volume parameter and perform volume adjustment on the second audio based on the reference volume parameter.

According to the method provided by the embodiment of the application, under the interaction mode realized by combining the first input and the second input, the volume of one or more paths of second audio is designated to be adjusted, so that the problem of hearing caused by overall volume up or overall volume down of K paths of audio is avoided, the effectiveness of volume adjustment is ensured, and the tone quality experience under a K paths of volume synchronous playing scene is optimized.

In addition, considering that the frequency bands of the voices of different members may have a slight difference in the group chat scene, when the frequency of the voices of a person or a plurality of persons is overall higher, the listener may generate a 'harshness' feeling due to too many high-frequency signals in the audio, the listener hearing feeling is poor, and the conversation experience is also affected.

Based on any of the above embodiments, the method further comprises:

and performing high-frequency suppression on target audio, wherein the target audio is at least one of the first audio, the third audio and the fourth audio, and the fourth audio is audio obtained by mixing the third audio and the first audio except the second audio in the K paths of first audio.

Specifically, in order to avoid the problem of influence on the hearing feeling due to the high-frequency signal being excessive, high-frequency suppression may be performed for the target audio. The target audio may be any one or more of the K paths of first audio, any one or more of the third audio, or audio obtained by mixing the third audio and the first audio except the second audio in the K paths of first audio when the third audio is played in step 140 and the first audio except the second audio in the K paths of first audio. That is, the high-frequency suppression of the audio may be performed before or after the volume adjustment, may be performed for the single-channel audio to achieve the high-frequency suppression of the single-channel audio, or may be performed for the audio after the multi-channel audio mixing to achieve the high-frequency suppression of all the audio, which is not particularly limited in the embodiment of the present application.

The high frequency suppression referred to herein, that is, suppression of the high frequency signal in the target audio, may specifically be conversion of the target audio X (n) in the time domain into the frequency domain signal X (f), followed by suppression of the high frequency signal in the frequency domain signal X (f). The time-frequency transformation can be realized by means of fourier transformation or fast fourier transformation. The suppression for the high frequency signal in the frequency domain signal X (f) can be expressed as the following formula:

wherein f is the frequency,for the frequency domain signal after high frequency suppression, will +.>And converting to a time domain to obtain the target audio after high-frequency suppression. In the embodiment of the application, f is defined>3500Hz is a high frequency signal that needs to be suppressed.

After the above processing, the high frequency signal is smoothly suppressed. Since the higher frequency audio is more uncomfortable to the human ear hearing, the above formula is more robust to higher frequency signals, while the maximum value of audio suppression is set to 0.5 in order to avoid the audio from being overdoowed. Through the formula, the audio with better hearing after high-frequency suppression can be obtained. After the inverse Fourier transform, the frequency domain signal can be processedAnd converting the audio signal into a time domain signal to obtain the target audio.

According to the method provided by the embodiment of the application, the balance between the middle frequency and the low frequency and the high frequency is maintained by high-frequency inhibition on the target audio, so that the listening feeling of the audio and the saturation of the audio are improved, and the tone quality experience in a group chat scene can be improved.

Based on any one of the above embodiments, in the above method, high-frequency suppressing the target audio includes:

performing high-frequency energy duty cycle detection on the target audio, wherein the high frequency is a frequency higher than a frequency threshold;

and under the condition that the high-frequency energy duty ratio of the target audio is larger than a duty ratio threshold value, suppressing the high-frequency signal in the target audio.

Specifically, the high-frequency energy duty ratio detection may be performed on the target audio before the high-frequency suppression is performed, so as to determine whether the high-frequency suppression is necessary for the target audio, thereby avoiding that the unnecessary high-frequency suppression introduces additional calculation amount and causes audio playing delay.

Before the high frequency energy duty ratio detection, frequency division is needed to determine how high a frequency is, namely, a high frequency. For example, in the group chat scenario, the voice is distributed at 200-4000Hz, and the frequency bands can be divided according to the frequency range of the voice as follows:

that is, in the above equation, 3500Hz is taken as the frequency threshold, and frequencies higher than 3500Hz are taken as the high frequency. It will be appreciated that in a non-human voice scenario, the frequency threshold may be adaptively determined to achieve high frequency suppression that is more adaptive to the actual scenario, which is not particularly limited in the embodiments of the present application.

When the high-frequency energy duty ratio detection is performed, the target audio frequency X (n) in the time domain can be converted into the frequency domain signal X (f), the self-power spectral density of the frequency domain signal in the full frequency band and the self-power spectral density in the high-frequency range are respectively calculated through self-power spectral estimation, and the duty ratio of the self-power spectral density in the high-frequency range in the self-power spectral density in the full frequency band is calculated, so that the high-frequency energy duty ratio is obtained. Here, the high frequency energy duty ratio can be expressed as:

where ρ is the high frequency energy duty cycle, Σ _{High Frequency} S (f) is a high frequency rangeSelf-power spectral density in Sigma _{Full frequency band} S (f) is the self-power spectral density in the full frequency band, S (f) is the self-power spectrum of the frequency domain signal X (f).

After the high-frequency energy duty ratio is detected, the high-frequency energy duty ratio may be compared with a preset duty ratio threshold, and if the high-frequency energy duty ratio is greater than the duty ratio threshold, the target audio is considered to be high-frequency audio, that is, high-frequency suppression is required for the target audio. It will be appreciated that the duty cycle threshold herein may be set adaptively according to a specific scenario, for example, in a group chat scenario, the duty cycle threshold may be set to 0.4, i.e. if the high frequency energy duty cycle ρ >0.4, i.e. high frequency suppression of the target audio is required.

According to the method provided by the embodiment of the application, the high-frequency energy duty ratio of the target audio is detected to judge whether the high-frequency suppression of the target audio is necessary, so that the problem that the audio playing time delay is caused by introducing extra calculation amount into unnecessary high-frequency suppression is avoided, and the hearing experience is further optimized.

Based on any of the above embodiments, in the above high frequency energy duty ratio detection process, the self-power spectrum estimation may be performed by a recursive smoothing manner:

wherein S (f) is the self-power spectrum of the frequency domain signal X (f), gamma is a smoothing coefficient with a value between 0 and 1,is a recursively smoothed self-power spectrum.

Accordingly, the calculation of the high frequency energy duty cycle can be expressed as:

based on any of the above embodiments, fig. 5 is a second flowchart of an audio processing method according to an embodiment of the present application, as shown in fig. 5, the method includes:

step 510, turn on the group chat tone quality optimization button.

Fig. 6 is a third display schematic diagram provided in the embodiment of the present application, as shown in fig. 6, the terminal may add a "group chat timbre optimization" button 601 on the setup interface of the chat software, so that the user may select whether to enable the "group chat timbre optimization" function. Alternatively, a "group chat tone quality optimization" button 601 may be added to the setting interface of the terminal system, so that the user may select whether to enable the "group chat tone quality optimization" function. It will be appreciated that in the state where the "group chat quality optimization" button 601 is on, subsequent steps may be performed to achieve group chat quality optimization in a group chat scenario. In addition, the following group chat tone quality optimization can be applied to tone quality optimization under a certain chat software, and can also be applied to a terminal layer to realize tone quality optimization under various group chat scenes on the terminal.

Step 520, a first input is received selecting standard audio, and the first audio other than standard audio is used as second audio.

In particular, the sound quality optimization may be performed according to user selection at the time of group voice/group video. When the group voice/group video is used, there may be a case where the volume of a person speaking is small, and at this time, the user may select the audio corresponding to the speaker with the appropriate volume as the standard audio by pressing the speaker head portrait for a long time as the first input. After receiving the first input, the terminal can determine the first audio indicated by the first input from K paths of first audio in the group chat scene as standard audio, and determine the rest first audio except the standard audio in the K paths of first audio as second audio with volume to be adjusted.

In step 530, an audio envelope estimate is performed for each first audio.

Specifically, audio envelope estimation may be performed on each path of the first audio. Here, considering that the speech envelope exhibits a characteristic of slow decay due to the limitation of the frequency of the human vocal cords, a method of recursively smoothing the speech envelope in combination with the frame envelope detection and the decay factor β is considered to perform speech envelope estimation, thereby obtaining a standard volume envelope of the standard audio and a second volume envelope of the second audio.

And step 540, adjusting the second audio envelope of the second audio by taking the standard audio envelope of the standard audio as a standard to obtain a third audio.

Specifically, for the envelope estimated in step 530, an AGC algorithm of a variable step factor normalized minimum mean square error algorithm may be applied to perform automatic volume adjustment, so as to obtain a third audio synchronized with the standard audio volume, and ensure that the volumes of all callers are equivalent.

Here, the variable step factor may be obtained based on the dynamic step acquisition provided in the above embodiment, where the step factor is adjusted in association with the envelope error and the volume value of the second audio. The envelope error may be an envelope error at each time obtained by a recursive smoothing manner, and the sound volume value of the second audio may be used for step normalization. The calculation of the dynamic step size can be implemented through the formula provided in the above embodiment, which is not described herein.

At step 550, the high frequency energy duty cycle is detected for the standard audio and the third audio to determine high frequency speech.

Specifically, all the first audio, that is, each path of audio including the standard audio and the third audio, may be used as target audio, and the high-frequency energy duty ratio of each path of target audio may be detected, so that the target audio with the high-frequency energy duty ratio greater than the duty ratio threshold value may be selected from each path of target audio, and used as high-frequency speech.

Step 560, high frequency suppression is performed on the high frequency speech.

Specifically, after high-frequency suppression is completed for high-frequency voice, synchronous playing of K paths of audio can be performed, and at this time, group chat tone quality optimization is completed.

According to the method provided by the embodiment of the application, the volume adjustment for a certain member in the group chat scene can be realized through an algorithm, and the high-frequency suppression is performed so as to improve the hearing and the saturation of the voice, thereby improving the tone quality experience in the group voice.

Based on any of the above embodiments, fig. 7 is a third flowchart of an audio processing method according to an embodiment of the present application, as shown in fig. 7, where the method includes:

step 710, turn on the group chat tone quality optimization button.

As shown in fig. 6, the terminal may add a "group chat timbre optimization" button 601 on a setup interface of the chat software, so that a user may select whether to enable the "group chat timbre optimization" function. Alternatively, a "group chat tone quality optimization" button 601 may be added to the setting interface of the terminal system, so that the user may select whether to enable the "group chat tone quality optimization" function. It will be appreciated that in the state where the "group chat quality optimization" button 601 is on, subsequent steps may be performed to achieve group chat quality optimization in a group chat scenario. In addition, the following group chat tone quality optimization can be applied to tone quality optimization under a certain chat software, and can also be applied to a terminal layer to realize tone quality optimization under various group chat scenes on the terminal.

Step 720 receives a first input selecting a second audio and receives a second input indicating a reference volume parameter.

Here, the terminal may directly determine the second audio of the volume to be adjusted based on the indication of the first input, and the terminal may further display a volume adjustment interface to prompt the user to input the reference volume parameter based on the first input. The terminal may then receive a second input at the prompt of the volume adjustment interface and derive therefrom the reference volume parameter.

Step 730, performing volume adjustment on the second audio based on the reference volume parameter to obtain a third audio.

Step 740, high frequency energy duty cycle detection is performed on the standard audio and the third audio to determine high frequency speech.

Specifically, all the first audio, that is, each path of audio including the first audio without volume adjustment and the third audio with volume adjustment, may be used as target audio, and the high-frequency energy duty ratio of each path of target audio may be detected, so that the target audio with the high-frequency energy duty ratio greater than the duty ratio threshold value may be selected from each path of target audio, and used as high-frequency speech.

Step 750, high frequency suppression is performed on the high frequency speech.

According to the method provided by the embodiment of the application, volume adjustment for a certain member in a group chat scene can be realized through interaction, and high-frequency suppression is performed to improve hearing and saturation of voice, so that tone quality experience in group voice is improved.

According to the audio processing method provided by the embodiment of the application, the execution main body can be an audio processing device. In the embodiment of the present application, an audio processing device executes an audio processing method as an example, and the audio processing device provided in the embodiment of the present application is described.

Fig. 8 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application, as shown in fig. 7, where the apparatus includes:

a first receiving unit 810 for receiving a first input of a user;

an audio determining unit 820, configured to determine, in response to the first input, a second audio of the volume to be adjusted from the K paths of first audio;

an adjusting unit 830, configured to adjust the volume of the second audio to obtain a third audio;

and a playing unit 840, configured to play the third audio, and the first audio except the second audio in the K paths of first audio.

Optionally, the audio determining unit is configured to:

determining standard audio from the K paths of first audio in response to the first input, and determining first audio except the standard audio in the K paths of first audio as the second audio;

The adjusting unit is used for:

Optionally, the adjusting unit includes:

the envelope estimation subunit is used for carrying out volume envelope estimation on each path of first audio to obtain a standard volume envelope of the standard audio and a second volume envelope of the second audio;

and the envelope adjustment subunit is used for adjusting the second volume envelope by taking the standard volume envelope as a standard to obtain the third audio.

Optionally, the envelope estimation subunit comprises:

the module value determining module is used for determining the module value of the voice frame at each moment in the first audio;

a first estimation module, configured to determine, when a modulus value of a speech frame at any one of the moments is less than or equal to an envelope value at a time preceding the any one moment, an envelope value at the any one moment based on an attenuation factor and the envelope value at the preceding moment;

and the second estimation module is used for determining the modulus value of the voice frame at any moment as the envelope value at any moment when the modulus value of the voice frame at any moment is larger than the envelope value at the previous moment.

Optionally, the envelope adjustment subunit is configured to:

the dynamic step size is determined based on at least one of an envelope error of the any one time and a volume value of the second audio;

Optionally, the envelope adjustment subunit is further configured to:

Optionally, the adjusting unit is configured to:

responsive to the first input, displaying a volume adjustment interface;

receiving a second input of the user to the volume adjustment interface;

determining a reference volume parameter in response to the second input;

Optionally, the method further comprises:

the high-frequency suppression unit is used for performing high-frequency suppression on target audio, the target audio is at least one of the first audio, the third audio and the fourth audio, and the fourth audio is audio obtained by mixing the third audio and the first audio except the second audio in the K paths of first audio.

Optionally, the high frequency suppression unit is configured to:

performing high-frequency energy duty ratio detection on the target audio, wherein the high frequency is a frequency higher than a frequency threshold;

In the embodiment of the application, the second audio with the volume to be adjusted is determined from the K paths of first audio based on the first input, so that the volume of one or more paths of second audio is designated to be adjusted in the K paths of first audio playing scenes, the hearing problem caused by overall volume up or overall volume down of the K paths of audio is avoided, the effectiveness of volume adjustment is ensured, and the tone quality experience in the K paths of volume synchronous playing scenes is optimized.

The audio processing device in the embodiment of the application can be an electronic device, and also can be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The audio processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The audio processing device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to 7, and in order to avoid repetition, a description is omitted here.

Optionally, fig. 9 is a schematic structural diagram of an electronic device provided by the embodiment of the present application, as shown in fig. 9, and further provides an electronic device 900, including a processor 901 and a memory 902, where a program or an instruction capable of running on the processor 901 is stored in the memory 902, and the program or the instruction implements each step of the above-mentioned embodiment of the audio processing method when being executed by the processor 901, and the steps can achieve the same technical effect, so that repetition is avoided and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

The electronic device 1000 includes, but is not limited to: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, and processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 10 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

Wherein, the input unit 1004 is configured to receive a first input of a user;

the processor 1010 responds to the first input to determine second audio with volume to be adjusted from K paths of first audio, wherein K is an integer greater than 1; performing volume adjustment on the second audio to obtain a third audio;

the audio output unit 1003 is configured to play the third audio, and first audio except the second audio in the K paths of first audio.

Optionally, the display unit 1006 is configured to display a volume adjustment interface in response to the first input;

the input unit 1004 is further configured to receive a second input of the user to the volume adjustment interface;

The processor 1010 is further configured to determine a reference volume parameter in response to the second input; and adjusting the volume of the second audio based on the reference volume parameter to obtain a third audio.

It should be appreciated that in an embodiment of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, and the graphics processor 10041 processes image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 1009 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.

The processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above-described embodiment of the display method, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the display method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the embodiments of the display method described above, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. An audio processing method, comprising:

receiving a first input of a user;

performing volume adjustment on the second audio to obtain a third audio;

2. The audio processing method according to claim 1, wherein the determining, in response to the first input, a second audio of a volume to be adjusted from K paths of first audio includes:

and adjusting the volume of the second audio to obtain a third audio, including:

3. The audio processing method according to claim 2, wherein the performing volume adjustment on the second audio based on the standard audio to obtain the third audio includes:

Performing volume envelope estimation on each path of first audio to obtain a standard volume envelope of the standard audio and a second volume envelope of the second audio;

and adjusting the second volume envelope by taking the standard volume envelope as a standard to obtain the third audio.

4. The audio processing method according to claim 3, wherein the performing volume envelope estimation on each path of the first audio includes:

determining a module value of a voice frame at each moment in the first audio;

when the module value of the voice frame at any one of the moments is smaller than or equal to the envelope value at the moment before the moment, determining the envelope value at any moment based on an attenuation factor and the envelope value at the moment before;

5. The audio processing method according to claim 3, wherein adjusting the second volume envelope with the standard volume envelope as a standard to obtain the third audio includes:

6. The audio processing method according to claim 5, wherein the envelope error at any one time is determined based on the standard envelope value, the envelope value adjusted at any one time, and the envelope error at a previous time of the any one time, comprising:

7. The audio processing method according to claim 1, wherein the performing volume adjustment on the second audio to obtain a third audio includes:

Responsive to the first input, displaying a volume adjustment interface;

receiving a second input of the user to the volume adjustment interface;

determining a reference volume parameter in response to the second input;

8. An audio processing apparatus, comprising:

a first receiving unit for receiving a first input of a user;

9. The audio processing apparatus according to claim 8, wherein the audio determination unit is configured to:

The adjusting unit is used for:

10. The audio processing apparatus according to claim 9, wherein the adjusting unit includes:

11. The audio processing apparatus of claim 10, wherein the envelope estimation subunit comprises:

12. The audio processing apparatus of claim 10, wherein the envelope adjustment subunit is configured to:

13. The audio processing apparatus of claim 12, wherein the envelope adjustment subunit is further configured to:

14. The audio processing apparatus according to claim 8, wherein the adjusting unit is configured to:

Responsive to the first input, displaying a volume adjustment interface;

receiving a second input of the user to the volume adjustment interface;

determining a reference volume parameter in response to the second input;

15. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the audio processing method of any of claims 1-7.

16. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the audio processing method according to any of claims 1-7.