CN113613143B

CN113613143B - Audio processing method, device and storage medium suitable for mobile terminal

Info

Publication number: CN113613143B
Application number: CN202110770953.6A
Authority: CN
Inventors: 李好博; 魏耀都; 陈华
Original assignee: Beijing Xiaochang Technology Co ltd
Current assignee: Beijing Xiaochang Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2023-06-13
Anticipated expiration: 2041-07-08
Also published as: CN113613143A

Abstract

The invention provides an audio processing method, a device and a storage medium suitable for a mobile terminal, comprising the following steps: acquiring the characteristics of an acoustic feedback loop in real time; performing voice beautifying processing on the input first multi-channel audio in real time based on the characteristics of the acoustic feedback loop to obtain second multi-channel audio; and carrying out mother-ribbon mixing processing on the second multi-path audio in real time based on the characteristics of the acoustic feedback loop to obtain single-path audio output. According to the technical scheme provided by the invention, the reverberation and equalization parameters can be dynamically adjusted according to the characteristics of the acoustic feedback loop, so that the difference of sound effects and hearing under different environments is reduced. In the dynamic adjustment, the adjustment is performed according to different pose information of the mobile terminal, so that the scheme provided by the invention has stronger adaptability. And, in the course of making human voice beautification, the human voice of the mobile terminal person is separated, the environmental noise and the sound of other people are restrained, and the tone quality is compensated.

Description

Audio processing method, device and storage medium suitable for mobile terminal

Technical Field

The present invention relates to the field of audio data processing technologies, and in particular, to an audio processing method and apparatus suitable for a mobile terminal, and a storage medium.

Background

With the rapid development of science, technology and music culture, the modes of singing (singing) are more and more diversified. The mode of K song based on the mobile terminal has the advantages of convenience and easiness in operation, and is most favored by people. Most of the current K song software has the function of voice beautification, but the current K song software can only carry out passive beautification when carrying out voice beautification, namely the beautification function of each voice is similar, and different beautification modes can not be adopted according to the difference of environments.

Disclosure of Invention

The embodiment of the invention provides an audio processing method, an audio processing device and a storage medium suitable for a mobile terminal, which can adopt different beautifying modes for human voice under different environments and scenes and have pertinence.

In a first aspect of an embodiment of the present invention, there is provided an audio processing method suitable for a mobile terminal, including:

acquiring the sound feedback loop characteristics of the mobile terminal in real time;

performing voice beautifying processing on the input first multi-channel audio in real time based on the characteristics of the acoustic feedback loop to obtain second multi-channel audio;

and carrying out mother-ribbon mixing processing on the second multi-path audio in real time based on the characteristics of the acoustic feedback loop to obtain single-path audio output.

Optionally, in a possible implementation manner of the first aspect, performing a human voice beautification process on the input first multiple paths of audio based on the acoustic feedback loop feature to obtain the second multiple paths of audio includes:

acquiring an ambient reverberation impulse response in an acoustic feedback loop characteristic, and generating an inverse signal based on the ambient reverberation impulse response;

the first multi-channel audio comprises accompaniment audio and single-channel voice signals, and signal cancellation processing is carried out on the single-channel voice signals based on the reverse signals to obtain single-channel voice signals without environmental reverberation;

and beautifying the single-path voice signal after the counteraction treatment to obtain stereo voice frequency.

Optionally, in a possible implementation manner of the first aspect, performing a mother-tape mixing process on the second multiple paths of audio based on the acoustic feedback loop feature to obtain a single path of audio output includes:

acquiring an environmental reverberation impulse response and an acoustic feedback loop frequency response curve in the acoustic feedback loop characteristics and a spatial mode standing wave frequency;

mixing the stereo audio and the accompaniment audio to obtain single-channel audio;

and processing the single-channel audio based on the environmental reverberation impulse response, the acoustic feedback loop frequency response curve and the spatial modal standing wave frequency to obtain the processed stereo audio.

Optionally, in one possible implementation manner of the first aspect, the sequence of enhancement frequency points and the sequence of gain amounts that change with time are calculated according to an acoustic feedback loop frequency response, a real-time change of an acoustic feedback loop frequency response curve, and a room environmental impulse response of echo tracking;

the method comprises the steps of configuring an environment equalization filter bank, wherein the environment equalization filter bank is used for inputting a mixed signal of stereo audio and accompaniment audio into the filter bank for equalization compensation, and the compensation mode is based on the enhanced frequency point sequence and the gain quantity sequence;

and configuring an exciter, wherein the exciter is used for generating corresponding higher harmonics to compensate the loss frequency caused by the spatial mode, and the compensation mode is based on the enhanced frequency point sequence and the gain quantity sequence.

Optionally, in a possible implementation manner of the first aspect, the acoustic feedback loop feature of the mobile terminal is obtained in real time:

controlling a loudspeaker of the mobile terminal to play at least one acoustic feedback detection signal and recording the acoustic feedback detection signal;

collecting an acoustic feedback detection signal through a microphone of the mobile terminal and recording the collected acoustic feedback detection signal;

and comparing the played acoustic feedback detection signal with the acquired acoustic feedback detection signal to obtain an acoustic feedback detection result, wherein the acoustic feedback detection result comprises an acoustic feedback loop characteristic.

Optionally, in a possible implementation manner of the first aspect, comparing the played acoustic feedback detection signal with the collected acoustic feedback detection signal to obtain an acoustic feedback detection result includes:

comparing the played acoustic feedback detection signal with the acquired acoustic feedback detection signal in an echo power dimension, a frequency response dimension and an environment reverberation dimension respectively to obtain an acoustic feedback detection result;

the acoustic feedback probe results include echo power information, frequency response information, and ambient reverberation impulse response information.

Optionally, in one possible implementation manner of the first aspect, the method further includes:

acquiring any one or more of attitude information, position information and speed information of a mobile terminal in real time;

acquiring pose information of the mobile terminal at the current moment based on the pose information, the position information and the speed information;

and updating the acoustic feedback detection result based on the pose information at the current moment.

In a second aspect of the embodiments of the present invention, there is provided an audio processing apparatus adapted for a mobile terminal, including:

the acquisition module is used for acquiring the acoustic feedback loop characteristics of the mobile terminal in real time;

the voice beautifying module is used for carrying out voice beautifying processing on the input first multi-path audio in real time based on the characteristics of the voice feedback loop to obtain second multi-path audio;

and the master tape mixing and shrinking module is used for carrying out master tape mixing and shrinking processing on the second multi-path audio in real time based on the characteristics of the acoustic feedback loop to obtain single-path audio output.

Optionally, in one possible implementation manner of the second aspect, the voice beautifying module is further configured to perform the following steps, including:

receiving an audio feature of a target user;

extracting audio information corresponding to the audio characteristics of the target user in the first multi-channel audio as target voice information of only the voice of the target user;

and generating a second multi-channel audio after carrying out tone quality enhancement processing on the target voice information.

In a third aspect of the embodiments of the present invention, there is provided a readable storage medium having stored therein a computer program for implementing the method of the first aspect and the various possible designs of the first aspect when the computer program is executed by a processor.

The audio processing method, the device and the storage medium suitable for the mobile terminal can dynamically adjust parameters of reverberation and equalization according to the characteristics of the acoustic feedback loop, and reduce the difference of sound effects and hearing sensations in different environments. In the dynamic adjustment, the adjustment is performed according to different pose information of the mobile terminal, so that the scheme provided by the invention has stronger adaptability. And, in the course of making human voice beautification, the human voice of the mobile terminal person is separated, the environmental noise and the sound of other people are restrained, and the tone quality is compensated.

Drawings

FIG. 1 is a flow chart of a first embodiment of an audio processing method suitable for a mobile terminal;

FIG. 2 is a schematic diagram of a first embodiment of a human voice beautification process and a master mix process;

FIG. 3 is a schematic diagram of a second embodiment of a human voice beautification process and a master mix process;

FIG. 4 is a schematic diagram of a measurement phase of an acoustic feedback loop feature;

FIG. 5 is a schematic diagram of a tracking phase of an acoustic feedback loop feature;

fig. 6 is a block diagram of a first embodiment of an audio processing method applied to a mobile terminal.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, and means that three relationships may exist, for example, and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.

It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

The invention provides an audio processing method suitable for a mobile terminal, as shown in fig. 1 and fig. 2, comprising the following steps:

step S110, acquiring the acoustic feedback loop characteristics of the mobile terminal in real time. Because different influences can be generated on the audio under different environments and scenes, the invention can acquire the characteristics of the acoustic feedback loop in real time so as to be suitable for the different environments and scenes.

And step 120, carrying out voice beautification processing on the input first multi-channel audio in real time based on the voice feedback loop characteristics to obtain second multi-channel audio. After the acoustic feedback loop characteristics are obtained, the input multipath audio is subjected to voice beautifying treatment. Because accompaniment is constant during singing and mainly beautifies human voice, human voice needs to be processed before mixing, and the human voice can be adjusted according to the characteristics of an acoustic feedback loop in real time during processing of the human voice so as to prevent the human voice from being disturbed in different environments.

And step S130, performing mother-ribbon mixing and shrinking processing on the second multi-path audio in real time based on the acoustic feedback loop characteristics to obtain single-path audio output. After the beautified human voice is obtained, the multi-path audio is subjected to audio mixing processing to obtain single-path audio output, and the single-path audio is regulated in real time according to the characteristics of an acoustic feedback loop in the process of audio mixing, so that the single-path audio is prevented from being interfered in different environments.

Further, as shown in fig. 3, performing the human voice beautification processing on the input first multi-path audio based on the acoustic feedback loop feature to obtain second multi-path audio includes:

The acoustic feedback loop detection module provided by the invention can transmit the environmental reverberation impulse response to the human sound beautifying module. The voice beautifying module generates reverse signals according to the environmental reverberation impulse response, counteracts signals generated by the environmental reverberation impulse response in the recording signals to obtain pure dry voice (without reverberation), namely a single-path voice signal without the environmental reverberation, and then performs traditional voice beautifying, such as equalization, compression, reverberation and the like, to obtain the voice with sound effect. The invention can remove the interference of the reverberation introduced by the environment on the voice beautifying work in the first multi-channel audio, thereby improving the voice beautifying effect.

Further, performing a mother-band mixing process on the second multi-path audio based on the acoustic feedback loop feature to obtain a single-path audio output includes:

And the acoustic feedback loop detection module transmits the environmental reverberation impulse response, the real-time change of the acoustic feedback loop frequency response curve and the spatial modal standing wave frequency of the spatial position of the mobile end to the mother belt mixing module. The stereo audio processed by the vocal beautifying module and the accompaniment audio are mixed together by the mother belt mixing shrinkage, the mixed audio is processed according to the characteristics input by the acoustic feedback loop detection module, certain frequency bands in the hearing sense caused by the environmental reverberation are enhanced after the processing, and meanwhile, the frequency loss caused by the spatial mode is compensated, so that the effect of reducing the difference of the hearing sense of the sound effect under different environments is achieved.

Further, a sequence of enhanced frequency points and a sequence of gain values which change with time are calculated according to the frequency response of the acoustic feedback loop, the real-time change of the frequency response curve of the acoustic feedback loop and the room environment impulse response of echo tracking.

And configuring an environment equalization filter bank, wherein the environment equalization filter bank is used for inputting a mixed signal of stereo audio and accompaniment audio into the filter bank for equalization compensation, and the compensation mode is based on the enhanced frequency point sequence and the gain quantity sequence.

According to the frequency response of the acoustic feedback loop, the real-time change of the frequency response curve of the acoustic feedback loop and the room environment impulse response of echo tracking, an enhanced frequency point sequence and a gain quantity sequence which change along with time are calculated, an environment gain equalization cascade filter bank is designed, parameters of the filter bank change along with time, an action frequency band is consistent with the enhanced frequency point sequence, and the gain parameters of the filter are opposite to the gain quantity sequence caused by the environment impulse response. And (3) the mixed signal of the voice and the accompaniment is subjected to the filter bank to obtain an audio signal after balanced compensation.

According to the loss frequency caused by the spatial mode, the invention adopts the traditional exciter to generate the higher harmonic wave of the frequency band so as to simulate the existence of the energy of the frequency band in the sense of hearing and compensate the frequency loss caused by the spatial mode.

Further, in step S110, it includes:

and controlling a loudspeaker of the mobile terminal to play at least one acoustic feedback detection signal and recording the acoustic feedback detection signal. In the present invention, as shown in fig. 4, an acoustic detection signal library may be preconfigured, and the acoustic feedback detection signal is a signal stored in advance in the acoustic detection signal library. Wherein the acoustic detection signal library stores signals for predicting at least echo power, frequency response, and ambient reverberation.

And acquiring an acoustic feedback detection signal through a microphone of the mobile terminal and recording the acquired acoustic feedback detection signal. After the mobile terminal plays the acoustic feedback detection signal, factors such as environment and the like can strengthen and/or weaken the acoustic feedback detection signal, and the enhanced and/or weakened acoustic feedback detection signal is obtained through a microphone.

And comparing the played acoustic feedback detection signal with the acquired acoustic feedback detection signal to obtain an acoustic feedback detection result, wherein the acoustic feedback detection result comprises an acoustic feedback loop characteristic. In this step, the enhanced and/or attenuated acoustic feedback detection signal is compared with the original acoustic feedback detection signal, so that an acoustic feedback detection result can be obtained, and the acoustic feedback detection result can reflect the enhancement and/or attenuation of the acoustic feedback detection signal, wherein the acoustic feedback detection result includes an acoustic feedback loop feature.

Further, comparing the played acoustic feedback detection signal with the collected acoustic feedback detection signal to obtain an acoustic feedback detection result includes:

and respectively comparing the played acoustic feedback detection signal with the acquired acoustic feedback detection signal in the echo power dimension, the frequency response dimension and the environment reverberation dimension to obtain an acoustic feedback detection result. According to the technical scheme provided by the invention, the acoustic feedback detection result can be obtained in the echo power dimension, the frequency response dimension and the environment reverberation dimension, so that the effect of performing multidimensional processing on human voice is achieved.

In one possible embodiment, the method further comprises:

and acquiring any one or more of the gesture information, the position information and the speed information of the mobile terminal in real time. As shown in fig. 5, the pose information in the present invention includes pose information of the mobile terminal when the user holds the mobile terminal, where the user and the mobile terminal are located, and a speed of the mobile terminal controlled by the user, so that the present invention is more comprehensive when the situation of acoustic feedback is considered, and further, the present invention is guaranteed to suppress and avoid howling in all aspects.

And obtaining the pose information of the mobile terminal at the current moment based on the pose information, the position information and the speed information. At each time, one or more of the attitude information, the position information, and the speed information may change, so that the attitude information is adjusted when any one of the information changes.

And updating the acoustic feedback detection result based on the pose information at the current moment. By the method, the dynamic update of the acoustic feedback detection result is realized, and the audio is dynamically processed based on the characteristics of the acoustic feedback loop.

An embodiment of the present invention further provides an audio processing apparatus for a mobile terminal, as shown in fig. 6, including:

In one embodiment, the human voice beautifying module is further configured to perform the steps comprising:

receiving an audio feature of a target user;

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). In addition, the ASIC may reside in a user device. The processor and the readable storage medium may reside as discrete components in a communication device. The readable storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tape, floppy disk, optical data storage device, etc.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, the execution instructions being executed by the at least one processor to cause the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An audio processing method suitable for a mobile terminal, comprising:

performing mother-ribbon mixing processing on the second multi-path audio in real time based on the acoustic feedback loop characteristics to obtain single-path audio output;

performing voice beautification processing on the input first multi-path audio based on the voice feedback loop characteristics to obtain second multi-path audio comprises:

beautifying the single-path voice signal after the offset processing to obtain stereo voice frequency;

performing a mother-band mixing process on the second multi-path audio based on the acoustic feedback loop characteristics to obtain a single-path audio output includes:

2. The method for audio processing for a mobile terminal according to claim 1, wherein,

calculating an enhanced frequency point sequence and a gain sequence which change along with time according to the frequency response of the acoustic feedback loop, the real-time change of the frequency response curve of the acoustic feedback loop and the room environment impulse response of echo tracking;

3. The method for audio processing for a mobile terminal according to claim 1, wherein,

the real-time acquisition of the acoustic feedback loop characteristics of the mobile terminal comprises the following steps:

4. An audio processing method suitable for a mobile terminal according to claim 3,

comparing the played acoustic feedback detection signal with the collected acoustic feedback detection signal to obtain an acoustic feedback detection result comprises the following steps:

5. The audio processing method for a mobile terminal according to claim 3, further comprising:

6. An audio processing apparatus adapted for use in a mobile terminal, comprising:

the master tape mixing and shrinking module is used for carrying out master tape mixing and shrinking processing on the second multi-path audio in real time based on the characteristics of the acoustic feedback loop to obtain single-path audio output;

7. The audio processing apparatus for a mobile terminal according to claim 6, wherein,

the voice beautifying module is further used for executing the following steps of:

receiving an audio feature of a target user;

8. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program for implementing the method of any of claims 1 to 5 when being executed by a processor.