CN116112736A

CN116112736A - Audio processing method, device, computer equipment and storage medium

Info

Publication number: CN116112736A
Application number: CN202211576726.0A
Authority: CN
Inventors: 王劲鹏
Original assignee: Shanghai Yuer Network Technology Co ltd
Current assignee: Shanghai Yuer Network Technology Co ltd
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-05-12

Abstract

The present application relates to an audio processing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: obtaining a virtual resource multimedia data packet and basic audio; separating the virtual resource multimedia data packet to obtain initial audio; processing the initial audio according to the target characteristics of the basic audio to obtain target audio; and mixing the target audio and the basic audio to obtain mixed audio. By adopting the method, two sounds can be fused.

Description

Audio processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to an audio processing method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

With the development of mobile terminal technology, people engage in social activities through the mobile terminal. And receiving the multimedia file containing the virtual resource in the process of playing the audio and video file at the mobile terminal, and playing the multimedia file. However, in the process of playing the received multimedia scene by the native player of the mobile terminal (Android/IOS), the playing of the multimedia file and the playing of the original audio/video file of the mobile terminal are affected, so that an echo exists.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio processing method, apparatus, computer device, computer-readable storage medium, and computer program product capable of canceling echo.

In a first aspect, the present application provides an audio processing method, the method comprising:

obtaining a virtual resource multimedia data packet and basic audio;

separating the virtual resource multimedia data packet to obtain initial audio;

processing the initial audio according to the target characteristics of the basic audio to obtain target audio;

and mixing the target audio and the basic audio to obtain mixed audio.

In one embodiment, the target features include audio format, sampling rate, bit depth, and sound count; the processing the initial audio according to the target characteristics of the basic audio to obtain target audio comprises the following steps:

and resampling the initial audio according to the audio format, the sampling rate, the bit depth and the channel number of the basic audio to obtain target audio.

In one embodiment, the method further comprises:

establishing a buffer area according to the initial audio, and placing the initial audio into the buffer area; the size of the buffer is positively correlated with the size of the base audio;

the processing the initial audio according to the target characteristics of the basic audio to obtain target audio comprises the following steps:

and processing the initial audio in the buffer zone according to the target characteristics of the basic audio to obtain target audio.

In one embodiment, the size of the buffer is an integer multiple of the base audio frame length.

In one embodiment, the mixing the target audio and the base audio to obtain mixed audio includes:

and adjusting the volume of the target audio according to the volume of the basic audio, and mixing the adjusted target audio with the basic audio to obtain the mixed audio.

In one embodiment, the frame length of the initial audio is the same as the frame length of the base audio.

In a second aspect, the present application also provides an audio processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring the virtual resource multimedia data packet and the basic audio;

the separation module is used for separating the virtual resource multimedia data packet to obtain initial audio;

the processing module is used for processing the initial audio according to the target characteristics of the basic audio to obtain target audio;

and the mixing module is used for mixing the target audio with the basic audio to obtain mixed audio.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the method described above.

According to the audio processing method, the audio processing device, the computer equipment, the storage medium and the computer program product, the target audio is obtained by processing the initial audio through the target characteristics, so that the target audio is consistent with the target characteristics of the basic audio, the target audio and the basic audio are mixed to obtain mixed audio, a good mixing effect is achieved, and therefore echo is eliminated.

Drawings

FIG. 1 is a diagram of an application environment for an audio processing method in one embodiment;

FIG. 2 is a flow chart of an audio processing method according to an embodiment;

FIG. 3 is a flow chart of an audio processing method according to another embodiment;

FIG. 4 is a block diagram of an audio processing device in one embodiment;

fig. 5 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The audio processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 acquires a virtual resource multimedia data packet and basic audio; separating the virtual resource multimedia data packet to obtain initial audio; processing the initial audio according to the target characteristics of the basic audio to obtain target audio; and mixing the target audio with the basic audio to obtain mixed audio. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, an audio processing method is provided, and the method is applied to the terminal in fig. 1 for illustration, and includes the following steps:

it should be noted that, the audio gift mentioned in this embodiment may be understood as a virtual audio gift that a user rewards a host in a living room. The audio gift can be a live broadcast platform entrusted to be developed by a third party, and comprises a video picture and a corresponding sound effect of the audio gift, so that in order to fuse the sound of the live broadcast room with the sound of the audio gift when the audio gift is played in the live broadcast platform, the audio processing needs to be carried out on the audio gift and the sound of the live broadcast room.

Step 202, obtaining a virtual resource multimedia data packet and basic audio.

The virtual resource multimedia data packet is a data packet corresponding to an audio gift, and the basic audio is real-time audio of the live broadcasting room.

Optionally, the terminal acquires a data packet corresponding to the audio gift and real-time audio of the live broadcast room.

And 204, separating the virtual resource multimedia data packet to obtain initial audio.

Wherein the initial audio is audio in an audio gift.

Optionally, the terminal separates the data packet corresponding to the audio gift to obtain the initial audio and the initial video. Wherein the initial video is a video of an audio gift.

And 206, processing the initial audio according to the target characteristics of the basic audio to obtain target audio.

Wherein the target characteristics of the target audio are the same as the target characteristics of the initial audio.

Optionally, the terminal processes the initial audio according to the target characteristics of the initial audio to obtain the target audio.

Optionally, the target features include audio format, sample rate, bit depth, and channel number. Wherein the audio format is PCM format. Sample rates include, but are not limited to, 44.1 khz and 48 khz. The Bit Depth is Bit-Depth. The number of channels includes, but is not limited to, mono or binaural (stereo).

Processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and the terminal resamples the initial audio according to the audio format, the sampling rate, the bit depth and the channel number of the basic audio to obtain target audio.

And step 208, mixing the target audio and the basic audio to obtain mixed audio.

Optionally, the terminal aligns and mixes the target audio with the target features of the base audio to obtain the mixed audio.

According to the audio processing method, the target audio is obtained by processing the initial audio through the target characteristics, so that the target audio is consistent with the target characteristics of the basic audio, and the target audio and the basic audio are mixed to obtain the mixed audio. The mixed audio is the mixed audio of the target audio and the initial audio obtained according to the virtual resource multimedia data packet, and the characteristics of the two audios are included and fused.

In one embodiment, the method further comprises: establishing a buffer area according to the initial audio, and placing the initial audio into the buffer area; the size of the buffer is positively correlated with the size of the base audio; processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and processing the initial audio in the buffer zone according to the target characteristics of the basic audio to obtain target audio.

Optionally, after the terminal separates the virtual resource multimedia data packet to obtain the initial audio, a corresponding buffer area is dynamically established, and the size of the buffer area is positively correlated with the size of the basic audio to be mixed with the initial audio. Optionally, the size of the buffer is an integer multiple of the base audio frame length. Alternatively, when the base audio frame length is 10ms, the buffer size is 5000 times the base audio frame length.

Optionally, the terminal processes the initial audio in the buffer according to the target characteristics of the basic audio to obtain the target audio.

According to the audio processing method, the initial audio is dynamically stored in a mode of dynamically establishing the buffer area, so that the occupied mode of the memory is more flexible.

In one embodiment, mixing the target audio and the base audio to obtain mixed audio includes: and adjusting the volume of the target audio according to the volume of the basic audio, and mixing the adjusted target audio with the basic audio to obtain mixed audio.

Optionally, the terminal uses a mixing algorithm to adjust the volume of the target audio according to the volume of the base audio, so that the specific gravity of the volume of the base audio and the volume of the target audio is a preset proportion. It should be noted that the preset ratio is not limited in the present application, and alternatively, the preset ratio is 6:4 or 7:3. and the terminal mixes the target audio with the adjusted volume with the basic audio to obtain mixed audio.

According to the audio processing method, the volume of the basic audio and the volume of the target audio are adjusted, so that the mixed audio sounds relatively balanced.

In one embodiment, as shown in fig. 3, the audio processing method is implemented by the mobile terminal. The mobile terminal for realizing the audio processing method comprises a network data receiving module, a demultiplexing and decoding module, an audio buffer module, an audio resampling module and an audio mixing module.

Wherein, the network data receiving module: virtual resource multimedia data packets including audio and video data packets of an audio gift are received via HTTP (Hyper Text Transfer Protocol ) HLS (HTTP Live Streaming, HTTP-based adaptive rate streaming protocol) protocol.

Demultiplexing and decoding module: the audio and video data of the audio gift are separated, resulting in an AvFrame data structure containing audio bare data (initial audio).

An audio buffer module: and (3) dynamically distributing the memory by setting the size of a proper buffer, and caching the analyzed initial audio and target audio.

And the audio and video resampling module is used for: the audio data (initial audio) buffered by the audio buffer module is resampled by parameters (target characteristics and corresponding parameters) such as the sampling rate of the incoming cloud quotient basic audio (basic audio, i.e. real-time audio of the live broadcasting room).

An audio mixing module: and mixing the audio (target audio) output by the audio resampling module with the cloud quotient basic audio (basic audio).

And the audio/video rendering module is used for: the audio data (mixed audio) is played by a cloud provider player and simultaneously echo cancellation processing is performed. Video data (video data in the virtual resource multimedia data packet, namely video in the audio gift) is played by a self-built player, and the transparency optimization is adapted.

Before audio-video separation in the first step of playing an audio gift (virtual resource multimedia data packet), target features of cloud provider base audio (base audio) need to be acquired, where the target features include: the four parameters of sampling rate, bit depth, sampling format and channel number are transmitted to the audio resampling module.

And (3) an audio and video separation link: the acquisition of the audio source data PCM (initial audio) is derived from the AudioFram data structure generated after decoding by the decapsulation and decoding module. The decapsulation module is used for separating audio and video data (virtual resource multimedia data packets) and generating video packets (video packets corresponding to video data in the virtual resource multimedia data packets) and audio packets (audio packets corresponding to initial audio). The decoding module is used for performing the inverse operation of decompression, and generating an AudioFrame data structure through an AudioPacket data structure, wherein the AudioFrame comprises audio bare data PCM (initial audio) with a certain frame length, namely the input of the resampling module in the next link.

The obtaining of the audio bare data PCM in the AudioFrame also needs to follow a certain rule, and basically, the audio and video data in FFmpeg (an audio and video processing program) has two storage modes, namely a packet (an audio and video storage mode) and a Planar (an audio and video storage mode), and for binaural audio (audio with 2 channels), the packet mode is data interleaved storage of two channels. The Planar mode is stored separately for the two channels. Wherein the packet format, frame.data [0] or frame.extended_data [0], contains all audio data. Planar format, frame. Data [ i ] or frame. Extended_data [ i ] contains data of the i-th channel. For different formats, after the complete PCM data (initial audio) containing two channels is obtained from the frame data or the frame extended data, the PCM data is cached into a dynamically created audio Buffer to wait for the processing of a subsequent resampling module.

The dynamically created audio buffer needs to consider the appropriate length size in advance from the point of view of subsequent mixing. The generation and processing of audio-visual data of audio gifts is a typical production consumer model, the decapsulation and decoding module can be regarded as the producer, and the mixing module can be regarded as the consumer. The data produced and consumed is audio data (initial audio) buffered in a dynamically created audio buffer. The audio buffer size here should be an integer multiple of the base audio frame length provided by the cloud provider. Meanwhile, considering the strictness of the mobile terminal on the storage space requirement, the audio buffer is set to 5000 times of the basic audio frame length provided by the cloud provider according to the experience value, and the audio length is about 5 s.

The decoding module is used as a producer to produce audio bare data PCM (initial audio), writes the audio bare data PCM (initial audio) into a dynamically allocated audio buffer, analyzes four parameters of the sampling rate, bit depth, sampling Format and channel number of the provided basic audio through a received cloud quotient player, compares the audio bare data PCM (initial audio) with the four parameters of the sampling rate, bit depth, sampling Format and channel number of the basic audio analyzed through the cloud quotient player to judge whether the audio buffer needs to resample corresponding parameters or not, and if so, resamples the parameters before mixing, otherwise, does not process.

The mixing function of the mixing module is realized by mixing two or more audio streams together to form one audio stream. But not any two audio streams may be directly mixed. The two audio streams must meet the following conditions to be mixed:

1) The audio formats are the same and need to be decompressed into a PCM format;

2) The sampling rate is the same and is converted to the same sampling rate. The mainstream sampling rate includes: 44.1 kHz and 48kHz, etc.;

3) The frame length is the same, the frame length is determined by the coding format, the PCM has no concept of the frame length, and 10ms is used as the frame length in order to keep the same with the frame length of the mainstream audio coding format;

4) The Bit Depth (Bit-Depth) or sampling Format (Sample Format) is the same, and the Bit number of carrying each sampling point data is the same;

5) The number of channels is the same and must be mono or binaural (stereo).

Optionally, the mixing module is implemented in such a way that the terminal takes the pcmBuffer with the basic audio and the pcmBuffer length through a callback function onaudiobiccmbufferlistener of the cloud provider player. The pcmBuffer and the pcmBuffer length are transmitted to a mixing module, and simultaneously sound gift audio data (target audio) with the same size as the pcmBuffer length after resampling is also transmitted to the mixing module to wait for mixing.

When the mixing module performs mixing processing, the amplitudes of the two audio streams represent the energy of sound. The terminal adjusts the weights of the amplitudes of the target audio and the basic audio according to the mixing algorithm, or adjusts the volume. Alternatively, the sound volume of the base audio data (base audio) of the cloud provider is kept unchanged, while the volume of the accompaniment music (target audio) is adjusted, and then mixed to obtain mixed audio.

The audio data (mixed audio) obtained after the audio mixing process is the audio to be played by the cloud provider player, wherein the cloud provider player can perform processing such as echo cancellation and noise reduction, and the video data (the video data of the audio gift) with separated audio and video can be played by the self-built player.

In one embodiment, the audio processing method includes: a decapsulation thread, an audio decoding thread, a resampling thread, an audio mixing thread and a rendering thread;

separating the virtual resource multimedia data packet through a decapsulation thread to obtain a first audio; the first audio is an audio data packet in the virtual resource multimedia data packet;

decoding the first audio through an audio decoding thread to obtain initial audio;

processing the initial audio through a resampling thread and an audio mixing thread, and obtaining mixed audio;

and rendering the mixed audio through a rendering thread.

In one embodiment, the decoding thread is an audio tool box (VideoToolBox or MediaCodec).

In one embodiment, decoding the first audio by the audio decoding thread to obtain the initial audio includes:

creating a decoding session by a first tool (vtde compression session create);

decoding the first audio in a decoding session by a second tool (vtdecompresssessiondecoder) to obtain an initial audio;

the decoding session is released by a third tool (vtdecompressionsetioninvite).

In one embodiment, the terminal implements the audio processing method through multi-thread processing, and the processing flow between each module implementing the audio processing method is an independent thread, including a decapsulation thread, a resampling thread, an audio mixing thread, and the like. Wherein the decapsulation thread (demuxerThread) obtains audio packet (first audio) and video packet (video data of audio gift). The AudioPacket is delivered to an audio decoding thread for decoding, and a Frame (initial audio) obtained after decoding is delivered to a rendering AudioRender thread for rendering after resampling and mixing. The frame rate per second is calculated as: global frame number/global time. Global time = Max { rendering time, decapsulation time, audio resampling time, audio mixing time }. Wherein the second frame rate is an indicator for measuring the mixed audio.

It will be appreciated that most mobile phones currently support hard solutions, for example, the hard solution API that is currently open for ios is VideoToolBox and the hard solution API that is open for Android is MediaCodec. Here, taking Ios as an example, a decoding dialog is created by vtdecompasssetioncreate, and then an audio Frame is decoded by vtdecompasssetiondecoder to obtain a Frame. Finally the decoding session is released by vtdecompression session invite. By adopting a hard solution scheme, the decoding speed is improved, the CPU occupation is reduced, and the performance is improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiments of the present application also provide an audio processing apparatus for implementing the above-mentioned audio processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the audio processing device provided below may be referred to the limitation of the audio processing method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 4, there is provided an audio processing apparatus including: an acquisition module 100, a separation module 200, a processing module 300, and a mixing module 400, wherein:

an acquisition module 100, configured to acquire a virtual resource multimedia data packet and a basic audio;

the separation module 200 is configured to separate the virtual resource multimedia data packet to obtain initial audio;

the processing module 300 is configured to process the initial audio according to the target feature of the basic audio to obtain a target audio;

and the mixing module 400 is configured to mix the target audio and the base audio to obtain mixed audio.

In one embodiment, a processing module includes: and the first processing module is used for resampling the initial audio according to the audio format, the sampling rate, the bit depth and the channel number of the basic audio to obtain target audio.

In one embodiment, the apparatus further comprises: the building module is used for building a buffer area according to the initial audio frequency and placing the initial audio frequency into the buffer area; the size of the buffer is positively correlated with the size of the base audio; a processing module, comprising: and the first processing module is used for processing the initial audio in the buffer zone according to the target characteristics of the basic audio to obtain target audio.

In one embodiment, the size of the buffer of the device is an integer multiple of the base audio frame length.

In one embodiment, a mixing module includes: and the adjusting and mixing module is used for adjusting the volume of the target audio according to the volume of the basic audio and mixing the adjusted target audio with the basic audio to obtain mixed audio.

In one embodiment, the frame length of the initial audio of the device is the same as the frame length of the base audio.

The respective modules in the above-described audio processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio processing method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

obtaining a virtual resource multimedia data packet and basic audio;

separating the virtual resource multimedia data packet to obtain initial audio;

and mixing the target audio with the basic audio to obtain mixed audio.

In one embodiment, the target features implemented when the processor executes the computer program include audio format, sample rate, bit depth, and number of channels; processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and resampling the initial audio according to the audio format, the sampling rate, the bit depth and the channel number of the basic audio to obtain target audio.

In one embodiment, the processor when executing the computer program further performs the steps of: establishing a buffer area according to the initial audio, and placing the initial audio into the buffer area; the size of the buffer is positively correlated with the size of the base audio; processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and processing the initial audio in the buffer zone according to the target characteristics of the basic audio to obtain target audio.

In one embodiment, the size of the buffer implemented when the processor executes the computer program is an integer multiple of the base audio frame length.

In one embodiment, a method for mixing target audio and base audio to obtain mixed audio implemented when a processor executes a computer program includes: and adjusting the volume of the target audio according to the volume of the basic audio, and mixing the adjusted target audio with the basic audio to obtain mixed audio.

In one embodiment, the frame length of the initial audio implemented when the processor executes the computer program is the same as the frame length of the base audio.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

obtaining a virtual resource multimedia data packet and basic audio;

separating the virtual resource multimedia data packet to obtain initial audio;

and mixing the target audio with the basic audio to obtain mixed audio.

In one embodiment, the target features implemented when the computer program is executed by the processor include audio format, sample rate, bit depth, and number of channels; processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and resampling the initial audio according to the audio format, the sampling rate, the bit depth and the channel number of the basic audio to obtain target audio.

In one embodiment, the computer program when executed by the processor further performs the steps of: establishing a buffer area according to the initial audio, and placing the initial audio into the buffer area; the size of the buffer is positively correlated with the size of the base audio; processing the initial audio according to the target characteristics of the basic audio to obtain target audio, including: and processing the initial audio in the buffer zone according to the target characteristics of the basic audio to obtain target audio.

In one embodiment, the size of the buffer implemented when the computer program is executed by the processor is an integer multiple of the base audio frame length.

In one embodiment, a method for mixing target audio and base audio to obtain mixed audio implemented when a computer program is executed by a processor includes: and adjusting the volume of the target audio according to the volume of the basic audio, and mixing the adjusted target audio with the basic audio to obtain mixed audio.

In one embodiment, the frame length of the initial audio is the same as the frame length of the base audio as implemented when the computer program is executed by the processor. In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

obtaining a virtual resource multimedia data packet and basic audio;

separating the virtual resource multimedia data packet to obtain initial audio;

and mixing the target audio with the basic audio to obtain mixed audio.

In one embodiment, the frame length of the initial audio is the same as the frame length of the base audio as implemented when the computer program is executed by the processor. It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of audio processing, the method comprising:

obtaining a virtual resource multimedia data packet and basic audio;

separating the virtual resource multimedia data packet to obtain initial audio;

and mixing the target audio and the basic audio to obtain mixed audio.

2. The method of claim 1, wherein the target features include audio format, sampling rate, bit depth, and sound count; the processing the initial audio according to the target characteristics of the basic audio to obtain target audio comprises the following steps:

3. The method according to claim 1, wherein the method further comprises:

4. A method according to claim 3, wherein the size of the buffer is an integer multiple of the base audio frame length.

5. The method of claim 1, wherein the mixing the target audio and the base audio to obtain mixed audio comprises:

6. The method of claim 1, wherein a frame length of the initial audio is the same as a frame length of the base audio.

7. An audio processing apparatus, the apparatus comprising:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.