CN106782576B

CN106782576B - Audio mixing method and device

Info

Publication number: CN106782576B
Application number: CN201710081724.7A
Authority: CN
Inventors: 朱煜鹏; 黄曙光; 刘显铭; 顾思斌; 杨伟东; 潘柏宇; 项青
Original assignee: Youku Network Technology Beijing Co Ltd
Current assignee: Alibaba China Co Ltd; Youku Network Technology Beijing Co Ltd
Priority date: 2017-02-15
Filing date: 2017-02-15
Publication date: 2020-05-22
Anticipated expiration: 2037-02-15
Also published as: CN106782576A

Abstract

The present disclosure relates to an audio mixing method and apparatus. The method comprises the following steps: respectively decoding the plurality of audio files to obtain a plurality of decoded audio data; storing the decoded plurality of audio data into a buffer space; and acquiring audio data with a first data length from each of the plurality of audio data in the cache space to perform sound mixing processing, and acquiring sound-mixed audio data. According to the embodiment of the disclosure, the audio files with different formats can be quickly mixed by respectively decoding the audio files and storing the audio files in the buffer space, and acquiring the audio data with the first data length from each audio data to perform mixing processing to obtain the mixed audio data.

Description

Audio mixing method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an audio mixing method and apparatus.

Background

With the rapid development of computer technology, audio mixing technology is applied more and more widely. At present, a general audio mixing method is to send multiple channels of audio to a decoder, obtain respective original PCM audio data through decoding of the decoder, superimpose and mix the obtained original PCM audio data to obtain mixed data, and then send the mixed result to an external device for rendering or directly write the mixed result into an audio file. However, in the related art, only audio in the same format (for example, mp3 format) can be audio-mixed, and the speed of the mixing process is not ideal.

Disclosure of Invention

In view of this, the present disclosure provides an audio mixing method and an audio mixing device, which implement fast audio mixing for audios with different formats.

According to an aspect of the present disclosure, there is provided an audio mixing method including:

respectively decoding the plurality of audio files to obtain a plurality of decoded audio data;

storing the decoded plurality of audio data into a buffer space;

and acquiring audio data with a first data length from each of the plurality of audio data in the cache space to perform sound mixing processing, and acquiring sound-mixed audio data.

For the above method, in a possible implementation manner, obtaining audio data of a first data length from each of the plurality of audio data in the buffer space to perform mixing processing includes:

in a case where a data length of first audio data of the plurality of audio data in the buffer space is smaller than a first data length, acquiring all audio data of the first audio data in the buffer space and acquiring audio data of the first data length from each of the audio data other than the first audio data, performing mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For the above method, in one possible implementation manner, decoding each of the plurality of audio files includes:

and calling an audio decoder corresponding to the type of each audio file to decode the plurality of audio files respectively.

in the case where a data length of first audio data among the plurality of audio data in the buffer space is smaller than a first data length, acquiring decoded audio data from an audio decoder corresponding to the first audio data and storing the decoded audio data in the buffer space,

acquiring all audio data of the first audio data in a buffer space if the decoded audio data does not exist in the corresponding audio decoder, and acquiring audio data of a first data length from each of the audio data other than the first audio data, performing mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For the above method, in one possible implementation, the first data length is obtained by the following formula:

the first data length is equal to an audio sampling rate of the audio playing device, multiplied by the number of channels of the audio playing device, multiplied by an audio sample length/frame rate supported by the audio playing device.

For the above method, in one possible implementation, the decoded plurality of audio data are PCM audio data.

For the above method, in one possible implementation, the method further includes:

and acquiring the time stamp of the audio data with the first data length.

For the above method, in one possible implementation, the time stamp of the audio data of the first data length is obtained by the following formula:

the time stamp of the audio data of the first data length is equal to the length of the mixed audio data/(the first data length × the frame rate).

According to another aspect of the present disclosure, there is provided an audio mixing apparatus including:

the decoding module is used for respectively decoding the plurality of audio files to obtain a plurality of decoded audio data;

the storage module is used for storing the decoded audio data into a cache space;

and the audio mixing module is used for acquiring audio data with a first data length from each of the plurality of audio data in the cache space to perform audio mixing processing, so as to obtain audio-mixed audio data.

For the above apparatus, in a possible implementation manner, the mixing module includes:

a first mixing sub-module, configured to, in a case where a data length of first audio data of the plurality of audio data in the buffer space is smaller than a first data length, acquire all audio data of the first audio data in the buffer space, and acquire audio data of the first data length from each of the audio data other than the first audio data, perform mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For the above apparatus, in one possible implementation manner, the decoding module includes:

and the decoding calling submodule is used for calling an audio decoder corresponding to the type of each audio file to decode the plurality of audio files respectively.

the data acquisition submodule is used for acquiring decoded audio data from an audio decoder corresponding to the first audio data and storing the decoded audio data into the cache space under the condition that the data length of the first audio data in the plurality of audio data in the cache space is smaller than the first data length;

a second mixing sub-module for acquiring all audio data of the first audio data in a buffer space if the decoded audio data does not exist in the corresponding audio decoder, and acquiring audio data of a first data length from each of the audio data other than the first audio data, performing mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For the above apparatus, in one possible implementation, the first data length is obtained by the following formula:

For the above apparatus, in one possible implementation manner, the decoded plurality of audio data are PCM audio data.

For the above apparatus, in one possible implementation manner, the apparatus further includes:

and the time stamp obtaining module is used for obtaining the time stamp of the audio data with the first data length.

For the above apparatus, in one possible implementation, the time stamp of the audio data of the first data length is obtained by the following formula:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

storing the decoded plurality of audio data into a buffer space;

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions stored thereon, which, when executed by a processor of a terminal and/or a server, enable the terminal and/or the server to perform an audio mixing method, the method including:

storing the decoded plurality of audio data into a buffer space;

According to the audio mixing method and device disclosed by the embodiment of the disclosure, the plurality of audio files can be respectively decoded and stored in the cache space, and the audio data with the first data length is acquired from each audio data to be subjected to mixing processing so as to obtain the mixed audio data, so that the audio files with different formats can be quickly mixed.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart illustrating an audio mixing method according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating an audio mixing method according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating an audio mixing method according to an exemplary embodiment.

Fig. 4 is a flowchart illustrating a step S13 of an audio mixing method according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Example 1

Fig. 1 is a flowchart illustrating an audio mixing method according to an exemplary embodiment. The method can be applied to terminal equipment (such as a smart phone, a computer and the like) or a server. As shown in fig. 1, an audio mixing method according to an embodiment of the present disclosure includes:

step S11, decoding the audio files respectively to obtain a plurality of decoded audio data;

step S12, storing the decoded plurality of audio data in a buffer space;

step S13, obtaining audio data with a first data length from each of the plurality of audio data in the buffer space, and performing mixing processing to obtain mixed audio data.

According to the embodiment of the disclosure, the plurality of audio files can be respectively decoded and stored in the cache space, and the audio data with the first data length is obtained from each audio data to perform the audio mixing processing so as to obtain the audio data after the audio mixing, so that the writing overhead in the audio data mixing process can be reduced, and the audio files with different formats can be quickly mixed.

For example, for a plurality of audio files with different formats, such as audio files with formats of mp3, aac, 3gpp, etc., corresponding audio decoders may be respectively invoked to decode the audio files, thereby obtaining a plurality of audio data. Moreover, each of the plurality of audio data may be preprocessed, for example, parameters (such as volume) of different channels of audio may be adjusted, and after the preprocessing, the plurality of decoded audio data may be obtained. The present disclosure is not limited to the specific type of audio file and corresponding audio decoder, and the specific manner of pre-processing the audio data is not limited.

In one possible implementation, the decoded plurality of audio data may be PCM audio data. In the terminal device or the server, a buffer space may be created for each of the plurality of audio data, and the plurality of decoded audio data may be stored in the buffer space, respectively. The buffer space can store audio data (for example, PCM audio data) with a certain length, and the length of the buffer space can be customized according to the actual environment and different devices.

In one possible implementation manner, audio data of a first data length may be obtained from each of the plurality of audio data in the buffer space to perform mixing processing. The first data length may be the length of an audio frame, and may be set by default or by a user according to actual conditions, for example, the first data length may be set to 4096Byte, 8192Byte, or the like. Since audio files of different formats and different types of decoders decode PCM data of different lengths at a time, it is possible to obtain a plurality of audio data (PCM data) of different lengths after decoding. For a plurality of audio data in the buffer space, the length alignment may be performed in units of one audio frame, for example, from the first audio frame of each audio data. Audio data (one audio frame) of a first data length may be obtained from each aligned audio data to perform mixing and superposition, and mixed audio data may be obtained. And inquiring and processing each path of audio in turn, taking out audio data of an audio frame from the cache space of each path of audio for superposition and sound mixing operation, and if the residual data of a certain path of audio is less than one frame, taking out all the audio data for superposition and sound mixing.

In one possible implementation, the first data length may be obtained by the following formula:

The audio sampling rate of a common audio playing device (e.g., a terminal device) is 22050Hz, 32000Hz, 44100Hz, 48000Hz, and the like. The number of sound channels of the audio playing device is generally two-channel, the length of an audio sample supported by the audio playing device is generally PCM data of 8bit/16bit/32bit, and the frame rate (i.e. the number of audio frames played by the audio playing device per unit time) can be customized. For example, for the terminal device, the audio sampling rate may be 44100Hz, which is a sampling rate with better portability across platforms, the number of channels may be two channels, the audio sample length may be 16 bits, and according to these commonly used data selections, for example, the frame rate of the audio frame may be defined as 20, and then a recommended first data length (audio frame length) may be 8820 bytes. Accordingly, the buffer space may store audio data (PCM audio data) for a plurality of audio frames, for example, the buffer space may store PCM audio data for 20 audio frames.

By the method, the writing overhead in the audio data mixing process can be reduced, and the audio files with different formats can be quickly mixed.

In one possible implementation, the method further includes: and acquiring the time stamp of the audio data with the first data length.

For example, the mixed audio data may be subjected to post-mixing processing, such as de-plosive, and parameters of the mixed audio data may be calculated, such as calculating a time stamp, so as to ensure that the mixed audio data can be played and exported normally. Since the audio data is rearranged and aligned in the process of decoding, storing, and mixing the plurality of audio files in steps 11 to 13, it is necessary to calculate a time stamp of the audio data (one audio frame) of the first data length in some cases. The present disclosure does not limit the post-mixing process and the specific type of parameters of the mixed audio data.

In one possible implementation, the time stamp of the audio data of the first data length may be obtained by the following formula:

time stamp of audio data of the first data length ═ length of mixed audio data/(first data length × frame rate)

Here, the time stamp of the audio data of the first data length may represent a time stamp of an audio frame currently being processed, the length of the mixed audio data may represent a total length of the audio data that has been mixed before the audio frame currently being processed, and the frame rate may be a frame rate in the formula of the first data length. After calculation, the corresponding time stamp may be added to the audio frame currently being processed. In this way, a time stamp can be calculated for each of the plurality of audio frames, so that all the mixed audio data have a corresponding time stamp, thereby facilitating subsequent processing (e.g., playing or generating a file).

In a possible implementation manner, the mixed audio data may be rendered and played, and the mixed audio data may be derived to generate a mixed audio file for a user to store or play.

By the method, the parameters of the mixed audio data can be calculated and added, so that the subsequent processing is facilitated, and the practicability of the mixed audio data is improved.

Fig. 2 is a flowchart illustrating an audio mixing method according to an exemplary embodiment. As shown in fig. 2, in one possible implementation, step S13 includes:

step S131, in the case that the data length of the first audio data in the plurality of audio data in the buffer space is smaller than the first data length, acquiring all the audio data of the first audio data in the buffer space, and acquiring the audio data of the first data length from each of the audio data except the first audio data, performing mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For example, when there is a vacancy in the buffer space, the audio files may be continuously decoded and preprocessed, and the decoded audio data may be obtained and stored in the buffer space. For a plurality of audio data in the buffer space, each audio data may be queried and processed in turn, and the audio data of an audio frame (audio data of a first data length) may be taken out from each audio data for mixing. If the remaining data of one or several audio data (first audio data) has been less than one audio frame, it can be considered that the audio file corresponding to the first audio data has been completely decoded. At this time, all audio data of the first audio data in the buffer space may be acquired, and audio data (one audio frame) of the first data length may be acquired from each of the audio data other than the first audio data, to perform mixing processing. Wherein the first audio data may be any one or several of the plurality of audio data in the buffer space.

By the method, when the residual data of the first audio data is less than one audio frame, the residual data and other audio data can be mixed, and the efficiency and flexibility of mixing processing are improved.

Fig. 3 is a flowchart illustrating an audio mixing method according to an exemplary embodiment. As shown in fig. 3, in one possible implementation, step S11 includes:

step S111, calling an audio decoder corresponding to the type of each audio file to decode the plurality of audio files respectively.

For example, for different types of audio files, an audio decoder corresponding to the type of each audio file is required for decoding. In this way, when a plurality of audio files are decoded respectively, an audio decoder corresponding to the type of each audio file can be called from the system to decode the plurality of audio files respectively. The present disclosure is not limited to a particular type of audio file and corresponding audio decoder.

By the method, the corresponding audio decoder can be called to decode the audio files respectively, so that the decoding efficiency is improved, and the audio mixing among different types of audio files can be realized conveniently.

Fig. 4 is a flowchart illustrating a step S13 of an audio mixing method according to an exemplary embodiment. As shown in fig. 4, in one possible implementation, step S13 includes:

step S132, when the data length of the first audio data in the plurality of audio data in the buffer space is smaller than the first data length, the decoded audio data is obtained from the audio decoder corresponding to the first audio data and stored in the buffer space,

step S133, if there is no decoded audio data in the corresponding audio decoder, acquiring all audio data of the first audio data in the buffer space, and acquiring audio data of a first data length from each of the audio data other than the first audio data, performing mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

For example, a plurality of audio files may be decoded and preprocessed, respectively, to obtain a plurality of decoded audio data, and the audio data is stored in the buffer space. For a plurality of audio data in the buffer space, each audio data may be queried and processed in turn, and the audio data of an audio frame (audio data of a first data length) may be taken out from each audio data for mixing. If the remaining data of one or several audio data (first audio data) is less than one audio frame, the audio decoder corresponding to the first audio data can be queried. If the decoded audio data exists in the corresponding audio decoder, the decoded and preprocessed audio data can be obtained and stored in the buffer space. Then, a normal mixing procedure may be performed, that is, audio data of the first data length is obtained from each of the plurality of audio data to perform mixing processing.

In one possible implementation, if the decoded audio data is not already present in the audio decoder corresponding to the first audio data, the audio file corresponding to the first audio data may be considered to have been completely decoded. At this time, all audio data of the first audio data in the buffer space may be acquired, and audio data (one audio frame) of the first data length may be acquired from each of the audio data other than the first audio data, to perform mixing processing. Wherein the first audio data may be any one or several of the plurality of audio data in the buffer space.

By the method, the decoded data can be acquired from the decoder when the residual data of the first audio data is less than one audio frame, and the residual data of the first audio data and other audio data are mixed when the decoded audio data does not exist in the decoder, so that the efficiency and flexibility of mixing processing are improved.

Example 2

Fig. 5 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment. As shown in fig. 5, the audio mixing apparatus includes: a decoding module 51, a storage module 52 and a mixing module 53.

A decoding module 51, configured to decode the multiple audio files respectively to obtain multiple decoded audio data;

a storage module 52, configured to store the decoded multiple audio data into a buffer space;

the audio mixing module 53 is configured to obtain audio data with a first data length from each of the plurality of audio data in the buffer space, perform audio mixing processing, and obtain audio data that has been subjected to audio mixing.

Fig. 6 is a block diagram illustrating an audio mixing apparatus according to an exemplary embodiment. As shown in fig. 6, in a possible implementation manner, the mixing module 53 includes:

a first mixing sub-module 531 for acquiring all audio data of the first audio data in the buffer space and acquiring audio data of a first data length from each of the audio data other than the first audio data to perform mixing processing, in a case where the data length of the first audio data among the plurality of audio data in the buffer space is smaller than the first data length,

wherein the first audio data is any audio data of the plurality of audio data.

As shown in fig. 6, in one possible implementation, the decoding module 51 includes:

the decoding calling sub-module 511 is configured to call an audio decoder corresponding to the type of each audio file to decode the plurality of audio files, respectively.

As shown in fig. 6, in a possible implementation manner, the mixing module 53 includes:

a data obtaining sub-module 532, configured to, in a case that a data length of first audio data in the plurality of audio data in the buffer space is smaller than a first data length, obtain decoded audio data from an audio decoder corresponding to the first audio data and store the decoded audio data in the buffer space;

a second mixing sub-module 533 configured to, if there is no decoded audio data in the corresponding audio decoder, acquire all audio data of the first audio data in the buffer space, and acquire audio data of a first data length from each of the audio data other than the first audio data, perform mixing processing,

wherein the first audio data is any audio data of the plurality of audio data.

In one possible implementation, the first data length is obtained by the following formula:

In one possible implementation, the decoded plurality of audio data are PCM audio data.

In one possible implementation, the apparatus further includes: and the time stamp obtaining module is used for obtaining the time stamp of the audio data with the first data length.

In one possible implementation, the time stamp of the audio data of the first data length is obtained by the following formula:

According to the embodiment of the disclosure, the plurality of audio files can be respectively decoded and stored in the buffer space, and the audio data with the first data length is acquired from each audio data to be subjected to the audio mixing processing so as to obtain the mixed audio data, so that the audio files with different formats can be quickly mixed.

Example 3

Fig. 7 is a block diagram illustrating an audio mixing apparatus 800 according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 7, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as the memory 804 including instructions executable by the processor 820 of the device 800 to perform the above-described method.

Fig. 8 is a block diagram illustrating an audio mixing apparatus 1900 according to an exemplary embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 8, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the audio mixing method described above.

The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 1932 that includes instructions, which are executable by the processing component 1922 of the apparatus 1900 to perform the above-described method.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An audio mixing method, comprising:

storing the decoded plurality of audio data into a buffer space;

obtaining audio data of a first data length from each of the plurality of audio data in the buffer space to perform audio mixing processing, obtaining audio-mixed audio data,

wherein the first data length is determined according to an audio playing apparatus playing the mixed audio data,

the first data length is obtained by the following formula:

the first data length = an audio sampling rate of the audio playback device × the number of channels of the audio playback device × an audio sample length/frame rate supported by the audio playback device.

2. The method of claim 1, wherein obtaining audio data of a first data length from each of the plurality of audio data in the buffer space for mixing processing comprises:

wherein the first audio data is any audio data of the plurality of audio data.

3. The method of claim 1, wherein decoding each of the plurality of audio files comprises:

4. The method of claim 3, wherein obtaining audio data of a first data length from each of the plurality of audio data in the buffer space for mixing processing comprises:

wherein the first audio data is any audio data of the plurality of audio data.

5. The method of any of claims 1-4, wherein the decoded plurality of audio data are PCM audio data.

6. The method of claim 1, further comprising:

and acquiring the time stamp of the audio data with the first data length.

7. The method according to claim 6, wherein the time stamp of the audio data of the first data length is obtained by the following formula:

the time stamp of the audio data of the first data length = the length of the mixed audio data/(the first data length × the frame rate).

8. An audio mixing apparatus, comprising:

a sound mixing module, configured to obtain audio data with a first data length from each of the plurality of audio data in the buffer space, perform sound mixing processing to obtain sound-mixed audio data,

the first data length is obtained by the following formula:

9. The apparatus of claim 8, wherein the mixing module comprises:

wherein the first audio data is any audio data of the plurality of audio data.

10. The apparatus of claim 8, wherein the decoding module comprises:

11. The apparatus of claim 10, wherein the mixing module comprises:

wherein the first audio data is any audio data of the plurality of audio data.

12. The apparatus according to any of claims 8-11, wherein the decoded plurality of audio data are PCM audio data.

13. The apparatus of claim 8, further comprising:

14. The apparatus of claim 13, wherein the time stamp of the audio data of the first data length is obtained by the following formula:

the time stamp of the audio data of the first data length = the length of the mixed audio data/(first data length × frame rate).

15. An audio mixing apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

storing the decoded plurality of audio data into a buffer space;

the first data length is obtained by the following formula: