CN108932948B

CN108932948B - Audio data processing method and device, computer equipment and computer readable storage medium

Info

Publication number: CN108932948B
Application number: CN201710386977.5A
Authority: CN
Inventors: 赵晓强; 罗程; 李斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2021-12-14
Anticipated expiration: 2037-05-26
Also published as: CN108932948A

Abstract

The invention relates to an audio data processing method, an audio data processing device, computer equipment and a storage medium, wherein the audio data processing method comprises the following steps: acquiring an audio data stream through a uniform interface, and acquiring a current audio communication state corresponding to the audio data stream; acquiring a target coding algorithm matched with the current audio communication state from a unified coding module, and coding the audio data stream by adopting the target coding algorithm to obtain coded audio data; executing the following steps by the unified processing module: determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, and processing coded audio data according to the target audio processing mode to obtain audio data to be sent; and determining a target network channel from the selectable network channels according to the current audio communication state, and transmitting the audio data to be transmitted through the target network channel, so that the cost is reduced and the efficiency is improved.

Description

Audio data processing method and device, computer equipment and computer readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an audio data processing method, an audio data processing apparatus, a computer device, and a computer-readable storage medium.

Background

With the development of computer technology, the application of network and multimedia technology combined with each other is more and more common in people's daily life. For better communication interaction, the user can input voice, music, etc. through the microphone to conduct instant communication sessions, entertainment, work, and learning.

When the traditional audio application realizes the functions of asynchronous audio messages and real-time audio communication, the traditional audio application is realized by introducing two different voice framework schemes, the problem of complex management caused by the fact that a plurality of sets of codes access system audio resources simultaneously exists, and the realization cost is high.

Disclosure of Invention

In view of the foregoing, there is a need to provide an audio data processing method, an apparatus, a computer device and a computer readable storage medium for implementing asynchronous audio and real-time audio by a unified technical architecture, so as to reduce the cost and improve the efficiency of audio resource management.

A method of audio data processing, the method comprising:

acquiring an audio data stream through a uniform interface, and acquiring a current audio communication state corresponding to the audio data stream;

acquiring a target coding algorithm matched with the current audio communication state from a unified coding module, and coding the audio data stream by adopting the target coding algorithm to obtain coded audio data;

executing the following steps by the unified processing module:

determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, and processing the coded audio data according to the target audio processing mode to obtain audio data to be sent;

and determining a target network channel from the selectable network channels according to the current audio communication state, and transmitting the audio data to be transmitted through the target network channel.

An audio data processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring an audio data stream through a unified interface and acquiring a current audio communication state corresponding to the audio data stream;

the unified coding module is used for acquiring a target coding algorithm matched with the current audio communication state and coding the audio data stream by adopting the target coding algorithm to obtain coded audio data;

a unified processing module, comprising:

a packing unit, configured to determine a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, where the selectable audio processing modes include an audio file mode and a real-time audio frame mode, and process the encoded audio data according to the target audio processing mode to obtain audio data to be sent;

and the transmission unit is used for determining a target network channel from the selectable network channels according to the current audio communication state and transmitting the audio data to be sent through the target network channel.

A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions, which, when executed by the processor, cause the processor to perform the steps of the audio data processing method of any one of the above embodiments.

A computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, cause the processor to perform the steps of the audio data processing method of any one of the above embodiments.

According to the audio data processing method, the device, the computer equipment and the computer readable storage medium, the audio data stream is obtained through the unified interface, the current audio communication state corresponding to the audio data stream is obtained, the target coding algorithm matched with the current audio communication state is obtained from the unified coding module, the audio data stream is coded through the target coding algorithm to obtain coded audio data, and the unified processing module executes the following steps: determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, processing the coded audio data according to the target audio processing mode to obtain audio data to be sent, determining a target network channel from the selectable network channels according to the current audio communication state, transmitting the audio data to be sent through the target network channel, acquiring audio data streams, coding and packaging the audio data to be sent to transmit in different audio communication states, processing by adopting a uniform interface and a uniform module, uniformly managing and distributing the audio data in different audio communication states, realizing the uniformity of architecture, improving the convenience of audio resource management, and realizing the audio data processing in different audio communication states by only adopting one set of codes, the fusion cost between heterogeneous voice systems is greatly reduced.

Drawings

FIG. 1 is a diagram of an exemplary audio data processing method;

FIG. 2 is a diagram illustrating an internal structure of the terminal of FIG. 1 according to one embodiment;

FIG. 3 is a flow diagram of a method for audio data processing in one embodiment;

FIG. 4 is a flowchart of an audio data processing method in another embodiment;

FIG. 5 is a flow diagram illustrating an exemplary embodiment for adjusting an audio state based on an audio communication state switch;

FIG. 6 is a flow diagram of processing and transmission according to an audio communication state in one embodiment;

FIG. 7 is a schematic diagram of an audio communication interface in one embodiment;

FIG. 8 is a block diagram of an audio data processing system in accordance with one embodiment;

FIG. 9 is a block diagram showing the structure of an audio data processing apparatus according to an embodiment;

FIG. 10 is a block diagram showing the construction of an audio data processing apparatus according to another embodiment;

FIG. 11 is a block diagram showing the configuration of an audio communication state switching module according to an embodiment;

fig. 12 is a block diagram showing the configuration of an audio communication state switching module in another embodiment;

FIG. 13 is a block diagram of the structure of a unified processing module in one embodiment;

FIG. 14 is a block diagram showing the construction of an audio data processing apparatus according to still another embodiment;

fig. 15 is a block diagram showing the structure of an audio data processing apparatus according to still another embodiment.

Detailed Description

Fig. 1 is a diagram of an application environment in which an audio data processing method operates in one embodiment. As shown in fig. 1, the application environment includes a first terminal 110, a server 120, and a second terminal 130, wherein the first terminal 110, the server 120, and the second terminal 130 communicate via a network, wherein the first terminal 110 can transmit real-time audio data or asynchronous audio data via the server 120, and seamless switching can be achieved in a real-time audio communication state and an asynchronous audio communication state. According to different current audio communication states, the first terminal 110 selects a coding algorithm and an audio processing mode through a uniform code and a uniform voice framework, packages audio data and selects a network channel according to a target audio processing mode, sends the audio data to be sent to a server according to the target network channel, and realizes uniform management and call of audio resources so that the server sends the audio data to be sent to the target terminal second terminal 130. The first terminal 110 may receive real-time audio data or asynchronous audio data transmitted by the second terminal 130 and play the data at the first terminal.

The first terminal 110 and the second terminal 130 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The first terminal 110 and the second terminal 130 may send audio forwarding requests to the server 120 through the network, and the server 120 may return corresponding audio resources in response to the requests sent by the first terminal 110 and the second terminal 130. The first terminal 110 and the second terminal 130 may be one or more, and the server 120 may be a single server or a server cluster.

In one embodiment, the internal structure of the first terminal 110 in fig. 1 is as shown in fig. 2, and the first terminal 110 includes a processor, a graphic processing unit, a storage medium, a memory, a network interface, a display screen, and an input device, which are connected through a system bus. The storage medium of the first terminal 110 stores an operating system, and further includes an audio data processing apparatus, which is used to implement an audio data processing method suitable for the terminal. The processor is used to provide computing and control capabilities to support the operation of the entire first terminal 110. The graphic processing unit in the first terminal 110 is configured to provide at least a rendering capability of a display interface, the memory provides an environment for the audio data processing device in the storage medium to operate, and the network interface is configured to perform network communication with the server 120. The display screen is used for displaying application interfaces and the like, such as displaying a real-time communication interface, the input device is used for receiving commands input by a user or audio data and the like, and the input device comprises a microphone. For the first terminal 110 with a touch screen, the display screen and the input device may be a touch screen. The structure shown in fig. 1 is a block diagram of only a part of the structure related to the present application, and does not constitute a limitation of the terminal to which the present application is applied, and a specific terminal may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.

In one embodiment, as shown in fig. 3, there is provided an audio data processing method, which is exemplified by being applied to a first terminal or a second terminal in the application environment, and includes the following steps:

step S210, obtaining the audio data stream through the unified interface, and obtaining the current audio communication state corresponding to the audio data stream.

Specifically, the audio data stream is voice data recorded on site or music data played by an audio file and the like acquired by an audio acquisition device, the audio data stream may be wave data obtained by basic encoding of original voice data, the original voice data is an analog signal acquired by system hardware, and the audio data stream is a byte stream and is a digital signal. The unified interface is a method for providing calling to the outside after logic packaging, corresponding functions can be completed through the unified interface without concerning internal implementation, the unified interface can comprise an audio recording interface, a recording authority interface, an audio playing interface and the like, and both asynchronous audio data and real-time audio data are acquired through the unified interface, so that the audio in different audio communication states is ensured to adopt a unified data stream format, and the convenience of audio resource management in different audio communication states is improved. In one embodiment, the audio data stream is a PCM encoded data stream, and the same data acquisition process is adopted through the unified interface to obtain the PCM encoded data stream regardless of asynchronous audio or real-time audio.

Wherein, the current audio communication state refers to the type of the current audio communication, including real-time audio communication and asynchronous audio communication. The real-time audio communication realizes real-time transmission and playing of sound data through technical means such as voice real-time acquisition, real-time encoding, data transmission, decoding and noise reduction, audio byte streams are transmitted, and the real-time audio communication generally needs to be subjected to noise reduction and quantization and then encoded, so that the compression rate of data is increased. When the real-time audio communication is carried out, a real-time communication link is established and a connection state is kept through calling, connecting and connecting, for example, the telephone voice communication is the real-time audio communication. Asynchronous audio communication realizes asynchronous transmission of sound data through voice acquisition, voice coding, voice file transmission and voice playing, complete audio data is recorded into an audio file, then the audio file is transmitted to an opposite terminal, and the recorded audio file is transmitted.

The current audio communication state can be determined and switched between different audio communication states through operations acting on the communication interface. If the real-time audio communication interface is used, the current audio communication state is switched from the closed state to the real-time audio communication state through a first preset operation, and the current audio communication state is switched from the real-time audio communication state to the asynchronous audio communication state through a second preset operation on the real-time audio communication interface. The first preset operation and the second preset operation can be gesture operation, touch screen operation, voice command and the like. The current audio communication state can be distinguished through preset characters, for example, 0 represents real-time audio communication, 1 represents asynchronous audio communication, and the current audio communication state can be associated with an input audio data stream, so that the current audio communication state corresponding to the audio data stream can be quickly determined. The current audio communication state can also be obtained by detecting the interface state in real time, such as detecting whether an asynchronous audio key is pressed or not, and the current audio communication state corresponding to the audio data stream is determined.

In one embodiment, the audio data processing method is implemented by the framework of an interface layer and an implementation layer, and is a set of unified codes, wherein the interface layer comprises a current audio communication state determination interface, and the implementation layer is used for implementing a series of processing of different logics according to the current audio communication state determined by the interface layer so as to complete different encoding, packaging and transmission.

And S220, acquiring a target coding algorithm matched with the current audio communication state from the unified coding module, and coding the audio data stream by adopting the target coding algorithm to obtain coded audio data.

Specifically, the target encoding algorithm may be configured through an interface layer, such as asynchronous audio communication matching with MP3 codec algorithm, and real-time audio communication matching with silk codec algorithm. The target coding algorithm matched with the current audio communication state can be customized in advance according to needs, and can also be dynamically matched through the real-time network state and the audio data characteristics. The unified coding module is an independently expandable module, integrates optional encoders in different audio communication states, realizes the unified coding module in an expandable mode, and only needs to realize and add the encoders through expandable interfaces when the encoders are needed to be added according to new coding requirements. The unified coding module is used for carrying out unified management and distribution on the encoders in different audio communication states, so that the unification of the architecture is realized.

Step S230 and step S240 are performed by the unified processing module:

step S230, determining a matched target audio processing mode from the selectable audio processing modes according to the current audio communication state, where the selectable audio processing modes include an audio file mode and a real-time audio frame mode, and processing the encoded audio data according to the target audio processing mode to obtain audio data to be transmitted.

Specifically, the audio processing mode is used for processing the encoded audio data to generate audio data to be transmitted that matches the current audio communication state, and different audio processing modes can be set for different audio communication states according to the requirements corresponding to the current audio communication state, such as an audio file mode corresponding to asynchronous audio communication and a real-time audio frame mode corresponding to real-time audio communication. The audio file mode is to write the complete audio coding data into the audio file with the preset format, the real-time audio frame mode is to pack the audio coding data into audio frames in real time and send out the audio frames in real time, and the real-time performance is ensured by adopting streaming packing and sending.

In one embodiment, the current network state and the characteristics of the encoded audio data, such as length, are obtained, processing parameters corresponding to the target audio processing mode are dynamically determined, for the real-time audio frame mode, the processing parameters include the frame length of the audio frame, the redundancy data policy parameters, and the like, and for the audio file mode, the file compression rate, and the like.

Step S240, determining a target network channel from the selectable network channels according to the current audio communication state, and transmitting the audio data to be transmitted through the target network channel.

Specifically, the network channel refers to a channel for transmitting network data, different network channels correspond to different network transmission protocols, and a matching relationship between an audio communication state and the network channel can be set in advance, for example, asynchronous audio communication corresponds to a first network channel, real-time audio communication corresponds to a second network channel, so that a corresponding target network channel can be obtained according to a configuration relationship according to a current audio communication state. And transmitting the audio data to be transmitted through a target network channel, and transmitting the audio data to be transmitted corresponding to different audio communication states through different network channels.

Because step S230 and step S240 are processed by the unified processing module, it is ensured that the real-time audio communication and the asynchronous audio communication are implemented by using a unified code, the audio resource is called by the unified code, which is convenient for management, and the asynchronous audio communication and the real-time audio communication are implemented by a unified technical architecture.

In one embodiment, the unified processing module comprises a first data acquisition callback processing interface and a second data acquisition callback processing interface, the first data acquisition callback processing interface corresponds to real-time audio communication, the second data acquisition callback processing interface corresponds to asynchronous audio communication, the first data acquisition callback processing interface defines an audio processing mode and a network channel which are matched with the real-time audio communication, and the second data acquisition callback processing interface defines an audio processing mode and a network channel which are matched with the asynchronous audio communication, so that the corresponding processing flow can be called according to the data acquisition callback processing interface only by determining the corresponding data acquisition callback processing interface according to the current audio communication state, and the independence of the processing flows among different audio communication states is improved.

In this embodiment, an audio data stream is obtained through a unified interface, a current audio communication state corresponding to the audio data stream is obtained, a target coding algorithm matched with the current audio communication state is obtained from a unified coding module, the audio data stream is coded by using the target coding algorithm to obtain coded audio data, and the following steps are executed through a unified processing module: determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, processing the coded audio data according to the target audio processing mode to obtain audio data to be sent, determining a target network channel from the selectable network channels according to the current audio communication state, transmitting the audio data to be sent through the target network channel, acquiring audio data streams, coding and packaging the audio data to be sent to transmit in different audio communication states, processing by adopting a uniform interface and a uniform module, uniformly managing and distributing the audio data in different audio communication states, realizing the uniformity of architecture, improving the convenience of audio resource management, and realizing the audio data processing in different audio communication states by only adopting one set of codes, the fusion cost between heterogeneous voice systems is greatly reduced.

In one embodiment, as shown in fig. 4, before step S210, the method further includes:

step S310, detecting an audio communication state, determining an audio state management parameter according to audio communication state switching data through a unified audio configuration management interface when detecting that audio communication state switching exists, and adjusting the current audio state according to the audio state management parameter.

Specifically, whether the audio communication state is switched or not can be judged by recognizing user operations such as gesture operations, touch operations, preset voice commands and the like, if the audio communication state is switched, the original audio communication state before switching and the target audio communication state needing to be switched are recorded, and audio communication state switching data are generated. The audio configuration management interface is used for determining audio state management parameters so as to adjust an audio state, the audio configuration management comprises audio authority management, recording management, playing management and audio processing management, the audio authority management comprises equipment authorization management, recording authorization and the like, the recording management comprises recording audio types, recording audio parameter setting, recording states and the like, the playing management comprises playing types, such as asynchronous audio file playing, real-time audio stream playing and playing states, and if the playing is stopped, the audio processing management mainly determines a corresponding callback processing interface according to the current audio communication state, and if the first data acquisition callback processing interface corresponding to the current audio communication state is distributed to the unified management module through the audio processing management. The corresponding audio state management parameters such as sampling rate and play state parameters are determined by each management part, wherein 0 is play, 1 is stop play and the like, so that the current audio state is adjusted according to the audio state management parameters, for example, the received audio code stream is adjusted to be a discarded audio code stream, and the conversion among different audio communication types is realized by adjusting the current audio state.

In the embodiment, the audio state management parameters are determined according to the audio communication state switching data through the unified audio configuration management interface, so that the unified management of the system audio resources is provided when different audio communication types are switched, and the complexity of the system audio resource management is avoided.

In one embodiment, as shown in fig. 5, the step of detecting the audio communication state in step S310, and when it is detected that there is an audio communication state switching, the step of determining the audio state management parameter according to the audio communication state switching data through the unified audio configuration management interface includes:

step S311, when the audio communication state is switched from the real-time audio communication to the asynchronous audio communication, the real-time audio communication link is kept in a connection state, and the playing parameter, the decoding state parameter, and the recording configuration parameter are modified through the unified audio configuration management interface.

Specifically, the real-time audio communication may be switched to the asynchronous audio communication by an operation acting on the real-time audio communication interface. A target communication user for asynchronous audio communication may be designated, and if not, all users in the current real-time audio communication session are defaulted as target communication users. Maintaining the real-time audio communication link in a connected state may ensure seamless return of real-time audio communication from asynchronous audio communication again.

In one embodiment, different groups are set for users in the real-time audio communication session, and the group identification corresponding to the current asynchronous audio communication is determined while the real-time audio communication is switched to the asynchronous audio communication, so that the corresponding target communication user can be quickly determined according to the group identification. The different groups set for the users can be grouped according to user grades, grouped according to friend intimacy and the like, and can be fixed groups configured in advance or groups dynamically set according to historical communication behavior data for real-time audio communication sessions.

Step S312, determine the playing parameter as real-time audio stop playing, determine the decoding status parameter as real-time audio stop decoding, and update the recording configuration parameter to a status matching the asynchronous audio communication.

Specifically, since the real-time audio communication is switched to the asynchronous audio communication, the playing of the decoded real-time audio needs to be terminated, so that the playing parameter is determined as the real-time audio stop playing, and the decoding of the received encoded real-time audio data needs to be stopped, so that the decoding state parameter is determined as the real-time audio stop decoding, the recording configuration parameter needs to be modified to ensure the correct recording of the asynchronous audio data, and the recording configuration parameter is updated to the state matched with the asynchronous audio communication.

The step of adjusting the current audio state according to the audio state management parameter in step S310 includes:

step 313, stopping playing the decoded real-time audio data according to the playing parameters, stopping decoding the real-time audio data, discarding the subsequently received real-time audio data to be decoded, and starting the acquisition of asynchronous audio data according to the updated recording configuration parameters.

Specifically, stopping playing of the decoded real-time audio data can avoid interference caused by real-time audio sounds when recording asynchronous audio data, and stop decoding of the real-time audio data, discarding the subsequently received real-time audio data to be decoded, thereby avoiding invalid processing that is not matched with the current audio communication state. And starting the acquisition of asynchronous audio data according to the updated recording configuration parameters, thereby starting asynchronous audio communication.

In this embodiment, the switching from real-time audio communication to asynchronous audio communication is realized through a uniform audio configuration management interface, and the seamless switching is realized in the whole process through automatic adjustment of audio state management parameters without stopping real-time audio connection.

In one embodiment, after the step of adjusting the current audio state according to the audio state management parameter, the method further includes:

when the audio communication state is switched from asynchronous audio communication to real-time audio communication and the real-time audio communication link is in a connection state, the playing parameters, the decoding state parameters and the recording configuration parameters are modified into parameters matched with the real-time audio communication through a unified audio configuration management interface, the current audio state is adjusted according to the audio state management parameters, and the real-time audio communication is recovered.

Specifically, when switching between different audio states, it can be determined whether the state switching satisfies a switching condition during switching, for example, when the real-time audio communication link is in a disconnected state, it is not possible to directly switch from asynchronous audio communication to real-time audio communication because the connection of the real-time audio communication link requires the entire response process from calling, connecting, and connecting. The seamless switching from asynchronous audio communication to real-time audio communication is enabled only when the real-time audio communication link is in a connected state. The playing parameters, the decoding state parameters and the recording configuration parameters are modified into parameters matched with real-time audio communication through a unified audio configuration management interface, if the playing parameters are modified into real-time audio to start playing, the decoding state parameters are modified into real-time audio data to start decoding, the recording configuration parameters are updated into states matched with the real-time audio communication, the acquisition of the real-time audio data is started, the playing of asynchronous audio is stopped at the same time, the downloading and decoding of the asynchronous audio data are stopped, and the real-time audio communication is recovered.

In this embodiment, the switching from asynchronous audio communication to real-time audio communication is realized through a uniform audio configuration management interface, and the seamless switching is realized through automatic adjustment of audio state management parameters in the whole process.

In one embodiment, as shown in fig. 6, steps S230 and S240 include:

in step S230a, when the current audio communication status is asynchronous audio communication, the audio processing mode is determined as an audio file mode, and the encoded audio data is written into an audio file and transmitted through a first network channel, where the first network channel includes at least one of an HTTP protocol channel and a TCP protocol channel.

Specifically, the asynchronous audio communication processes complete audio data, detects whether the asynchronous audio recording is finished, and if so, encodes the complete audio data to obtain encoded audio data, writes the encoded audio data into an audio file to generate audio data to be transmitted, and can also compress the generated audio file. And sending the audio file to a server through an HTTP (hyper text transport protocol) channel or a TCP (transmission control protocol) channel. The server can generate a URL address corresponding to the audio file, and the target communication terminal corresponding to the audio data can continuously transmit and download the audio file through the breakpoint of the http after receiving the URL address.

In step S230b, when the current audio communication status is real-time audio communication, the audio processing mode is determined as a real-time audio frame mode, the encoded audio data is assembled into audio frames, and the audio frames are transmitted through a second network channel in real time, where the second network channel includes a UDP protocol channel.

Specifically, during real-time audio communication, received audio data are encoded in real time, the encoded audio data are generated continuously along with acquisition time, the encoded audio data are assembled into audio frames through a preset algorithm, the audio frames generated continuously along with the acquisition time are transmitted to a server through a second network channel in real time, for example, the audio frames are transmitted to a target communication terminal through a socket in real time, and the server forwards the audio frames to the target communication terminal in real time, so that the data streaming transmission is realized.

In one embodiment, the method further comprises: acquiring encoded audio data, processing the encoded audio data to generate an audio data stream, wherein the encoded audio data comprises at least one of file audio data and real-time audio frame data, acquiring a matched decoding algorithm from the unified decoding module according to an acquisition network channel of the encoded audio data, and decoding the audio data stream according to the decoding algorithm to obtain original audio data.

Specifically, if the real-time audio communication is performed, the encoded audio data is obtained by directly receiving the audio frame sent by the server in real time, and the encoded audio data is the audio data stream. The audio decoding under different audio communication states is ensured to adopt a uniform data stream format, and the convenience of audio resource management under different audio communication states is improved.

If the communication is real-time audio communication, the two communication parties agree on an encoding algorithm and a corresponding decoding algorithm through a communication protocol, and if the communication is asynchronous network communication, the encoding algorithm can be carried in encoded data through preset characters of preset bytes, so that the decoding algorithm matched with the encoding algorithm is obtained from the unified decoding module. The unified decoding module is an independently expandable module, optional decoders in different audio communication states are integrated, the unified decoding module is realized in an expandable mode, and for new coding requirements, when a corresponding decoder needs to be added, the decoder only needs to be realized and added through an expandable interface. The unified decoder module is used for carrying out unified management and distribution on the decoders in different audio communication states, so that the unification of the architecture is realized.

In one embodiment, the method further comprises: the method comprises the steps of obtaining asynchronous audio starting operation through an asynchronous audio communication key on a real-time audio communication interface, switching the current audio communication state from real-time audio communication to asynchronous audio communication, obtaining asynchronous audio ending operation through the asynchronous audio communication key under the condition that a real-time audio communication link is kept in a connection state, and recovering the current audio communication state from the asynchronous audio communication to the real-time audio communication.

Specifically, the real-time audio communication interface may be a terminal screen interface, or may be a three-dimensional real-time audio communication space interface formed in a three-dimensional space by a virtual reality device. The asynchronous audio communication key can be a virtual key or an entity key on a terminal screen interface, and can also be a three-dimensional virtual key on a three-dimensional real-time audio communication interface space interface. An asynchronous audio starting instruction can be generated through a first preset operation, if an asynchronous audio communication key is pressed down, the operation of pressing the asynchronous audio communication key is used as asynchronous audio starting operation, the current audio communication state is switched from real-time audio communication to asynchronous audio communication, asynchronous audio collection is started, under the condition that a real-time audio communication link is kept in a connection state, an asynchronous audio ending instruction can be generated through a second preset operation, if the asynchronous audio communication key pops up, an asynchronous audio ending instruction is generated, the operation of popping up the asynchronous audio communication key is used as asynchronous audio ending operation, and the current audio communication state is restored from asynchronous audio communication to real-time audio communication. The first preset operation and the second preset operation may be gesture operations or touch operations. As shown in fig. 7, the audio communication interface is a three-dimensional virtual session audio communication interface, the audio communication interface includes a real-time audio communication trigger button 320 and an asynchronous audio communication trigger button 330, real-time audio communication is entered through the real-time audio communication trigger button 320, the asynchronous audio communication trigger button 330 can be pressed on the real-time audio communication interface to start asynchronous audio acquisition, and when the acquisition is completed, the asynchronous audio communication trigger button 330 is popped up as an asynchronous audio end operation, and the asynchronous communication is completed to restore the current audio communication state from asynchronous audio communication to real-time audio communication.

In the embodiment, through different interface operations, the audio communication states can be seamlessly switched, and the method is simple and convenient.

In one embodiment, the audio data processing method is applied to a multi-person conversation scenario, and step S230a includes: and writing the encoded audio data and user information corresponding to a target user in the multi-person conversation into an audio file in a correlation manner, and transmitting the audio file to the server through a first network channel, so that the server determines a target receiving terminal according to the user information in the audio file and sends the audio file to the target user in the multi-person conversation.

Specifically, the multi-person conversation refers to a scene in which a plurality of communication parties exist in a conversation, and during real-time audio communication, audio data of one conversation user is sent to all other conversation users in the multi-person conversation. If the first conversation user only wants to communicate with the second conversation user and needs to shield other conversation users, the real-time audio communication can be switched to asynchronous audio communication, the coded audio data is associated with user information corresponding to a target user in the multi-person conversation and written into an audio file, and the user information is used for determining the target user and can be a user identifier and the like. The server determines a target receiving terminal corresponding to a target user according to the user information in the audio file, and sends the audio file to the target user in the multi-user conversation, and since other users cannot receive asynchronous audio data, the seamless switching to an asynchronous audio communication state during real-time audio communication is achieved, other conversation users are shielded, and the function of sending asynchronous audio only to the target user is achieved. And the connection state of the real-time audio communication link is kept, the real-time audio communication can be quickly recovered after asynchronous audio is sent to the target user, the real-time communication is continued, and the function of sending the private messages to the target user in the real-time audio conference can be realized.

In a specific embodiment, the audio data processing method is implemented by an audio data processing system architecture as shown in fig. 8, and includes an interface layer 410 and an implementation layer 420, where the interface layer 410 provides interfaces exposed to the outside, including an asynchronous voice start interface 411, an asynchronous voice end interface 412, an asynchronous voice play interface 413, a real-time voice on interface 414, and a real-time voice off interface 415.

The implementation layer 420 includes audio configuration management 421, codec 422, unified processing mode management 423, and network module 424.

The audio configuration management 421 provides system audio authority management, recording management, playing management, and audio processing management, and provides a unified interface for external use, which is convenient for use, and obtains system recording data by uniformly using PCM data streams. And determining an audio state management parameter according to the audio communication state switching data through a uniform audio configuration management interface, so as to adjust the current audio state according to the audio state management parameter.

Codec 422 provides a scalable set of codec implementations, selected according to system requirements, that process the original audio data stream into encoded audio data by encoding. For new formats that need to be supported, only new codecs need to be added. This embodiment includes an ARM codec, an MP3 codec, and a SILK codec.

The unified processing mode management 423 includes a mode manager, an audio file mode module, and a real-time audio frame mode module. The unified processing mode management 423 provides voice communication mode management, controls a voice communication state, determines a matched target audio processing mode from the selectable audio processing modes according to the current audio communication state, and processes the acquired encoded audio data according to the target audio processing mode, for example, for an audio file mode, writing an encoded code stream into a sound file in a preset format, and for a real-time audio frame mode, assembling an audio frame in the preset format for subsequent transmission or reading.

The network module 424 provides network transceiving management capabilities including policy management, transmission management, reception management, transmission of audio data to be transmitted over a target network channel, and provides for uploading and downloading of asynchronous voice files, as well as transmission and reception processing of real-time audio frame streams.

In the specific embodiment, the audio resource management of the system is unified, and under different audio communication states, the audio resource of the system is managed through a set of unified codes, so that the complex management of states caused by the fact that a plurality of sets of codes access the audio resource of the system at the same time is avoided. The unified audio data acquisition is realized, the same data acquisition process is adopted no matter asynchronous voice or real-time voice, the original PCM data stream is acquired, and then the PCM data stream is processed according to a built-in unified processing mode manager. The unified coding and decoding model is realized, the PCM data stream is uniformly coded and decoded according to the requirements by internally arranging various coders and decoders, the coded data is written into an audio file or assembled into a voice frame according to the requirements, and the subsequent processing flow is entered. The coding and decoding module is a group of independently expandable modules, and only a new coder and decoder is needed to be realized for the newly added coding and decoding requirements. The method realizes the uniform playing process, processes the received real-time voice data or asynchronous voice data into a voice data stream form, determines a corresponding decoding algorithm according to the mode manager, decodes the voice data stream, and submits the decoded data to the system player for uniform playing, thereby avoiding the problem of playing competition.

In one embodiment, as shown in fig. 9, there is provided an audio data processing apparatus including:

the obtaining module 510 is configured to obtain an audio data stream through the unified interface, and obtain a current audio communication state corresponding to the audio data stream.

And the unified coding module 520 is configured to obtain a target coding algorithm matched with the current audio communication state, and code the audio data stream by using the target coding algorithm to obtain coded audio data.

A unified processing module 530, comprising:

and the packaging unit 531 is configured to determine a matched target audio processing mode from the selectable audio processing modes according to the current audio communication state, where the selectable audio processing modes include an audio file mode and a real-time audio frame mode, and process the encoded audio data according to the target audio processing mode to obtain audio data to be sent.

A transmission unit 532, configured to determine a target network channel from the selectable network channels according to the current audio communication state, and transmit the audio data to be sent through the target network channel.

In one embodiment, as shown in fig. 10, the apparatus further comprises:

and the audio communication state switching module 540 is configured to detect an audio communication state, determine an audio state management parameter according to the audio communication state switching data through a unified audio configuration management interface when it is detected that there is audio communication state switching, and adjust a current audio state according to the audio state management parameter.

In one embodiment, as shown in fig. 11, the audio communication state switching module 540 includes:

a real-time switching asynchronous unit 541, configured to, when an audio communication state is switched from real-time audio communication to asynchronous audio communication, maintain a real-time audio communication link in a connection state, modify a playing parameter, a decoding state parameter, and a recording configuration parameter through a unified audio configuration management interface, determine the playing parameter as a real-time audio stop playing, determine the decoding state parameter as a real-time audio stop decoding, update the recording configuration parameter to a state matching the asynchronous audio communication, stop playing of decoded real-time audio data according to the playing parameter, stop decoding of the real-time audio data according to the decoding state parameter, discard subsequently received real-time audio data to be decoded, and start acquisition of the asynchronous audio data according to the updated recording configuration parameter.

In one embodiment, as shown in fig. 12, the audio communication state switching module 540 includes:

and an asynchronous switching real-time module 542, configured to modify, when the audio communication state is switched from asynchronous audio communication to real-time audio communication, the playing parameter, the decoding state parameter, and the recording configuration parameter into parameters matching the real-time audio communication through a unified audio configuration management interface when the real-time audio communication link is in a connection state, adjust the current audio state according to the audio state management parameters, and resume the real-time audio communication.

In one embodiment, as shown in FIG. 13, the unified processing module 530 includes:

the asynchronous audio processing unit 533, configured to, when the current audio communication state is asynchronous audio communication, determine that the audio processing mode is an audio file mode, write encoded audio data into an audio file, and transmit the encoded audio data through a first network channel, where the first network channel includes at least one of an HTTP protocol channel and a TCP protocol channel.

And a real-time audio processing unit 534 for determining the audio processing mode as a real-time audio frame mode when the current audio communication state is real-time audio communication, assembling the encoded audio data into audio frames, and transmitting the audio frames through a second network channel in real time, wherein the second network channel comprises a UDP protocol channel.

In one embodiment, as shown in fig. 14, the apparatus further comprises:

the unified decoding module 550 is configured to acquire encoded audio data, process the encoded audio data to generate an audio data stream, where the encoded audio data includes at least one of file audio data and real-time audio frame data, acquire a matched decoding algorithm according to an acquisition network channel of the encoded audio data, and decode the audio data stream according to the decoding algorithm to obtain original audio data.

In one embodiment, as shown in fig. 15, the apparatus further comprises:

the interface operation module 560 is configured to obtain an asynchronous audio start operation through the asynchronous audio communication key on the real-time audio communication interface, switch the current audio communication state from real-time audio communication to asynchronous audio communication, obtain an asynchronous audio end operation through the asynchronous audio communication key in the state where the real-time audio communication link is kept connected, and restore the current audio communication state from asynchronous audio communication to real-time audio communication.

In an embodiment, the apparatus is applied to a multi-person conversation scenario, and the asynchronous audio processing unit 533 is further configured to write the encoded audio data into an audio file in association with user information corresponding to a target user in the multi-person conversation, and transmit the audio file to the server through the first network channel, so that the server determines a target receiving terminal according to the user information in the audio file, and sends the audio file to the target user in the multi-person conversation.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of: the method comprises the following steps of obtaining an audio data stream through a unified interface, obtaining a current audio communication state corresponding to the audio data stream, obtaining a target coding algorithm matched with the current audio communication state from a unified coding module, coding the audio data stream by adopting the target coding algorithm to obtain coded audio data, and executing the following steps through a unified processing module: and determining a matched target audio processing mode from the selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, processing the coded audio data according to the target audio processing mode to obtain audio data to be sent, determining a target network channel from the selectable network channels according to the current audio communication state, and transmitting the audio data to be sent through the target network channel.

In one embodiment, the computer readable instructions cause the processor to perform the following further steps before receiving the audio data stream over the unified interface: and detecting an audio communication state, determining an audio state management parameter according to audio communication state switching data through a unified audio configuration management interface when detecting that audio communication state switching exists, and adjusting the current audio state according to the audio state management parameter.

In one embodiment, detecting an audio communication state, and determining, by the unified audio configuration management interface, an audio state management parameter according to audio communication state switching data when detecting that there is an audio communication state switching, includes: when the audio communication state is switched from real-time audio communication to asynchronous audio communication, a real-time audio communication link is kept in a connection state, a playing parameter, a decoding state parameter and a recording configuration parameter are modified through a unified audio configuration management interface, the playing parameter is determined to be a real-time audio stop playing parameter, the decoding state parameter is determined to be a real-time audio data stop decoding parameter, and the recording configuration parameter is updated to be in a state matched with the asynchronous audio communication.

Adjusting the current audio state according to the audio state management parameter, including: and stopping playing the decoded real-time audio data according to the playing parameters, stopping decoding the real-time audio data according to the decoding state parameters, discarding the subsequently received real-time audio data to be decoded, and starting asynchronous audio data acquisition according to the updated recording configuration parameters.

In one embodiment, after the processor adjusts the current audio state according to the audio state management parameter, the following steps are further performed: when the audio communication state is switched from asynchronous audio communication to real-time audio communication and the real-time audio communication link is in a connection state, the playing parameters, the decoding state parameters and the recording configuration parameters are modified into parameters matched with the real-time audio communication through a unified audio configuration management interface, the current audio state is adjusted according to the audio state management parameters, and the real-time audio communication is recovered.

In one embodiment, determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, where the selectable audio processing modes include an audio file mode and a real-time audio frame mode, processing the encoded audio data according to the target audio processing mode to obtain audio data to be sent, determining a target network channel from selectable network channels according to the current audio communication state, and transmitting the audio data to be sent through the target network channel, includes: when the current audio communication state is asynchronous audio communication, the audio processing mode is determined to be an audio file mode, the coded audio data are written into an audio file and are transmitted through a first network channel, the first network channel comprises at least one of an HTTP protocol channel and a TCP protocol channel, when the current audio communication state is real-time audio communication, the audio processing mode is determined to be a real-time audio frame mode, the coded audio data are assembled into audio frames, the audio frames are transmitted through a second network channel in real time, and the second network channel comprises a UDP protocol channel.

In one embodiment, the computer readable instructions cause the processor to further perform the steps of: acquiring encoded audio data, processing the encoded audio data to generate an audio data stream, wherein the encoded audio data comprises at least one of file audio data and real-time audio frame data, acquiring a matched decoding algorithm from the unified decoding module according to an acquisition network channel of the encoded audio data, and decoding the audio data stream according to the decoding algorithm to obtain original audio data.

In one embodiment, the computer readable instructions cause the processor to further perform the steps of: the method comprises the steps of obtaining asynchronous audio starting operation through an asynchronous audio communication key on a real-time audio communication interface, switching the current audio communication state from real-time audio communication to asynchronous audio communication, obtaining asynchronous audio ending operation through the asynchronous audio communication key under the condition that a real-time audio communication link is kept in a connection state, and recovering the current audio communication state from the asynchronous audio communication to the real-time audio communication.

In one embodiment, applied to a multi-person conversation scenario, when the current communication state is asynchronous audio communication, the audio processing mode is determined to be an audio file mode, and the encoded audio data is written into an audio file and transmitted through a first network channel, comprising: and writing the encoded audio data and user information corresponding to a target user in the multi-person conversation into an audio file in a correlation manner, and transmitting the audio file to the server through the first network channel, so that the server determines a target receiving terminal according to the user information in the audio file, and sends the audio file to the target user in the multi-person conversation.

In one embodiment, a computer-readable storage medium having computer-executable instructions stored thereon that, when executed by a processor, cause the processor to perform the steps of: the method comprises the following steps of obtaining an audio data stream through a unified interface, obtaining a current audio communication state corresponding to the audio data stream, obtaining a target coding algorithm matched with the current audio communication state from a unified coding module, coding the audio data stream by adopting the target coding algorithm to obtain coded audio data, and executing the following steps through a unified processing module: and determining a matched target audio processing mode from the selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, processing the coded audio data according to the target audio processing mode to obtain audio data to be sent, determining a target network channel from the selectable network channels according to the current audio communication state, and transmitting the audio data to be sent through the target network channel.

It will be understood by those skilled in the art that all or part of the processes in the methods of the embodiments described above may be implemented by hardware related to instructions of a computer program, which may be stored in a computer readable storage medium, for example, in the storage medium of a computer system, and executed by at least one processor in the computer system, so as to implement the processes of the embodiments including the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An audio data processing method is applied to a terminal, and comprises the following steps:

acquiring an audio data stream through a uniform interface, and acquiring a current audio communication state corresponding to the audio data stream, wherein the current audio communication state refers to the type of current audio communication and comprises real-time audio communication and asynchronous audio communication;

executing the following steps by the unified processing module:

determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, processing the coded audio data according to the target audio processing mode to obtain audio data to be sent, setting different audio processing modes in different audio communication states, setting an audio file mode corresponding to asynchronous audio communication, and setting a real-time audio frame mode corresponding to real-time audio communication;

2. The method of claim 1, wherein the step of receiving an audio data stream over a unified interface is preceded by:

detecting an audio communication state, and determining an audio state management parameter according to audio communication state switching data through a unified audio configuration management interface when detecting that audio communication state switching exists;

and adjusting the current audio state according to the audio state management parameter.

3. The method of claim 2, wherein the step of detecting an audio communication status and determining audio status management parameters from audio communication status switching data via a unified audio configuration management interface upon detecting the presence of an audio communication status switch comprises:

when the audio communication state is switched from real-time audio communication to asynchronous audio communication, a real-time audio communication link is kept in a connection state, and a playing parameter, a decoding state parameter and a recording configuration parameter are modified through a unified audio configuration management interface;

determining the playing parameters as real-time audio stop playing, determining the decoding state parameters as real-time audio data stop decoding, and updating the recording configuration parameters to be in a state matched with asynchronous audio communication;

the step of adjusting the current audio state according to the audio state management parameter comprises:

stopping playing the decoded real-time audio data according to the playing parameters;

stopping decoding the real-time audio data according to the decoding state parameters, and discarding the subsequently received real-time audio data to be decoded;

and starting the acquisition of asynchronous audio data according to the updated recording configuration parameters.

4. The method of claim 3, wherein the step of adjusting the current audio state according to the audio state management parameter is followed by:

when the audio communication state is switched from asynchronous audio communication to real-time audio communication and the real-time audio communication link is in a connection state, modifying the playing parameter, the decoding state parameter and the recording configuration parameter into parameters matched with the real-time audio communication through a unified audio configuration management interface;

and adjusting the current audio state according to the audio state management parameters, and recovering real-time audio communication.

5. The method according to claim 1, wherein the determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, the selectable audio processing modes including an audio file mode and a real-time audio frame mode, processing the encoded audio data according to the target audio processing mode to obtain audio data to be transmitted, determining a target network channel from selectable network channels according to the current audio communication state, and transmitting the audio data to be transmitted through the target network channel comprises:

when the current audio communication state is asynchronous audio communication, determining an audio processing mode as an audio file mode, writing the coded audio data into an audio file, and transmitting the coded audio data through a first network channel, wherein the first network channel comprises at least one of an HTTP (hyper text transport protocol) channel and a TCP (transmission control protocol) channel;

and when the current audio communication state is real-time audio communication, determining an audio processing mode as a real-time audio frame mode, assembling the coded audio data into audio frames, and transmitting the audio frames through a second network channel in real time, wherein the second network channel comprises a UDP protocol channel.

6. The method of claim 1, further comprising:

acquiring encoded audio data, and processing the encoded audio data to generate an audio data stream, wherein the encoded audio data comprises at least one of file audio data and real-time audio frame data;

and acquiring a matched decoding algorithm from a unified decoding module according to the acquisition network channel of the coded audio data, and decoding the audio data stream according to the decoding algorithm to obtain original audio data.

7. The method of claim 1, further comprising:

acquiring asynchronous audio starting operation through an asynchronous audio communication key on a real-time audio communication interface, and switching the current audio communication state from real-time audio communication to asynchronous audio communication;

and under the condition that the real-time audio communication link is kept in a connected state, the asynchronous audio communication button is used for acquiring asynchronous audio ending operation, and the current audio communication state is recovered from asynchronous audio communication to real-time audio communication.

8. The method of claim 5, wherein the method is applied to a multi-person conversation scenario, and wherein when the current communication status is asynchronous audio communication, the audio processing mode is determined to be an audio file mode, and the step of writing the encoded audio data into an audio file and transmitting the audio data through a first network channel comprises:

and associating and writing the encoded audio data with user information corresponding to a target user in a multi-person conversation into an audio file, and transmitting the audio file to a server through a first network channel, so that the server determines a target receiving terminal according to the user information in the audio file, and sends the audio file to the target user in the multi-person conversation.

9. An audio data processing apparatus, applied to a terminal, the apparatus comprising:

the acquisition module is used for acquiring an audio data stream through a unified interface and acquiring a current audio communication state corresponding to the audio data stream, wherein the current audio communication state refers to the type of current audio communication and comprises real-time audio communication and asynchronous audio communication;

a unified processing module, comprising:

the packaging unit is used for determining a matched target audio processing mode from selectable audio processing modes according to the current audio communication state, wherein the selectable audio processing modes comprise an audio file mode and a real-time audio frame mode, the coded audio data are processed according to the target audio processing mode to obtain audio data to be sent, different audio processing modes are set in different audio communication states, asynchronous audio communication corresponds to the audio file mode, and real-time audio communication corresponds to the real-time audio frame mode;

10. The apparatus of claim 9, further comprising:

and the audio communication state switching module is used for detecting the audio communication state, determining an audio state management parameter according to the audio communication state switching data through a uniform audio configuration management interface when the audio communication state switching is detected, and adjusting the current audio state according to the audio state management parameter.

11. The apparatus of claim 10, wherein the audio communication state switching module comprises:

the real-time switching asynchronous unit is used for keeping a real-time audio communication link in a connection state when the audio communication state is switched from real-time audio communication to asynchronous audio communication, modifying a playing parameter, a decoding state parameter and a recording configuration parameter through a unified audio configuration management interface, determining the playing parameter as a real-time audio stop playing, determining the decoding state parameter as a real-time audio stop decoding state, updating the recording configuration parameter to a state matched with the asynchronous audio communication, stopping the playing of decoded real-time audio data according to the playing parameter, stopping the decoding of the real-time audio data according to the decoding state parameter, discarding the subsequently received real-time audio data to be decoded, and starting the acquisition of the asynchronous audio data according to the updated recording configuration parameter.

12. The apparatus of claim 11, wherein the audio communication state switching module comprises:

and the asynchronous switching real-time module is used for modifying the playing parameter, the decoding state parameter and the recording configuration parameter into parameters matched with the real-time audio communication through a unified audio configuration management interface when the audio communication state is switched from asynchronous audio communication to real-time audio communication and the real-time audio communication link is in a connection state, adjusting the current audio state according to the audio state management parameters and recovering the real-time audio communication.

13. The apparatus of claim 9, wherein the unified processing module comprises:

the asynchronous audio processing unit is used for determining an audio processing mode as an audio file mode when the current audio communication state is asynchronous audio communication, writing the coded audio data into an audio file, and transmitting the coded audio data through a first network channel, wherein the first network channel comprises at least one of an HTTP (hyper text transport protocol) channel and a TCP (transmission control protocol) channel;

and the real-time audio processing unit is used for determining an audio processing mode as a real-time audio frame mode when the current audio communication state is real-time audio communication, assembling the coded audio data into audio frames, and transmitting the audio frames through a second network channel in real time, wherein the second network channel comprises a UDP protocol channel.

14. The apparatus of claim 9, further comprising:

the unified decoding module is used for acquiring encoded audio data and processing the encoded audio data to generate an audio data stream, wherein the encoded audio data comprises at least one of file audio data and real-time audio frame data, a matched decoding algorithm is acquired according to an acquisition network channel of the encoded audio data, and the audio data stream is decoded according to the decoding algorithm to obtain original audio data.

15. The apparatus of claim 9, further comprising:

the interface operation module is used for acquiring asynchronous audio starting operation through the asynchronous audio communication key on the real-time audio communication interface, switching the current audio communication state from real-time audio communication to asynchronous audio communication, acquiring asynchronous audio ending operation through the asynchronous audio communication key under the condition that the real-time audio communication link is kept in a connection state, and recovering the current audio communication state from asynchronous audio communication to real-time audio communication.

16. The apparatus of claim 13, wherein the apparatus is applied to a multi-person conversation scenario, and the asynchronous audio processing unit is further configured to write the encoded audio data into an audio file in association with user information corresponding to a target user in the multi-person conversation scenario, and transmit the audio file to a server through a first network channel, so that the server determines a target receiving terminal according to the user information in the audio file, and sends the audio file to the target user in the multi-person conversation scenario.

17. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of the method of any one of claims 1 to 8.

18. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, cause the processor to perform the steps of the method of any one of claims 1 to 8.