CN111755017B

CN111755017B - Audio recording method and device for cloud conference, server and storage medium

Info

Publication number: CN111755017B
Application number: CN202010643341.6A
Authority: CN
Inventors: 唐国华
Original assignee: G Net Cloud Service Co Ltd
Current assignee: G Net Cloud Service Co Ltd
Priority date: 2020-07-06
Filing date: 2020-07-06
Publication date: 2021-01-26
Anticipated expiration: 2040-07-06
Also published as: CN111755017A

Abstract

The application provides an audio recording method, an audio recording device, a server and a storage medium for a cloud conference, and relates to the technical field of audio processing. The method comprises the following steps: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. In the scheme of the application, the PCM data is used as transition data, and the conversion from various coding modes to the recorded final audio coding in the transmission process can be realized, so that the recording server can support various audio coding modes in the transmission process, and the user experience is improved.

Description

Audio recording method and device for cloud conference, server and storage medium

Technical Field

The invention relates to the technical field of audio processing, in particular to an audio recording method, an audio recording device, a server and a storage medium for a cloud conference.

Background

The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology, and can be used for remote communication and remote assistance in various terminal modes such as telephones, mobile phones, computers, special terminals and the like all over the world. The multimedia data of each terminal are transmitted to the cloud server through the network, the cloud server synthesizes multiple audio streams into one audio stream through the cloud computing technology and then forwards the audio stream to each terminal, and each terminal receives the audio data after confluence and then carries out corresponding processing to hear the sound of each other.

Currently, most of cloud audio recording is based on ffmpeg (Fast Forward Mpeg) coding and decoding technology, wherein the coding and decoding technology adopting ffmpeg only provides coding and decoding of audio data.

However, with the existing cloud audio recording technology, a system which cannot support various possible types of audio coding exists, so that social business requirements cannot be met, and particularly in the field of cloud conferences, such as remote negotiation, remote consultation, remote classroom, remote appointment and the like, the experience degree of users is reduced.

Disclosure of Invention

The present invention is directed to provide an audio recording method, an audio recording apparatus, a server, and a storage medium for a cloud conference, so as to support various possible types of audio encoding and improve user experience.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

in a first aspect, an embodiment of the present application provides an audio recording method for a cloud conference, where the method includes:

analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;

decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM (Pulse Code Modulation) data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode;

and carrying out audio recording coding on the PCM data to obtain an audio recording file.

Optionally, the analyzing the first audio data packet to obtain the audio encoding mode of the first audio data packet includes:

and analyzing a preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.

Optionally, before the decoding the first audio data packet according to the decoding library corresponding to the audio coding manner to obtain PCM data, the method further includes:

and determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between a preset coding mode and the decoding library.

Optionally, the performing audio recording and encoding on the PCM data to obtain an audio recording file includes:

determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet;

initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet;

and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain the audio recording file.

Optionally, the PCM data is PCM data obtained from a first audio data packet.

writing the PCM data into a preset storage queue;

and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain the audio recording file.

Optionally, the method further comprises:

and when the number of the data in the storage queue reaches the number of samples required for encoding one frame, clearing the data in the storage queue.

Optionally, the writing the PCM data into a preset storage queue includes:

determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and a second audio data packet, wherein the second audio data packet is an audio data packet received before the first audio data packet;

and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into the storage queue.

In a second aspect, an embodiment of the present application further provides an audio recording apparatus for a cloud conference, where the apparatus includes: the device comprises an analysis module, a decoding module and an encoding module;

the analysis module is used for analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;

the decoding module is configured to decode the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, where the decoding library stores a decoding mode corresponding to the audio coding mode;

and the coding module is used for carrying out audio recording coding on the PCM data to obtain an audio recording file.

Optionally, the parsing module is specifically configured to parse a preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.

Optionally, the apparatus further comprises: a determination module;

and the determining module is used for determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between a preset coding mode and the decoding library.

Optionally, the determining module is further configured to determine that a sampling rate corresponding to the audio coding mode is a sampling rate of the first audio data packet;

the encoding module is specifically configured to initialize a sampling rate of a preset encoder according to the sampling rate of the first audio data packet;

Optionally, the PCM data is PCM data obtained from a first audio data packet.

Optionally, the encoding module is further configured to write the PCM data into a preset storage queue;

Optionally, the encoding module is further configured to clear the data in the storage queue when the number of data in the storage queue reaches the number of samples required for encoding one frame.

Optionally, the encoding module is further specifically configured to determine, according to a time difference between the first audio data packet and a second audio data packet, a silence complement number corresponding to the first audio data packet, where the second audio data packet is an audio data packet received before the first audio data packet;

In a third aspect, an embodiment of the present application further provides an audio recording server, including: the audio recording server implements the audio recording method for the cloud conference provided by any one of the above first aspects when executing the computer program.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is executed by a processor to perform the audio recording method for a cloud conference provided in any one of the above first aspects.

The beneficial effect of this application is:

the application provides an audio recording method, an audio recording device, a server and a storage medium for a cloud conference, wherein the method comprises the following steps: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. According to the scheme, firstly, the audio coding mode of the first audio data packet is obtained by analyzing the first audio data packet, then the first audio data packet is decoded according to the decoding library corresponding to the audio coding mode, PCM data is obtained, finally, audio recording coding is carried out on the PCM data, an audio recording file is obtained, and therefore the PCM data serves as transition data, conversion from multiple coding modes to recording final audio coding in the transmission process can be achieved, the purpose that the recording server supports various audio coding modes in the transmission process is achieved, and user experience is improved.

Secondly, determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet; initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet; and according to the initialized encoder, audio recording encoding is carried out on the PCM data to obtain an audio recording file, so that the recording audio quality is improved, the recording time is shortened, and the rapid real-time recording of the audio data is realized.

In addition, PCM data is written into a preset storage queue, each data in the storage queue is read in sequence, and the read data is subjected to audio recording coding to obtain an audio recording file, so that data loss caused by blocking can be avoided, the integrity of coding is effectively guaranteed, and the correctness of the obtained audio recording file is improved.

And finally, determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and the second audio data packet, writing the silent packet data corresponding to the number of silent supplementary packets and the PCM data into a storage queue together, and encoding, so that the recorded audio recording file can be ensured to be the real environment of a conference site, and the site can be accurately restored when the audio recording file is played.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of an audio recording method for a cloud conference according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an audio recording apparatus for a cloud conference according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another audio recording apparatus for a cloud conference according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an audio recording server according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

The audio recording method for the cloud conference provided by the present application will be described in detail through a plurality of specific embodiments as follows.

Fig. 1 is a schematic flowchart of an audio recording method for a cloud conference according to an embodiment of the present disclosure; the audio recording method of the cloud conference can be executed by an audio recording device, and the audio recording device can be a cloud conference server or a recording server. In a possible implementation manner, the functions of the cloud conference server and the recording server may be implemented in the same server, or may be implemented in different servers. As shown in fig. 1, the method includes:

s101, analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet.

The first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients.

Generally, before parsing the first audio data packet, first, each client sound is sampled, and the sampled audio data is transmitted to the cloud conference server. Generally, in order to reduce the bandwidth and increase the transmission efficiency, the sampled audio data needs to be encoded, and there are many audio encoding methods, for example, AMR (Adaptive Multi-rate), OPUS (which is a format for lossy audio coding), and so on, and different encoding methods can be selected according to actual requirements.

In some embodiments, after the audio data is encoded, the encoded audio data may be transmitted to the cloud server through the network, so that the encoded audio data may be decoded by the cloud server, and then the audio data may be decoded and merged, and the merged audio data is encoded according to an original encoding method and then transmitted to each client through the network, and is also transmitted to the recording server while being transmitted, and the recording server analyzes the first audio data packet after receiving the encoded audio data, so as to obtain an audio encoding method of the first audio data packet.

And S102, decoding the first audio data packet according to the decoding library corresponding to the audio coding mode to obtain PCM data.

Wherein, the decoding base stores the decoding mode corresponding to the audio coding mode.

It should be noted that the reason why the audio data is decoded is that the audio data is required to be uniformly encoded and then encapsulated into an audio file format that can be played at the end of recording, and the encoded object data is the analyzed audio data, that is, PCM data.

Specifically, a corresponding decoding library is selected according to the audio coding mode, the first audio data packet is decoded, and the PCM data is obtained after decoding.

S103, carrying out audio recording and coding on the PCM data to obtain an audio recording file.

It can be understood that after the recording server decodes the PCM data, the PCM needs to be encoded to obtain an audio recording file.

In some embodiments, PCM data may be used as transition data to realize conversion from multiple encoding modes to recording final audio encoding during transmission, so as to achieve the purpose that the recording server supports various audio encoding modes during transmission.

Specifically, after the first audio data packet is analyzed in step S101, the obtained original audio coding mode of the first audio data packet is used to perform audio recording coding on PCM data, so as to obtain an audio recording file.

To sum up, an embodiment of the present application provides an audio recording method for a cloud conference, where the method includes: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. In the method, firstly, the audio coding mode of the first audio data packet is obtained by analyzing the first audio data packet, then the first audio data packet is decoded according to a decoding library corresponding to the audio coding mode, PCM data is obtained, finally, audio recording coding is carried out on the PCM data, and an audio recording file is obtained.

Optionally, parsing the first audio data packet to obtain an audio encoding mode of the first audio data packet includes: and analyzing the preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.

When the first audio data packet is analyzed, the preset field in the first audio data packet may be analyzed, and the audio coding mode indicated by the preset field is obtained, so that the audio coding mode of the first audio data packet may be obtained.

Optionally, before decoding the first audio data packet according to a decoding library corresponding to the audio coding method to obtain PCM data, the method further includes:

and determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between the preset coding mode and the decoding library.

In some possible embodiments, the decoding library corresponding to the audio encoding mode may be further determined according to the audio encoding mode and a corresponding relationship between a preset encoding mode and the decoding library.

For example, the audio coding methods include: AMR, OPUS, etc., and correspondingly, the above audio coding schemes all have respective decoding libraries corresponding thereto, that is, according to the obtained audio coding scheme, the decoding library corresponding to the audio coding scheme can be obtained.

Fig. 2 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present disclosure; as shown in fig. 2, performing audio recording and encoding on PCM data to obtain an audio recording file includes:

s201, determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet.

It should be noted that, in the conventional encoding process, the PCM data is resampled by using the sampling rate of the output encoding method, and then the resampled data is encoded by using the ffmpeg interface. However, resampling is a lossy conversion process, which increases the distortion degree after resampling, and another is that data needs to be resampled each time, which correspondingly increases the recording conversion time.

To solve this problem, it is made possible to improve the recorded audio quality and reduce the recording time. Considering that the sampling rate set by the encoder is variable, in order to guarantee the same time under the condition of adopting non-resampling encoding, one encoding mode capable of supporting multiple sampling rates can be selected to ensure the same sampling rate as the sampling rate before encoding. The AAC (Advanced Audio Coding) Coding can support multiple sampling rates such as 8K, 16K, and 48K, and the sampling rate of the currently used mainstream Coding mode is also 8K, 16K, or 48K, and the AAC Coding can meet the requirements of the multiple sampling rates.

In addition, selecting AAC encoding requires that the sampling rate of the encoder is not fixed, i.e., the encoder can be initialized with the sampling rate of the audio data.

Specifically, when performing audio recording and encoding on PCM data, the sampling rate of the first audio data packet may be determined according to the sampling rate corresponding to the audio encoding mode.

S202, initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet.

After determining the sampling rate of the first audio data packet according to the audio coding mode, the sampling rate of the preset encoder may be initialized by using the obtained sampling rate of the first audio data packet.

For example, on the basis of the above embodiment, after the audio data packet is decoded according to the decoding library corresponding to the audio coding method to obtain the first audio data, correspondingly, the original coding method of the first audio data may also be obtained, for example, if the original coding method of the first audio data is AMR, the sampling rate of the first audio data is 8k, that is, the sampling rate of the preset encoder may be initialized by using 8 k.

It should be noted that the sampling rate of the audio data depends on the audio coding mode, for example, the sampling rate of the OPUS coding mode is 48k, which is not listed here.

And S203, carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain an audio recording file.

After the sampling rate of the first audio data is adopted to initialize the sampling rate of the preset encoder, audio recording encoding can be performed on the PCM data obtained in the step S102, and an audio recording file is obtained, so that the quality of recorded audio can be improved, the recording time can be reduced, and the fast real-time recording of the audio data can be realized.

Optionally, the PCM data is PCM data obtained from a first audio data packet.

Generally, in the cloud conference process, the encoding mode of a fixed recorded scene audio is fixed, that is, the sampling rate is not changed. The sampling rate of the encoder can therefore be initialized based on the sampling rate of the PCM data obtained in the first audio data packet.

In some real-time examples, in order to ensure that an accurate audio recording file is obtained, the sampling rate of received audio data can be detected each time, whether the sampling rate of the received audio data is consistent with the sampling rate of the last audio data packet is judged, if so, the sampling rate of the conference audio recording process can be determined to be unchanged, and the sampling rate of an encoder can be initialized according to the sampling rate of first data in PCM data; if the sampling rates are not consistent, the change of the sampling rate in the audio recording process of the conference can be determined correspondingly, at the moment, the new sampling rate is adopted to initialize the encoder again, and the encoder is turned on again.

Fig. 3 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application; as shown in fig. 3, performing audio recording and encoding on PCM data to obtain an audio recording file includes:

and S301, writing the PCM data into a preset storage queue.

It can be understood that since initializing and turning on the encoder is a relatively time-consuming operation, when PCM data arrives, the encoder is initialized and turned on, which may result in blocking the data coming later, or losing the data due to the blocking, so a queue needs to be designed, that is, a conversion thread is needed to store the data, and only after the encoder is successfully turned on after the initialization is determined, the data is taken out from the queue for encoding conversion.

In this embodiment, for example, all decoded PCM data may be pushed into a preset queue, for example, the preset queue may be an avfifo buffer of ffmpeg, which is a queue for storing data, and a corresponding API (Application Programming Interface) is provided for reading and writing and calculating the size of the queue data.

S302, reading each data in the storage queue in sequence, and carrying out audio recording coding on the read data to obtain an audio recording file.

Specifically, after the decoded PCM data is written into a preset queue, whether the number of samples of the queue data reaches that required by one frame of AAC coding is judged, and if the number of samples of the data written into the preset queue reaches that, the data in the storage queue can be read from the preset queue in sequence and coded, so that data loss caused by blocking can be avoided, the integrity of coding is effectively guaranteed, and the correctness of the obtained audio recording file is improved.

It should be noted that, when recording and encoding the audio of the read data, the sampling rate set for the frame structure AVFrame needs to be the same as that set for the initialization encoder. It can be understood that, in two encoding modes with the same sampling rate, the number of samples represents the time, even if the number of samples in each frame is different, the total number of samples is unchanged, and as long as the number of samples is ensured to be unchanged, the corresponding time will remain unchanged, so that the obtained audio recording file can be ensured to be correct.

Optionally, on the basis of the foregoing embodiment, the method further includes: and when the number of the data in the storage queue reaches the number of samples required for encoding one frame, clearing the data in the storage queue.

Fig. 4 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application; as shown in fig. 4, on the basis of the above embodiment, writing PCM data into a preset storage queue includes:

s401, determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and the second audio data packet.

The second audio data packet is an audio data packet received before the first audio data packet.

It can be understood that in the cloud conference process, in a certain time period, the recording server may be in a silent state without sound, even in the whole process, there may be no audio input, if no additional processing is performed, that is, in the recording process, after the recording server decodes the received audio data, and directly encodes and writes the decoded audio data into an audio recording file, a string of audio effects with continuous audio output will be played, so that the audio data with the silent time in the recording process will be lost, and will not be matched with the actual recording site.

In this embodiment, to solve this problem, a silence packet to be padded between each audio data packet may be calculated, and the silence packet occupies a time period when no audio is input.

Specifically, after the multi-channel audio data of the multiple clients reach the cloud server to be decoded, the cloud server selects a period of fixed time to merge the audio data during the merging, for example, each time 40 milliseconds of multi-channel audio data is selected to perform sound mixing and merging, and correspondingly, the recording server receives an audio data packet after the merging every 40 milliseconds at a fixed time. Therefore, when there is no audio data input, it can be considered that one silence packet exists every 40 msec.

It will be appreciated that ideally an audio data packet will be received every 40 ms, but in view of real network considerations, similar surge times of 38 ms, 39 ms, 40 ms, 41 ms, 42 ms, etc. may occur, requiring the use of a complementary packet algorithm compatible with such complications.

For example, the number of silent supplementary packets is determined by calculating the time when the received audio data packet arrives at the recording service. Specifically, a Time Difference DT (Difference Time, DT for short) between two adjacent packets is recorded, for example, the Time Difference DT between a first audio packet and a second audio packet is recorded, and the fixed Time period is 40 ms, where the Difference DT and 40 is accumulated in FT (Fill Time, FT for short), when DT is greater than 40, FT is a positive number indicating that the Time is required to be filled, when DT is equal to 40, FT is 0 indicating that the packet is not required to be filled, when DT is less than 0, it indicates that the Time is not required to be filled, DT greater than 0 is required to perform neutralization, FT calculated each Time is accumulated in TT (Total Time, TT for short) to obtain the number FN of packets to be filled by dividing TT by 40, MT left after the packet is obtained by TT modulo 40, then TT-40 FN is obtained, and TT + MT is obtained, at this time, TT is the time remaining after packet padding, which is not enough for one packet, and if the calculation is performed between the current audio data packet and the previous audio data packet each time, the padding Number FN (Fill Number, abbreviated as FN) of the silence packet data can be obtained by calculation, that is, the Number of silence padding packets corresponding to the first audio data packet can be determined according to the time difference between the first audio data packet and the second audio data packet.

S402, writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into a storage queue.

After the number of the silent supplementary packets corresponding to the first audio data packet is determined according to the time difference between the first audio data packet and the second audio data packet, the silent packet data corresponding to the number of the silent supplementary packets and the PCM data are written into a storage queue together and encoded, so that the recorded audio recording file can be ensured to be the real environment of a conference site, and the site can be accurately restored when the audio recording file is played.

The following describes a device, a server, and a storage medium for executing the audio recording method for a cloud conference, where specific implementation processes and technical effects of the device, the server, and the storage medium are referred to above, and are not described in detail below.

Fig. 5 is a schematic structural diagram of an audio recording apparatus for a cloud conference according to an embodiment of the present application; as shown in fig. 5, the audio recording apparatus 500 for a cloud conference includes: an analysis module 501, a decoding module 502 and an encoding module 503;

the analysis module 501 is configured to analyze the first audio data packet to obtain an audio coding mode of the first audio data packet, where the first audio data packet is a data packet obtained by merging audio data acquired by multiple cloud conference clients;

the decoding module 502 is configured to decode the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, where the decoding library stores a decoding mode corresponding to the audio coding mode;

and the encoding module 503 is configured to perform audio recording and encoding on the PCM data to obtain an audio recording file.

Optionally, the parsing module 501 is specifically configured to parse a preset field in the first audio data packet to obtain an audio coding mode indicated by the preset field.

Fig. 6 is a schematic structural diagram of another audio recording apparatus for a cloud conference according to an embodiment of the present application; as shown in fig. 6, the apparatus further includes: a determination module 601;

the determining module 601 is configured to determine a decoding library corresponding to the audio coding mode according to the audio coding mode and a corresponding relationship between a preset coding mode and the decoding library.

Optionally, the determining module 601 is further configured to determine that a sampling rate corresponding to the audio coding mode is a sampling rate of the first audio data packet;

the encoding module 503 is specifically configured to initialize a sampling rate of a preset encoder according to the sampling rate of the first audio data packet; and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain an audio recording file.

Optionally, the PCM data is PCM data obtained from a first audio data packet.

Optionally, the encoding module 503 is further configured to write the PCM data into a preset storage queue; and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain an audio recording file.

Optionally, the encoding module 503 is further configured to clear the data in the storage queue when the number of data in the storage queue reaches the number of samples required for encoding one frame.

Optionally, the encoding module 503 is further specifically configured to determine, according to a time difference between a first audio data packet and a second audio data packet, a silence supplementary packet number corresponding to the first audio data packet, where the second audio data packet is an audio data packet received before the first audio data packet; and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into a storage queue.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 7 is a schematic structural diagram of an audio recording server according to an embodiment of the present application, where the audio recording server may include: a processor 701, a memory 702.

The memory 702 is used for storing programs, and the processor 701 calls the programs stored in the memory 702 to execute the above method embodiments. The specific implementation and technical effects are similar, and are not described herein again.

Optionally, the invention also provides a program product, for example a computer-readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. An audio recording method for a cloud conference, the method comprising:

decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain Pulse Code Modulation (PCM) data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode;

carrying out audio recording coding on the PCM data to obtain an audio recording file;

the audio recording and encoding the PCM data to obtain an audio recording file comprises the following steps:

2. The method of claim 1, wherein the parsing the first audio packet to obtain the audio encoding mode of the first audio packet comprises:

3. The method of claim 1, wherein the PCM data is PCM data obtained from a first audio packet.

4. The method of claim 1, wherein said audio recording and encoding the PCM data to obtain an audio recording file comprises:

writing the PCM data into a preset storage queue;

5. The method of claim 4, further comprising:

6. The method of claim 4, wherein writing the PCM data into a predetermined store queue comprises:

7. An audio recording apparatus for a cloud conference, the apparatus comprising: the device comprises an analysis module, a decoding module and an encoding module;

the coding module is used for carrying out audio recording coding on the PCM data to obtain an audio recording file;

8. An audio recording server, comprising: a memory storing a computer program executable by the processor, and a processor implementing the audio recording method of a cloud conference according to any one of claims 1 to 6 when the computer program is executed by the processor.

9. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, performs the audio recording method of a cloud conference according to any one of claims 1 to 6.