CN111755017B - Audio recording method and device for cloud conference, server and storage medium - Google Patents

Audio recording method and device for cloud conference, server and storage medium Download PDF

Info

Publication number
CN111755017B
CN111755017B CN202010643341.6A CN202010643341A CN111755017B CN 111755017 B CN111755017 B CN 111755017B CN 202010643341 A CN202010643341 A CN 202010643341A CN 111755017 B CN111755017 B CN 111755017B
Authority
CN
China
Prior art keywords
audio
data packet
data
audio data
audio recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010643341.6A
Other languages
Chinese (zh)
Other versions
CN111755017A (en
Inventor
唐国华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Net Cloud Service Co Ltd
Original Assignee
G Net Cloud Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Net Cloud Service Co Ltd filed Critical G Net Cloud Service Co Ltd
Priority to CN202010643341.6A priority Critical patent/CN111755017B/en
Publication of CN111755017A publication Critical patent/CN111755017A/en
Application granted granted Critical
Publication of CN111755017B publication Critical patent/CN111755017B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The application provides an audio recording method, an audio recording device, a server and a storage medium for a cloud conference, and relates to the technical field of audio processing. The method comprises the following steps: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. In the scheme of the application, the PCM data is used as transition data, and the conversion from various coding modes to the recorded final audio coding in the transmission process can be realized, so that the recording server can support various audio coding modes in the transmission process, and the user experience is improved.

Description

Audio recording method and device for cloud conference, server and storage medium
Technical Field
The invention relates to the technical field of audio processing, in particular to an audio recording method, an audio recording device, a server and a storage medium for a cloud conference.
Background
The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology, and can be used for remote communication and remote assistance in various terminal modes such as telephones, mobile phones, computers, special terminals and the like all over the world. The multimedia data of each terminal are transmitted to the cloud server through the network, the cloud server synthesizes multiple audio streams into one audio stream through the cloud computing technology and then forwards the audio stream to each terminal, and each terminal receives the audio data after confluence and then carries out corresponding processing to hear the sound of each other.
Currently, most of cloud audio recording is based on ffmpeg (Fast Forward Mpeg) coding and decoding technology, wherein the coding and decoding technology adopting ffmpeg only provides coding and decoding of audio data.
However, with the existing cloud audio recording technology, a system which cannot support various possible types of audio coding exists, so that social business requirements cannot be met, and particularly in the field of cloud conferences, such as remote negotiation, remote consultation, remote classroom, remote appointment and the like, the experience degree of users is reduced.
Disclosure of Invention
The present invention is directed to provide an audio recording method, an audio recording apparatus, a server, and a storage medium for a cloud conference, so as to support various possible types of audio encoding and improve user experience.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides an audio recording method for a cloud conference, where the method includes:
analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;
decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM (Pulse Code Modulation) data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode;
and carrying out audio recording coding on the PCM data to obtain an audio recording file.
Optionally, the analyzing the first audio data packet to obtain the audio encoding mode of the first audio data packet includes:
and analyzing a preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.
Optionally, before the decoding the first audio data packet according to the decoding library corresponding to the audio coding manner to obtain PCM data, the method further includes:
and determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between a preset coding mode and the decoding library.
Optionally, the performing audio recording and encoding on the PCM data to obtain an audio recording file includes:
determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet;
initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet;
and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain the audio recording file.
Optionally, the PCM data is PCM data obtained from a first audio data packet.
Optionally, the performing audio recording and encoding on the PCM data to obtain an audio recording file includes:
writing the PCM data into a preset storage queue;
and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain the audio recording file.
Optionally, the method further comprises:
and when the number of the data in the storage queue reaches the number of samples required for encoding one frame, clearing the data in the storage queue.
Optionally, the writing the PCM data into a preset storage queue includes:
determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and a second audio data packet, wherein the second audio data packet is an audio data packet received before the first audio data packet;
and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into the storage queue.
In a second aspect, an embodiment of the present application further provides an audio recording apparatus for a cloud conference, where the apparatus includes: the device comprises an analysis module, a decoding module and an encoding module;
the analysis module is used for analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;
the decoding module is configured to decode the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, where the decoding library stores a decoding mode corresponding to the audio coding mode;
and the coding module is used for carrying out audio recording coding on the PCM data to obtain an audio recording file.
Optionally, the parsing module is specifically configured to parse a preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.
Optionally, the apparatus further comprises: a determination module;
and the determining module is used for determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between a preset coding mode and the decoding library.
Optionally, the determining module is further configured to determine that a sampling rate corresponding to the audio coding mode is a sampling rate of the first audio data packet;
the encoding module is specifically configured to initialize a sampling rate of a preset encoder according to the sampling rate of the first audio data packet;
and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain the audio recording file.
Optionally, the PCM data is PCM data obtained from a first audio data packet.
Optionally, the encoding module is further configured to write the PCM data into a preset storage queue;
and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain the audio recording file.
Optionally, the encoding module is further configured to clear the data in the storage queue when the number of data in the storage queue reaches the number of samples required for encoding one frame.
Optionally, the encoding module is further specifically configured to determine, according to a time difference between the first audio data packet and a second audio data packet, a silence complement number corresponding to the first audio data packet, where the second audio data packet is an audio data packet received before the first audio data packet;
and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into the storage queue.
In a third aspect, an embodiment of the present application further provides an audio recording server, including: the audio recording server implements the audio recording method for the cloud conference provided by any one of the above first aspects when executing the computer program.
In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is executed by a processor to perform the audio recording method for a cloud conference provided in any one of the above first aspects.
The beneficial effect of this application is:
the application provides an audio recording method, an audio recording device, a server and a storage medium for a cloud conference, wherein the method comprises the following steps: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. According to the scheme, firstly, the audio coding mode of the first audio data packet is obtained by analyzing the first audio data packet, then the first audio data packet is decoded according to the decoding library corresponding to the audio coding mode, PCM data is obtained, finally, audio recording coding is carried out on the PCM data, an audio recording file is obtained, and therefore the PCM data serves as transition data, conversion from multiple coding modes to recording final audio coding in the transmission process can be achieved, the purpose that the recording server supports various audio coding modes in the transmission process is achieved, and user experience is improved.
Secondly, determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet; initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet; and according to the initialized encoder, audio recording encoding is carried out on the PCM data to obtain an audio recording file, so that the recording audio quality is improved, the recording time is shortened, and the rapid real-time recording of the audio data is realized.
In addition, PCM data is written into a preset storage queue, each data in the storage queue is read in sequence, and the read data is subjected to audio recording coding to obtain an audio recording file, so that data loss caused by blocking can be avoided, the integrity of coding is effectively guaranteed, and the correctness of the obtained audio recording file is improved.
And finally, determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and the second audio data packet, writing the silent packet data corresponding to the number of silent supplementary packets and the PCM data into a storage queue together, and encoding, so that the recorded audio recording file can be ensured to be the real environment of a conference site, and the site can be accurately restored when the audio recording file is played.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of an audio recording method for a cloud conference according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an audio recording apparatus for a cloud conference according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of another audio recording apparatus for a cloud conference according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an audio recording server according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.
The audio recording method for the cloud conference provided by the present application will be described in detail through a plurality of specific embodiments as follows.
Fig. 1 is a schematic flowchart of an audio recording method for a cloud conference according to an embodiment of the present disclosure; the audio recording method of the cloud conference can be executed by an audio recording device, and the audio recording device can be a cloud conference server or a recording server. In a possible implementation manner, the functions of the cloud conference server and the recording server may be implemented in the same server, or may be implemented in different servers. As shown in fig. 1, the method includes:
s101, analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet.
The first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients.
Generally, before parsing the first audio data packet, first, each client sound is sampled, and the sampled audio data is transmitted to the cloud conference server. Generally, in order to reduce the bandwidth and increase the transmission efficiency, the sampled audio data needs to be encoded, and there are many audio encoding methods, for example, AMR (Adaptive Multi-rate), OPUS (which is a format for lossy audio coding), and so on, and different encoding methods can be selected according to actual requirements.
In some embodiments, after the audio data is encoded, the encoded audio data may be transmitted to the cloud server through the network, so that the encoded audio data may be decoded by the cloud server, and then the audio data may be decoded and merged, and the merged audio data is encoded according to an original encoding method and then transmitted to each client through the network, and is also transmitted to the recording server while being transmitted, and the recording server analyzes the first audio data packet after receiving the encoded audio data, so as to obtain an audio encoding method of the first audio data packet.
And S102, decoding the first audio data packet according to the decoding library corresponding to the audio coding mode to obtain PCM data.
Wherein, the decoding base stores the decoding mode corresponding to the audio coding mode.
It should be noted that the reason why the audio data is decoded is that the audio data is required to be uniformly encoded and then encapsulated into an audio file format that can be played at the end of recording, and the encoded object data is the analyzed audio data, that is, PCM data.
Specifically, a corresponding decoding library is selected according to the audio coding mode, the first audio data packet is decoded, and the PCM data is obtained after decoding.
S103, carrying out audio recording and coding on the PCM data to obtain an audio recording file.
It can be understood that after the recording server decodes the PCM data, the PCM needs to be encoded to obtain an audio recording file.
In some embodiments, PCM data may be used as transition data to realize conversion from multiple encoding modes to recording final audio encoding during transmission, so as to achieve the purpose that the recording server supports various audio encoding modes during transmission.
Specifically, after the first audio data packet is analyzed in step S101, the obtained original audio coding mode of the first audio data packet is used to perform audio recording coding on PCM data, so as to obtain an audio recording file.
To sum up, an embodiment of the present application provides an audio recording method for a cloud conference, where the method includes: analyzing the first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients; decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode; and carrying out audio recording coding on the PCM data to obtain an audio recording file. In the method, firstly, the audio coding mode of the first audio data packet is obtained by analyzing the first audio data packet, then the first audio data packet is decoded according to a decoding library corresponding to the audio coding mode, PCM data is obtained, finally, audio recording coding is carried out on the PCM data, and an audio recording file is obtained.
Optionally, parsing the first audio data packet to obtain an audio encoding mode of the first audio data packet includes: and analyzing the preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.
When the first audio data packet is analyzed, the preset field in the first audio data packet may be analyzed, and the audio coding mode indicated by the preset field is obtained, so that the audio coding mode of the first audio data packet may be obtained.
Optionally, before decoding the first audio data packet according to a decoding library corresponding to the audio coding method to obtain PCM data, the method further includes:
and determining a decoding library corresponding to the audio coding mode according to the audio coding mode and the corresponding relation between the preset coding mode and the decoding library.
In some possible embodiments, the decoding library corresponding to the audio encoding mode may be further determined according to the audio encoding mode and a corresponding relationship between a preset encoding mode and the decoding library.
For example, the audio coding methods include: AMR, OPUS, etc., and correspondingly, the above audio coding schemes all have respective decoding libraries corresponding thereto, that is, according to the obtained audio coding scheme, the decoding library corresponding to the audio coding scheme can be obtained.
Fig. 2 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present disclosure; as shown in fig. 2, performing audio recording and encoding on PCM data to obtain an audio recording file includes:
s201, determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet.
It should be noted that, in the conventional encoding process, the PCM data is resampled by using the sampling rate of the output encoding method, and then the resampled data is encoded by using the ffmpeg interface. However, resampling is a lossy conversion process, which increases the distortion degree after resampling, and another is that data needs to be resampled each time, which correspondingly increases the recording conversion time.
To solve this problem, it is made possible to improve the recorded audio quality and reduce the recording time. Considering that the sampling rate set by the encoder is variable, in order to guarantee the same time under the condition of adopting non-resampling encoding, one encoding mode capable of supporting multiple sampling rates can be selected to ensure the same sampling rate as the sampling rate before encoding. The AAC (Advanced Audio Coding) Coding can support multiple sampling rates such as 8K, 16K, and 48K, and the sampling rate of the currently used mainstream Coding mode is also 8K, 16K, or 48K, and the AAC Coding can meet the requirements of the multiple sampling rates.
In addition, selecting AAC encoding requires that the sampling rate of the encoder is not fixed, i.e., the encoder can be initialized with the sampling rate of the audio data.
Specifically, when performing audio recording and encoding on PCM data, the sampling rate of the first audio data packet may be determined according to the sampling rate corresponding to the audio encoding mode.
S202, initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet.
After determining the sampling rate of the first audio data packet according to the audio coding mode, the sampling rate of the preset encoder may be initialized by using the obtained sampling rate of the first audio data packet.
For example, on the basis of the above embodiment, after the audio data packet is decoded according to the decoding library corresponding to the audio coding method to obtain the first audio data, correspondingly, the original coding method of the first audio data may also be obtained, for example, if the original coding method of the first audio data is AMR, the sampling rate of the first audio data is 8k, that is, the sampling rate of the preset encoder may be initialized by using 8 k.
It should be noted that the sampling rate of the audio data depends on the audio coding mode, for example, the sampling rate of the OPUS coding mode is 48k, which is not listed here.
And S203, carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain an audio recording file.
After the sampling rate of the first audio data is adopted to initialize the sampling rate of the preset encoder, audio recording encoding can be performed on the PCM data obtained in the step S102, and an audio recording file is obtained, so that the quality of recorded audio can be improved, the recording time can be reduced, and the fast real-time recording of the audio data can be realized.
Optionally, the PCM data is PCM data obtained from a first audio data packet.
Generally, in the cloud conference process, the encoding mode of a fixed recorded scene audio is fixed, that is, the sampling rate is not changed. The sampling rate of the encoder can therefore be initialized based on the sampling rate of the PCM data obtained in the first audio data packet.
In some real-time examples, in order to ensure that an accurate audio recording file is obtained, the sampling rate of received audio data can be detected each time, whether the sampling rate of the received audio data is consistent with the sampling rate of the last audio data packet is judged, if so, the sampling rate of the conference audio recording process can be determined to be unchanged, and the sampling rate of an encoder can be initialized according to the sampling rate of first data in PCM data; if the sampling rates are not consistent, the change of the sampling rate in the audio recording process of the conference can be determined correspondingly, at the moment, the new sampling rate is adopted to initialize the encoder again, and the encoder is turned on again.
Fig. 3 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application; as shown in fig. 3, performing audio recording and encoding on PCM data to obtain an audio recording file includes:
and S301, writing the PCM data into a preset storage queue.
It can be understood that since initializing and turning on the encoder is a relatively time-consuming operation, when PCM data arrives, the encoder is initialized and turned on, which may result in blocking the data coming later, or losing the data due to the blocking, so a queue needs to be designed, that is, a conversion thread is needed to store the data, and only after the encoder is successfully turned on after the initialization is determined, the data is taken out from the queue for encoding conversion.
In this embodiment, for example, all decoded PCM data may be pushed into a preset queue, for example, the preset queue may be an avfifo buffer of ffmpeg, which is a queue for storing data, and a corresponding API (Application Programming Interface) is provided for reading and writing and calculating the size of the queue data.
S302, reading each data in the storage queue in sequence, and carrying out audio recording coding on the read data to obtain an audio recording file.
Specifically, after the decoded PCM data is written into a preset queue, whether the number of samples of the queue data reaches that required by one frame of AAC coding is judged, and if the number of samples of the data written into the preset queue reaches that, the data in the storage queue can be read from the preset queue in sequence and coded, so that data loss caused by blocking can be avoided, the integrity of coding is effectively guaranteed, and the correctness of the obtained audio recording file is improved.
It should be noted that, when recording and encoding the audio of the read data, the sampling rate set for the frame structure AVFrame needs to be the same as that set for the initialization encoder. It can be understood that, in two encoding modes with the same sampling rate, the number of samples represents the time, even if the number of samples in each frame is different, the total number of samples is unchanged, and as long as the number of samples is ensured to be unchanged, the corresponding time will remain unchanged, so that the obtained audio recording file can be ensured to be correct.
Optionally, on the basis of the foregoing embodiment, the method further includes: and when the number of the data in the storage queue reaches the number of samples required for encoding one frame, clearing the data in the storage queue.
Fig. 4 is a schematic flowchart of another audio recording method for a cloud conference according to an embodiment of the present application; as shown in fig. 4, on the basis of the above embodiment, writing PCM data into a preset storage queue includes:
s401, determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and the second audio data packet.
The second audio data packet is an audio data packet received before the first audio data packet.
It can be understood that in the cloud conference process, in a certain time period, the recording server may be in a silent state without sound, even in the whole process, there may be no audio input, if no additional processing is performed, that is, in the recording process, after the recording server decodes the received audio data, and directly encodes and writes the decoded audio data into an audio recording file, a string of audio effects with continuous audio output will be played, so that the audio data with the silent time in the recording process will be lost, and will not be matched with the actual recording site.
In this embodiment, to solve this problem, a silence packet to be padded between each audio data packet may be calculated, and the silence packet occupies a time period when no audio is input.
Specifically, after the multi-channel audio data of the multiple clients reach the cloud server to be decoded, the cloud server selects a period of fixed time to merge the audio data during the merging, for example, each time 40 milliseconds of multi-channel audio data is selected to perform sound mixing and merging, and correspondingly, the recording server receives an audio data packet after the merging every 40 milliseconds at a fixed time. Therefore, when there is no audio data input, it can be considered that one silence packet exists every 40 msec.
It will be appreciated that ideally an audio data packet will be received every 40 ms, but in view of real network considerations, similar surge times of 38 ms, 39 ms, 40 ms, 41 ms, 42 ms, etc. may occur, requiring the use of a complementary packet algorithm compatible with such complications.
For example, the number of silent supplementary packets is determined by calculating the time when the received audio data packet arrives at the recording service. Specifically, a Time Difference DT (Difference Time, DT for short) between two adjacent packets is recorded, for example, the Time Difference DT between a first audio packet and a second audio packet is recorded, and the fixed Time period is 40 ms, where the Difference DT and 40 is accumulated in FT (Fill Time, FT for short), when DT is greater than 40, FT is a positive number indicating that the Time is required to be filled, when DT is equal to 40, FT is 0 indicating that the packet is not required to be filled, when DT is less than 0, it indicates that the Time is not required to be filled, DT greater than 0 is required to perform neutralization, FT calculated each Time is accumulated in TT (Total Time, TT for short) to obtain the number FN of packets to be filled by dividing TT by 40, MT left after the packet is obtained by TT modulo 40, then TT-40 FN is obtained, and TT + MT is obtained, at this time, TT is the time remaining after packet padding, which is not enough for one packet, and if the calculation is performed between the current audio data packet and the previous audio data packet each time, the padding Number FN (Fill Number, abbreviated as FN) of the silence packet data can be obtained by calculation, that is, the Number of silence padding packets corresponding to the first audio data packet can be determined according to the time difference between the first audio data packet and the second audio data packet.
S402, writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into a storage queue.
After the number of the silent supplementary packets corresponding to the first audio data packet is determined according to the time difference between the first audio data packet and the second audio data packet, the silent packet data corresponding to the number of the silent supplementary packets and the PCM data are written into a storage queue together and encoded, so that the recorded audio recording file can be ensured to be the real environment of a conference site, and the site can be accurately restored when the audio recording file is played.
The following describes a device, a server, and a storage medium for executing the audio recording method for a cloud conference, where specific implementation processes and technical effects of the device, the server, and the storage medium are referred to above, and are not described in detail below.
Fig. 5 is a schematic structural diagram of an audio recording apparatus for a cloud conference according to an embodiment of the present application; as shown in fig. 5, the audio recording apparatus 500 for a cloud conference includes: an analysis module 501, a decoding module 502 and an encoding module 503;
the analysis module 501 is configured to analyze the first audio data packet to obtain an audio coding mode of the first audio data packet, where the first audio data packet is a data packet obtained by merging audio data acquired by multiple cloud conference clients;
the decoding module 502 is configured to decode the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, where the decoding library stores a decoding mode corresponding to the audio coding mode;
and the encoding module 503 is configured to perform audio recording and encoding on the PCM data to obtain an audio recording file.
Optionally, the parsing module 501 is specifically configured to parse a preset field in the first audio data packet to obtain an audio coding mode indicated by the preset field.
Fig. 6 is a schematic structural diagram of another audio recording apparatus for a cloud conference according to an embodiment of the present application; as shown in fig. 6, the apparatus further includes: a determination module 601;
the determining module 601 is configured to determine a decoding library corresponding to the audio coding mode according to the audio coding mode and a corresponding relationship between a preset coding mode and the decoding library.
Optionally, the determining module 601 is further configured to determine that a sampling rate corresponding to the audio coding mode is a sampling rate of the first audio data packet;
the encoding module 503 is specifically configured to initialize a sampling rate of a preset encoder according to the sampling rate of the first audio data packet; and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain an audio recording file.
Optionally, the PCM data is PCM data obtained from a first audio data packet.
Optionally, the encoding module 503 is further configured to write the PCM data into a preset storage queue; and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain an audio recording file.
Optionally, the encoding module 503 is further configured to clear the data in the storage queue when the number of data in the storage queue reaches the number of samples required for encoding one frame.
Optionally, the encoding module 503 is further specifically configured to determine, according to a time difference between a first audio data packet and a second audio data packet, a silence supplementary packet number corresponding to the first audio data packet, where the second audio data packet is an audio data packet received before the first audio data packet; and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into a storage queue.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 7 is a schematic structural diagram of an audio recording server according to an embodiment of the present application, where the audio recording server may include: a processor 701, a memory 702.
The memory 702 is used for storing programs, and the processor 701 calls the programs stored in the memory 702 to execute the above method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the invention also provides a program product, for example a computer-readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (9)

1. An audio recording method for a cloud conference, the method comprising:
analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;
decoding the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain Pulse Code Modulation (PCM) data, wherein the decoding library stores the decoding mode corresponding to the audio coding mode;
carrying out audio recording coding on the PCM data to obtain an audio recording file;
the audio recording and encoding the PCM data to obtain an audio recording file comprises the following steps:
determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet;
initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet;
and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain the audio recording file.
2. The method of claim 1, wherein the parsing the first audio packet to obtain the audio encoding mode of the first audio packet comprises:
and analyzing a preset field in the first audio data packet to obtain the audio coding mode indicated by the preset field.
3. The method of claim 1, wherein the PCM data is PCM data obtained from a first audio packet.
4. The method of claim 1, wherein said audio recording and encoding the PCM data to obtain an audio recording file comprises:
writing the PCM data into a preset storage queue;
and sequentially reading each data in the storage queue, and carrying out audio recording coding on the read data to obtain the audio recording file.
5. The method of claim 4, further comprising:
and when the number of the data in the storage queue reaches the number of samples required for encoding one frame, clearing the data in the storage queue.
6. The method of claim 4, wherein writing the PCM data into a predetermined store queue comprises:
determining the number of silent supplementary packets corresponding to the first audio data packet according to the time difference between the first audio data packet and a second audio data packet, wherein the second audio data packet is an audio data packet received before the first audio data packet;
and writing the silent packet data and the PCM data corresponding to the silent supplementary packet number into the storage queue.
7. An audio recording apparatus for a cloud conference, the apparatus comprising: the device comprises an analysis module, a decoding module and an encoding module;
the analysis module is used for analyzing a first audio data packet to obtain an audio coding mode of the first audio data packet, wherein the first audio data packet is a data packet obtained by converging audio data acquired by a plurality of cloud conference clients;
the decoding module is configured to decode the first audio data packet according to a decoding library corresponding to the audio coding mode to obtain PCM data, where the decoding library stores a decoding mode corresponding to the audio coding mode;
the coding module is used for carrying out audio recording coding on the PCM data to obtain an audio recording file;
the audio recording and encoding the PCM data to obtain an audio recording file comprises the following steps:
determining the sampling rate corresponding to the audio coding mode as the sampling rate of the first audio data packet;
initializing the sampling rate of a preset encoder according to the sampling rate of the first audio data packet;
and carrying out audio recording and encoding on the PCM data according to the initialized encoder to obtain the audio recording file.
8. An audio recording server, comprising: a memory storing a computer program executable by the processor, and a processor implementing the audio recording method of a cloud conference according to any one of claims 1 to 6 when the computer program is executed by the processor.
9. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, performs the audio recording method of a cloud conference according to any one of claims 1 to 6.
CN202010643341.6A 2020-07-06 2020-07-06 Audio recording method and device for cloud conference, server and storage medium Active CN111755017B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010643341.6A CN111755017B (en) 2020-07-06 2020-07-06 Audio recording method and device for cloud conference, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010643341.6A CN111755017B (en) 2020-07-06 2020-07-06 Audio recording method and device for cloud conference, server and storage medium

Publications (2)

Publication Number Publication Date
CN111755017A CN111755017A (en) 2020-10-09
CN111755017B true CN111755017B (en) 2021-01-26

Family

ID=72679599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010643341.6A Active CN111755017B (en) 2020-07-06 2020-07-06 Audio recording method and device for cloud conference, server and storage medium

Country Status (1)

Country Link
CN (1) CN111755017B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112423018A (en) * 2020-10-27 2021-02-26 深圳Tcl新技术有限公司 Media file coding transmission method, device, equipment and readable storage medium
CN112073810B (en) * 2020-11-16 2021-02-02 全时云商务服务股份有限公司 Multi-layout cloud conference recording method and system and readable storage medium
CN112615853B (en) * 2020-12-16 2023-01-10 瑞芯微电子股份有限公司 Android device audio data access method
CN112882682A (en) * 2021-02-25 2021-06-01 广州趣丸网络科技有限公司 Memory multiplexing method, device, equipment and medium in audio recording equipment
CN114553616B (en) * 2022-01-12 2023-11-24 广州市迪士普音响科技有限公司 Audio transmission method, device and system of conference unit and terminal equipment

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1472959A (en) * 2002-07-30 2004-02-04 华为技术有限公司 Device and method for realizing conversion between various VF flow formats
CN102355484A (en) * 2011-08-05 2012-02-15 多玩娱乐信息技术(北京)有限公司 Audio data transmission method
CN103187066A (en) * 2012-01-03 2013-07-03 摩托罗拉移动有限责任公司 Method and apparatus for processing audio frames to transition between different codecs
CN103747249A (en) * 2013-12-16 2014-04-23 乐视致新电子科技(天津)有限公司 Audio and video decoding method and android mobile terminal
CN103905834A (en) * 2014-03-13 2014-07-02 深圳创维-Rgb电子有限公司 Voice data coded format conversion method and device
CN105049919A (en) * 2015-07-27 2015-11-11 青岛海信移动通信技术股份有限公司 Method and device for recording multimedia file
CN105913860A (en) * 2015-12-18 2016-08-31 乐视致新电子科技(天津)有限公司 Method and apparatus for playing high-fidelity (HIFI) sound through multiple players
US9530424B2 (en) * 2011-11-11 2016-12-27 Dolby International Ab Upsampling using oversampled SBR
CN106653036A (en) * 2015-12-17 2017-05-10 天翼爱音乐文化科技有限公司 Audio mixing and transcoding method based on OTT box
CN107705796A (en) * 2017-09-19 2018-02-16 深圳市金立通信设备有限公司 A kind of processing method of voice data, terminal and computer-readable medium
CN108809921A (en) * 2017-07-31 2018-11-13 北京视联动力国际信息技术有限公司 A kind of audio-frequency processing method regards networked server and regards networked terminals
CN108881819A (en) * 2017-11-02 2018-11-23 北京视联动力国际信息技术有限公司 A kind of transmission method and device of audio data
CN108877820A (en) * 2017-11-30 2018-11-23 北京视联动力国际信息技术有限公司 A kind of audio data mixed method and device
EP3336839B1 (en) * 2013-10-31 2019-07-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal
CN110611639A (en) * 2018-06-14 2019-12-24 视联动力信息技术股份有限公司 Audio data processing method and device for streaming media conference

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4721355B2 (en) * 2006-07-18 2011-07-13 Kddi株式会社 Coding rule conversion method and apparatus for coded data

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1472959A (en) * 2002-07-30 2004-02-04 华为技术有限公司 Device and method for realizing conversion between various VF flow formats
CN102355484A (en) * 2011-08-05 2012-02-15 多玩娱乐信息技术(北京)有限公司 Audio data transmission method
US9530424B2 (en) * 2011-11-11 2016-12-27 Dolby International Ab Upsampling using oversampled SBR
CN103187066A (en) * 2012-01-03 2013-07-03 摩托罗拉移动有限责任公司 Method and apparatus for processing audio frames to transition between different codecs
EP3336839B1 (en) * 2013-10-31 2019-07-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal
CN103747249A (en) * 2013-12-16 2014-04-23 乐视致新电子科技(天津)有限公司 Audio and video decoding method and android mobile terminal
CN103905834A (en) * 2014-03-13 2014-07-02 深圳创维-Rgb电子有限公司 Voice data coded format conversion method and device
CN108881770A (en) * 2015-07-27 2018-11-23 青岛海信移动通信技术股份有限公司 A kind of method for recording and device of multimedia file
CN105049919A (en) * 2015-07-27 2015-11-11 青岛海信移动通信技术股份有限公司 Method and device for recording multimedia file
CN106653036A (en) * 2015-12-17 2017-05-10 天翼爱音乐文化科技有限公司 Audio mixing and transcoding method based on OTT box
CN105913860A (en) * 2015-12-18 2016-08-31 乐视致新电子科技(天津)有限公司 Method and apparatus for playing high-fidelity (HIFI) sound through multiple players
CN108809921A (en) * 2017-07-31 2018-11-13 北京视联动力国际信息技术有限公司 A kind of audio-frequency processing method regards networked server and regards networked terminals
CN107705796A (en) * 2017-09-19 2018-02-16 深圳市金立通信设备有限公司 A kind of processing method of voice data, terminal and computer-readable medium
CN108881819A (en) * 2017-11-02 2018-11-23 北京视联动力国际信息技术有限公司 A kind of transmission method and device of audio data
CN108877820A (en) * 2017-11-30 2018-11-23 北京视联动力国际信息技术有限公司 A kind of audio data mixed method and device
CN110611639A (en) * 2018-06-14 2019-12-24 视联动力信息技术股份有限公司 Audio data processing method and device for streaming media conference

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Conversion of MP3 to AAC in the compressed domain";Koichi Takagi;《IEEE workshop on multimedia signal processing》;20071231;全文 *
"transcoding system for audio standards";Mansour M F;《IEEE transactions on multimedia》;20121231;第14卷(第5期);全文 *
"基于RTSP协议的多源视音频实时直播系统的设计与实现";吕坤轩;《中国优秀硕士学位论文全文数据库信息科技辑》;20160215;全文 *

Also Published As

Publication number Publication date
CN111755017A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
CN111755017B (en) Audio recording method and device for cloud conference, server and storage medium
CN108932948B (en) Audio data processing method and device, computer equipment and computer readable storage medium
CN106817588B (en) Transcoding control method and device, net cast method and system
EP3899928B1 (en) Conditional forward error correction for network data
US10803876B2 (en) Combined forward and backward extrapolation of lost network data
JP2017519406A (en) Network video playback method and apparatus
EP2210191A1 (en) System and method for producing importance rate-based rich media, and server applied to the same
US10701124B1 (en) Handling timestamp inaccuracies for streaming network protocols
CN109243471A (en) A kind of method that digital audio is used in fast coding broadcast
CN112689197B (en) File format conversion method and device and computer storage medium
CN107659603B (en) Method and device for interaction between user and push information
CN111385576B (en) Video coding method and device, mobile terminal and storage medium
CN115223577A (en) Audio processing method, chip, device, equipment and computer readable storage medium
EP3891962B1 (en) Synchronized jitter buffers to handle codec switches
CN111866542B (en) Audio signal processing method, multimedia information processing device and electronic equipment
CN112511706A (en) Voice stream obtaining method and system suitable for non-invasive bypass telephone
CN111405354A (en) Optimization method and system for player channel switching, storage medium and player
CN113395581B (en) Audio playing method and device, electronic equipment and storage medium
US20230075562A1 (en) Audio Transcoding Method and Apparatus, Audio Transcoder, Device, and Storage Medium
US20120158408A1 (en) Method And Apparatus For Reducing Rendering Latency For Audio Streaming Applications Using Internet Protocol Communications Networks
CN115695392A (en) Audio processing method, device, storage medium and server
CN111757168B (en) Audio decoding method, device, storage medium and equipment
CN114760389A (en) Voice communication method and device, computer storage medium and electronic equipment
CN115050377A (en) Audio transcoding method and device, audio transcoder, equipment and storage medium
CN114520687A (en) Audio data processing method, device and equipment applied to satellite system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 100010 room 203-35, 2 / F, building 2, No.1 and 3, Qinglong Hutong, Dongcheng District, Beijing

Patentee after: G-NET CLOUD SERVICE Co.,Ltd.

Address before: Room 1102, Ninth Floor, Pengyuan International Building, Building 4, No. 1 Courtyard, Shangdi East Road, Haidian District, Beijing

Patentee before: G-NET CLOUD SERVICE Co.,Ltd.

CP02 Change in the address of a patent holder