CN112437315A - Audio adaptation method and system adapting to multiple system versions - Google Patents

Audio adaptation method and system adapting to multiple system versions Download PDF

Info

Publication number
CN112437315A
CN112437315A CN202010911906.4A CN202010911906A CN112437315A CN 112437315 A CN112437315 A CN 112437315A CN 202010911906 A CN202010911906 A CN 202010911906A CN 112437315 A CN112437315 A CN 112437315A
Authority
CN
China
Prior art keywords
data
audio
ith
data segment
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010911906.4A
Other languages
Chinese (zh)
Other versions
CN112437315B (en
Inventor
陈建宇
徐胜
朱林伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hode Information Technology Co Ltd
Original Assignee
Shanghai Hode Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hode Information Technology Co Ltd filed Critical Shanghai Hode Information Technology Co Ltd
Priority to CN202010911906.4A priority Critical patent/CN112437315B/en
Publication of CN112437315A publication Critical patent/CN112437315A/en
Application granted granted Critical
Publication of CN112437315B publication Critical patent/CN112437315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4392Processing of audio elementary streams involving audio buffer management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses an audio adaptation method adapting to multiple system versions, which comprises the following steps: acquiring ith batch of audio data provided by a system live recording tool, wherein i is a positive integer; and obtaining corresponding first data segments and second data segments at least based on the ith batch of audio data. The audio adaptation method suitable for the multiple system versions can be compatible with various audio data with inconsistent data volume caused by adaptation system version difference, and ensures correct coding.

Description

Audio adaptation method and system adapting to multiple system versions
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio adaptation method, system, computer device, and computer-readable storage medium adapted to multiple system versions.
Background
Live webcasting is one of the popular items of the current internet. In the current market, people develop a large number of live APP based on an Android operating system or an IOS operating system for live operation. However, with the iteration of Android or IOS system version updates, these live APPs may not be simultaneously applicable in multiple versions of the operating system. Taking the IOS system of apple inc as an example, these live APP needs to call the live system recording tool replayKit in the IOS system to obtain audio data, but: the audio data output by the replayKit of the IOS system before the 13.0 version is greatly different from the audio data output by the replayKit of the IOS system after the 13.0 version. This data discrepancy is easily associated with audio adaptation problems and thus coding errors.
Disclosure of Invention
An object of the embodiments of the present application is to provide an audio adaptation method, system, computer device and computer readable storage medium adapted to multiple system versions, for solving the problem of coding errors caused by audio non-adaptation due to system version differences.
An aspect of an embodiment of the present application provides an audio adaptation method adapted to multiple system versions, where the method includes: acquiring ith batch of audio data provided by a system live recording tool, wherein i is a positive integer; obtaining corresponding first data segments and second data segments based on at least the ith batch of audio data: when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area; when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into the audio buffer.
Optionally, the method further includes: determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the staged data timestamp; the main timestamp is the sum of timestamp increments of audio data from 1 st to ith batches, and the temporary storage timestamp is a timestamp increment corresponding to the ith second data segment.
Optionally, the method further includes: processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; and putting the tasks into a serial queue to execute asynchronous processing operation on the ith batch of audio data.
Optionally, the method further includes: and increasing the priority of the serial queue.
Optionally, the method further includes: acquiring a data format of the ith batch of audio data; and if the data format is not the preset data format, converting the data format of the ith first data fragment.
Optionally, the data format includes a size end; if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps: and if the size end is not the preset size end, performing replacement operation on the j-th data and the j + 1-th data in the ith first data segment, wherein j is a positive integer.
Optionally, the data format includes a channel number; if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps: if the data format is a single channel and the preset data format is a double channel, performing channel number conversion on the ith first data segment: copying the kth data of the ith first data segment to k × 2 bit addresses and k × 2+2 bit addresses of a two-channel pointer; and copying the (k +1) th data of the ith first data segment to the (k + 2+ 1) bit address and the (k + 2+ 3) bit address of the two-channel pointer, wherein k is a positive integer.
Optionally, the data format includes a sampling rate; if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps: and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment by a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
Optionally, the live recording tool of the system is a replayKit of the IOS system.
An aspect of an embodiment of the present application further provides an audio adaptation system adapted to multiple system versions, including: the acquisition module is used for acquiring the ith batch of audio data provided by a system live recording tool, wherein i is a positive integer; a processing module, configured to obtain corresponding first data segments and second data segments based on at least the ith batch of audio data: when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area; when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into the audio buffer.
An aspect of the embodiments of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the audio adaptation method adapted to multiple system versions as described above when executing the computer program.
An aspect of embodiments of the present application further provides a computer-readable storage medium, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the audio adaptation method to accommodate multiple system versions as described above when executing the computer program.
The audio adaptation method, the system, the device and the computer readable storage medium suitable for the multiple system versions provided by the embodiment of the application divide each batch of audio data, so that the audio data sent to the next audio processing module is regular audio data, and various audio data with inconsistent data volume caused by the version difference of the adaptation system are compatible. That is, the input process of audio data of any magnitude needs to be performed with data warping to ensure correct encoding.
Drawings
Fig. 1 schematically shows an application environment diagram of an audio adaptation method adapting to multiple system versions according to an embodiment of the present application;
fig. 2 schematically shows a flowchart of an audio adaptation method to accommodate multiple system versions according to an embodiment of the present application;
fig. 3 is a flow chart schematically illustrating additional steps of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
FIG. 4 is a diagram illustrating sub-steps of step S302 in FIG. 3;
fig. 5 schematically shows another flowchart of an audio adaptation method to accommodate multiple system versions according to an embodiment of the present application;
FIG. 6 is a flow chart schematically illustrating another additional step of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
fig. 7 is a flow chart schematically illustrating another additional step of an audio adaptation method for adapting to multiple system versions according to an embodiment of the present application;
fig. 8 schematically shows a block diagram of an audio adaptation system accommodating multiple system versions according to a second embodiment of the present application; and
fig. 9 schematically shows a hardware architecture diagram of a computer device suitable for implementing an audio adaptation method adapted to multiple system versions according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the descriptions relating to "first", "second", etc. in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
In the description of the present application, it should be understood that the numerical references before the steps do not identify the order of performing the steps, but merely serve to facilitate the description of the present application and to distinguish each step, and therefore should not be construed as limiting the present application.
Fig. 1 schematically shows an environment application diagram of an audio adaptation method adapting to multiple system versions according to an embodiment of the present application. In a live scene, the anchor 2 may push live data to the viewer 4 in real time.
And the anchor terminal 2 is used for generating live broadcast data in real time and carrying out stream pushing operation on the live broadcast data. The live data may include audio data or video data. The live broadcast end 2 may be a smart phone, a tablet computer, etc. based on the IOS system. In other embodiments, the live broadcast end 2 may be a live broadcast device based on a system such as Android.
The viewer side 4 may be configured to receive live data of the anchor side 2 in real time. The viewer terminal 4 may be any type of computing device, such as a smart phone, a tablet device, a laptop computer, a set-top box, a smart television, and the like. The viewer side 4 may have a built-in browser or a dedicated program through which the live data is received to output the content to the user. The content may include video, audio, commentary, textual data, and/or the like.
The anchor 2 has a system live recording tool and a live APP (e.g., bilililink) built therein.
The anchor can open live APP and carry out live operation. Live APP belongs to application layer APP. Under the live scene, live APP needs to call a system live recording tool of a system layer, and live data are acquired through the system live recording tool. Therefore, the problem of adapting live APP and live recording tools of the system is involved. The system live broadcast recording tools corresponding to different system versions may be different, so that the live broadcast APP is difficult to be compatible with a plurality of system versions, and the problems of coding error, frame loss, sound and picture asynchronism, stream pulling blockage, stream pushing instability and the like are caused.
Taking the IOS system as an example: the live system recording tool in the IOS system is a replayKit. In the IOS system before version 13.0 and the IOS system after version 13.0, there is a great difference in the amount of audio data, sampling rate, and number of channels used by the replayKit. Specifically, the method comprises the following steps: 13.0 previous IOS System: the audio of the replayKit is recalled to output 441000 bytes of data every 0.5s, the number of sound channels is single sound channel, and the sampling rate of a microphone is 44100 hz; 13.0 IOS System after Release: the audio of the replayKit is recalled to output 4096 bytes of data per 0.023220s, the number of channels is two channels, and the microphone sampling rate becomes 48000 hz. The replayKit difference caused by different system versions may cause problems of coding error, frame loss, asynchronous voice and picture, instable pull stream, unstable push stream and the like.
The application aims to provide an audio adaptation scheme in a live scene to solve the problem of system version difference. Various embodiments are provided below, which may be used to solve one or more of the above-described technical problems, and implement stable push streaming, no frame loss, audio-video synchronization, and no pause in pull streaming of live APP (e.g., bili link) screen recording live.
Example one
The IOS system and its replayKit are described below as examples. It should be understood that the present application is not limited to audio adaptation methods for IOS systems.
Fig. 2 schematically shows a flowchart of an audio adaptation method that accommodates multiple system versions according to an embodiment of the present application. It should be noted that the following description is made by taking a computer device (the anchor 2) as an execution subject. As shown in fig. 2, the audio adaptation method for adapting to multiple system versions may include steps S200 to S206, where:
step S200, the ith batch of audio data provided by the system live recording tool is obtained, wherein i is a positive integer.
The ith batch of audio data is audio data which is called back in real time by a system live broadcast recording tool in a live broadcast scene. The ith batch of audio data is also an ith audio data packet which is the minimum data packet provided by the system live broadcast recording tool each time. The replayKit in the IOS system before version 13.0 provides 441000 bytes of data at a time, and the replayKit in the IOS system after version 13.0 provides 2048 bytes of data at a time. Thus: if the current system is the IOS system before 13.0 version, the data volume of the ith batch of audio data is 441000 bytes; if the current system is the IOS system after 13.0 version, the data size of the ith batch of audio data is 2048 bytes.
Step S202, obtaining corresponding first data segments and second data segments at least based on the ith batch of audio data:
when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area;
when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into the audio buffer.
The amount of audio data that is provided to the application layer at a time may be different for different versions of the operating system. Due to the difference in data amount, problems such as coding errors and frame loss easily occur in a mixer, an encoder, and the like. Continuing with the IOS system as an example, the data playback amount of the replayKit in the IOS system after the 13.0 version is 1024 audio frames, the data size is 2048 bytes, which is equivalent to unpacking the audio data by the system layer, so that the replayKit can be directly used for subsequent mixing or encoding processing. The callback data size of replayKit in IOS systems before version 13.0 was 441000 bytes/0.5 s. If 441000 bytes of data are split in 2048 bytes, each time, data which is not an integer multiple of 1024 bytes remains, and the data cannot be mixed or sent to the encoder immediately, resulting in encoding errors and frame loss.
In view of this, the present application performs a warping process on the ith batch of audio data to solve the problem of inconsistent data size. Reference may be made specifically to the following exemplary steps: (1) calculating the data volume of the slices; wherein, the slice data amount may be a data amount of one audio frame, that is: 1024 × bytesPreFrame × number of channels, bytesPreFrame is used to indicate the number of bytes contained in each audio frame. (2) And establishing iteration, executing (the ith batch of audio data/slice data amount) times, and unpacking the ith batch of audio data. For example: in an IOS system, the encoder must fix the amount of data input into one audio frame, so "iteration" refers to the process of how many audio frames the ith batch of audio data is sliced into to facilitate the subsequent encoding process. (3) Obtaining the ith first data fragment A1 and the ith second data fragment A2; the data size of the ith first data segment a1 is the largest integral multiple of the slice data size, that is: m slice data amount is less than or equal to the ith first data segment a1 is less than or equal to (M +1) slice data amount, and M is a positive integer. The ith first data segment a1 is a regular data segment, and the ith first data segment a1 is used for subsequent encoding, mixing and other processing, so that encoding can be correct. The ith second data segment a2 is an irregular data segment and is temporarily not used for subsequent encoding, mixing, and the like, so as to prevent encoding errors and frame loss.
It should be understood that the next audio processing module may include various audio processing modules such as an encoding module, a mixing module, and the like.
It should be understood that the data amount of the ith second data segment is less than one slice data amount or 0.
The computer device 2 may pre-establish the audio buffer. The audio buffer may be a char-type pointer. The audio buffer initialization size should be large enough to prevent data overflow problems. For example: the computer device 2 may create an array of blank pointers as an audio buffer for receiving the ith second data segment a 2. The blank pointer array is a temporary variable, and the life cycle is in the current iteration.
The computer device 2 integrates the ith second data segment a2 remaining from the ith batch of audio data with the (i +1) th batch of audio data, and performs an iterative operation on the integrated total audio data, wherein the iterative operation may have the following results:
the total audio data of the first, i-th second data segment a2 and the i + 1-th batch of audio data is exactly an integer multiple of the slice data amount. That is, the i +1 th first data fragment B1 is the i +1 th batch of audio data of the i second data fragment a2 +.
The total audio data of the second, i-th second data segment a2 and the i + 1-th batch of audio data is not divisible by the amount of slice data. The i +1 th first data fragment B1 is the i second data fragment A2+ most of the data of the i +1 th batch of audio data. The i +1 th second data segment B2 is the data left after the total audio data is iterated.
For example, i is 1, the slice data amount is 2048 bytes, the data amount of the 1 st batch of audio data is 441000 bytes, and according to the time sequence:
(1) acquiring 1 st batch of audio data;
(2) obtaining 440320 bytes (2048 bytes 215) of data of a first data fragment a1 in the 1 st batch of audio data;
(3) obtaining a first second data fragment a2 in the 1 st batch of audio data as 680 bytes of data (data located at the end of the ith batch of audio data);
(4) the first data fragment a1 in step (3) is sent to the next audio processing module;
(5) temporarily storing the first second data segment A2 in the step (3) in an audio buffer;
(6) acquiring 2 nd batch of audio data;
(7) integrating the first second data segment A2 of 680 bytes in the audio buffer in the step (5) and the 2 nd batch of audio data, and performing an iterative operation, wherein:
(7.1) the first iteration results in a first amount of data corresponding to the slice data, namely: the first second data fragment a2 of 680 bytes in the audio buffer in step (5) plus the first 1368 bytes of data in the 2 nd batch of audio data;
(7.2) obtaining a second data corresponding to the amount of slice data for the second iteration;
… and so on, get:
a second first data fragment B1 (the 680 bytes first second data fragment + 439640 bytes of data in the 2 nd batch of audio data); a second data fragment B2 (the last 1360 bytes of data located in the 2 nd batch of audio data).
In the audio adaptation method adapted to multiple system versions, a general logic is established, and input processing of audio data of any magnitude needs to be uniformly processed to ensure data regularity and correct encoding. Therefore, the problem of system version difference can be solved, and unified audio adaptation of different system versions is realized.
The warping operation of data in the above embodiment solves the audio adaptation problem in one aspect caused by the system version difference. The following embodiments also provide other aspects of audio adaptation problems caused by system version differences.
In an exemplary embodiment, to solve the problem that the data format is not uniform due to the difference of the system version, and thus the audio is not adapted, as shown in fig. 3, the audio adaptation method adapted to the system version further includes steps S300 to S302, where: step S300, acquiring the data format of the ith batch of audio data; step S302, if the data format is not the preset data format, converting the data format of the ith first data segment.
The data format may involve multiple aspects such as size end, number of channels, sampling rate, depth, bytePerFrame, etc.
(1) Regarding the size end:
big-endian (big-endian) means that the high byte of data is stored in the low address of the memory and the low byte of data is stored in the high address of the memory.
Little-endian (little endian) means that the high byte of data is stored at the high address of the memory and the low byte of data is stored at the low address of the memory.
The data format includes a size end. And if the size end in the data format is different from the size end in the preset data format, size conversion is required. As shown in fig. 4, the step S302 may include the following steps: s402, if the size end is not the preset size end, replacing the j-th data and the j + 1-th data in the ith first data segment, wherein j is a positive integer.
(2) Regarding the mining rate:
the data format includes a sampling rate. If the sampling rate in the data format is different from a preset sampling rate, sampling rate conversion is required. As shown in fig. 4, the step S302 may include the following steps: s404, if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment through a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate. The fast sampling rate conversion strategy can meet the real-time performance of a live broadcast scene, and the performance of a screen recording system cannot be influenced. It should be noted that the sample rate conversion may use libresample library, and the writing of the array may be initialized to a large enough space.
(3) Regarding the number of channels:
the data format includes a number of channels. And if the number of the channels in the data format is different from the number of the channels in the preset data format, the number of the channels needs to be converted. As shown in fig. 4, the step S302 may include the following steps: s404, if the data format is a single channel and the preset data format is a double channel, performing channel number conversion on the ith first data segment: copying the kth data of the ith first data segment to k × 2 bit addresses and k × 2+2 bit addresses of a two-channel pointer; and copying the (k +1) th data of the ith first data segment to the (k + 2+ 1) bit address and the (k + 2+ 3) bit address of the two-channel pointer, wherein k is a positive integer.
The inventors have also found that: data warping solves the audio adaptation problem caused by system version differences in one aspect, but may also result in another problem, namely picture dyssynchrony. The reason is as follows: taking the first batch of audio data as an example, during the slicing iteration, 440320 bytes of data (i.e., the first data segment a1) are delivered to the subsequent audio processing module, and the remaining 680 bytes of data (i.e., the first second data segment a2) are buffered in the audio buffer and are not immediately delivered to the subsequent audio processing module. This situation leads to: the time stamp of the audio frequency in the live broadcast process can be continuously additionally increased, so that the sound and the picture are asynchronous.
In an exemplary embodiment, to avoid the asynchronous voice, as shown in fig. 5, the audio adaptation method for adapting to multiple system versions further includes a step S500 of determining a time stamp of the audio. Wherein the audio has a timestamp equal to the master timestamp minus the staged data timestamp; the main timestamp is the sum of timestamp increments of audio data from 1 st to ith batches, and the temporary storage timestamp is a timestamp increment corresponding to the ith second data segment. The temporary storage data timestamp corresponding to the i-th second data segment a2 is the timestamp increment × remainder/(1024 × bytesPreFrame × number of channels), that is, the timestamp increment 680/(1024 × bytesPreFrame × number of channels). It should be understood that the timestamp increment is a timestamp increment corresponding to each batch of audio data.
Different players have different decoding behaviors, the time stamp of the first audio frame can be set as a zero point, the time stamp of the subsequent audio frame is used as the zero point, the time stamp of one audio frame is added in each iteration, and the time (temporary data time stamp) is subtracted from less than one audio frame to ensure that the time stamps of the picture and the audio are synchronous, so that the pull stream is ensured not to be jammed.
The inventors have also found that: when the computer device 2 processes more audio data, the problem of unstable plug flow is likely to occur. Continuing with the example of the IOS system, the replayKit of the IOS system has a limit of the process memory 50M, and exceeding this memory may cause a program crash, thereby causing the live streaming to be interrupted. Therefore, in order to realize stable plug flow, as shown in fig. 6, the audio adaptation method for adapting to multiple system versions further includes steps S600 to S602, where: step S600, processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; step S602, putting the plurality of tasks into a serial queue to perform asynchronous processing operation on the ith batch of audio data. That is to say, when the application processes the audio data stream, the task is put into the self-managed serial queue to be executed asynchronously, so that the thread of data callback is prevented from being blocked due to processing time consumption. The advantages of this embodiment are: the increase of the memory consumption during the operation is reduced to a certain extent, and the plug flow stability is ensured.
In order to further improve the stability, as shown in fig. 7, the audio adaptation method adapted to multiple system versions further includes step S604: and increasing the priority of the serial queue. The embodiment is to avoid task accumulation and further improve the stability of the plug flow.
Example two
Fig. 8 is a block diagram schematically illustrating an audio adaptation system adapted to multiple system versions according to a second embodiment of the present application, where the audio adaptation system adapted to multiple system versions may be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the second embodiment of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can perform specific functions, and the following description will specifically describe the functions of the program modules in the embodiments.
As shown in fig. 8, the multi-system version adaptive audio adaptation system 800 may include an obtaining module 810 and a processing module 820, wherein:
an obtaining module 810, configured to obtain an ith batch of audio data provided by a system live recording tool, where i is a positive integer;
a processing module 820, configured to obtain, based on at least the ith batch of audio data, corresponding first and second data segments: when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area; when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into the audio buffer
In an exemplary embodiment, the audio adaptation system 800 further includes a timestamp determination module (not shown) for: determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the staged data timestamp; the main timestamp is the sum of timestamp increments of audio data from 1 st to ith batches, and the temporary storage timestamp is a timestamp increment corresponding to the ith second data segment.
In an exemplary embodiment, the audio adaptation system 800 further includes a task processing module (not shown). The task processing module is used for: processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; and putting the tasks into a serial queue to execute asynchronous processing operation on the ith batch of audio data.
In an exemplary embodiment, the task processing module is further configured to: and increasing the priority of the serial queue.
In an exemplary embodiment, the audio adaptation system 800 further includes a format conversion module (not shown). The format conversion module is configured to: acquiring a data format of the ith batch of audio data; and if the data format is not the preset data format, converting the data format of the ith first data fragment.
In an exemplary embodiment, the data format includes a size end; the format conversion module is used for: and if the size end is not the preset size end, performing replacement operation on the j-th data and the j + 1-th data in the ith first data segment, wherein j is a positive integer.
In an exemplary embodiment, the data format includes a number of channels; the format conversion module is used for: if the data format is a single channel and the preset data format is a double channel, performing channel number conversion on the ith first data segment: copying the kth data of the ith first data segment to k × 2 bit addresses and k × 2+2 bit addresses of a two-channel pointer; and copying the (k +1) th data of the ith first data segment to the (k + 2+ 1) bit address and the (k + 2+ 3) bit address of the two-channel pointer, wherein k is a positive integer.
In an exemplary embodiment, the data format includes a sampling rate; the format conversion module is used for: and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment by a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
In an exemplary embodiment, the live recording tool of the system is a replayKit of an IOS system.
EXAMPLE III
Fig. 9 schematically shows a hardware architecture diagram of a computer device 2 suitable for implementing an audio adaptation method adapted to multiple system versions according to a third embodiment of the present application. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set in advance or stored. For example, it may be a smartphone, tablet computer, etc. As shown in fig. 9, the computer device 2 includes at least, but is not limited to: the memory 910, processor 920, and network interface 930 may be communicatively linked to each other via a system bus. Wherein:
the memory 910 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 910 may be an internal storage module of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 910 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 2. Of course, the memory 910 may also include both internal and external memory modules of the computer device 2. In this embodiment, the memory 910 is generally used for storing an operating system installed in the computer device 2 and various types of application software, such as program codes of audio adaptation methods adapted to multiple system versions. In addition, the memory 910 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 920 may be, in some embodiments, a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip. The processor 920 is generally configured to control the overall operation of the computer device 2, such as performing control and processing related to data interaction or communication with the computer device 2. In this embodiment, the processor 920 is configured to execute program codes stored in the memory 910 or process data.
Network interface 930 may include a wireless network interface or a wired network interface, with network interface 930 typically being used to establish communication links between computer device 2 and other computer devices. For example, the network interface 930 is used to connect the computer device 2 to an external terminal via a network, establish a data transmission channel and a communication link between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), or Wi-Fi.
It is noted that FIG. 9 only shows a computer device having components 910 and 930, but it is to be understood that not all of the shown components are required and that more or fewer components may be implemented instead.
In this embodiment, the audio adaptation method adapted to multiple system versions stored in the memory 910 may also be divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 920) to implement the embodiments of the present application.
Example four
The present application further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the audio adaptation method to accommodate multiple system versions in embodiments.
In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In this embodiment, the computer-readable storage medium is generally used to store an operating system and various types of application software installed in the computer device, for example, program codes of an audio adaptation method that is adapted to multiple system versions in the embodiment, and the like. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
It should be noted that the above mentioned embodiments are only preferred embodiments of the present application, and not intended to limit the scope of the present application, and all the equivalent structures or equivalent flow transformations made by the contents of the specification and the drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (12)

1. A method for audio adaptation to multiple system versions, the method comprising:
acquiring ith batch of audio data provided by a system live recording tool, wherein i is a positive integer;
obtaining corresponding first data segments and second data segments based on at least the ith batch of audio data:
when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area;
when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into the audio buffer.
2. The multi-system version adaptive audio adaptation method according to claim 1, further comprising:
determining a timestamp of the audio, wherein the timestamp of the audio is equal to the master timestamp minus the staged data timestamp; the main timestamp is the sum of timestamp increments of audio data from 1 st to ith batches, and the temporary storage timestamp is a timestamp increment corresponding to the ith second data segment.
3. The multi-system version adaptive audio adaptation method according to claim 1, further comprising:
processing the ith batch of audio data through a plurality of tasks, wherein each task corresponds to one processing operation; and
and putting the tasks into a serial queue to execute asynchronous processing operation on the ith batch of audio data.
4. The multi-system version adaptive audio adaptation method according to claim 3, further comprising:
and increasing the priority of the serial queue.
5. The multi-system version adaptive audio adaptation method according to claim 1, further comprising:
acquiring a data format of the ith batch of audio data; and
and if the data format is not the preset data format, converting the data format of the ith first data fragment.
6. The multi-system version compliant audio adaptation method of claim 5, wherein the data format comprises a size end;
if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps:
and if the size end is not the preset size end, performing replacement operation on the j-th data and the j + 1-th data in the ith first data segment, wherein j is a positive integer.
7. The multi-system version compliant audio adaptation method of claim 5, wherein the data format comprises a number of channels;
if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps:
if the data format is a single channel and the preset data format is a double channel, performing channel number conversion on the ith first data segment:
copying the kth data of the ith first data segment to k × 2 bit addresses and k × 2+2 bit addresses of a two-channel pointer; and copying the (k +1) th data of the ith first data segment to the (k + 2+ 1) bit address and the (k + 2+ 3) bit address of the two-channel pointer, wherein k is a positive integer.
8. The multi-system version adaptive audio adaptation method according to claim 5, wherein the data format comprises a sampling rate;
if the data format is not the preset data format, the data format conversion is carried out on the ith first data fragment, and the data format conversion comprises the following steps:
and if the sampling rate is not equal to the preset sampling rate, performing sampling rate conversion on the ith first data segment by a fast sampling rate conversion strategy to obtain the ith first data segment corresponding to the preset sampling rate.
9. The method of any of claims 1 to 8, wherein the live system recording tool is a playback kit of an IOS system.
10. An audio adaptation system that accommodates multiple system versions, comprising:
the acquisition module is used for acquiring the ith batch of audio data provided by a system live recording tool, wherein i is a positive integer;
a processing module, configured to obtain corresponding first data segments and second data segments based on at least the ith batch of audio data:
when i is 1, the first batch of audio data is divided into a first data segment and a first second data segment, wherein the data volume of the first data segment is the maximum integral multiple of the slice data volume provided by the first batch of audio data, and the first second data segment is the data segment of the first batch of audio data except for the first data segment; sending the first data segment to a next audio processing module; temporarily storing the first second data segment into an audio buffer area;
when i is larger than or equal to 2, forming an ith first data segment and an ith second data segment based on the ith-1 second audio segment left by the i-1 batch of audio data and the ith batch of audio data, wherein the data volume of the ith first data segment is the maximum integral multiple of the slice data volume provided by the total audio data of the ith-1 second data segment and the ith batch of audio data, and the ith second data segment is the data segment left by the ith first data segment except the ith second data segment in the total audio data of the i-1 second data segment and the ith batch of audio data; sending the ith first data segment to the next audio processing module; and temporarily storing the ith second data segment into an audio buffer.
11. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, is adapted to carry out the steps of the method for multi-system version adaptive audio adaptation according to any one of claims 1 to 9.
12. A computer-readable storage medium, in which a computer program is stored which is executable by at least one processor to cause the at least one processor to perform the steps of the method for multi-system version compliant audio adaptation according to any of claims 1 to 9.
CN202010911906.4A 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions Active CN112437315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010911906.4A CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010911906.4A CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Publications (2)

Publication Number Publication Date
CN112437315A true CN112437315A (en) 2021-03-02
CN112437315B CN112437315B (en) 2023-06-27

Family

ID=74689976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010911906.4A Active CN112437315B (en) 2020-09-02 2020-09-02 Audio adaptation method and system for adapting to multiple system versions

Country Status (1)

Country Link
CN (1) CN112437315B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923065B (en) * 2021-09-06 2023-11-24 贵阳语玩科技有限公司 Cross-version communication method, system, medium and server based on chat room audio

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344887A (en) * 2008-06-06 2009-01-14 网易有道信息技术(北京)有限公司 Audio search method and device
US20110150099A1 (en) * 2009-12-21 2011-06-23 Calvin Ryan Owen Audio Splitting With Codec-Enforced Frame Sizes
WO2012163304A1 (en) * 2011-06-02 2012-12-06 华为终端有限公司 Audio decoding method and device
CN108235052A (en) * 2018-01-09 2018-06-29 安徽小马创意科技股份有限公司 Multi-audio-frequency channel hardware audio mixing, acquisition and the method for broadcasting may be selected based on IOS
CN110335615A (en) * 2019-05-05 2019-10-15 北京字节跳动网络技术有限公司 Processing method, device, electronic equipment and the storage medium of audio data
CN110415723A (en) * 2019-07-30 2019-11-05 广州酷狗计算机科技有限公司 Method, apparatus, server and the computer readable storage medium of audio parsing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101344887A (en) * 2008-06-06 2009-01-14 网易有道信息技术(北京)有限公司 Audio search method and device
US20110150099A1 (en) * 2009-12-21 2011-06-23 Calvin Ryan Owen Audio Splitting With Codec-Enforced Frame Sizes
WO2012163304A1 (en) * 2011-06-02 2012-12-06 华为终端有限公司 Audio decoding method and device
CN108235052A (en) * 2018-01-09 2018-06-29 安徽小马创意科技股份有限公司 Multi-audio-frequency channel hardware audio mixing, acquisition and the method for broadcasting may be selected based on IOS
CN110335615A (en) * 2019-05-05 2019-10-15 北京字节跳动网络技术有限公司 Processing method, device, electronic equipment and the storage medium of audio data
CN110415723A (en) * 2019-07-30 2019-11-05 广州酷狗计算机科技有限公司 Method, apparatus, server and the computer readable storage medium of audio parsing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚毅等: "多媒体协作平台中录制系统的研究与实现", 《计算机技术与发展》 *
姚毅等: "多媒体协作平台中录制系统的研究与实现", 《计算机技术与发展》, no. 09, 10 September 2007 (2007-09-10) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923065B (en) * 2021-09-06 2023-11-24 贵阳语玩科技有限公司 Cross-version communication method, system, medium and server based on chat room audio

Also Published As

Publication number Publication date
CN112437315B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN111464256A (en) Time stamp correction method and device, electronic equipment and storage medium
CN109618225B (en) Video frame extraction method, device, equipment and medium
CN110704202B (en) Multimedia recording data sharing method and terminal equipment
CN109144858B (en) Fluency detection method and device, computing equipment and storage medium
EP3866481A1 (en) Audio/video switching method and apparatus, and computer device and readable storage medium
EP3905596A1 (en) Internet speed measuring method and device, computer equipment and readable storage medium
CN112437315B (en) Audio adaptation method and system for adapting to multiple system versions
US20220246179A1 (en) Timecode generation and assignment
CN109889922B (en) Method, device, equipment and storage medium for forwarding streaming media data
CN112069195A (en) Database-based message transmission method and device, electronic equipment and storage medium
US11825165B2 (en) Method of determining video resolution, computing device, and computer-program product
CN114222156A (en) Video editing method, video editing device, computer equipment and storage medium
CN103905843A (en) Distributed audio/video processing device and method for continuous frame-I circumvention
WO2023083213A1 (en) Data decoding method and apparatus, electronic device and readable storage medium
CN110996172B (en) Method for quickly synthesizing 4K MXF file
US7668094B2 (en) Time-offset regulated method and system for synchronization and rate control of media data
US20160295531A1 (en) Multipath time division service transmission method and device
US8436753B2 (en) System and method for efficiently translating media files between formats using a universal representation
US11375203B2 (en) Video processing method, system, device and computer-readable storage medium
CN110677777B (en) Audio data processing method, terminal and storage medium
CN112423104A (en) Audio mixing method and system for multi-channel audio in live scene
CN107277650B (en) Video file cutting method and device
CN112423117A (en) Web end video playing method and device and computer equipment
CN111787420A (en) Data processing method and device, electronic equipment and readable storage medium
CN112423120A (en) Audio time delay detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant