CN109243471B

CN109243471B - Method for quickly coding digital audio for broadcasting

Info

Publication number: CN109243471B
Application number: CN201811124426.2A
Authority: CN
Inventors: 陈永泽; 吕连新; 赵凡
Original assignee: Hangzhou Linker Technology Co ltd
Current assignee: Hangzhou Linker Technology Co ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2022-09-23
Anticipated expiration: 2038-09-26
Also published as: CN109243471A

Abstract

The invention discloses a method for rapidly coding digital audio for broadcasting, which comprises the following steps: s1, converting the source audio file into PCM data; s2, dividing the PCM data into a plurality of blocks, and labeling each block of data in sequence; s3, sending the segmented data to a CPU or a GPU for parallel coding operation; and S4, combining the coded audio data according to the label sequence to generate a final digital audio file. According to the scheme, the data are sent to the processor for parallel processing after being segmented, and under the condition that the current processor is basically a multi-core processor, each core can process one piece of data, multithreading is concurrent, and the processing speed is effectively improved. The scheme is suitable for large audio data file coding and decoding processing used in occasions such as broadcasting and the like.

Description

Method for quickly coding digital audio for broadcasting

Technical Field

The invention relates to the technical field of digital audio coding and decoding, in particular to a method for quickly coding digital audio for broadcasting, which can be used for parallel processing.

Background

The audio data files for broadcasting need to be coded and decoded, and when the audio files are large, such as 24-hour audio, the coding and decoding process takes a long time, and the utilization rate of a processor is not high.

Disclosure of Invention

The invention mainly solves the technical problems of long time consumption and low efficiency of audio file coding in the prior art, and provides a method for quickly coding digital audio for broadcasting, which can make full use of a multi-core CPU or GPU and has high processor utilization rate.

The invention mainly solves the technical problems through the following technical scheme: a method for rapidly encoding digital audio for broadcasting, comprising the steps of:

s1, converting the source audio file into PCM data;

s2, dividing the PCM data into a plurality of blocks, and labeling each block of data in sequence;

s3, sending the segmented data to a CPU or a GPU for parallel coding operation;

and S4, combining the coded audio data according to the label sequence to generate a final digital audio file.

According to the scheme, the data are sent to the processor for parallel processing after being segmented, and under the condition that the current processor is basically a multi-core processor, each core can process one piece of data, multithreading is concurrent, and the processing speed is effectively improved.

Preferably, the step S1 is specifically:

s101, judging whether the source audio is PCM data, if so, jumping to a step S103, otherwise, entering the step S102:

s102, decoding the source audio data to generate PCM data, and then entering the step S103;

s103, judging whether the sampling rate, the bit depth and the channel number of the PCM data and the target audio MP2 are consistent, if any one of the parameters is inconsistent, the step S104 is carried out, and if all the parameters are consistent, the step S2 is carried out; the sampling rate, bit depth and number of channels of the target audio MP2 are parameters input by human or program defaults before coding;

s104, resampling and re-quantizing the source data, and then proceeding to step S2. The resampled and requantized data is PCM data and no decoding operation is required.

Preferably, in step S1, the source audio is audio data that can be decoded universally using ffmpeg or libav open source library to generate PCM data.

Preferably, the size S of each block of data _chunk Determined by the following set of equations:

where P is the minimum period value of frame padding, C _f Is the number of sample information contained in a unit frame, N _bitdepth Is bit depth, N _channel For the number of channels, ceil (float) is an upward rounding function, S _pcm For the total size of PCM data, S _frame Is the unit frame data size. C _f : fixed to 384 sample information per frame for MP1, and fixed to 1152 sample information per frame for MP 2. N is a radical of _bitdepth : the bit depth is a parameter input by people or programs in default before coding, and is generally 16bits by default. N is a radical of _channel : the number of channels is a parameter that is considered or input by default before encoding, typically stereo, i.e. 2 channels.

Preferably, the minimum period value P of the frame padding is determined by the following set of equations:

in the formula, R _b Is bit rate, S _s Number of bytes occupied by unit slot, f _s As a sampling rate, gcd (number) ₁ ,number ₂ ) To ask for number ₁ ,number ₂ The greatest common divisor function of (d). R _b : the bit rate can be obtained by parameters input manually or by a program before encoding, or can be obtained by calculating parameters such as the number of sample points of a unit frame, the sampling rate, the number of channels and the like, and is generally used as an input item. S _s : one slot takes 4 bytes for MP1 and 1byte for MP 2. f. of _s : the sampling rate is a parameter that is input manually or programmatically prior to encoding.

Preferably, the minimum period value P of the frame padding is determined by the following formula:

in the formula, R _b Is bit rate, S _s Number of bytes occupied by unit slot, f _s For the sampling rate, lcm (number) ₁ ,number ₂ ) To ask for number ₁ ,number ₂ Is the least common multiple function of.

The invention has the substantial effects of shortening the encoding and decoding process and improving the utilization rate of the processor.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.

Example (b): the method for rapidly coding the digital audio for broadcasting of the embodiment comprises the following steps:

step one, decoding into PCM data: and after decoding the source audio file into PCM data, performing the second step.

Dividing the PCM data into N blocks: the minimum period value P of the frame filling is calculated according to "formula set 1" or "formula set 2". Then, the PCM data is divided into N blocks (corresponding to N threads) according to a formula set 3, and the size of each block is S _chunk (the last block size is not always equal to S _chunk ) And after labeling is carried out according to the sequence, the third step is carried out.

The algorithm formula is as follows:

formula set 1:

formula set 2:

formula group 3

The symbols in the formula:

p: minimum period value of frame padding.

lcm(number ₁ ,number ₂ ): number of mathematical functions ₁ ,number ₂ The least common multiple of.

gcd(number ₁ ,number ₂ ): number of mathematical functions ₁ ,number ₂ The greatest common divisor of (c).

f _s : sampling rate in Hz; typically 32KHz, 44.1KHz, 48KHz, etc.

C _f : the number of sample information contained in a unit frame.

S _s : the number of bytes occupied by the unit slot is 1Byte ═ 8 bit.

R _b : the bit rate, unit is bit/s.

S _pcm : the total size of the PCM data, i.e. the total number of bytes, is in bytes.

N _bitdepth : bit depths, such as: 8(bit), 16(bit), 24(bit), 32 (bit).

N _channel : the number of channels, for example: mono is 1 and stereo/binaural is 2.

Ceil (float): mathematical function, rounding up.

S _frame : the unit frame data size is in bytes.

S _chunk : PCM block data size, in bytes.

N: and (3) parallelly processing the opened number of threads, wherein the number of the threads belongs to the range of [1 × processor core number, 1.5 × processing core number ], and the CPU utilization rate can be up to 100%.

*: the multiplication operator.

Percent: the modulus operator.

And (step three), sending the audio data into a CPU or a GPU for parallel coding operation, combining the coded audio data according to the label sequence (combining in a multithreading way), and generating a final digital audio file.

Fig. 1 is a flowchart of the present embodiment.

Examples are as follows:

testing PCM original audio data, and transcoding into MP 2:

(1) and (3) system environment:

operating the system: windows Server 2008R2x64SP1

A processor: inter (R) core (TM) i5-4590CPU @3.30GHz

Memory (RAM): 12G

(2) Source audio:

PCM, 48000Hz, 16bit, 2 channel, duration 1:00:00, 691,200,000 bytes.

(3) Target audio:

MP2, 48000Hz, 256000bit/s, 2 channels, duration 1:00:00, 115,200,000 bytes.

And (3) testing results:

(1) in the test process, the test PC simultaneously runs other resource-consuming programs such as a development tool and the like, and the result is only used for showing that the multi-thread coding is obviously accelerated compared with the single thread.

(2) In addition, different audio codecs are packed differently, and for ffmpeg as an example, when PCM data of each block (except for the first block) is encoded, the encoded first frame needs to be "re-encoded" or "take 1 more frame data and discard the frame data".

(3) GPU coding: taking NVIDIAGEFORCE GTX 1080Ti video card as an example, the core number is 3584, and the acceleration frequency is 1582 MHz. 3584 threads may be turned on simultaneously for parallel processing. Then the magnitude of the increase in speed of the GPU would be theoretically more pronounced.

Related terms

PCM pulse code modulation: pulse-code modulation (PCM) is a method of digitizing analog signals.

Bit stream Bit: in GB/T17191, the bitstream is an encoded representation of an audio signal.

Encoding: encoding is the process by which information is converted from one form or format to another. There is no process specified in GB/T17191 for reading a stream of input audio samples to produce an efficient bit stream in accordance with the definition in GB/T17191.

Decoding: decoding is the inverse of encoding. The process defined in GB/T17191 reads the encoded bitstream and produces decoded audio sample values.

Sound Channel: the number of channels is the number of sound sources during recording or playing back the sound, or the number of corresponding speakers during playing back the sound.

Sample rate: the sampling rate, also called sampling speed, sampling frequency, defines the samples per second extracted from a continuous signal and constituting a discrete signalNumber in hertz (Hz). The commonly used expression symbol is f _s 。

Bit rate: the bit rate, i.e. the bit rate, also called bit rate, code rate, is the number of bits transmitted or processed per unit time. The units of "bits per second" (bit/s or bps) are used. Available symbols R _b And (4) showing.

Bit depth: in using PCM digital audio, the bit depth is the number of bits of information in each sample, which directly corresponds to the resolution of each sample. Examples of bit depths include 16bits per sample for digital audio on a disk, and up to 24bits per sample for DVD audio and blu-ray disks. The meaning is basically consistent with the weighing precision, the quantization digit, the sampling precision, the sampling digit and the like. The bit depth is only meaningful for PCM digital signals. non-PCM formats (e.g., lossy compression formats) have no associated bit depth.

Layer: in GB/T17191, a layer is one of the audio system coding layers.

Audio access unit: in GB/T17191, for layers i and ii, an audio access unit is defined as the smallest part of the encoded bitstream that can be decoded by itself. Where decoding refers to "fully reconstructed sound".

Frame of the Frame: in GB/T17191, the portion of the audio signal corresponding to the audio PCM samples from the audio access unit. Number of available symbols C containing sampling point information in one frame _f And (4) showing.

Groove: in GB/T17191, a slot is an essential part of the bit stream. In layer i, one slot is 4 bytes; in layer II, one slot is 1 byte. Symbol S available in number of bytes of one slot _s And (4) showing.

Padding: in GB/T17191 the average time length of an audio frame is adjusted to fit the duration of the corresponding PCM data sample value by conditionally adding a slot to the audio frame.

Least common multiple lcm: shorthand for lowest common multiple. lcm (number) ₁ ,number ₂ ,...,number _n ) Number of ₁ ,number ₂ ,...,number _n The least common multiple of.

Greatest common divisor gcd: a shorthand for the greatest common divsor, also known as the greatest common factor (gch). gcd (number) ₁ ,number ₂ ,...,number _n ) Number of ₁ ,number ₂ ,...,number _n The greatest common divisor of (c).

The scheme has no strict format requirement on the source audio, and can finish general decoding by using an open source library such as ffmpeg/libav and the like to generate PCM data as long as the audio format is commonly used.

The basis for judging whether the decoded PCM data needs resampling and requantization is as follows: the PCM data is identical to parameters such as the sampling rate, bit depth, and number of channels of the target audio MP2, and if they are identical, they are not required, and if they are not identical, they are required.

Parallel encoding: starting N threads for parallel processing, dividing PCM data into N blocks, each block having a size of S _chunk (the last block size is not always equal to S _chunk ). The MP2 encoding process is general encoding and can be completed by using an open source library such as ffmpeg/libav.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Although the terms PCM, block, frame, etc. are used more herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims

1. A method for rapidly encoding digital audio for broadcasting, comprising the steps of:

s1, converting the source audio file into PCM data;

s3, sending the segmented data to a CPU or a GPU for parallel coding operation;

s4, merging the coded audio data according to the label sequence to generate a final digital audio file;

in step S2, the size S of each block of data _chunk Determined by the following set of equations:

wherein P is the minimum period value of frame padding, C _f Is the number of sample information contained in a unit frame, N _bitdepth Is bit depth, N _channel For the number of channels, ceil (float) is an upward rounding function, S _pcm For the total size of PCM data, S _frame Is the unit frame data size.

2. The method as claimed in claim 1, wherein the step S1 is specifically performed by:

s103, judging whether the sampling rate, the bit depth and the channel number of the PCM data and the target audio MP2 are consistent, if any one of the parameters is inconsistent, the step S104 is carried out, and if all the parameters are consistent, the step S2 is carried out;

s104, resampling and requantizing the source data, and then proceeding to step S2.

3. The method of claim 1 or 2, wherein the source audio is audio data that can be decoded commonly using ffmpeg or libav open source library to generate PCM data in step S1.

4. The method of claim 1, wherein the minimum period value P of the frame padding is determined by the following formula:

in the formula, R _b Is bit rate, S _s Number of bytes occupied per slot, f _s As a sampling rate, gcd (number) ₁ ,number ₂ ) To ask for number ₁ ,number ₂ The greatest common divisor function of (d).

5. The method of claim 1, wherein the minimum period value P of the frame padding is determined by the following formula:

in the formula, R _b Is bit rate, S _s Is a byte occupied by a unit slotNumber f _s For the sampling rate, lcm (number) ₁ ,number ₂ ) To ask for number ₁ ,number ₂ Is the least common multiple function of.