CN109243471B - Method for quickly coding digital audio for broadcasting - Google Patents

Method for quickly coding digital audio for broadcasting Download PDF

Info

Publication number
CN109243471B
CN109243471B CN201811124426.2A CN201811124426A CN109243471B CN 109243471 B CN109243471 B CN 109243471B CN 201811124426 A CN201811124426 A CN 201811124426A CN 109243471 B CN109243471 B CN 109243471B
Authority
CN
China
Prior art keywords
data
audio
pcm
frame
pcm data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811124426.2A
Other languages
Chinese (zh)
Other versions
CN109243471A (en
Inventor
陈永泽
吕连新
赵凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Linker Technology Co ltd
Original Assignee
Hangzhou Linker Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Linker Technology Co ltd filed Critical Hangzhou Linker Technology Co ltd
Priority to CN201811124426.2A priority Critical patent/CN109243471B/en
Publication of CN109243471A publication Critical patent/CN109243471A/en
Application granted granted Critical
Publication of CN109243471B publication Critical patent/CN109243471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture

Abstract

The invention discloses a method for rapidly coding digital audio for broadcasting, which comprises the following steps: s1, converting the source audio file into PCM data; s2, dividing the PCM data into a plurality of blocks, and labeling each block of data in sequence; s3, sending the segmented data to a CPU or a GPU for parallel coding operation; and S4, combining the coded audio data according to the label sequence to generate a final digital audio file. According to the scheme, the data are sent to the processor for parallel processing after being segmented, and under the condition that the current processor is basically a multi-core processor, each core can process one piece of data, multithreading is concurrent, and the processing speed is effectively improved. The scheme is suitable for large audio data file coding and decoding processing used in occasions such as broadcasting and the like.

Description

Method for quickly coding digital audio for broadcasting
Technical Field
The invention relates to the technical field of digital audio coding and decoding, in particular to a method for quickly coding digital audio for broadcasting, which can be used for parallel processing.
Background
The audio data files for broadcasting need to be coded and decoded, and when the audio files are large, such as 24-hour audio, the coding and decoding process takes a long time, and the utilization rate of a processor is not high.
Disclosure of Invention
The invention mainly solves the technical problems of long time consumption and low efficiency of audio file coding in the prior art, and provides a method for quickly coding digital audio for broadcasting, which can make full use of a multi-core CPU or GPU and has high processor utilization rate.
The invention mainly solves the technical problems through the following technical scheme: a method for rapidly encoding digital audio for broadcasting, comprising the steps of:
s1, converting the source audio file into PCM data;
s2, dividing the PCM data into a plurality of blocks, and labeling each block of data in sequence;
s3, sending the segmented data to a CPU or a GPU for parallel coding operation;
and S4, combining the coded audio data according to the label sequence to generate a final digital audio file.
According to the scheme, the data are sent to the processor for parallel processing after being segmented, and under the condition that the current processor is basically a multi-core processor, each core can process one piece of data, multithreading is concurrent, and the processing speed is effectively improved.
Preferably, the step S1 is specifically:
s101, judging whether the source audio is PCM data, if so, jumping to a step S103, otherwise, entering the step S102:
s102, decoding the source audio data to generate PCM data, and then entering the step S103;
s103, judging whether the sampling rate, the bit depth and the channel number of the PCM data and the target audio MP2 are consistent, if any one of the parameters is inconsistent, the step S104 is carried out, and if all the parameters are consistent, the step S2 is carried out; the sampling rate, bit depth and number of channels of the target audio MP2 are parameters input by human or program defaults before coding;
s104, resampling and re-quantizing the source data, and then proceeding to step S2. The resampled and requantized data is PCM data and no decoding operation is required.
Preferably, in step S1, the source audio is audio data that can be decoded universally using ffmpeg or libav open source library to generate PCM data.
Preferably, the size S of each block of data chunk Determined by the following set of equations:
Figure BDA0001812022420000021
Figure BDA0001812022420000022
Figure BDA0001812022420000023
where P is the minimum period value of frame padding, C f Is the number of sample information contained in a unit frame, N bitdepth Is bit depth, N channel For the number of channels, ceil (float) is an upward rounding function, S pcm For the total size of PCM data, S frame Is the unit frame data size. C f : fixed to 384 sample information per frame for MP1, and fixed to 1152 sample information per frame for MP 2. N is a radical of bitdepth : the bit depth is a parameter input by people or programs in default before coding, and is generally 16bits by default. N is a radical of channel : the number of channels is a parameter that is considered or input by default before encoding, typically stereo, i.e. 2 channels.
Preferably, the minimum period value P of the frame padding is determined by the following set of equations:
Figure BDA0001812022420000024
Figure BDA0001812022420000031
in the formula, R b Is bit rate, S s Number of bytes occupied by unit slot, f s As a sampling rate, gcd (number) 1 ,number 2 ) To ask for number 1 ,number 2 The greatest common divisor function of (d). R b : the bit rate can be obtained by parameters input manually or by a program before encoding, or can be obtained by calculating parameters such as the number of sample points of a unit frame, the sampling rate, the number of channels and the like, and is generally used as an input item. S s : one slot takes 4 bytes for MP1 and 1byte for MP 2. f. of s : the sampling rate is a parameter that is input manually or programmatically prior to encoding.
Preferably, the minimum period value P of the frame padding is determined by the following formula:
Figure BDA0001812022420000032
Figure BDA0001812022420000033
in the formula, R b Is bit rate, S s Number of bytes occupied by unit slot, f s For the sampling rate, lcm (number) 1 ,number 2 ) To ask for number 1 ,number 2 Is the least common multiple function of.
The invention has the substantial effects of shortening the encoding and decoding process and improving the utilization rate of the processor.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical scheme of the invention is further specifically described by the following embodiments and the accompanying drawings.
Example (b): the method for rapidly coding the digital audio for broadcasting of the embodiment comprises the following steps:
step one, decoding into PCM data: and after decoding the source audio file into PCM data, performing the second step.
Dividing the PCM data into N blocks: the minimum period value P of the frame filling is calculated according to "formula set 1" or "formula set 2". Then, the PCM data is divided into N blocks (corresponding to N threads) according to a formula set 3, and the size of each block is S chunk (the last block size is not always equal to S chunk ) And after labeling is carried out according to the sequence, the third step is carried out.
The algorithm formula is as follows:
formula set 1:
Figure BDA0001812022420000041
Figure BDA0001812022420000042
formula set 2:
Figure BDA0001812022420000043
Figure BDA0001812022420000044
formula group 3
Figure BDA0001812022420000045
Figure BDA0001812022420000046
Figure BDA0001812022420000047
The symbols in the formula:
p: minimum period value of frame padding.
lcm(number 1 ,number 2 ): number of mathematical functions 1 ,number 2 The least common multiple of.
gcd(number 1 ,number 2 ): number of mathematical functions 1 ,number 2 The greatest common divisor of (c).
f s : sampling rate in Hz; typically 32KHz, 44.1KHz, 48KHz, etc.
C f : the number of sample information contained in a unit frame.
S s : the number of bytes occupied by the unit slot is 1Byte ═ 8 bit.
R b : the bit rate, unit is bit/s.
S pcm : the total size of the PCM data, i.e. the total number of bytes, is in bytes.
N bitdepth : bit depths, such as: 8(bit), 16(bit), 24(bit), 32 (bit).
N channel : the number of channels, for example: mono is 1 and stereo/binaural is 2.
Ceil (float): mathematical function, rounding up.
S frame : the unit frame data size is in bytes.
S chunk : PCM block data size, in bytes.
N: and (3) parallelly processing the opened number of threads, wherein the number of the threads belongs to the range of [1 × processor core number, 1.5 × processing core number ], and the CPU utilization rate can be up to 100%.
*: the multiplication operator.
Percent: the modulus operator.
And (step three), sending the audio data into a CPU or a GPU for parallel coding operation, combining the coded audio data according to the label sequence (combining in a multithreading way), and generating a final digital audio file.
Fig. 1 is a flowchart of the present embodiment.
Examples are as follows:
testing PCM original audio data, and transcoding into MP 2:
(1) and (3) system environment:
operating the system: windows Server 2008R2x64SP1
A processor: inter (R) core (TM) i5-4590CPU @3.30GHz
Memory (RAM): 12G
(2) Source audio:
PCM, 48000Hz, 16bit, 2 channel, duration 1:00:00, 691,200,000 bytes.
(3) Target audio:
MP2, 48000Hz, 256000bit/s, 2 channels, duration 1:00:00, 115,200,000 bytes.
And (3) testing results:
Figure BDA0001812022420000061
(1) in the test process, the test PC simultaneously runs other resource-consuming programs such as a development tool and the like, and the result is only used for showing that the multi-thread coding is obviously accelerated compared with the single thread.
(2) In addition, different audio codecs are packed differently, and for ffmpeg as an example, when PCM data of each block (except for the first block) is encoded, the encoded first frame needs to be "re-encoded" or "take 1 more frame data and discard the frame data".
(3) GPU coding: taking NVIDIAGEFORCE GTX 1080Ti video card as an example, the core number is 3584, and the acceleration frequency is 1582 MHz. 3584 threads may be turned on simultaneously for parallel processing. Then the magnitude of the increase in speed of the GPU would be theoretically more pronounced.
Related terms
PCM pulse code modulation: pulse-code modulation (PCM) is a method of digitizing analog signals.
Bit stream Bit: in GB/T17191, the bitstream is an encoded representation of an audio signal.
Encoding: encoding is the process by which information is converted from one form or format to another. There is no process specified in GB/T17191 for reading a stream of input audio samples to produce an efficient bit stream in accordance with the definition in GB/T17191.
Decoding: decoding is the inverse of encoding. The process defined in GB/T17191 reads the encoded bitstream and produces decoded audio sample values.
Sound Channel: the number of channels is the number of sound sources during recording or playing back the sound, or the number of corresponding speakers during playing back the sound.
Sample rate: the sampling rate, also called sampling speed, sampling frequency, defines the samples per second extracted from a continuous signal and constituting a discrete signalNumber in hertz (Hz). The commonly used expression symbol is f s
Bit rate: the bit rate, i.e. the bit rate, also called bit rate, code rate, is the number of bits transmitted or processed per unit time. The units of "bits per second" (bit/s or bps) are used. Available symbols R b And (4) showing.
Bit depth: in using PCM digital audio, the bit depth is the number of bits of information in each sample, which directly corresponds to the resolution of each sample. Examples of bit depths include 16bits per sample for digital audio on a disk, and up to 24bits per sample for DVD audio and blu-ray disks. The meaning is basically consistent with the weighing precision, the quantization digit, the sampling precision, the sampling digit and the like. The bit depth is only meaningful for PCM digital signals. non-PCM formats (e.g., lossy compression formats) have no associated bit depth.
Layer: in GB/T17191, a layer is one of the audio system coding layers.
Audio access unit: in GB/T17191, for layers i and ii, an audio access unit is defined as the smallest part of the encoded bitstream that can be decoded by itself. Where decoding refers to "fully reconstructed sound".
Frame of the Frame: in GB/T17191, the portion of the audio signal corresponding to the audio PCM samples from the audio access unit. Number of available symbols C containing sampling point information in one frame f And (4) showing.
Groove: in GB/T17191, a slot is an essential part of the bit stream. In layer i, one slot is 4 bytes; in layer II, one slot is 1 byte. Symbol S available in number of bytes of one slot s And (4) showing.
Padding: in GB/T17191 the average time length of an audio frame is adjusted to fit the duration of the corresponding PCM data sample value by conditionally adding a slot to the audio frame.
Least common multiple lcm: shorthand for lowest common multiple. lcm (number) 1 ,number 2 ,...,number n ) Number of 1 ,number 2 ,...,number n The least common multiple of.
Greatest common divisor gcd: a shorthand for the greatest common divsor, also known as the greatest common factor (gch). gcd (number) 1 ,number 2 ,...,number n ) Number of 1 ,number 2 ,...,number n The greatest common divisor of (c).
The scheme has no strict format requirement on the source audio, and can finish general decoding by using an open source library such as ffmpeg/libav and the like to generate PCM data as long as the audio format is commonly used.
The basis for judging whether the decoded PCM data needs resampling and requantization is as follows: the PCM data is identical to parameters such as the sampling rate, bit depth, and number of channels of the target audio MP2, and if they are identical, they are not required, and if they are not identical, they are required.
Parallel encoding: starting N threads for parallel processing, dividing PCM data into N blocks, each block having a size of S chunk (the last block size is not always equal to S chunk ). The MP2 encoding process is general encoding and can be completed by using an open source library such as ffmpeg/libav.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Although the terms PCM, block, frame, etc. are used more herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims (5)

1. A method for rapidly encoding digital audio for broadcasting, comprising the steps of:
s1, converting the source audio file into PCM data;
s2, dividing the PCM data into a plurality of blocks, and labeling each block of data in sequence;
s3, sending the segmented data to a CPU or a GPU for parallel coding operation;
s4, merging the coded audio data according to the label sequence to generate a final digital audio file;
in step S2, the size S of each block of data chunk Determined by the following set of equations:
Figure FDA0003771688550000011
Figure FDA0003771688550000012
Figure FDA0003771688550000013
wherein P is the minimum period value of frame padding, C f Is the number of sample information contained in a unit frame, N bitdepth Is bit depth, N channel For the number of channels, ceil (float) is an upward rounding function, S pcm For the total size of PCM data, S frame Is the unit frame data size.
2. The method as claimed in claim 1, wherein the step S1 is specifically performed by:
s101, judging whether the source audio is PCM data, if so, jumping to a step S103, otherwise, entering the step S102:
s102, decoding the source audio data to generate PCM data, and then entering the step S103;
s103, judging whether the sampling rate, the bit depth and the channel number of the PCM data and the target audio MP2 are consistent, if any one of the parameters is inconsistent, the step S104 is carried out, and if all the parameters are consistent, the step S2 is carried out;
s104, resampling and requantizing the source data, and then proceeding to step S2.
3. The method of claim 1 or 2, wherein the source audio is audio data that can be decoded commonly using ffmpeg or libav open source library to generate PCM data in step S1.
4. The method of claim 1, wherein the minimum period value P of the frame padding is determined by the following formula:
Figure FDA0003771688550000021
Figure FDA0003771688550000022
in the formula, R b Is bit rate, S s Number of bytes occupied per slot, f s As a sampling rate, gcd (number) 1 ,number 2 ) To ask for number 1 ,number 2 The greatest common divisor function of (d).
5. The method of claim 1, wherein the minimum period value P of the frame padding is determined by the following formula:
Figure FDA0003771688550000023
Figure FDA0003771688550000024
in the formula, R b Is bit rate, S s Is a byte occupied by a unit slotNumber f s For the sampling rate, lcm (number) 1 ,number 2 ) To ask for number 1 ,number 2 Is the least common multiple function of.
CN201811124426.2A 2018-09-26 2018-09-26 Method for quickly coding digital audio for broadcasting Active CN109243471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811124426.2A CN109243471B (en) 2018-09-26 2018-09-26 Method for quickly coding digital audio for broadcasting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811124426.2A CN109243471B (en) 2018-09-26 2018-09-26 Method for quickly coding digital audio for broadcasting

Publications (2)

Publication Number Publication Date
CN109243471A CN109243471A (en) 2019-01-18
CN109243471B true CN109243471B (en) 2022-09-23

Family

ID=65056230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811124426.2A Active CN109243471B (en) 2018-09-26 2018-09-26 Method for quickly coding digital audio for broadcasting

Country Status (1)

Country Link
CN (1) CN109243471B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689876B (en) * 2019-10-14 2022-04-12 腾讯科技(深圳)有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112380173B (en) * 2020-11-20 2023-10-20 中国直升机设计研究所 Intelligent correction rapid PCM decoding calculation method
CN112767920A (en) * 2020-12-31 2021-05-07 深圳市珍爱捷云信息技术有限公司 Method, device, equipment and storage medium for recognizing call voice
CN113747236A (en) * 2021-10-19 2021-12-03 江下信息科技(惠州)有限公司 Multithreading-based high-speed audio format conversion method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922537A (en) * 1987-06-02 1990-05-01 Frederiksen & Shu Laboratories, Inc. Method and apparatus employing audio frequency offset extraction and floating-point conversion for digitally encoding and decoding high-fidelity audio signals
CN1848241A (en) * 1995-12-01 2006-10-18 数字剧场系统股份有限公司 Multi-channel audio frequency coder
CN101027717A (en) * 2004-03-25 2007-08-29 Dts公司 Lossless multi-channel audio codec
EP2259634A2 (en) * 1995-06-30 2010-12-08 Interdigital Technology Corporation Code acquisition in a CDMA communication system
CN104795073A (en) * 2015-03-26 2015-07-22 无锡天脉聚源传媒科技有限公司 Method and device for processing audio data
CN107731238A (en) * 2016-08-10 2018-02-23 华为技术有限公司 The coding method of multi-channel signal and encoder
CN108550369A (en) * 2018-04-14 2018-09-18 全景声科技南京有限公司 A kind of panorama acoustical signal decoding method of variable-length

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922537A (en) * 1987-06-02 1990-05-01 Frederiksen & Shu Laboratories, Inc. Method and apparatus employing audio frequency offset extraction and floating-point conversion for digitally encoding and decoding high-fidelity audio signals
EP2259634A2 (en) * 1995-06-30 2010-12-08 Interdigital Technology Corporation Code acquisition in a CDMA communication system
CN1848241A (en) * 1995-12-01 2006-10-18 数字剧场系统股份有限公司 Multi-channel audio frequency coder
CN101027717A (en) * 2004-03-25 2007-08-29 Dts公司 Lossless multi-channel audio codec
CN104795073A (en) * 2015-03-26 2015-07-22 无锡天脉聚源传媒科技有限公司 Method and device for processing audio data
CN107731238A (en) * 2016-08-10 2018-02-23 华为技术有限公司 The coding method of multi-channel signal and encoder
CN108550369A (en) * 2018-04-14 2018-09-18 全景声科技南京有限公司 A kind of panorama acoustical signal decoding method of variable-length

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Tian Bo.A GOP Size Control Method for Distributed Video Coding.《 2015 8th International Conference on Intelligent Computation Technology and Automation (ICICTA)》.2016,全文. *
段敏涛.基于定点DSP的G.723.1实时语音编码器的优化设计与实现研究.《中国优秀硕士学位论文全文数据库》.2006,(第5期),全文. *

Also Published As

Publication number Publication date
CN109243471A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109243471B (en) Method for quickly coding digital audio for broadcasting
EP3114681B1 (en) Post-encoding bitrate reduction of multiple object audio
US8386271B2 (en) Lossless and near lossless scalable audio codec
KR100561869B1 (en) Lossless audio decoding/encoding method and apparatus
JP5688861B2 (en) Entropy coding to adapt coding between level mode and run length / level mode
JP5400143B2 (en) Factoring the overlapping transform into two block transforms
JP2001094433A (en) Sub-band coding and decoding medium
AU2015235133B2 (en) Audio decoding device, audio encoding device, audio decoding method, audio encoding method, audio decoding program, and audio encoding program
KR20070059849A (en) Method and apparatus for encoding/decoding audio signal
KR20060084497A (en) Method and apparatus for encoding and decoding of digital signals
JP2020512587A (en) System and method for processing audio data
US11869523B2 (en) Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
KR20200012861A (en) Difference Data in Digital Audio Signals
JP2003523535A (en) Method and apparatus for converting an audio signal between a plurality of data compression formats
JP2007142547A (en) Coding method and decoding method, and coder and decoder employing same
CN111866542B (en) Audio signal processing method, multimedia information processing device and electronic equipment
US10199043B2 (en) Scalable code excited linear prediction bitstream repacked from a higher to a lower bitrate by discarding insignificant frame data
JP7318645B2 (en) Encoding device and method, decoding device and method, and program
EP1420401A1 (en) Method and apparatus for converting a compressed audio data stream with fixed frame length including a bit reservoir feature into a different-format data stream
CN1826635A (en) Audio file format conversion
CN115312069A (en) Audio encoding and decoding method and device, computer readable medium and electronic equipment
JP2001094432A (en) Sub-band coding and decoding method
KR20100008312A (en) Encoder and decoder for encoding/decoding location information about important spectral component of audio signal
KR20140027831A (en) Audio signal transmitting apparatus and method for transmitting audio signal, and audio signal receiving apparatus and method for extracting audio source thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant