CN112420059B

CN112420059B - Audio coding quantization control method combining code rate layering and quality layering

Info

Publication number: CN112420059B
Application number: CN202011105481.4A
Authority: CN
Inventors: 梅元刚; 刘宇新; 朱政
Original assignee: Hangzhou Microframe Information Technology Co ltd
Current assignee: Hangzhou Microframe Information Technology Co ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2022-04-19
Anticipated expiration: 2040-10-15
Also published as: CN112420059A

Abstract

The invention discloses an audio coding quantization control method combining code rate layering and quality layering, and belongs to the field of audio coding. The method comprises the following steps: firstly, pre-coding audio according to a predicted quality control factor (acrf) and an initial code rate layer; and then, adjusting the code rate according to the coding result, so that the BDrate (corresponding relation representing code rate consumption and quality improvement) reaches the highest at the code rate, the relative quality is the best in an interval, and a reasonable linear mapping mode between the quality control factor and the code rate is obtained, thereby balancing the quality and adjusting the code rate corresponding to the quality.

Description

Audio coding quantization control method combining code rate layering and quality layering

Technical Field

The invention relates to the technical field of audio coding, in particular to an audio coding quantization control method combining code rate layering and quality layering.

Background

The main purpose of audio coding is to remove the statistical redundancy and the perceptual redundancy of the input signal to the maximum extent to realize the compression of data volume under the premise of ensuring certain subjective hearing quality, so as to meet the requirements under different transmission and storage conditions.

In the following scenarios, there is a need for layered control of audio bitrate, and it is desirable to apply different bitrate under different conditions and maintain the audio quality under each condition as much as possible.

1) The audio content is subjected to slimming and is stored according to quality grading, so that the storage space is saved and the audio quality is maintained as much as possible.

2) In the face of online voice interaction:

scenes including real-time communication, live broadcasting, mobile-end on-demand, large-scale audio service system online voice forwarding and the like;

there is a need to both conserve traffic and control concurrent bandwidth and keep services stable while maintaining audio quality.

The limitation of the current method is that:

when the audio is coded by using a Constant Bit Rate (CBR), the bit rate is effectively controlled, but the quality and the details of the audio cannot be guaranteed while the bit rate is reduced. The code rate is limited for the content scene without the quality reduction, and although the code rate is effectively controlled, the code rate is not accurately controlled in different code rate intervals and has certain jitter; the quality is unstable, and the fluctuation is large; the quality of certain sound scenes (speech, music and the like) is damaged greatly, and the subjective effect is poor.

When the audio is coded by using the Variable Bit Rate (VBR), the quality can be relatively kept stable, but the whole bit rate is large, the bit rate cannot be accurately controlled, the hierarchical control of different bit rates is lacked, and the specific expression is that the interval of the variable bit rate is narrow; for different contents, certain contents or certain code rate intervals, the code rate has larger fluctuation, and the contribution of the large code rate to the quality is not large; the requirements of code rate layering and stable control quality of actual products cannot be met.

When the audio coding is performed, the Constant Bit Rate (CBR) and the Variable Bit Rate (VBR) cannot achieve linear control of the bit rate and guarantee the stability of the quality for different tasks.

Disclosure of Invention

In view of the above disadvantages, the present invention provides a method for audio coding quantization control combining rate layering and quality layering. The core idea of the invention is to find a reasonable linear mapping mode by combining the code rate and the quality quantization control factor (ACRF), set a reasonable code rate control interval and a boundary point of the quality control interval, and realize the balance of quality and simultaneously adjust the code rate according to the quality requirement.

The invention provides a method for audio coding quantization control by combining code rate layering and quality layering, which comprises the following steps:

(1) and establishing an initial code rate grading code table according to the sampling rate of the input audio, wherein the grading code table comprises the sampling rate, the single-track code rate and the double-track code rate.

(2) And determining the code rate of which gear is adopted by the input audio according to the frame length of the input audio, the sampling rate and the code rate requirement of each channel.

(3) And determining a target quality hierarchy and a corresponding code rate hierarchy.

(4) And finding out the corresponding controlled code rate interval boundary according to different quality control factors ACRF and an initial code rate grading code table, and determining the boundary point of the quality layering and the code rate layering.

(5) And (5) pre-coding the ACRF and the code rate determined in the step (4) to obtain a pre-coded code rate range, namely an actual code rate range corresponding to the quality.

(6) According to the obtained optimal quality quantization factor acrf of each gear_bestAnd a minimum quality quantization factor min _ f and a maximum quality quantization factor max _ f, calculating a final coding quality quantization factor according to a linear mapping:

acrf= min_f + (max_f - min_f)×(max_f - min_f) ×acrf_best / (max_acrf - min_acrf)。

the code rate and the quality are controlled by using ACRF, the interval is (min _ ACRF, max _ ACRF), and the ACRF is the currently selected coding quality quantization factor.

(7) And (5) coding by using the code rate corresponding to the acrf obtained in the step (6): carrying out overall code rate smoothing control at a frame coding level; and generating code rate deviation correction and summarizing the code rate deviation correction to a trend table to be used as reference in the next control period and fine control of a frame level, scoring the coded audio through PESQ, wherein the quality deviation PESQ (scoring) is within 0.15, namely, the deviation correction is stopped, which shows that the BDrate (corresponding relation representing code rate consumption and quality improvement) is the highest under the code rate and the relative quality is the best in an interval.

(8) Automatically classifying the sampling rate, the code rate of the original audio and the number of sound channels according to a global summary trend table, and matching the fastest initial code rate file and the fastest quality file to form a quality curve and a code rate table; the initial code rate file refers to an initial code rate grading code table established according to the sampling rate in the step (1), and the quality file refers to quality layering in the step (3) and the step (4).

Drawings

Fig. 1 is a flowchart of a method for audio coding quantization control combining rate layering and quality layering according to the present invention.

FIG. 2 is a schematic diagram of a mass curve.

Fig. 3 is a code rate table.

Detailed Description

In order to make the technical solutions in the present specification better understood, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present specification without any creative effort shall fall within the protection scope of the present specification.

The present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, a method for controlling quantization in audio coding combining rate layering and quality layering according to the present invention includes the following steps:

(1) an initial code rate grading code table is established according to the sampling rate of the input audio, the grading code table comprises a sampling rate, a single-channel code rate and a two-channel code rate, the preferred sampling rate is 0, 12000, and 576001, the corresponding single-channel code rate is 3700, 5000, and 17000, and the two-channel code rate is 5000, 6400, and 17000, which is specifically as follows:

{0,3700,5000}

{12000,5000,6400}

{20000,6900,9640}

{28000,9600, 13050}

{40000, 12060, 14260}

{56000, 13950, 15500}

{72000, 14200, 16120}

{96000, 17000, 17000}

{576001,17000, 17000}。

(2) determining the code rate of which gear is adopted by the input audio according to the frame length and the sampling rate of the input audio and the code rate requirement of each channel; if a code rate file with the sampling rate of 40000 is selected: {40000, 12060, 14260}.

(3) Target quality hierarchies and corresponding rate hierarchies are determined, with preferred quality hierarchies being { acrf _ q0, acrf _ q1, acrf _ q2, acrf _ q3, acrf _ q4, acrf _ q5, acrf _ q6}, and corresponding rate hierarchies being {26kbps, 32kbps, 40kbps, 50kbps, 60kbps, 80kbps, 100kbps }.

(4) According to different quality control factors ACRF and an initial code rate grading code table, finding a corresponding controlled code rate interval boundary, and determining a boundary point of quality grading and code rate grading, wherein for example, the quality grade of the current audio is ACRF _ q0, and if the ACRF =50, the corresponding code rate controlled boundary is (26kbps, 32 kbps).

(5) Pre-coding the ACRF and the code rate determined in the step (4) to obtain a pre-coded code rate range, namely an actual code rate range corresponding to the quality; for example: the value of aclf =51 of the aclf _ q0 is a suitable value, and the code rate interval corresponding to it is (27 kbps, 30 kbps), because (27 kbps, 30 kbps) is within the (26kbps, 32kbps) interval, the aclf =51 is considered as the optimum aclf value within the aclf _ q 0.

acrf= min_f + (max_f - min_f)×(max_f - min_f) ×acrf_best / (max_acrf - min_acrf)

and controlling the code rate and the quality by using ACRF, wherein min _ ACRF is the minimum value of the whole coding quality quantization factor interval, preferably 1, and max _ ACRF is the maximum value of the whole coding quality quantization factor interval, preferably 51, the interval is (min _ ACRF, max _ ACRF), and the ACRF is the currently selected coding quality quantization factor.

(8) Automatically classifying the sampling rate, the code rate of the original audio and the number of sound channels according to a global summary trend table, and matching the fastest initial code rate file and the fastest quality file to form a quality curve and a code rate table; the initial code rate file refers to an initial code rate grading code table established according to the sampling rate in the step (1), and the quality file refers to quality layering in the step (3) and the step (4). The overall correspondence is as in the quality curve of fig. 2 and the code rate table of fig. 3.

The invention combines the code rate and the quality quantization control factor ACRF, and achieves ideal quality layering and corresponding code rate layering through a reasonable linear mapping mode of the code rate and the quality quantization control factor ACRF, and can adjust the code rate corresponding to the quality while balancing the quality, and the quality curve shown in figure 2 and the code rate table shown in figure 3 are obtained through simulation experiments, and the quality curve and the code rate table can intuitively reflect the code rate-quality layering and the mapping relation.

In the face of different audio coding tasks, different quality gears can be selected according to different contents, code rate layering is carried out according to the different quality gears, the code rate is accurately controlled, in each quality gear, the code rate control interval is small, and corresponding quality fluctuation is small, so that the code rate is saved to the maximum extent while the quality is kept stable, and the method has great value for audio-related service providers and operators.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for audio coding quantization control combining code rate layering and quality layering, characterized by comprising the steps of:

(1) establishing an initial code rate grading code table according to the sampling rate of the input audio, wherein the grading code table comprises the sampling rate, the single track code rate and the double track code rate;

(2) determining the code rate of which gear is adopted by the input audio according to the frame length and the sampling rate of the input audio and the code rate requirement of each channel;

(3) determining a target quality layer and a corresponding code rate layer;

(4) finding out the corresponding controlled code rate interval boundary according to different quality control factors ACRF and an initial code rate grading code table, and determining a boundary point of quality layering and code rate layering;

(5) pre-coding the ACRF and the code rate determined in the step (4) to obtain a pre-coded code rate range, namely an actual code rate range corresponding to the quality;

acrf= min_f + (max_f - min_f) * (max_f - min_f)* acrf_best / (max_acrf - min_acrf)

the code rate and the quality are controlled by using ACRF, the interval is (min _ ACRF, max _ ACRF), and the ACRF is the currently selected coding quality quantization factor;

(7) and (5) coding by using the code rate corresponding to the acrf obtained in the step (6): carrying out overall code rate smoothing control at a frame coding level; generating code rate deviation correction and summarizing the code rate deviation correction to a trend table to be used as reference in the next control period and fine control of a frame level, scoring the coded audio through PESQ, stopping deviation correction when the quality deviation PESQ is within 0.15, and showing that the BDrate is highest under the code rate and the relative quality is best in an interval; the BDrate represents the corresponding relation between code rate consumption and quality improvement;