CN101894560B - Reference source-free MP3 audio frequency definition objective evaluation method - Google Patents

Reference source-free MP3 audio frequency definition objective evaluation method Download PDF

Info

Publication number
CN101894560B
CN101894560B CN2010102156001A CN201010215600A CN101894560B CN 101894560 B CN101894560 B CN 101894560B CN 2010102156001 A CN2010102156001 A CN 2010102156001A CN 201010215600 A CN201010215600 A CN 201010215600A CN 101894560 B CN101894560 B CN 101894560B
Authority
CN
China
Prior art keywords
audio
high frequency
value
sigma
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010102156001A
Other languages
Chinese (zh)
Other versions
CN101894560A (en
Inventor
余小清
张静
石成林
刘军伟
万旺根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN2010102156001A priority Critical patent/CN101894560B/en
Publication of CN101894560A publication Critical patent/CN101894560A/en
Application granted granted Critical
Publication of CN101894560B publication Critical patent/CN101894560B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a reference source-free MP3 audio frequency definition objective evaluation method, which directly and objectively evaluates the definition quality of MP3 audio frequency. The method comprises the following steps of: firstly, acquiring an MP3 stereo audio frequency file and extracting a medium-high frequency index (MHFI), which affects the definition quality of the audio frequency, from the MP3 stereo audio frequency file, wherein the high audio frequency definition means that the audio frequency comprises more medium-high frequency components; secondly, computing possibilistic entropy of the medium-high frequency index of the audio frequency of each frame, wherein the entropy value reflects the rich degree of the audio frequency information, and the possibilistic entropy value of the highly definite audio frequency is higher than that of the indefinite audio frequency, so that the possibilistic entropy value can be used as a reference source-free audio frequency definition evaluation function; and finally, calculating and mapping the entropy value and defining the value of the audio frequency definition between 0 and 5. Experiments show that the method can effectively measure an objective evaluation value of the reference source-free MP3 stereo audio frequency definition, and the value is close to a subject evaluation value and is consistent with the hearing feeling of human ears.

Description

一种无参考源的MP3音频清晰度客观评价方法An objective evaluation method of MP3 audio clarity without reference source

技术领域 technical field

本发明涉及音频质量客观评价,特别是一种对无参考源MP3(MPEG1-layer3)音频的清晰度客观评价方法。The invention relates to the objective evaluation of audio quality, in particular to an objective evaluation method for the clarity of MP3 (MPEG1-layer3) audio without a reference source.

背景技术 Background technique

本发明一种无参考源的音频清晰度客观评价技术是直接对音频系统输出信号的清晰度质量在0-5区间进行分值评判。An objective evaluation technology of audio clarity without a reference source in the present invention is to directly evaluate the clarity quality of an output signal of an audio system in the range of 0-5.

音频质量客观评价,从评价结构上可分为基于输入-输出的评价和基于输出的评价。其中大部分工作集中在基于输入-输出的评价,它是以语音系统的输入信号和输出信号之间的误差大小来判别语音质量的好坏,是一种误差度量;而基于输出的评价是仅根据语音系统的输出信号来进行质量评价,无参考源;另一方面,从评价内容上可分为对音频整体的综合评价及分指标(如:清晰度、丰满度、明亮度、柔和度等)进行评价。基于输入-输出的音频质量客观评价方法目前比较完善,基于输出的评价方面起步较晚且在评价内容方面基本上是针对音频总体质量进行评价,主要有感知线性预测方法、基于模糊多类支持向量机的评价方法、度量语谱图密度分布特征的方法等。目前还没有涉及到对输出信号分指标进行质量评价的方法,本发明正是研究对无参考源MP3音频信号的清晰度指标进行客观评价。The objective evaluation of audio quality can be divided into evaluation based on input-output and evaluation based on output from the evaluation structure. Most of the work focuses on the evaluation based on input-output, which is to judge the quality of speech quality by the error between the input signal and output signal of the speech system, which is a kind of error measurement; while the evaluation based on output is only Quality evaluation is performed based on the output signal of the speech system, without reference sources; on the other hand, from the evaluation content, it can be divided into comprehensive evaluation of the overall audio and sub-indices (such as: clarity, fullness, brightness, softness, etc. ) for evaluation. The objective evaluation method of audio quality based on input-output is relatively perfect at present, and the evaluation based on output starts relatively late and basically evaluates the overall quality of audio in terms of evaluation content. There are mainly perceptual linear prediction methods, based on fuzzy multi-class support vectors The evaluation method of the machine, the method of measuring the density distribution characteristics of the spectrogram, etc. At present, there is no method for evaluating the quality of the sub-indices of the output signal. The present invention is just to study the objective evaluation of the clarity index of the MP3 audio signal without a reference source.

本发明所提出的无参考源的音频清晰度客观评价方法,改善了主观评价费时、费力及成本高的缺点,同时还解决了目前占据优势地位的基于输入-输出客观评价方法有时难以提供参考信号的不利因素,可进一步为音频丰满度、明亮度等其它指标的客观评价提供参考,同时也可将各项指标作为高层感知参数,用于MP3压缩域的语音识别与分类检索中。The objective evaluation method of audio clarity without a reference source proposed by the present invention improves the shortcomings of time-consuming, labor-intensive and high-cost subjective evaluation, and also solves the problem that the currently dominant objective evaluation method based on input-output is sometimes difficult to provide reference signals The unfavorable factors can further provide a reference for the objective evaluation of other indicators such as audio fullness and brightness. At the same time, various indicators can also be used as high-level perception parameters for speech recognition and classification retrieval in the MP3 compressed domain.

发明内容 Contents of the invention

本发明的目的在于提供一种无参考源的MP3音频清晰度客观评价方法,提供最佳音频清晰度质量的判据。从MP3压缩数据中直接提取反映音频清晰度的特征参数-中高频指数(MHFI),并计算MHFI的可能性熵值,通过对熵值进行统计映射,将待测音频清晰度的分值限定在0-5分之间,实现对无参考源音频清晰度指标的客观评价。The purpose of the present invention is to provide an objective evaluation method of MP3 audio clarity without a reference source, providing a criterion for the best audio clarity quality. Directly extract the characteristic parameter reflecting the audio clarity from the MP3 compressed data - mid-high frequency index (MHFI), and calculate the possibility entropy value of MHFI, through the statistical mapping of the entropy value, the score of the audio clarity to be tested is limited to Between 0 and 5 points, to achieve an objective evaluation of the audio clarity index without a reference source.

本发明解决其技术问题采用的技术方案为:先从MP3压缩音频数据中提取中高频指数,计算中高频指数的可能性熵值,通过统计映射得到待测音频清晰度的分值。The technical solution adopted by the present invention to solve the technical problem is: first extract the mid-high frequency index from the MP3 compressed audio data, calculate the possibility entropy value of the mid-high frequency index, and obtain the score of the audio clarity to be tested through statistical mapping.

本发明解决其技术问题所采用的技术方案还可以进一步完善。首先从MP3压缩音频数据中生成修正离散余弦变换MDCT(Modified Discrete Cosine Transform)矩阵,再从中提取有效的特征参数:中高频指数,计算中高频指数的可能性熵值,通过对熵值进行统计映射,将待测音频清晰度的分值限定在0-5分之间,实现对无参考源音频清晰度指标的客观评价。该方法具体包括如下步骤:The technical solution adopted by the present invention to solve the technical problem can be further improved. Firstly, the modified discrete cosine transform MDCT (Modified Discrete Cosine Transform) matrix is generated from the MP3 compressed audio data, and then the effective characteristic parameters are extracted from it: mid-high frequency index, and the possibility entropy value of the mid-high frequency index is calculated, and the entropy value is statistically mapped , to limit the score of the audio clarity to be tested between 0 and 5 points, so as to realize the objective evaluation of the audio clarity index without a reference source. The method specifically includes the following steps:

1)MP3压缩音频的预处理:包括对解码帧头,边信息读取,主数据读取,哈夫曼解码和量化四个部分;1) Preprocessing of MP3 compressed audio: including four parts: decoding frame header, side information reading, main data reading, Huffman decoding and quantization;

2)生成MDCT矩阵:找出每一子带中的MDCT系数,对子带中系数排列,形成矩阵三部分;2) generate the MDCT matrix: find out the MDCT coefficients in each subband, arrange the coefficients in the subbands, and form three parts of the matrix;

3)压缩域特征参数的提取:中高频指数MHFI(Medium-high frequency index);3) Extraction of characteristic parameters in the compressed domain: MHFI (Medium-high frequency index);

4)计算中高频指数的可能性熵(E):

Figure BSA00000187258600021
4) Calculate the possibility entropy (E) of the medium and high frequency indices:
Figure BSA00000187258600021

5)统计映射:将计算所得的中高频指数可能性熵值进行统计,将其映射至0-5区间并输出。5) Statistical mapping: Statistically calculate the possibility entropy value of the medium and high frequency index, map it to the interval of 0-5 and output it.

本发明有益的效果是:直接从MP3压缩音频数据中提取有效的特征参数,比将压缩数据解压后再提取特征,既算法更简单,又节省计算时间;弥补了主观评价费时、费力及成本高等缺陷,同时还解决了目前占据优势地位的基于输入-输出客观评价方法有时难以提供参考信号的不利因素,可进一步为音频丰满度、明亮度等其它指标的客观评价提供参考,同时也可将各项指标作为高层感知参数,用于MP3压缩域的语音识别与检索中。The beneficial effects of the present invention are: directly extracting effective feature parameters from MP3 compressed audio data, compared with extracting features after decompressing the compressed data, the algorithm is simpler, and the calculation time is saved; the time-consuming, laborious and high cost of subjective evaluation is compensated At the same time, it also solves the unfavorable factors that the currently dominant objective evaluation method based on input-output is sometimes difficult to provide reference signals, which can further provide reference for the objective evaluation of other indicators such as audio fullness and brightness. At the same time, various The item index is used as a high-level perception parameter for speech recognition and retrieval in MP3 compression domain.

附图说明 Description of drawings

图1是本发明一种无参考源的MP3音频清晰度客观评价方法的流程图。Fig. 1 is a flowchart of an objective evaluation method for MP3 audio clarity without a reference source in the present invention.

图2是主、客观评价分值的对比图。Figure 2 is a comparison chart of subjective and objective evaluation scores.

具体实施方式 Detailed ways

本发明的一个优选实例结合附图1说明如下:本无参考源的MP3音频清晰度客观评价方法,共分五步:A preferred example of the present invention is described as follows in conjunction with accompanying drawing 1: this MP3 audio frequency clarity objective evaluation method without reference source, divides five steps altogether:

第一步:MP3压缩域音频数据处理Step 1: MP3 compressed domain audio data processing

压缩域音频数据处理可分为:帧头信息读取,边信息的读取,主数据读取,哈夫曼解码和量化。Compressed domain audio data processing can be divided into: frame header information reading, side information reading, main data reading, Huffman decoding and quantization.

1)帧头信息读取1) Read frame header information

a)定义存放帧头信息的结构体;a) Define a structure for storing frame header information;

b)读取帧中同步信息;b) read the synchronization information in the frame;

c)使解码器与数据流同步;c) synchronizing the decoder with the data stream;

d)确定该帧数据起始位置,存放帧头信息;d) Determine the starting position of the frame data and store the frame header information;

2)边信息的读取2) Reading of side information

a)定义存放边信息的结构体;a) Define the structure for storing side information;

b)由帧头结束位置确定边信息开始位置;b) The start position of the side information is determined by the end position of the frame header;

c)存放边信息;c) store side information;

3)主数据读取3) Master data read

a)定义存放缩放因子的结构,存放主数据大小;a) Define the structure for storing the scaling factor, and store the size of the main data;

b)计算主数据长度;b) Calculate the master data length;

c)申请主数据长度的内存空间;c) Apply for the memory space of the main data length;

d)读取主数据;d) read master data;

e)读取缩放因子;e) read scaling factor;

4)哈夫曼解码和反量化4) Huffman decoding and dequantization

a)定义一个颗粒中存放哈夫曼解码数据的数组is[32][18];a) Define an array is[32][18] that stores Huffman decoded data in a particle;

b)根据边信息确定主数据中哈夫曼数据起始位置;b) Determine the starting position of the Huffman data in the main data according to the side information;

c)对哈夫曼数据进行解码并将解码数据放在is[32][18]中;c) Decode the Huffman data and put the decoded data in is[32][18];

d)对is[32][18]中的数据进行反量化,仍存放于is[32][18]中。d) Dequantize the data in is[32][18] and store it in is[32][18].

第二步:生成MDCT系数矩阵Step 2: Generate MDCT coefficient matrix

每个颗粒的数据由32个子带构成且每一子带含有18个系数,根据频率由低到高分布的原则,每一颗粒可形成一个32×18的矩阵。该过程如下:The data of each particle consists of 32 subbands and each subband contains 18 coefficients. According to the principle of frequency distribution from low to high, each particle can form a 32×18 matrix. The process is as follows:

1、找出每一子带系数1. Find the coefficients of each subband

a)找出is[32][18]中子带的系数Si,共32个;a) Find the coefficients S i of the subbands in is[32][18], 32 in total;

b)定义Si子带中的系数为Si[j],每一子带系数18个。b) Define the coefficients in the S i subband as S i [j], and each subband has 18 coefficients.

2、形成行向量2. Form a row vector

a)按频率高低原则重新排列Si中系数,仍存放于Si[j]中;a) Rearrange the coefficients in S i according to the principle of frequency, and still store them in S i [j];

b)将每一子带排列完成后的Si[j]看作是矩阵中的行向量。b) S i [j] after each sub-band arrangement is regarded as a row vector in the matrix.

3、形成矩阵3. Form a matrix

a)将Si[j]行向量依子带序号组合形成32×18M[i][j];a) Combine the S i [j] row vectors according to the sub-band numbers to form 32×18M[i][j];

b)依照上述原则,一帧中两个颗粒的MDCT系数矩阵表示为M1[i][j],M2[i][j]b) According to the above principle, the MDCT coefficient matrix of two particles in one frame is expressed as M 1 [i][j], M 2 [i][j]

第三步:压缩域特征参数的提取Step 3: Extraction of characteristic parameters in the compressed domain

所提取的压缩域特征为:中高频指数MHFI(Medium-high frequency index)参数,具体计算步骤如下:The extracted compressed domain features are: Medium-high frequency index MHFI (Medium-high frequency index) parameters, the specific calculation steps are as follows:

a)计算MP3音频每个颗粒修正离散余弦变换系数的平方和:a) Calculate the sum of the squares of each particle modified discrete cosine transform coefficient of the MP3 audio:

ΣΣ ii == 11 3232 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ;;

式中i、j表示的是边带序号和边带内系数的序号;M[i][j]为MDCT系数值。In the formula, i and j represent the serial number of the sideband and the serial number of the coefficient in the sideband; M[i][j] is the MDCT coefficient value.

b)计算每个颗粒中高频段MDCT系数的平方和:b) Calculate the sum of squares of high-frequency MDCT coefficients in each particle:

ΣΣ ii == 22 77 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ;;

其中:系数序号区间可根据选定的中高频频段进行小范围适度调整;Among them: the range of coefficient serial numbers can be moderately adjusted in a small range according to the selected medium and high frequency bands;

c)定义每个颗粒的中高频指数MHFI(Medium-high frequency index)为:c) Define the medium-high frequency index MHFI (Medium-high frequency index) of each particle as:

MHFIMHFI == ΣΣ ii == 22 77 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ΣΣ ii == 11 3232 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]]

第四步:计算中高频指数的可能性熵EStep 4: Calculate the possibility entropy E of the medium and high frequency indices

EE. == -- ΣΣ ii == 11 44 NN pp ii lnln pp ii

其中N为测试音频总的帧数,pi为中高频指数值,由于每帧包含两个颗粒且测试音频为MP3双声道音频数据,故每帧对应四个中高频指数值;Wherein N is the total frame number of the test audio, p i is the mid-high frequency index value, since each frame contains two particles and the test audio is MP3 two-channel audio data, so each frame corresponds to four mid-high frequency index values;

第五步:统计映射Step Five: Statistical Mapping

a)求整段音频中高频指数可能性熵的均值EM:a) Find the mean value EM of the high-frequency exponential possibility entropy in the entire audio:

EMEM == -- 11 44 NN ΣΣ ii == 11 44 NN pp ii lnln pp ii

式中N为音频总的帧数,pi为中高频指数的值,由于每帧包含两个颗粒且测试音频为MP3立体声音频数据,每帧对应四个中高频指数的值;In the formula, N is the total frame number of the audio, and p i is the value of the middle and high frequency index, because each frame contains two particles and the test audio is MP3 stereo audio data, and each frame corresponds to the value of four middle and high frequency indexes;

b)将中高频指数可能性熵的均值EM映射至0-5区间并输出;b) Map the mean value EM of the possibility entropy of the mid-to-high frequency index to the 0-5 interval and output it;

首先将中高频指数可能性熵的均值进行适度放大,得SII,再通过非线性映射函数将SII值映射至0-5区间,即得清晰度指数值AI(Articulation Index),其中常用的映射函数还包括:平方函数、对数函数、截取函数、窗口函数、阈值函数、多值量化函数;First, the mean value of the possibility entropy of the medium and high frequency indices is moderately amplified to obtain the SII, and then the SII value is mapped to the 0-5 interval through a nonlinear mapping function to obtain the articulation index AI (Articulation Index), among which the commonly used mapping function Also includes: square function, logarithmic function, intercept function, window function, threshold function, multi-valued quantization function;

Figure BSA00000187258600051
Figure BSA00000187258600051

AI=10/π*arctan(SII)AI=10/π*arctan(SII)

最终输出清晰度分值。Finally output the sharpness score.

实验结果Experimental results

本实验使用的音频资料为MP3立体声数据,其采样频率为44.1KHz。将音频资料分为三组,每组为四段内容相同但清晰度质量主观感觉依次降低的音频数据。按照上述音频清晰度质量客观评价方法,可以得到对应的清晰度客观评价分值,与其主观评价分值进行数据对比结果表格如下:The audio data used in this experiment is MP3 stereo data, and its sampling frequency is 44.1KHz. Divide the audio data into three groups, and each group consists of four pieces of audio data with the same content but the subjective perception of clarity and quality decreases successively. According to the above-mentioned objective evaluation method of audio clarity quality, the corresponding objective evaluation score of clarity can be obtained, and the data comparison result table with the subjective evaluation score is as follows:

表1:MP3立体声音频清晰度质量主、客观评价分值对比其主、客观评价分值的对比如图2所示。Table 1: Comparison of subjective and objective evaluation scores of MP3 stereo audio clarity quality The comparison of subjective and objective evaluation scores is shown in Figure 2.

从无参考源音频清晰度主、客观评价分值对比图可以清楚地看出,本发明的无参考源MP3音频清晰度客观评价方法,能针对MP3压缩数据有效地计算出相应音频的清晰度分值,且与主观评价分值非常接近,符合人耳的听觉感受。It can be clearly seen from the comparison chart of subjective and objective evaluation scores of audio clarity without a reference source that the objective evaluation method for MP3 audio clarity without a reference source of the present invention can effectively calculate the clarity score of the corresponding audio frequency for MP3 compressed data. value, and is very close to the subjective evaluation score, which is in line with the auditory experience of the human ear.

Claims (8)

1.一种无参考源的MP3音频清晰度客观评价方法,其特征在于:首先通过对MP3压缩音频部分解码得到修正离散余弦变换系数,其次对这些数据计算频域中高频指数(MHFI),即:每帧压缩域音频信号中高频频率成分的能量与每帧频域总能量的比值;然后选用可能性熵函数作为无参考MP3音频清晰度评价函数,最后对中高频指数的可能性熵值进行统计映射得到清晰度客观评价分值。1. a kind of MP3 audio clarity objective evaluation method without reference source, it is characterized in that: at first obtain modified discrete cosine transform coefficient by decoding MP3 compressed audio part, secondly to these data calculation frequency domain middle and high frequency index (MHFI), i.e. : the ratio of the energy of the high-frequency components in each frame of compressed-domain audio signal to the total energy of each frame in the frequency domain; then the possibility entropy function is selected as the evaluation function of the no-reference MP3 audio clarity, and finally the possibility entropy value of the mid-high frequency index is calculated Statistical mapping yields an objective evaluation score for clarity. 2.根据权利要求1所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:具体操作步骤如下:2. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 1, is characterized in that: concrete operation steps are as follows: a)MP3压缩音频的预处理:解码帧头,边信息读取,主数据读取,哈夫曼解码和量化;a) Preprocessing of MP3 compressed audio: decoding frame header, reading side information, reading main data, Huffman decoding and quantization; b)生成修正离散余弦变换MDCT矩阵:找出每一子带中的修正离散余弦变换系数,对子带中系数排列,形成矩阵;b) Generate a modified discrete cosine transform MDCT matrix: find out the modified discrete cosine transform coefficients in each subband, and arrange the coefficients in the subbands to form a matrix; c)压缩域特征参数的提取:中高频指数MHFI,即:每帧压缩域音频信号中高频频率成分的能量与每帧频域总能量的比值:c) Extraction of characteristic parameters in the compressed domain: medium and high frequency index MHFI, namely: the ratio of the energy of the high frequency components in the compressed domain audio signal of each frame to the total energy of the frequency domain of each frame: MHFIMHFI == ΣΣ ii == 22 77 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ΣΣ ii == 11 3232 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] 式中i、j表示边带序号和边带内系数的序号,分子中边带序号i的值可根据选定的中高频频段进行小范围适度调整;M[i][j]为MDCT系数值;In the formula, i and j represent the serial number of the sideband and the serial number of the coefficient in the sideband, and the value of the sideband serial number i in the numerator can be moderately adjusted in a small range according to the selected medium and high frequency band; M[i][j] is the MDCT coefficient value ; d)计算中高频指数的可能性熵E:d) Calculate the possibility entropy E of the medium and high frequency indices: EE. == -- ΣΣ ii == 11 kk pp ii lnln pp ii 式中k代表不同组份的数目,pi代表第i组份发生的可能性,与香农熵的区别在于:可能性熵不需再满足各组分发生概率之和必须为1的约束;In the formula, k represents the number of different components, and p i represents the possibility of occurrence of the i-th component, which differs from Shannon entropy in that the possibility entropy no longer needs to satisfy the constraint that the sum of the occurrence probabilities of each component must be 1; e)统计映射:将计算所得中高频指数的可能性熵值进行求均统计,并将其映射至0-5区间;e) Statistical mapping: perform average statistics on the calculated possible entropy values of the medium and high frequency indices, and map them to the 0-5 interval; i.求整段音频中高频指数可能性熵的均值EM:i. Find the mean value EM of the high-frequency exponential possibility entropy in the entire audio: EMEM == -- 11 44 NN ΣΣ ii == 11 44 NN pp ii lnln pp ii ;; 式中:N为音频总的帧数,pi为中高频指数的值,由于每帧包含两个颗粒且测试音频为MP3立体声音频数据,每帧对应四个中高频指数;In the formula: N is the total frame number of the audio, p i is the value of the mid-high frequency index, because each frame contains two particles and the test audio is MP3 stereo audio data, and each frame corresponds to four mid-high frequency indices; ii.将中高频指数可能性熵的均值EM映射至0-5区间;ii. Map the mean value EM of the possibility entropy of the medium and high frequency indices to the 0-5 interval; 首先将中高频指数可能性熵的均值进行适度放大,得SII,再通过非线性映射函数将SII值映射至0-5区间,即得清晰度指数值AI,其中常用的映射函数还包括:平方函数、对数函数、截取函数、窗口函数、阈值函数、多值量化函数;Firstly, the mean value of the possibility entropy of the medium and high frequency indices is moderately amplified to obtain the SII, and then the SII value is mapped to the range of 0-5 through a nonlinear mapping function to obtain the AI of the clarity index. The commonly used mapping functions also include: square function, logarithmic function, intercept function, window function, threshold function, multi-valued quantization function;
Figure FSB00000727647400022
Figure FSB00000727647400022
AI=10/π*arctan(SII)AI=10/π*arctan(SII) f)输出清晰度分值:即统计映射得到的清晰度指数AI值。f) Output sharpness score: that is, the sharpness index AI value obtained by statistical mapping.
3.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤a)MP3压缩音频预处理的具体实现方法是:3. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 2, is characterized in that: described step a) the specific implementation method of MP3 compressed audio preprocessing is: a)帧头信息读取,a) Read frame header information, b)边信息的读取,b) reading of side information, c)主数据读取,c) master data read, d)哈夫曼解码和反量化。d) Huffman decoding and dequantization. 4.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤b)生成MDCT矩阵具体实现方法是:4. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 2, is characterized in that: described step b) generates MDCT matrix concrete implementation method is: 1)、找出每一子带系数;1), find out each sub-band coefficient; a)找出每个颗粒哈夫曼解码数据数组中子带的系数,共32个;a) Find out the coefficients of the subbands in each granular Huffman decoding data array, a total of 32; b)定义第i个子带中的系数为Si[j],每一子带系数18个;b) Define the coefficients in the i-th subband as S i [j], and each subband has 18 coefficients; 2)、形成行向量:2), forming a row vector: a)按频率高低原则重新排列第i个子带中的系数,仍存放于Si[j]中;a) Rearrange the coefficients in the i-th subband according to the principle of high and low frequencies, and still store them in S i [j]; b)将每一子带排列完成后的Si[j]看作是矩阵中的行向量;b) Treat S i [j] after each sub-band is arranged as a row vector in the matrix; 3)、形成矩阵3), forming a matrix a)将Si[j]行向量依子带序号组合形成32×18的M[i][j];a) Combine the S i [j] row vectors according to the sub-band numbers to form a 32×18 M[i][j]; b)依照上述原则,一帧中两个颗粒的MDCT系数矩阵表示为M1[i][j],M2[i][j]。b) According to the above principles, the MDCT coefficient matrices of two particles in one frame are expressed as M 1 [i][j], M 2 [i][j]. 5.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤c)压缩域特征参数提取的具体实现方法是:5. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 2, is characterized in that: described step c) the concrete implementation method that compressed domain characteristic parameter extracts is: 1)、中高频指数MHFI(Medium-high frequency index)参数1), MHFI (Medium-high frequency index) parameters a)计算MP3音频每个颗粒修正离散余弦变换系数的平方和:a) Calculate the sum of the squares of each particle modified discrete cosine transform coefficient of the MP3 audio: ΣΣ ii == 11 3232 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ;; 式中i、j表示的是边带序号和边带内系数的序号;M[i][j]为MDCT系数值;In the formula, i and j represent the serial number of the sideband and the serial number of the coefficient in the sideband; M[i][j] is the MDCT coefficient value; b)计算每个颗粒中高频频段MDCT系数的平方和:b) Calculate the sum of squares of MDCT coefficients in the high-frequency band of each particle: ΣΣ ii == 22 77 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ;; 其中:系数序号区间可根据选定的中高频频段进行小范围适度调整;Among them: the range of coefficient serial numbers can be moderately adjusted in a small range according to the selected medium and high frequency bands; c)定义每个颗粒的中高频指数MHFI为:c) Define the middle and high frequency index MHFI of each particle as: MHFIMHFI == ΣΣ ii == 22 77 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] ΣΣ ii == 11 3232 ΣΣ jj == 11 1818 Mm 22 [[ ii ]] [[ jj ]] .. 6.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤d)计算中高频指数的可能性熵的具体方法如下:6. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 2, is characterized in that: described step d) the concrete method of calculating the possibility entropy of middle and high frequency index is as follows: EE. == -- ΣΣ ii == 11 kk pp ii lnln pp ii 其中N为测试音频总的帧数,pi为中高频指数,由于每帧包含两个颗粒且测试音频为MP3立体声音频数据,每帧对应四个中高频指数值。Wherein N is the total number of frames of the test audio, and pi is the mid-high frequency index. Since each frame contains two particles and the test audio is MP3 stereo audio data, each frame corresponds to four mid-high frequency index values. 7.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤e)统计映射的具体实现方法是:7. a kind of MP3 audio clarity objective evaluation method without reference source according to claim 2, is characterized in that: the concrete realization method of described step e) statistical mapping is: a)求整段音频中高频指数可能性熵的均值EM:a) Find the mean value EM of the high-frequency exponential possibility entropy in the entire audio: EMEM == -- 11 44 NN ΣΣ ii == 11 44 NN pp ii lnln pp ii 式中N为音频总的帧数,pi为中高频指数的值,由于每帧包含两个颗粒且测试音频为MP3立体声音频数据,每帧对应四个中高频指数的值;In the formula, N is the total frame number of the audio, and p i is the value of the middle and high frequency index, because each frame contains two particles and the test audio is MP3 stereo audio data, and each frame corresponds to the value of four middle and high frequency indexes; b)将中高频指数可能性熵的均值EM映射至0-5区间;b) Map the mean value EM of the possibility entropy of the middle and high frequency indices to the 0-5 interval; 首先将中高频指数可能性熵的均值进行适度放大,得SII值,再通过非线性映射函数将SII值映射至0-5区间,即得清晰度指数值AI,其中常用的映射函数还包括:平方函数、对数函数、截取函数、窗口函数、阈值函数、多值量化函数;First, the mean value of the possibility entropy of the medium and high frequency indices is moderately amplified to obtain the SII value, and then the SII value is mapped to the 0-5 interval through a nonlinear mapping function to obtain the AI of the clarity index value. The commonly used mapping functions also include: Square function, logarithmic function, intercept function, window function, threshold function, multi-valued quantization function; AI=10/π*arctan(SII)。AI=10/π*arctan(SII). 8.根据权利要求2所述的一种无参考源的MP3音频清晰度客观评价方法,其特征在于:所述步骤f)中的清晰度分值为统计映射得到的清晰度指数AI值。8. A kind of MP3 audio definition objective evaluation method without reference source according to claim 2, characterized in that: the definition score in the step f) is the definition index AI value that statistical mapping obtains.
CN2010102156001A 2010-06-29 2010-06-29 Reference source-free MP3 audio frequency definition objective evaluation method Expired - Fee Related CN101894560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102156001A CN101894560B (en) 2010-06-29 2010-06-29 Reference source-free MP3 audio frequency definition objective evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102156001A CN101894560B (en) 2010-06-29 2010-06-29 Reference source-free MP3 audio frequency definition objective evaluation method

Publications (2)

Publication Number Publication Date
CN101894560A CN101894560A (en) 2010-11-24
CN101894560B true CN101894560B (en) 2012-08-15

Family

ID=43103731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102156001A Expired - Fee Related CN101894560B (en) 2010-06-29 2010-06-29 Reference source-free MP3 audio frequency definition objective evaluation method

Country Status (1)

Country Link
CN (1) CN101894560B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102496369B (en) * 2011-12-23 2016-02-24 中国传媒大学 A kind of objective assessment method for audio quality of compressed domain based on distortion correction
CN104681038B (en) * 2013-11-29 2018-03-09 清华大学 Audio signal quality detection method and device
CN104103279A (en) * 2014-07-16 2014-10-15 腾讯科技(深圳)有限公司 True quality judging method and system for music
CN105869656B (en) * 2016-06-01 2019-12-31 南方科技大学 A method and device for determining the clarity of a speech signal
CN109979476B (en) * 2017-12-28 2021-05-14 电信科学技术研究院 Method and device for removing reverberation of voice
CN108682430B (en) * 2018-03-09 2020-06-19 华南理工大学 Method for objectively evaluating indoor language definition
CN110032585B (en) * 2019-04-02 2021-11-30 北京科技大学 Time sequence double-layer symbolization method and device
CN111008299B (en) * 2020-03-11 2020-06-19 北京海天瑞声科技股份有限公司 Quality assessment method, device and computer storage medium for speech database
CN114400022B (en) * 2022-03-25 2022-08-23 北京荣耀终端有限公司 Method, device and storage medium for comparing sound quality

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645475A (en) * 2005-01-18 2005-07-27 中国电子科技集团公司第三十研究所 Establishment of statistics concerned model of acounstic quality normalization
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation Quality Evaluation Method in Computer Aided Language Learning System
CN101727903A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems
CN101727900A (en) * 2009-11-24 2010-06-09 北京中星微电子有限公司 Method and equipment for detecting user pronunciation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7308403B2 (en) * 2002-07-01 2007-12-11 Lucent Technologies Inc. Compensation for utterance dependent articulation for speech quality assessment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645475A (en) * 2005-01-18 2005-07-27 中国电子科技集团公司第三十研究所 Establishment of statistics concerned model of acounstic quality normalization
CN101246685A (en) * 2008-03-17 2008-08-20 清华大学 Pronunciation Quality Evaluation Method in Computer Aided Language Learning System
CN101727903A (en) * 2008-10-29 2010-06-09 中国科学院自动化研究所 Pronunciation quality assessment and error detection method based on fusion of multiple characteristics and multiple systems
CN101727900A (en) * 2009-11-24 2010-06-09 北京中星微电子有限公司 Method and equipment for detecting user pronunciation

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
.MP3压缩域中语音分割的研究与实现.《计算机应用》.2009,
Lei Wang Hongbing Ji Xinbo Gao.Clustering Based on Possibilistic Entropy.《7th International Conference on Signal Processing Proceedings 2004》.2004, *
万旺根
余小清
常辽豫
常辽豫;余小清;万旺根;李昌莲;许雪琼;.MP3压缩域中语音分割的研究与实现.《计算机应用》.2009, *
李昌莲
许雪琼

Also Published As

Publication number Publication date
CN101894560A (en) 2010-11-24

Similar Documents

Publication Publication Date Title
CN101894560B (en) Reference source-free MP3 audio frequency definition objective evaluation method
CN101521014B (en) Audio bandwidth expansion coding and decoding devices
EP2786377B1 (en) Chroma extraction from an audio codec
CN102129456B (en) Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping
CN108269584B (en) Companding apparatus and method for reducing quantization noise using advanced spectral continuation
US7478045B2 (en) Method and device for characterizing a signal and method and device for producing an indexed signal
EP2490215A2 (en) Method and apparatus to extract important spectral component from audio signal and low bit-rate audio signal coding and/or decoding method and apparatus using the same
JP2009515212A (en) Audio compression
WO2014056326A1 (en) Method and device for evaluating voice quality
Korycki Authenticity examination of compressed audio recordings using detection of multiple compression and encoders’ identification
CN111210832B (en) Bandwidth expansion audio coding and decoding method and device based on spectrum envelope template
CN107610710A (en) A kind of audio coding and coding/decoding method towards Multi-audio-frequency object
CN105336333A (en) Multichannel sound signal coding and decoding method and device
CN115715413A (en) Method, device and system for detecting and extracting spatial identifiable sub-band audio source
KR100745688B1 (en) Apparatus for encoding and decoding multichannel audio signal and method thereof
US8751219B2 (en) Method and related device for simplifying psychoacoustic analysis with spectral flatness characteristic values
CN108877816B (en) QMDCT coefficient-based AAC audio frequency recompression detection method
CN109785848A (en) AAC Double Compression Audio Detection Method Based on Scale Factor Coefficient Difference
Gunjal et al. Traditional psychoacoustic model and Daubechies wavelets for enhanced speech coder performance
Qiu-Yu et al. Perceptual hashing algorithm for speech content identification based on spectrum entropy in compressed domain
Umapathy et al. Audio Coding and Classification: Principles and Algorithms
Yu et al. Detecting fake-quality MP3 based on Huffman table index.
CN102760442B (en) 3D video azimuth parametric quantification method
Korycki Authenticity investigation of digital audio recorded as MP3 files
Marengo Rodriguez et al. Perceptual audio coding schemes based on adaptive signal processing tools

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120815

Termination date: 20150629

EXPY Termination of patent right or utility model