CN105989853B

CN105989853B - Audio quality evaluation method and system

Info

Publication number: CN105989853B
Application number: CN201510091491.XA
Authority: CN
Inventors: 杨将; 章继东; 吴维昊
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2015-02-28
Filing date: 2015-02-28
Publication date: 2020-08-18
Anticipated expiration: 2035-02-28
Also published as: CN105989853A

Abstract

The invention discloses an audio quality evaluation method and system, and belongs to the technical field of voice signal processing. The audio quality evaluation method comprises the following steps: receiving audio data input by a user; transcoding the audio data to obtain a plurality of audio sampling point data; respectively calculating the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum of the audio sampling point data; and calculating an audio quality score according to the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum. The audio quality evaluation method integrates a plurality of audio quality parameters to evaluate the audio quality, has strong evaluation result universality and can meet the requirements of most application occasions.

Description

Audio quality evaluation method and system

Technical Field

The invention relates to the technical field of voice signal processing, in particular to an audio quality evaluation method and system.

Background

With the continuous development of digital audio processing technology, the requirement for audio quality is higher and higher. How to determine a suitable evaluation standard to evaluate the audio quality so as to obtain an audio meeting the requirements becomes an important current subject.

In the initial stage of audio technology development, because there is no objective evaluation standard, a large number of audio auditors are usually organized to evaluate and score the audio quality by manually auditioning various audios. The mode has high economic cost and long experimental period due to the fact that a large number of personnel are needed to participate, subjectivity is achieved based on the favor and the standard of each evaluating personnel, and unification is difficult, so that the objectivity and the accuracy of an evaluating result cannot be guaranteed.

With the development of the technology, people summarize indexes with large influence on audio quality to form audio quality parameters as objective evaluation standards of the audio quality. Typical audio quality parameters are mainly: clipping, loudness, signal-to-noise ratio, and noise energy. In practical application, one of the audio quality parameters is selected according to different audio use occasions, and the audio quality under the occasion is evaluated. The evaluation mode only considers a certain index of the audio quality, and the evaluation result only has reference significance for a specific occasion, but cannot be applied to other occasions, so that the universality is poor.

Disclosure of Invention

The embodiment of the invention provides an audio quality evaluation method and system, which are used for evaluating the audio quality by integrating a plurality of audio quality parameters, have strong evaluation result universality and can meet the requirements of most application occasions.

The technical scheme provided by the embodiment of the invention is as follows:

in one aspect, a method for evaluating audio quality is provided, including:

receiving audio data input by a user;

transcoding the audio data to obtain a plurality of audio sampling point data;

respectively calculating the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum of the audio sampling point data;

and calculating an audio quality score according to the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum.

Preferably, the calculating the upper frequency of the spectrum includes:

performing framing processing on the audio sampling point data;

calculating the upper frequency limit of the frequency spectrum of each frame of audio data;

counting a frequency histogram of the upper limit frequency of the frequency spectrum in a full frequency band, and selecting a frequency band range with a preset width from the frequency histogram with the maximum upper limit frequency of the frequency spectrum;

determining a central frequency point of each frequency histogram in the frequency band range with the preset width, and counting the occurrence frequency of the central frequency point;

taking the times as weighting coefficients, and calculating a weighted average value of the central frequency points;

and taking the weighted average value as the upper frequency of the frequency spectrum.

Preferably, the calculating the average loudness includes:

extracting a layer of envelope from the amplitude absolute value of the audio sampling point data to obtain a layer of envelope audio amplitude absolute value data;

extracting a layer of envelope from the one-layer envelope audio amplitude absolute value data to obtain a two-layer envelope audio amplitude absolute value data;

calculating average amplitude values of the one-layer envelope audio amplitude absolute value data and the two-layer envelope audio amplitude absolute value data;

and taking the average amplitude value as the average loudness.

Preferably, the calculating an audio quality score according to the pop amplitude-cut point proportion, the average loudness, the signal-to-noise ratio, the noise energy, and the upper frequency spectrum limit frequency includes:

acquiring a preset initial audio quality score;

determining the cumulative deduction of the abnormal conditions of the five audio quality parameters according to the calculated proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the frequency spectrum upper limit frequency;

calculating the difference value between the initial audio quality score and the accumulated deduction;

and taking the difference value as the audio quality score.

Preferably, the method further comprises:

determining an audio quality grade according to the audio quality score;

and taking the audio quality grade as an evaluation result.

In another aspect, an audio quality evaluation system is provided, including:

the receiving module is used for receiving audio data input by a user;

the transcoding module is used for transcoding the audio data to obtain a plurality of audio sampling point data;

the first calculation module is used for calculating the proportion of the crackle amplitude-cutting point, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum of the audio sampling point data respectively;

and the second calculation module is used for calculating the audio quality score according to the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the frequency spectrum upper limit frequency which are calculated by the first calculation module.

Preferably, the first calculation module comprises a first calculation submodule for calculating the upper limit frequency of the spectrum; the first computation submodule includes:

the framing unit is used for framing the audio sampling point data;

a first calculation unit for calculating an upper limit frequency of a spectrum of each frame of audio data;

the first statistic unit is used for counting a frequency histogram of the upper limit frequency of the frequency spectrum in a full frequency band;

the selection unit is used for selecting a frequency band range with a preset width from the frequency histogram with the maximum frequency of the upper limit of the frequency spectrum counted by the first statistic unit;

the determining unit is used for determining a central frequency point of each time histogram in the frequency band range with the preset width;

a second counting unit configured to count the number of occurrences of the center frequency point determined by the determining unit;

and the second calculating unit is used for calculating the weighted average value of the central frequency point by taking the frequency of occurrence of the central frequency point counted by the second counting unit as a weighting coefficient, and taking the weighted average value as the upper limit frequency of the frequency spectrum.

Preferably, the first calculation module further comprises a second calculation submodule for calculating the average loudness; the second computation submodule includes:

the first extraction unit is used for extracting a layer of envelope from the amplitude absolute value of the audio sampling point data to obtain a layer of envelope audio amplitude absolute value data;

the second extraction unit is used for extracting a layer of envelope from the layer of envelope audio amplitude absolute value data to obtain a layer of envelope audio amplitude absolute value data;

a third calculating unit configured to calculate an average amplitude value of the one-layer envelope audio amplitude absolute value data and the two-layer envelope audio amplitude absolute value data, and take the average amplitude value as the average loudness.

Preferably, the second calculation module includes:

the acquisition unit is used for acquiring a preset initial audio quality score;

the fourth calculating unit is used for determining the accumulated deduction of the abnormal conditions of the five audio quality parameters according to the calculated plosive amplitude-interception point proportion, the average loudness, the signal-to-noise ratio, the noise energy and the spectrum upper limit frequency;

and the fifth calculating unit is used for calculating the difference value between the initial audio quality score and the accumulated deduction score and taking the difference value as the audio quality score.

Preferably, the system further comprises: and the determining module is used for determining the audio quality grade according to the audio quality score and taking the audio quality grade as an evaluation result.

The audio quality evaluation method and the system provided by the embodiment of the invention transcode the audio data input by a user to obtain a plurality of audio sampling point data, and calculate five audio quality parameters of the audio sampling point data, such as the proportion of the plosive amplitude-intercept point, the average loudness, the signal-to-noise ratio, the noise energy and the frequency spectrum upper limit frequency, so as to obtain the audio quality score according to the audio quality parameters. According to the audio quality evaluation method and the system, since a plurality of audio quality parameters are integrated to evaluate the audio quality, the evaluation result has strong universality and can meet the requirements of most application occasions.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of an audio quality evaluation method according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for calculating an upper limit frequency of a spectrum according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for calculating average loudness according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for calculating an audio quality score according to an embodiment of the present invention;

FIG. 5 is a flow chart of another audio quality assessment method according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an audio quality evaluation system according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a second audio quality evaluation system according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a third audio quality evaluation system according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a fourth audio quality evaluation system according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a fifth audio quality evaluation system according to an embodiment of the present invention.

Detailed Description

In order to make the technical field of the invention better understand the scheme of the embodiment of the invention, the embodiment of the invention is further described in detail with reference to the drawings and the implementation mode.

An embodiment of the present invention provides an audio quality evaluation method, as shown in fig. 1, including:

step 101: audio data input by a user is received.

Step 102: and transcoding the audio data to obtain a plurality of audio sampling point data.

Specifically, the transcoding tool can be used to transcode the audio data according to the format requirement of the output audio data and the selection of the transcoding tool for the coding information carried in the user input audio data. For example, ffmpeg can be selected as a transcoding tool to transcode the audio data, and the transcoded audio format is a wav format with a single channel, a sampling rate of 16k and a sampling precision of 16 bits. Since the ffmpeg transcoding tool can automatically analyze the format information of the original audio, input audio data of any format can be supported. In use, the audio format to be output can be predetermined, and the input audio data can be converted into the determined audio format for output by using the ffmpeg.

Step 103: and respectively calculating the proportion of the plosive amplitude-cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the upper limit frequency of the spectrum of the audio sampling point data.

The proportion of the popping clipping points refers to the ratio of the number of sampling points in the audio sampling point data, of which the amplitude value exceeds the amplitude clipping threshold value, to the number of all sampling points. The smaller the proportion of the crackle amplitude cutting point is, the higher the audio quality is. An audio amplitude clipping threshold η is preset (for example, η is set to 3000), the number of sampling points whose amplitude values exceed the amplitude clipping threshold η in the audio sampling point data obtained in the statistical step 102 is sum1, and the number of all sampling points is sum, so that the pop amplitude clipping ratio α is sum 1/sum.

Loudness refers to the degree of sound perceived by the human ear, i.e., how loud the sound is. The loudness is mainly dependent on the sound intensity and also related to the frequency of the sound. In the embodiment of the present invention, the amplitude value is used to represent the loudness size, and more specifically, the average amplitude value is used to represent the average loudness size, wherein the average loudness value can be calculated by extracting an envelope from the absolute value of the amplitude of the audio sample point data.

As shown in fig. 3, a flowchart of a method for calculating average loudness according to an embodiment of the present invention includes the following steps:

step 301: and extracting a layer of envelope from the amplitude absolute value of the audio sampling point data to obtain a layer of envelope audio amplitude absolute value data.

The one-layer envelope audio amplitude absolute value data is the integration of the maximum value of the sampling point, for example, the beta 1 is adopted for representing, and the one-layer envelope audio amplitude absolute value data can be directly used as the average loudness in the occasion with low requirement on the calculation accuracy of the average loudness.

Step 302: and extracting a layer of envelope from the one-layer envelope audio amplitude absolute value data to obtain a two-layer envelope audio amplitude absolute value data.

In order to improve the calculation accuracy of the average loudness, a layer of envelope may be further extracted from the layer of envelope audio amplitude absolute value data to obtain a layer of envelope audio amplitude absolute value data, which is expressed by β 2, for example.

Step 303: and calculating the average amplitude value of the one-layer envelope audio amplitude absolute value data and the two-layer envelope audio amplitude absolute value data, and taking the average amplitude value as the average loudness.

By calculating the average amplitude value β of the one-layer envelope audio amplitude absolute value data β 1 and the two-layer envelope audio amplitude absolute value data β 2, that is, β ═ β 1+ β 2)/2, and taking β as the average loudness, the calculation accuracy of the average loudness can be ensured.

The signal-to-noise ratio (snr) is a parameter describing the ratio of the effective component to the noise component in a signal, expressed as γ, the larger the snr, the smaller the noise. Since noise exists in the form of waves and has certain energy, the noise energy can be used as one of the parameters for measuring the audio quality, wherein the noise energy can be represented by E. The signal-to-noise ratio γ and the noise energy E can be conveniently calculated by a person skilled in the art using common calculation tools, such as speedx tools.

The frequency spectrum refers to the distribution curve of the frequency. The frequency spectrum of the audio signal covers a band range of a certain width according to the frequency. Since the audio signal is not valid in all frequency bands, for example, the audio signal in the bass frequency band, which is generally in a narrow range, is valid for a conventional unprocessed audio signal, and the audio signal in the treble frequency band, which is in a wide coverage frequency band, is not valid. That is, there exists a frequency value in the audio spectrum that is the highest frequency value that is actually valid in the audio spectrum, which we refer to as the upper spectral limit frequency. The audio frequency spectrum can be divided into two parts of effective audio and ineffective audio through the upper frequency limit of the frequency spectrum, specifically, the audio signal above the upper frequency limit of the frequency spectrum is ineffective, but the audio signal value below the upper frequency limit of the frequency spectrum is effective, and is represented on the spectrogram, the audio signal above the upper frequency limit of the frequency spectrum is represented by dark color, and the audio signal below the upper frequency limit of the frequency spectrum is represented by bright color. The audio signal is divided into valid audio and invalid audio, which do not mean that the audio signal is absolutely valid or invalid, but according to the influence of the valid audio and the invalid audio on the audio quality, the influence of the invalid audio on the audio quality is considered to be small and can be ignored, and the influence of the valid audio on the audio quality is large, so that the audio quality needs to be evaluated in an important mode in audio quality evaluation. Before the concept of upper frequency of a frequency spectrum is not introduced, when audio quality evaluation is performed on an audio signal, sampling and analysis need to be performed on the audio signal in a full frequency band according to the same sampling frequency, and because a range covered by an invalid frequency band in the full frequency band range is often wider than an effective frequency band range, a larger part of sampled data is data in the invalid frequency band, so that data which is analyzed with a great deal of effort basically has no reference meaning to actual audio quality, that is, much useless work is performed. In addition, the whole frequency band range of the audio signal is wide, so that the sampling precision is low under the condition of processing the same data quantity, the overall calculation precision is influenced, and errors are easy to occur in the process of processing data in a large number of invalid frequency bands, so that errors are caused in the overall evaluation result. After the upper limit frequency of a frequency spectrum is introduced, a full frequency band range is divided into effective audios and invalid audios, only the effective audios in audio signals are sampled and analyzed, the invalid audios are not processed, the frequency bandwidth covered by the invalid audios is far larger than the width of the effective audios, so that the data processing amount is greatly reduced, the error probability of processing the invalid audios is reduced, in addition, the sampling precision can be greatly improved due to the small processing bandwidth, and the more accurate evaluation result is ensured. For example, the frequency band range of the audio signal is 0-20 kHz, wherein the upper limit frequency of the frequency spectrum is 4kHz, that is, 0-4 kHz is an effective frequency band, and 4 k-16 kHz is an ineffective frequency band, after the upper limit frequency of the frequency spectrum is introduced, the frequency band with the width of 0-4 kHz is intercepted to perform audio quality evaluation, and the data volume required to be processed for performing audio quality evaluation on the frequency band with the width of 0-20 kHz is obviously reduced, so that a higher sampling frequency can be adopted for sampling within the frequency band range of 0-4 kHz, and the processing result is more accurate.

As shown in fig. 2, a flowchart of a method for calculating an upper limit frequency of a spectrum according to an embodiment of the present invention includes:

step 201: and performing frame processing on the audio sampling point data.

Specifically, the framing processing may be performed in a conventional manner, for example, the framing processing function provided by Matlab is used to perform framing processing on the audio sampling point data, so as to obtain multi-frame audio data.

Step 202: the upper frequency of the spectrum of each frame of audio data is calculated.

After audio sampling point data are subjected to frame division processing to obtain multiple frames of audio data, the upper limit frequency fi. of the frequency spectrum of each frame of audio data can be respectively calculated, the frequency range of the frequency spectrum is assumed to be 0-PHz, then a maximum frequency point THz exists in the frequency range of each frame of audio data, the frequency spectrum value of the frame of audio data in the range of T-PHz is smaller than or equal to a threshold value ξ, wherein ξ is a value close to 0

According to the definition of the upper limit frequency of the frequency spectrum, the central frequency of the frequency band with the first spectral amplitude value larger than a certain set threshold value, for example, ξ being 0.3, is calculated in the order from the larger to the smaller of the central frequency of the frequency band, that is, the upper limit frequency f of the frame of audio data is the upper limit frequency f_i。

Step 203: counting the frequency histogram of the upper limit frequency of the frequency spectrum in the whole frequency band, and selecting the frequency band range with the preset width from the frequency histogram with the maximum upper limit frequency of the frequency spectrum.

Based on the calculated upper limit frequency fi (i ═ 0,1, … M) of each frame of data, where M is the total number of frames, the number of times Cn (N ═ 0,1 …, N) that the upper limit frequency fi is Fn appears is counted over N frequency bands, a frequency histogram in the full frequency band range is created, a frequency band range of a predetermined width is selected from the frequency histogram of the maximum upper limit frequency, for example, a frequency band of SHz is selected from 8kHz down, and generally, the frequency band width is preferably selected to be S ═ 1k by parameter tuning on the audio test set.

Step 204: and determining a central frequency point of each frequency histogram in a frequency band range with a preset width, and counting the occurrence frequency of the central frequency point.

For each frequency histogram, the central frequency point Fn is easily determined, so that the total frequency To of the upper limit frequency fi-Fn in the preset width frequency band is easily counted

Step 205: and taking the frequency as a weighting coefficient, calculating a weighted average value of the central frequency point, and taking the weighted average value as the upper limit frequency of the frequency spectrum.

Smoothing from the high frequency part to the low frequency part with S1 kHz width as the frequency band, and calculating new upper limit frequency f in the frequency band with the smoothing length of 8k/N Hz_i＝F_nTotal number of occurrences T₁Sequentially calculating T_j(j is 0,1, … W), since the frequency band starts from 0 to 1kHz, the width of the frequency band is ensured to be 1kHz, the frequency band starting frequency is sequentially increased by a smoothing length of 8k/N, the final frequency band is 7kHz to 8kHz, the starting frequency of the frequency band forms an arithmetic progression with an initial value of 0, a final value of 7k and a step size of 8k/N,

calculating T_max＝max{T_jI (j ═ 0, 1.., W) } center frequencies F of all the minimum frequency bands of 8k/N width among the corresponding frequency bands_nWeighted average of (2), the weight coefficient being C_nThe result is the upper frequency f of the spectrum of the final audio data.

Step 104: and calculating the audio quality score according to the proportion of the crackle amplitude cutting points, the average loudness, the signal-to-noise ratio, the noise energy and the spectrum upper limit frequency.

As shown in fig. 4, a flowchart of a method for calculating an audio quality score according to an embodiment of the present invention includes:

step 401: and acquiring a preset initial audio quality score.

The initial audio quality score may be set in advance according to a scoring rule, for example, in the case of adopting a percentile scoring rule, the initial audio quality score may be set to 100 points.

Step 402: and determining the accumulated deduction of the abnormal conditions of the five audio quality parameters according to the calculated plosive amplitude-cutting point proportion, average loudness, signal-to-noise ratio, noise energy and spectrum upper limit frequency.

After five audio quality parameters including the plosive intercept point ratio alpha, the average loudness beta, the signal-to-noise ratio gamma, the noise energy E and the spectrum upper limit frequency f are calculated in the step 103, the deduction condition under the condition that the five audio quality parameters are abnormal can be respectively determined according to a preset deduction rule, so that the accumulated deduction under the condition that the five audio quality parameters are abnormal is calculated.

The scoring condition of the audio quality under various different use occasions can be obtained by collecting a large amount of audio data as samples for statistics and analysis, and then the scoring condition under the abnormal condition of each audio quality parameter (the scoring condition under the normal condition of the audio quality parameter is 0) is comprehensively obtained, so that the scoring result can be suitable for the application requirements of most occasions. In order to ensure that the evaluation result is more accurate, the corresponding deduction score can be set to be different values when each audio quality parameter is in different numerical value ranges.

For example: (1) when alpha is less than or equal to 0.006, the deduction is set to 0; when 0.006 < alpha < 0.01, setting the deduction to be A < 1 >; when alpha is greater than or equal to 0.01, the deduction is set to be A2, wherein A1 is less than A2;

(2) when beta is larger than 1200, the deduction is set to 0; when beta is more than or equal to 1000 and less than or equal to 1200, the deduction is set as B1; when beta is less than 1000, the deduction is set to be B2, wherein B1 is less than B2;

(3) when gamma is larger than 16.8, the deduction is set to be 0; when gamma is more than or equal to 15.5 and less than or equal to 16.8, the deduction is set as C1; when gamma is more than or equal to 13.5 and less than 15.5, the deduction is set as C2; when gamma is less than 13.5, the deduction is set to C3, where C1 < C2 < C3;

(4) when E is less than 43.43, the deduction is set to be 0; when E is more than or equal to 43.43 and gamma is less than 52.55, the deduction is set as D1; when E is more than or equal to 43.43 and gamma is more than or equal to 52.55 and less than 54.55, the deduction is set as D2; when E is equal to or greater than 43.43 and gamma is equal to or greater than 54.55, the deduction is set to be D3, wherein D1 is less than D2 is less than D3;

(5) when f is less than 6000 or f is more than or equal to 7000, setting the deduction as 0; when f is more than or equal to 6000 and less than 7000 and gamma is more than or equal to 6000, setting the deduction as E1; when f is 6000 or more < 7000 and gamma is 5000 or more < 6000, the deduction is set to E2; when f is 6000 or more < 7000 and gamma is < 5000, the deduction is set to E3, where E1 < E2 < E3.

The scores A1, A2, B1, B2, C1, C2, C3, D1, D2, D3, E1, E2 and E3 are obtained according to statistical data, and are set in advance, and different scores are selected according to the numerical values of the audio quality parameters, wherein A2 + B2 + C3 + D3 + E3 is less than or equal to a preset initial quality score, so that the finally calculated audio quality score is more than or equal to 0.

And (3) evaluating the audio quality by respectively taking the signal-to-noise ratio gamma as an independent evaluation parameter according to the influence of each audio quality parameter on the audio quality, combining the signal-to-noise ratio gamma with the noise energy E and combining the signal-to-noise ratio gamma with the upper frequency f of the frequency spectrum as an evaluation parameter, and when the signal-to-noise ratio gamma meets the conditions in (3), (4) and (5), respectively and independently deducting each item.

According to the five audio quality parameters of the pop amplitude-cutting point proportion alpha, the average loudness beta, the signal-to-noise ratio gamma, the noise energy E and the spectrum upper limit frequency f, deductions corresponding to the parameters can be calculated respectively, the deductions are summed, and the accumulated deductions under the condition that the five audio quality parameters are abnormal can be calculated.

Step 403: and calculating the difference value of the initial audio quality score and the accumulated deduction score, and taking the difference value as the audio quality score.

In the embodiment of the invention, the difference value of the initial audio quality score and the accumulated deduction under the abnormal condition of each audio quality parameter is used as the audio quality score, and the influence of five audio quality parameters including the proportion alpha of the plosive amplitude-cutting point, the average loudness beta, the signal-to-noise ratio gamma, the noise energy E and the spectrum upper limit frequency f on the audio quality is fully considered, so that the evaluation result can be ensured to meet the application requirements of most occasions.

As shown in fig. 5, the audio quality evaluation method may further include:

step 105: and determining an audio quality grade according to the audio quality score, and taking the audio quality grade as an evaluation result.

The rating standard can be preset, after the audio quality score is calculated, the audio quality level is further determined according to the audio quality score, and the audio quality level is fed back to the user as a final evaluation result.

For example, the audio quality level is set to five levels, and when 0 ≦ score ≦ 20, the corresponding audio quality level is one level; when score is more than 20 and less than or equal to 40, the corresponding audio quality level is two-level; when score is more than 40 and less than or equal to 60, the corresponding audio quality level is three-level; when score is more than 60 and less than or equal to 80, the corresponding audio quality level is four levels; and when score is more than 80 and less than or equal to 100, the corresponding audio quality level is five. Wherein, a higher level represents a higher singing level of the user.

The audio quality evaluation method provided by the embodiment of the invention has the advantages that the audio data input by a user are transcoded to obtain a plurality of audio sampling point data, a maximum frequency point THz exists in a frequency range by calculating five audio quality parameters of the audio sampling point data, namely the plosive amplitude-intercept point proportion, the average loudness, the signal-to-noise ratio, the noise energy and the spectrum upper limit frequency, so that the frequency spectrum values of the frame of audio data in the range of T-PHz are all counted, and the audio quality score is calculated according to the audio quality parameters. According to the audio quality evaluation method and the system, since a plurality of audio quality parameters are integrated to evaluate the audio quality, the evaluation result has strong universality and can meet the requirements of most application occasions.

Correspondingly, an embodiment of the present invention further provides an audio quality evaluation system, as shown in fig. 6, including:

a receiving module 501, configured to receive audio data input by a user;

a transcoding module 502, configured to transcode the audio data to obtain multiple audio sampling point data;

the first calculating module 503 is configured to calculate a plosive amplitude-cut point ratio, an average loudness, a signal-to-noise ratio, noise energy, and a spectrum upper limit frequency of the audio sampling point data, respectively;

the second calculating module 504 is configured to calculate an audio quality score according to the plosive amplitude-cut ratio, the average loudness, the signal-to-noise ratio, the noise energy, and the spectrum upper limit frequency calculated by the first calculating module 503.

As shown in fig. 7, the first calculating module 503 includes a first calculating submodule 601 for calculating the upper limit frequency of the spectrum; one specific structure of the first computing submodule 601 includes:

a framing unit 701, configured to perform framing processing on the audio sample data;

a first calculation unit 702 for calculating an upper limit frequency of a spectrum of each frame of audio data;

a first statistical unit 703, configured to count a frequency histogram of an upper limit frequency of a spectrum in a full frequency band;

a selecting unit 704, configured to select a frequency band range with a preset width from the frequency histogram with the maximum upper frequency limit of the frequency spectrum counted by the first counting unit 703;

a determining unit 705 configured to determine a center frequency point of each histogram of times within a frequency band range of a preset width;

a second counting unit 706 configured to count the number of occurrences of the center frequency point determined by the determining unit 705;

a second calculation unit 707 for calculating a weighted average of the center frequency points with the number of occurrences of the center frequency point counted by the second counting unit 706 as a weighting coefficient, and taking the weighted average as the upper limit frequency of the spectrum.

As shown in fig. 8, the first calculating module 503 further includes a second calculating submodule 602 for calculating the average loudness; one specific structure of the second computing submodule 602 includes:

a first extracting unit 708, configured to extract a layer of envelope from the amplitude absolute value of the audio sample point data to obtain a layer of envelope audio amplitude absolute value data;

a second extracting unit 709, configured to extract a layer of envelope from the first-layer envelope audio amplitude absolute value data to obtain a second-layer envelope audio amplitude absolute value data;

a third calculating unit 710, configured to calculate an average amplitude value of the one-layer envelope audio amplitude absolute value data and the two-layer envelope audio amplitude absolute value data, and take the average amplitude value as an average loudness.

As shown in fig. 9, the second calculating module 504 includes:

an obtaining unit 801, configured to obtain a preset initial audio quality score;

a fourth calculating unit 802, configured to determine cumulative scores of the five abnormal audio quality parameters according to the calculated pop amplitude intercept ratio, average loudness, signal-to-noise ratio, noise energy, and spectrum upper limit frequency;

a fifth calculating unit 803, configured to calculate a difference between the initial audio quality score and the cumulative score, and use the difference as the audio quality score.

As shown in fig. 10, the audio quality evaluation system further includes:

and the determining module 505 is configured to determine an audio quality grade according to the audio quality score, and use the audio quality grade as an evaluation result.

The audio quality evaluation system provided by the embodiment of the invention transcodes audio data input by a user to obtain a plurality of audio sampling point data, and calculates the audio quality scores according to the audio quality parameters by calculating five audio quality parameters of the audio sampling point data, such as the proportion of the plosive amplitude-intercept point, the average loudness, the signal-to-noise ratio, the noise energy and the spectrum upper limit frequency. According to the audio quality evaluation method and the system, since a plurality of audio quality parameters are integrated to evaluate the audio quality, the evaluation result has strong universality and can meet the requirements of most application occasions.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of method embodiments for relevant points. The above-described system embodiments are merely illustrative, wherein the modules or units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An audio quality evaluation method is characterized by comprising the following steps:

receiving audio data input by a user;

transcoding the audio data to obtain a plurality of audio sampling point data;

respectively calculating the proportion of the plosive amplitude-clipping points, the average loudness, the signal-to-noise ratio, the noise energy and the upper frequency limit of the frequency spectrum of the audio sampling point data, wherein the smaller the proportion of the plosive amplitude-clipping points is, the higher the audio quality is; extracting envelope from the amplitude absolute values of the audio sampling points layer by layer to calculate the average loudness, wherein the average loudness is divided into two layers layer by layer;

2. The audio quality assessment method according to claim 1, wherein the calculating the upper frequency of the spectrum comprises:

performing framing processing on the audio sampling point data;

and taking the weighted average as the upper frequency of the frequency spectrum of the audio data.

3. The audio quality assessment method according to claim 1, wherein said calculating an average loudness comprises:

and taking the average amplitude value as the average loudness.

4. The audio quality evaluation method according to claim 1, wherein the calculating an audio quality score according to the plosive intercept point proportion, the average loudness, the signal-to-noise ratio, the noise energy, and the spectral ceiling frequency comprises:

acquiring a preset initial audio quality score;

and taking the difference value as the audio quality score.

5. The audio quality assessment method according to any one of claims 1 to 4, further comprising:

determining an audio quality grade according to the audio quality score;

and taking the audio quality grade as an evaluation result.

6. An audio quality evaluation system, comprising:

the receiving module is used for receiving audio data input by a user;

the first calculation module is used for calculating the proportion of a plosive amplitude cut point, the average loudness, the signal-to-noise ratio, the noise energy and the upper limit frequency of a frequency spectrum of the audio sampling point data respectively, wherein the smaller the proportion of the plosive amplitude cut point is, the higher the audio quality is; extracting envelope from the amplitude absolute values of the audio sampling points layer by layer to calculate the average loudness, wherein the average loudness is divided into two layers layer by layer;

7. The audio quality evaluation system according to claim 6, wherein the first calculation module comprises a first calculation submodule for calculating an upper frequency of a spectrum; the first computation submodule includes:

the framing unit is used for framing the audio sampling point data;

and the second calculating unit is used for calculating the weighted average value of the central frequency point by taking the frequency of the occurrence of the central frequency point counted by the second counting unit as a weighting coefficient, and taking the weighted average value as the upper limit frequency of the frequency spectrum of the audio data.

8. The audio quality assessment system according to claim 6, wherein said first calculation module further comprises a second calculation submodule for calculating said average loudness; the second computation submodule includes:

9. The audio quality assessment system according to claim 6, wherein said second calculation module comprises:

10. The audio quality assessment system according to any one of claims 6 to 9, further comprising: and the determining module is used for determining the audio quality grade according to the audio quality score and taking the audio quality grade as an evaluation result.