Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide the subjective and objective evaluation method of the sound quality under the variable-speed working condition of the transmission.
In order to achieve the purpose, the invention adopts the technical scheme that:
a sound quality subjective and objective evaluation method under the variable-speed working condition of a transmission comprises the following steps:
step 1: collecting noise signals of the transmission under the condition of changing the rotating speed of M gears by using a microphone, and collecting rotating speed signals of the transmission by using a photoelectric sensor;
step 2: performing sliding window interception on noise signals under M gears of the transmission to generate preliminary listening sample sets of the M gears, wherein each preliminary listening sample set comprises E listening samples, and the total number of the preliminary listening samples is M multiplied by E listening samples;
and step 3: respectively calculating sound quality psychoacoustic indexes of the E listening samples contained in each preliminary listening sample set in the step 2, and performing cluster analysis according to the calculation results of the sound quality psychoacoustic indexes to finally form M x c preliminary listening samples;
the acoustic quality psychoacoustic indexes include loudness, sharpness, roughness and jitter degree, but are not limited to the four acoustic quality psychoacoustic indexes; the clustering analysis method adopts a K-means clustering analysis method;
and 4, step 4: for the M × c preliminary listening samples in the step 3, reserving listening samples closest to the clustering center point in each category to form a listening sample set with the sample capacity of M × c;
and 5: adding H samples which are different in the two groups and are the same in the groups after the 1 st group and the M group of the listening sample set in the step 4 to serve as evaluation personnel reliability evaluation samples, and adding samples manually selected by NVH experts from the 2 nd group to the M-1 st group to keep the number of listening samples of each group consistent; the listening samples are arranged in sequence according to the corresponding actual rotating speed value of the transmission to form a formal listening sample group with the sample capacity of Mx (c + H);
step 6: grouping the evaluators, and adding different group weights between each group;
and 7: selecting semantic antisense words capable of reflecting the sound characteristics of the transmission to be evaluated from a transmission sound quality semantic word bank by NVH experts;
and 8: subjective evaluation is carried out on the formal listening sample group in the step 5, the semantic antisense words are adopted as the subjective evaluation words in the step 7, and after the subjective evaluation, the subjective evaluation result of each person on the listening samples in the formal listening sample group is obtained;
and step 9: carrying out listening reliability analysis on the appraisers, and adding reliability weight to subjective evaluation results of the appraisers;
step 10: performing statistical analysis on the Spearman correlation coefficient and Euclidean distance indexes on the subjective evaluation result in the step 8, eliminating evaluation personnel with inaccurate scores, adding the group weight in the step 6 and the reliability weight in the step 9 to the score data of the rest personnel after elimination, and calculating to obtain a subjective evaluation label;
step 11: establishing a transmission sound quality subjective and objective evaluation model;
calculating the acoustic quality psychoacoustic indexes of the formal listening sample component frames in the step 5, dividing a test set and a training set, and establishing a transmission acoustic quality subjective and objective evaluation model by adopting a support vector regression method according to the acoustic quality psychoacoustic index calculation results of the listening samples in the training set and the subjective evaluation labels in the step 10;
step 12: and (3) verifying the transmission sound quality subjective and objective evaluation model:
and (3) verifying the transmission sound quality subjective and objective evaluation model established in the step 11 by adopting the test set in the step 11, taking a Pearson correlation coefficient and an average absolute error as evaluation indexes, if the Pearson correlation coefficient is greater than 0.9, and the average absolute error is less than 10% of the maximum value of the grading interval, indicating that the transmission sound quality subjective and objective evaluation model is well predicted, and if the Pearson correlation coefficient is not greater than 0.9, returning to the step 6, and re-performing the sound quality subjective and objective evaluation.
In the step 1, a microphone is used for collecting noise signals under the variable-speed working condition of M gears of the transmission, the collection environment is a semi-anechoic chamber, the microphone is arranged at a position 1M away from the center of the transmission from a loading end to the left side, the position is a sound field far field, no interference effect exists among sound waves, the sound source is approximately regarded as a point sound source, the actual noise evaluation requirement is met, the sampling frequency is not lower than 40960Hz, the analysis frequency is not lower than 20480Hz and higher than the upper limit frequency of human hearing according to the Nyquist theorem, and each gear of the transmission is driven to rotate at the highest speed rpm under the maximum loadmaxUniformly decelerated to the lowest rpmminAnd collecting noise signals, wherein the collection time of each gear is S seconds.
In the step 2, noise signals under M gears of the transmission are intercepted by a sliding window, a window function of the sliding window is selected as a rectangular window, the length of the noise signal under each gear is set to be N, the window length is wlen, the sliding displacement of the latter window relative to the former window is wst, wherein the time length range of wlen is 5s or more and less than wlen and less than or equal to 10s, the time length range of wst is 0s or more and less than wst or less than or equal to wlen, and the calculation formula of the number N of sample sections intercepted by the sliding window is as follows:
the method for calculating the psychoacoustic index of sound quality in the step 3 comprises the following steps:
3.1) loudness calculation:
the specific loudness is calculated using the following formula:
of formula (II) to (III)'0To reference specific loudness, ETQFor the corresponding excitation in the quiet state, srIs the ratio of the sound intensity of a just-audible test tone to that of a broadband noise at the same critical band, E0Is sound intensity I0=10-12W/m2Corresponding reference excitation value, EsIs the excitation to which the sound corresponds, when N'0When 0.065, 0.25 s is selected as θr0.25; is N'0When equal to 0.08, take θ equal to 0.23, sr0.5; the Bark band division standard adopts a Zwicker model Bark band division standard;
the total loudness is obtained by integrating the specific loudness over the 0-24Bark scale:
3.2) sharpness calculation:
the Zwicker sharpness model is based on a loudness model, and the mathematical model is as follows:
in the formula, K is a weighting coefficient, and K is 0.11; ssharpnessRepresenting sharpness, and N' (z) representing specific loudness in Bark domain z, where g (z) is the weight coefficient of the sound signal in different Bark domains, expressed as:
3.3) roughness calculation:
the roughness model after Zwicker improvement is based on a loudness model, and the mathematical model is as follows:
wherein Rou is the calculated roughness, fmodTo modulate frequency, Δ LEFor the sound pressure variation amplitude in each critical frequency band, the following is defined:
in the formula, Nmax′(z) and Nmin′(z) represents the maximum and minimum values of the characteristic loudness in the Zwicker loudness model, respectively;
3.4) jitter degree calculation:
the Zwicker jitter model is based on a loudness model, and the mathematical model is as follows:
in the formula,. DELTA.LEThe sound pressure change amplitude in each critical frequency band is taken as the sound pressure change amplitude; f. ofmodIs the modulation frequency; f. of0Is to modulate the fundamental frequency of the signal,f0=4Hz。
the clustering analysis and calculation process in the step 3 is as follows:
(1) carrying out sliding window interception according to the noise signal under each gear, wherein the number of the sound sample sections intercepted by each gear sliding window is n, and an input sample is Q ═ x1,x2,...,xnRandomly selecting c sound samples from Q as centroid samples u1,u2,...,ucThe number c of the centroids of the clusters is determined by the maximum value of the scoring interval, and if the scoring is 0-10, c is 10;
(2) for input sample x1,x2,...,xnMeasuring the distance to the centroid and classifying it as a class to the nearest centroid;
(3) updating the center of each class to be the mean of all samples belonging to the class;
(4) repeating steps (2) and (3) until less than 3 samples are reassigned to different clusters;
(5) and after clustering is finished, the noise signals intercepted by the sliding windows of each gear are gathered into c-type sound samples.
And in the step 5, the chrominance evaluation sample is a sound sample selected by a large amount of auditions of an expert, wherein the number of the large amount of auditions is not less than the number of the centralized audition samples of the audition samples in the step 4.
In the step 5, the acoustic samples are arranged in sequence according to the corresponding actual rotating speed value of the gearbox, and the sequence of the acoustic samples is as follows: the first sample of each group is the intercepted sample corresponding to the highest rotating speed section, the second sample is the intercepted sample corresponding to the lowest rotating speed section, other samples are arranged according to the sequence of the corresponding rotating speed sections from large to small or from small to large, a 5s blank space is reserved between the two listening samples, and time is scored for evaluators.
In the step 6, the evaluators are grouped, and different group weights are added between each group, namely the evaluators with different listening levels are divided into expert groups, experience groups and common groups, and different group weights W are respectively given to the evaluatorsG。
In the step 9, listening reliability analysis is carried out on the appraisers, and the appraisers are subjected to listening reliability analysisAttaching a reliability weight W to the subjective evaluation result of (1)TThe method comprises the following specific steps:
9.1) extracting the scoring data of 2 XH credibility evaluation samples by an evaluator, if all samples are scored consistently, the misjudgment rate is 0, if the samples with inconsistent scoring have delta, and the difference between the two scoring is not more than 10% of the maximum value of the scoring interval, calculating the acceptable misjudgment rate as
If b samples with inconsistent scores exist and the difference between the two scores exceeds 10% of the maximum value of the scoring interval, calculating the unacceptable misjudgment rate as
9.2) according to the acceptable false positive rate PyAnd unacceptable false positive rate PnForming a confidence weight WTThe calculation formula is as follows:
WT=1-Pn-λPy
wherein, the value of lambda is between 0 and 1, if the acceptable misjudgment is not allowed, the lambda is 1; if the acceptable misjudgment is completely allowed, λ is 0.
The statistical analysis and calculation steps of the Spearman correlation coefficient and the Euclidean distance in the step 10 are as follows:
10.1) calculating the Spearman correlation coefficient between each two evaluators:
in the formula: d is the difference between the subjective evaluation result grades of the two lines; r is the length of two lines of subjective evaluation results; ri,jA Spearman correlation coefficient representing the subjective evaluation result of the ith evaluator to the subjective evaluation result of the jth evaluator;
10.2) taking the average correlation coefficient:
in the formula: k is the number of evaluators;
the correlation coefficient between the ith evaluator and the jth evaluator is obtained; r
iThe average correlation coefficient of the ith evaluator relative to the other evaluators in the panel; set R
iThe threshold value is 0.75 if R
iIf the correlation is less than 0.75, the correlation of the evaluator is not high relative to other evaluators, the subjective listening sensation is greatly deviated, and the scoring data of the evaluator is removed;
10.3) averaging the scoring data of the evaluators retained after the rejection, wherein the average scoring value of each sample is obtained by the following calculation formula;
in the formula: k' is the number of remaining evaluators after rejection, Vi,aSubjective scoring value of the ith evaluator on the a-th sample; vaAverage scores for all raters for a samples;
10.4) score data V for each of the remaining ratersi,aAnd VaThe Euclidean distance statistical analysis and calculation method comprises the following steps:
wherein i is the number of evaluators;
eliminating residual evaluator correspondence D (V)i,a,Va) The larger evaluation data requires that the Spearman correlation coefficient and Euclidean distance statistical analysis rejecting population is not more than 20% of the total evaluation population.
The step 11 of establishing the transmission sound quality subjective and objective evaluation model comprises the following specific steps:
11.1) taking 70% of the formal listening samples of the listening sample group as a training set for training the transmission sound quality subjective and objective evaluation model; 30% of the test set is used for testing the transmission sound quality subjective and objective evaluation model;
11.2) setting frame length and frame shift for all listening samples, wherein the frame length time length interval is (0,1] second, the frame shift time length interval is (0,1] second, and the frames are divided into f sections;
11.3) calculating the psychoacoustic index of sound quality of each frame of the listening sample;
11.4) using the acoustic quality psychoacoustic indexes calculated by the training set in frames and the subjective evaluation labels in the step 10 as parameters for training an acoustic quality subjective and objective evaluation model of the transmission;
11.5) training a transmission sound quality subjective and objective evaluation model, and establishing the transmission sound quality subjective and objective evaluation model;
in the step 11, a support vector regression method is used for carrying out transmission sound quality subjective and objective evaluation model fitting, an RBF kernel function is selected, an insensitive loss function takes epsilon as 0.01, an optimal penalty parameter e and a kernel function parameter g are selected by adopting a K-CV (K-fold Cross Validation) Cross Validation method, the lowest mean square error mse in the Cross Validation process is taken as an optimization target function, the Cross Validation parameter v is selected as 3, and the mse formula is calculated as follows:
in the formula: v is the number of cross validation groups, nl is the number of cross validation groups, y
ijIn order to obtain the true label of the sample,
a transmission sound quality subjective and objective evaluation model prediction label;
using a grid search method, a rough selection is first performed, taking a log base 22e、log2g, the value ranges are [ -8,8 respectively]、[-8,8]The step size of the penalty parameter e and the kernel function parameter g is both 1; fine selection is carried out again according to the coarse selection result, and the log with the base 2 is continuously taken2e、log2g,The value ranges are [ -4,4 respectively]、[-4,4]The step sizes of the penalty parameter e and the kernel function parameter g are both 0.1.
The calculation formula of the Pearson correlation coefficient ρ in step 12 is as follows:
in the formula: y is the test set subjective rating label,
subjective evaluation label prediction values of the transmission sound quality subjective and objective evaluation model on the test set,
is y and
covariance between, σ
yIs the mean square error of y, μ
yIs the mean value of y, E
λRepresents a mathematical expectation;
mean absolute error MAE is less than maximum value L of scoring interval max10% of MAE, the MAE calculation formula is as follows:
in the formula:
is the predicted value of the ith listening sample, y
iIs a subjective evaluation label, n
samplesIs the total number of listening samples;
percentage of error WmaeThe calculation formula is as follows:
in the formula:
is the average absolute error between the predicted value of the subjective evaluation label and the subjective evaluation label of the test set, L
maxIs the maximum value of the scoring interval.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention intercepts the sound quality evaluation sample under the variable working condition of the transmission, adopts a sliding window intercepting method, and intercepts all time periods, so that the sound sample under each working condition of the transmission is selected without leakage.
2. In the invention, in the selection of the subjective sound quality evaluation sample, the sample with large difference of sound quality psychoacoustic indexes is selected from a large number of samples generated after the interception of the sliding window as the subjective evaluation sample, so that the audition sample amount is minimized and the reflection of the sound quality attribute difference is maximized.
3. The invention provides a subjective appraiser weight assignment method, which comprises appraiser reliability weight and group weight, so that the scoring data of professional appraisers with high appraisal reliability account for a larger proportion.
4. The invention provides a subjective evaluation result statistical method, which enables subjective evaluation results to better accord with statistical rules.
5. The method carries out frame interception on the listening samples, calculates the psychoacoustic index of sound quality of each frame, and fits the subjective evaluation label with the calculation result by using the support vector regression model, and the support vector regression model has high generalization capability and can extract the nonlinear relation between the frames, so that objective evaluation and reflection of sound quality characteristics are more accurate.
6. The invention provides a transmission sound quality subjective and objective evaluation model inspection method, which enables the transmission sound quality subjective and objective evaluation model to have scientific judgment basis for the fitting result precision of subjective and objective evaluation.
Detailed Description
The present invention will be described in further detail with reference to the following drawings and examples.
Referring to fig. 1, the subjective and objective evaluation method for sound quality under the variable-speed working condition of the transmission comprises the following steps:
step 1: the method comprises the steps of collecting variable-rotating-speed noise signals under 6 gears of a transmission by using a microphone, wherein the collection environment is a semi-anechoic chamber, the microphone is arranged at a position 1 m on the left side (observed from a loading end) away from the center height of the transmission, the position is a sound field far field, no interference effect exists among sound waves, the sound source can be approximately regarded as a point sound source, the actual noise evaluation requirement is met, the sampling frequency is not lower than 40960Hz, the analysis frequency is not lower than 20480Hz and higher than the human hearing upper limit frequency according to the Nyquist theorem, the transmission has 6 gears in total, and the transmission can rotate at the highest rotating speed rpm under the maximum loadmax2600rpm, uniformly decelerated to the lowest rpmmin500rpm, and the acquisition time of each gear is 257 s;
step 2: performing sliding window interception on noise signals under 6 gears of the transmission, as shown in fig. 2, selecting a window function of a sliding window as a rectangular window, acquiring a noise signal under each gear at a length of 257s, acquiring a window length of 5s, and generating a preliminary listening sample set for each gear, where a sliding displacement of a subsequent window relative to a previous window is 2.5s, and a calculation formula of n, which is the number of listening sample sections intercepted by the sliding window, is as follows:
a total of 6 preliminary listening sample sets, each preliminary listening sample set comprising 101 listening samples, a total of 6 × 101 listening samples;
and step 3: performing cluster analysis on the sound quality psychoacoustic indexes, such as loudness, sharpness, roughness, jitter and the like, of each listening sample of 101 listening samples contained in each preliminary listening sample set in the step 2 according to the calculated sound quality psychoacoustic indexes;
each preliminary listening sample set divides 101 listening samples into 10 classes, and 6 preliminary listening sample sets are formed in total, and finally, 6 x 10 classes of preliminary listening samples are formed;
the method for calculating the psychoacoustic index of sound quality comprises the following steps:
3.1) loudness calculation:
the specific loudness is calculated using the following formula:
of formula (II) to (III)'0To reference specific loudness, ETQFor the corresponding excitation in the quiet state, srIs the ratio of the sound intensity of a just-audible test tone to that of a broadband noise at the same critical band, E0Is sound intensity I0=10-12W/m2Corresponding reference excitation value, EsIs the excitation to which the sound corresponds, when N'0When 0.065, 0.25 s is selected as θr0.25; is N'0When equal to 0.08, take θ equal to 0.23, sr0.5; the Bark band division standard adopts a Zwicker model Bark band division standard;
the total loudness is obtained by integrating the specific loudness over the 0-24Bark scale:
3.2) sharpness calculation:
the Zwicker sharpness model is based on a loudness model, and the mathematical model is as follows:
in the formula, K is a weighting coefficient, and K is 0.11; ssharpnessRepresenting sharpness, and N' (z) representing specific loudness in Bark domain z, where g (z) is the weight coefficient of the sound signal in different Bark domains, expressed as:
3.3) roughness calculation:
the roughness model after Zwicker improvement is based on a loudness model, and the mathematical model is as follows:
wherein Rou is the calculated roughness, fmodTo modulate frequency, Δ LEFor the sound pressure variation amplitude in each critical frequency band, the following is defined:
in the formula, Nmax′(z) and Nmin′(z) represents the maximum and minimum values of the characteristic loudness in the Zwicker loudness model, respectively;
3.4) jitter degree calculation:
the Zwicker jitter model is based on a loudness model, and the mathematical model is as follows:
in the formula,. DELTA.LEThe sound pressure change amplitude in each critical frequency band is taken as the sound pressure change amplitude; f. ofmodIs the modulation frequency; f. of0Is modulating the fundamental frequency, f0=4Hz;
The clustering analysis process is as follows:
(1) carrying out sliding window interception according to the noise signal under each gear, wherein the number of the intercepted sound sample sections of each gear sliding window is 101, and an input sample is set to be Q ═ x1,x2,...,x101Randomly selecting 10 sound samples from Q as centroid samples u1,u2,...,u10The number of centroids of the cluster is 10;
(2) for input sample x1,x2,...,x101Measuring the distance to the centroid and classifying it as a class to the nearest centroid;
(3) updating the center of each class to be the mean of all samples belonging to the class;
(4) repeating steps (2) and (3) until less than 3 samples are reassigned to different clusters;
(5) after clustering is finished, the noise signals intercepted by the sliding windows of each gear are clustered into 10 types of sound samples;
and 4, step 4: for the 6 × 10 preliminary listening samples in the step 3, reserving the listening sample closest to the clustering center point in each category, and generating a listening sample set with the sample capacity of 6 × 10;
and 5: respectively adding 4 listening samples which are different in group and same in group behind the 1 st group and the 6 th group of the listening sample set in the step 4 to serve as evaluation personnel reliability evaluation samples, wherein the reliability evaluation samples are representative sound samples selected by a large number of listening samples of an expert, and the number of the large number of listening samples is not less than that of the listening sample set in the step 4;
adding 4 samples manually selected by NVH experts from the group 2 to the group 5 respectively to keep the number of listening samples in each group consistent, in order to prevent the evaluation staff from having no sense of integral range during scoring, arranging sound samples of a lowest rotating speed section and a highest rotating speed section in samples No. 1 and No. 2 of initial listening, arranging the listening samples according to rotating speed signals collected in the step 1 from large to small (a blank space of 5s is formed between the two listening samples and scoring time is carried out on the evaluation staff) to form a formal listening sample group with the sample capacity of 6 multiplied by 14, wherein the samples of the formal listening sample group form a sample group shown in figure 3;
step 6: the listening experiment evaluator in the embodiment comprises 8 NVH experts, 6 drivers with driving experience more than 7 years and 10 ordinary evaluators, and the evaluation personnel give a group weight WGRespectively 0.5, 0.3 and 0.2, and the information and specific distribution of the evaluators are shown in table 1;
TABLE 1 group of weights
And 7: selecting 14 pairs of semantic antisense words describing the noise sound quality characteristics of the transmission as a transmission sound quality semantic word library, and enabling an NVH (noise, vibration and harshness) expert to screen the 14 pairs of semantic antisense words; the semantic antisense words in the table 2 can be scored by a 0-10-level scoring method to reflect the importance degree of the sound quality characteristics of the transmission, so as to determine the semantic antisense words (0 represents extremely unimportant and 10 represents extremely important) which are finally used for evaluating the sound quality of the transmission, and the semantic antisense words describing the sound quality characteristics of the transmission are shown in the table 2;
TABLE 2 semantic antisense words describing transmission acoustic quality characteristics
Numbering
|
Semantic antisense words
|
Numbering
|
Semantic antisense words
|
1
|
Gentle of wave motion
|
8
|
Sharp-smooth
|
2
|
Harsh-pleasing
|
9
|
Acutely-bass
|
3
|
Vibratile-stable
|
10
|
Smooth and abrupt
|
4
|
Vibrating the ear-gentle
|
11
|
Comfort-discomfort
|
5
|
Of impact-moderation
|
12
|
Quiet-noisy
|
6
|
Humming-calming
|
13
|
Undulating-smooth
|
7
|
Rolling-quiet
|
14
|
Acutely-dull |
The results of the NVH expert in scoring the importance degree of the 14 semantic anti-sense words are shown in Table 3;
TABLE 3 expert word-selecting scoring results
Word order number
| Expert | 1
|
Expert 2
|
Expert 3
|
……
|
……
|
Expert 8
|
1
|
8
|
8
|
9
|
……
|
……
|
9
|
2
|
6
|
7
|
6
|
……
|
……
|
5
|
……
|
……
|
……
|
……
|
……
|
……
|
……
|
14
|
8
|
9
|
7
|
……
|
……
|
7 |
The 8-bit NVH expert gives statistics to the scoring results of 14 semantic antisense words to be selected, and the statistics are shown in Table 4;
TABLE 4 expert word-selecting scoring statistics
Serial number
|
Word and phrase
|
Mean value of
|
Standard deviation of
|
1
|
Comfort-discomfort
|
9.125
|
0.7806
|
2
|
Acutely-dull
|
8.75
|
0.9682
|
3
|
Vibrating the ear-gentle
|
7.5
|
1
|
4
|
Acutely-bass
|
7.25
|
0.8291
|
5
|
Harsh-pleasing
|
7.25
|
1.3919
|
6
|
Rolling-quiet
|
7.125
|
0.7806
|
7
|
Humming-calming
|
7
|
1.8027
|
8
|
Undulating-smooth
|
6.875
|
1.3635
|
9
|
Smooth and abrupt
|
6.875
|
1.3635
|
10
|
Quiet-noisy
|
6.875
|
2.5217
|
11
|
Sharp-smooth
|
6.875
|
1.6153
|
12
|
Vibratile-stable
|
6.625
|
1.4086
|
13
|
Of impact-moderation
|
6.625
|
1.4947
|
14
|
Gentle of wave motion
|
6.5
|
1.4142 |
Selecting the first A' semantic antisense words with relatively high scores according to evaluation requirements, and only selecting the semantic antisense words with the highest scores in the evaluation: comfort-discomfort as a semantic antisense to transmission sound quality assessment;
and 8: the evaluator officially listens, and carries out subjective evaluation according to a subjective evaluation scoring table shown in table 5; the evaluator subjectively evaluates the sound samples in the formal listening sample group, evaluates the words as comfortable and uncomfortable, and evaluates the score to 0-10, wherein the score is shown in a grade score table in a table 5 corresponding to the subjective feeling;
TABLE 5 subjective evaluation scoring table
The evaluation environment is indoor with good ventilation, and the environmental noise is less than 30 dB; the method is characterized in that a special earphone (MDR-Z1000/Q Sony high-fidelity noise-isolation and leakage-prevention recording room professional monitoring earphone and a headset) for the high-fidelity noise-isolation and leakage-prevention recording room is adopted, windows are uniformly used for carrying audio software, the volume is adjusted to be uniform (the loudspeaker is adjusted to be 15, and the audio software is adjusted to be maximum), the same type of earphone, the same type of computer and an onboard sound card are used, and in the same time period (morning), the mental state of an evaluator is required to be good before listening, no disease exists, and the listening process needs to be completed independently;
listening experiment time is 9 am every day, 3 groups of samples are listened for 20-30 minutes, and each group of samples of the sample group is listened for 2 minutes as rest time;
the method comprises the steps that before formal listening, an integral audio signal for testing the transmission is heard, the sound of the transmission is integrally known, sound samples are pre-scored in a training sheet (see the training sheet in a table 5), pre-scoring is only carried out on the sound samples by an evaluator to know a scoring flow and master sound characteristics, and the pre-scored scoring data are not subjected to statistical analysis;
explaining the scene and the characteristics of the acoustic event before listening evaluation, so that an evaluator can form the feeling of the acoustic event in advance in the evaluation process and imagine the scene of evaluated noise;
the evaluators officially listen and begin scoring against the official scoring statistics shown in table 5; a total of 28 evaluators, each evaluated 6X 14 sound samples, which were combined to form a scale V28×84As shown in table 6;
TABLE 6 evaluation personnel rating results
And step 9: confidence analysis is performed on confidence samples No. 11-14 and No. 81-84 according to each row of data in the table 6, and the calculation formula is as follows:
WT=1-Pn-λPy
if λ is 1 and no acceptable erroneous determination is allowed, then
N
wThe numbers of the scores of the two groups of No. 11-14 and No. 81-84 are inconsistent, for example, if the scores of No. 11-14 of the evaluator A1B1 are 9, 1, 7, 6, and the scores of No. 81-84 are 9, 2, 6, 6, then two confidence evaluation sample scores are inconsistent, and the confidence weight of the evaluator A1B1 is W
T(A1B1)=0.5;
The calculation results of all the evaluator confidence level weight values are shown in table 7;
TABLE 7 evaluation personnel reliability weighting Table
Evaluation personnel
|
Confidence weights
|
Evaluation personnel
|
Confidence weights
|
A1 |
|
1
|
O1
|
1
|
B1
|
0.75
|
P1
|
0.5
|
……
|
|
……
|
|
N1
|
1
|
A1B1
|
0.5 |
Step 10: the results of the subjective evaluations of 28 evaluators in table 6 were subjected to Spearman correlation coefficient analysis:
10.1) calculating the Spearman correlation coefficient between each two evaluators:
in the formula: d is the difference between the subjective evaluation result grades in the two rows in the table; r is the length of two lines of subjective evaluation results; ri,jThe Spearman correlation coefficient of the subjective evaluation result of the ith evaluator to the subjective evaluation result of the jth evaluator is shown, and the calculation results of the evaluators are shown in table 8;
TABLE 8 Spearman correlation coefficient Table between evaluators
Correlation coefficient
|
A1
|
B1
|
C1
|
……
|
A1B1
|
A1
|
1.00
|
0.81
|
0.81
|
……
|
0.91
|
B1
|
0.81
|
1.00
|
0.86
|
……
|
0.78
|
C1
|
0.81
|
0.86
|
1.00
|
……
|
0.67
|
……
|
……
|
……
|
……
|
……
|
……
|
A1B1
|
0.91
|
0.78
|
0.67
|
……
|
1.00 |
10.2) removing the correlation coefficient of which each row value is 1, averaging to obtain an average correlation coefficient, wherein a calculation formula is shown as follows;
in the formula: k is the number of evaluators; ri,jThe correlation coefficient between the ith evaluator and the jth evaluator is obtained; riThe average correlation coefficient of the ith evaluator relative to other evaluators;
the results of the average correlation coefficient calculation by the evaluators are shown in table 9;
TABLE 9 average correlation coefficient of evaluators
The average correlation coefficient of 28 evaluators in total, the scoring data of the evaluators with the correlation coefficient of more than 0.75 is kept, and the scoring data of the evaluators with the numbers of C1, E1 and O1 in the table 6 are removed;
10.3) scoring data V for the remaining 25 persons of Table 625×84Averaging each column to obtain the average score of each sample, wherein the calculation formula is as follows;
in the formula: k' is the number of remaining evaluators after rejection, Vi,aSubjective scoring value of the ith evaluator on the a-th sample; vaAverage scores for all raters for a samples;
after three appraisers C1, E1 and O1 are removed, the average subjective evaluation scores of the remaining 25 appraisers on 84 listening samples are shown in Table 10;
TABLE 10 sample subjective evaluation mean scores
Sample numbering
|
1
|
2
|
3
|
……
|
84
|
Mean score Va |
2.38
|
9.42
|
2.00
|
……
|
3.05 |
10.4) removing three evaluators C1, E1 and O1 from Table 6, and remaining V25×84Score data of (1), score data per line Vi,84And VaThe formula for calculating the Euclidean distance is as follows:
wherein i is 25;
the calculation results are shown in table 11;
TABLE 11 Euclidean distance statistical analysis
Number of evaluator
|
A1
|
B1
|
D1
|
F1
|
……
|
A1B1
|
Euclidean distance D
|
7.48
|
12.26
|
13.72
|
9.27
|
……
|
21.25 |
Table 11 shows the first action, the second action, the 84 listening sample scoring data per evaluator, and the average score V of evaluatorsaThe Euclidean distance D between the evaluation personnel and the evaluation personnel is obtained by rejecting the scoring data of the two evaluation personnel with the largest Euclidean distance, wherein the scoring data are the scoring data of the evaluation personnel with the numbers of R1 and A1B1 in the table 11, and the rejection number is not more than 20% of the total number of the evaluation people in the statistical analysis of the whole subjective evaluation experiment;
spearman correlation coefficient and Euclidean distance statisticsAfter the scientific analysis, the scoring data of the evaluators numbered C1, E1, O1, R1 and A1B1 in the table 6 are rejected, and the scoring data scale of the rest evaluators is V23×84To V pair23×84And attaching the reliability weight and the group weight of corresponding appraisers to each row, and obtaining the subjective appraisal labels of 84 listening samples according to the following formula:
finally, effective scoring data V of 23 evaluators is reserved23×84Adding the reliability weight and the group weight of corresponding evaluators, and calculating to obtain the final subjective evaluation labels of 84 listening samples as shown in table 12;
TABLE 12 Scoring effectiveness data of listening samples and subjective evaluation labels
Step 11: establishing a transmission sound quality subjective and objective evaluation model, which comprises the following specific steps:
11.1) taking 70% of 84 listening sample data as a training set for training a transmission sound quality subjective and objective evaluation model, and taking 30% of the 84 listening sample data as a test set for testing the transmission sound quality subjective and objective evaluation model;
11.2) all listening samples are processed in a frame-dividing mode, the frame length is 1s, the frame shift is 1s, and one listening sample is divided into 5 sections in a frame-dividing mode;
11.3) calculating the psychoacoustic index of sound quality of each frame of each listening sample, wherein the calculation process is shown in FIG. 4, and the calculation result is shown in Table 13;
TABLE 13 calculation of psychoacoustic index of sound quality by listening to samples in frames
11.4) the transmission sound quality subjective and objective evaluation model is fitted by a support vector regression method, MATLAB software and libsvm program package are adopted for modeling, the psychoacoustic index of sound quality calculated by framing of a training set and a subjective evaluation label are used as input data for transmission sound quality subjective and objective evaluation model training, an RBF kernel function is selected, an insensitive loss function is taken as epsilon 0.01, an optimal penalty parameter e and a kernel function parameter g are selected by a K-CV (K-fold Cross Validation) Cross Validation method, the lowest mean square error mse in the Cross Validation process is taken as an optimization target function, a Cross Validation parameter v is selected as 3, and the mse formula is calculated as follows:
in the formula: v is the number of cross validation groups, and n is the number of each group of cross validation groups; y is
ijIn order to obtain the true label of the sample,
a transmission sound quality subjective and objective evaluation model prediction label;
using a grid search method, a rough selection is first performed, taking a log base 22e、log2g, the value ranges are [ -8,8 respectively]、[-8,8]The step size of the penalty parameter e and the kernel function parameter g is both 1; fine selection is carried out again according to the coarse selection result, and the log with the base 2 is continuously taken2e、log2g, the value ranges are [ -4,4 ] respectively]、[-4,4]Step sizes of the penalty parameter e and the kernel function parameter g are both 0.1, and finally the optimal cross validation parameter e is selected to be 2.7763, and g is selected to be 0.1147; training the transmission sound quality subjective and objective evaluation model by using 59 groups of listening sample data in the training set, and predicting 25 groups of listening sample subjective evaluation labels in the test set by using the trained transmission sound quality subjective and objective evaluation model to obtain the prediction result of the transmission sound quality subjective and objective evaluation model on the test set subjective evaluation labels, namely the test resultCollecting a predicted value;
11.5) establishing a transmission sound quality subjective and objective evaluation model;
step 12: verifying the transmission sound quality subjective and objective evaluation model established in the step 11, taking a Pearson correlation coefficient and an average absolute error as evaluation indexes, if the Pearson correlation coefficient is greater than 0.9, and the average absolute error is less than 10% of the maximum value of a grading interval, indicating that the transmission sound quality subjective and objective evaluation model is well predicted, and if the Pearson correlation coefficient is not greater than 0.9, returning to the step 6, and performing the sound quality subjective and objective evaluation again;
and the transmission sound quality subjective and objective evaluation model is used for evaluating the Pearson correlation coefficient between the test set predicted value and the test set subjective evaluation label:
in the formula: y is the test set subjective rating label,
subjective evaluation label prediction values of the transmission sound quality subjective and objective evaluation model on the test set,
is y and
covariance between, σ
yIs the mean square error of y, μ
yIs the mean value of y, E
λRepresents a mathematical expectation;
the average absolute error MAE is calculated as follows:
in the formula:
is the predicted value of the ith listening sample, y
iIs a subjective evaluationPrice tag, n
samplesIs the total number of listening samples;
percentage of error WmaeThe calculation formula is as follows:
in the formula:
is the average absolute error between the predicted value of the subjective evaluation label and the subjective evaluation label of the test set, L
maxIs the maximum value of the scoring interval.
Therefore, the prediction accuracy of the transmission sound quality subjective and objective evaluation model meets the requirement, the comparison graph of the model prediction value and the test set subjective evaluation label is shown in fig. 5, and it can be seen from fig. 5 that the error value of the model prediction value and the test set subjective evaluation label of only a few samples is greater than 1, the majority is less than 0.5, and the average error value of all samples is 0.557; the test set predicted value and the test set subjective evaluation label error graph are shown in fig. 6, and it can be seen from fig. 6 that the test set subjective evaluation label and the test set model predicted value have basically consistent trend, high correlation and better transmission sound quality subjective and objective evaluation model prediction accuracy.