CN115547299A

CN115547299A - Quantitative evaluation and classification method and device for controlled voice quality division

Info

Publication number: CN115547299A
Application number: CN202211469949.7A
Authority: CN
Inventors: 潘卫军; 张坚; 蒋培元; 蒋倩兰; 王泆棣; 张玉梅
Original assignee: Civil Aviation Flight University of China
Current assignee: Civil Aviation Flight University of China
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2022-12-30
Anticipated expiration: 2042-11-22
Also published as: CN115547299B

Abstract

The invention discloses a quantitative evaluation and classification method and a device for controlled voice quality division, wherein the method comprises the following steps: s1, inputting voice data of a standard control voice database marked with correct meanings; s2, considering the characteristics of civil aviation land-air communication, constructing an evaluation index system for controlling voice quality division; s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit; s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index; s5, weighting each index by adopting a weighted fusion algorithm to combine a plurality of control voice data sets with the grade quality; the apparatus includes at least one processor and at least one memory. The problem that the voice quality cannot be objectively and quantitatively analyzed and the corresponding relation between the voice quality and each evaluation index cannot be clearly controlled is solved.

Description

Quantitative evaluation and classification method and device for controlled voice quality division

Technical Field

The invention relates to the field of quality measurement of controlled voice data, in particular to a quantitative evaluation and classification method and device for controlled voice quality division.

Background

At present, the main Speech Quality evaluation methods mainly surround Speech Quality evaluation models such as MOS (Mean Opinion Score), PESQ (objective Speech Quality evaluation), PSQM (Perceptual Speech Quality evaluation), and the like, but the evaluation classification method is a very fuzzy evaluation method, which obtains an evaluation Score by mapping a machine learning algorithm and a neural network model according to a level standard determined by people in advance, and has large subjective factors and insufficient objectivity of an evaluation result. In addition, existing objective speech quality assessment methods focus on: the speech quality is represented without reference based on certain specific parameters or is represented by reference comparison based on signals, but the objective evaluation methods can only obtain comprehensive evaluation results, and similar to the black box test method, a set of relatively perfect evaluation index system is not formed in the objective speech evaluation process. The inability to perform objective quantitative analysis on speech quality or find out the measurement units of objective quantitative analysis is a great difficulty in later stage research on the performance of speech recognition software, because the recognition performances corresponding to different speech qualities are different under the same speech recognition software.

According to the above contents, the existing method for classifying the controlled speech quality is not objective enough, cannot objectively and quantitatively analyze the speech quality or find out a measure unit of the objective quantitative analysis, cannot clearly control the corresponding relationship between the speech quality and each evaluation index, and does not form a sound controlled speech evaluation index system.

Disclosure of Invention

The invention aims to solve the problems that the voice quality cannot be objectively and quantitatively analyzed and the corresponding relation between the voice quality and each evaluation index cannot be clearly controlled, and provides a quantitative evaluation and classification method and a device for controlling voice quality classification, which are designed and generated by performing voice data quality classification on a standard control voice database (containing audio and labeled text) established by a project and designing and generating test sets with different control types and different difficulty levels.

In order to achieve the above object, the present invention provides the following technical solutions:

a quantitative evaluation and classification method for controlling voice quality division comprises the following steps:

s1, inputting voice data of a standard control voice database marked with correct meanings;

s2, considering the characteristics of civil aviation land-air communication, constructing an evaluation index system for controlling voice quality division;

s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit;

s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index;

s5, weighting each evaluation index by adopting a weighted fusion algorithm to combine into a control voice data set with multiple levels of quality.

Preferably, in step S2, the evaluation index system for controlling the speech quality division includes an extremely large index, an intermediate index, an extremely small index, and a specific index, where the larger the value of the extremely large index is, the better the speech recognition effect is, including accents, the closer the value of the intermediate index is to a certain intermediate value, the better the speech recognition effect is, including speech speed, tone (pitch), and sound intensity, the smaller the value of the extremely small index is, the better the speech recognition effect is, including continuity, interference degree, professional term proportion, gray vocabulary content, and pitch change, and the when the value of the specific index is a certain value, the better the speech recognition effect is, including language category.

Preferably, the grouping the data analyzed by the single evaluation index by using the clustering method in step S4 to specify the range value of each voice level under the single evaluation index includes the following steps:

step S4-1: inputting the number of grades to be divided and a data set obtained by each single evaluation index analysis method;

step S4-2: and outputting the clustering result and each grade range.

Preferably, the method for outputting the clustering result in step S4-2 is implemented, and comprises the following steps:

step S4-2-1: determining the optimal category number by adopting an elbow method or a contour coefficient method;

step S4-2-2: initializing a class center value, calculating Euclidean distance from each sample point to each class center, assigning each sample to the class closest to the sample point to form a clustering result, wherein the Euclidean distance refers to the real distance between two points in an m-dimensional space or the natural length of a vector, and the formula is as follows:

wherein the content of the first and second substances,

representative sample points

To particle point

The distance of (a) to (b),

representing the kth property of the ith sample,

representing the kth attribute of the jth sample, and sharing m-dimensional attributes;

step S4-2-3: calculating the mean value of all samples in each category of the clustering result as a new clustering center;

step S4-2-4: taking the sum of the distances from the sample to the class center as a target function, and outputting if the iteration is converged or the stopping condition is met; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;

step S4-2-5: the algorithm uses iterative computation, the global optimal solution is difficult to achieve, a heuristic strategy is adopted for the solution, and Nash equilibrium is utilized to achieve the optimal solution of the problem.

Preferably, the regulated speech quality is classified into 1-5 classes, with higher classes giving better speech quality.

Preferably, the weighting fusion algorithm adopted in step S5 includes a subjective weighting method and an objective weighting method, and the implementation steps are as follows:

step S5-1: the subjective weighting method is to use expert experience to adjust and optimize the weighting value objectively given by the evaluation index, and is to use a 1-9 scale method to carry out quantitative comparison on the importance degree of each index belonging to one level relative to the same index of the previous level to form a judgment matrix X, use a maximum eigenvector method to calculate the eigenvector of the corresponding characteristic root of the judgment matrix, and use the eigenvector as the weight of each index when the judgment matrix is checked to meet the consistency;

step S5-2: the objective weighting method comprises the following steps:

step S5-2-1: the indexes are converted into the positive indexes, namely the extremely small indexes and the intermediate indexes are converted into the extremely large indexes:

very small- > very large:

intermediate- > very large:

wherein the content of the first and second substances,

in order to identify the optimal value of the effect, the voice centralized value obtained by the evaluation index method is taken as the optimal value,

is a forward numerical value;

step S5-2-2: data normalization, dimensional error between balance indices:

wherein the content of the first and second substances,

the numerical value of the ith voice under the jth evaluation index is obtained;

step S5-2-3: data normalization, unifying to interval 0-1:

wherein n is the number of evaluation objects;

s5-2-4, calculating information entropy of each evaluation index

Information entropy of each evaluation index:

wherein n is the number of the evaluation objects, m is the number of the evaluation indexes, and the value of j is taken from 1 to m;

step S5-2-5: calculating the weight:

wherein the value of j is taken from 1 to m;

step S5-3: fusion of subjective and objective weights:

wherein n is the number of evaluation objects,

in order to be the subjective weight, the user can select,

is an objective weight;

step S5-4: each voice is given a composite score:

wherein the content of the first and second substances,

taking the value of i from 1 to n for the normalization of the value of the ith evaluation object under the jth evaluation index;

step S5-5: each controlled speech quality level score range:

the comprehensive score is calculated according to the evaluation method in the whole standard control voice database, the comprehensive score sequence of the whole database is divided according to 5 grades, the interval range of each grade is the score range of each quality grade, the interval range is 0-1, all evaluation indexes are subjected to forward processing, and therefore the quality is better when the comprehensive score value is larger, and the 5-grade quality is optimal.

A device for quantitative evaluation and classification for managing voice quality division comprises at least one processor and a memory which is in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the steps of the classification method.

Compared with the prior art, the invention has the following beneficial effects:

1. establishing a control voice quality evaluation index system, carrying out quantitative analysis on each evaluation index in the evaluation index system, realizing objective quantitative research on control voice quality, defining a metering unit for objective quantitative analysis on control voice quality, and acquiring a corresponding relation between control voice quality and each evaluation index;

2. according to quantitative analysis of evaluation indexes and a main empowerment fusion algorithm, an objective control voice quality division method is established, third-party control voice recognition software is tested according to the divided voice data sets with different quality grades, the aviation unit can conveniently select the control voice recognition software, and the air traffic control efficiency, safety, reliability and effectiveness are improved.

Drawings

FIG. 1 is a diagram of a partitioning technique for controlling speech quality;

FIG. 2 is a block diagram of a quantitative analysis structure for controlling speech quality classification;

FIG. 3 is a diagram of index classification;

FIG. 4 is a diagram illustrating the index values and the recognition effect trend;

FIG. 5 is a diagram of the effect of the first part of the index quantization grading;

FIG. 6 is a diagram of the effect of the second part of the index quantization step;

FIG. 7 is a roadmap for the assessment technique for governing speech quality ratings.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

Examples

The embodiment of the application collects control voice linguistic data from an actual environment of civil aviation operation by depending on a project background to establish a standard control voice database, wherein the linguistic content comprises multiple scenes, different controller pronunciations, different control instruction voices, different flight stages, an ultra-large amount of Liu Kongmo radio communication phrase vocabularies, single or mixed language pronunciations and other air management characteristics, a corresponding control instruction text is marked for each control voice audio in the database, and quantitative evaluation and classification are carried out on data in the database.

The implementation process and steps of the embodiment of the application are as follows, the flow block diagram is shown in fig. 1, and the quantitative analysis structure block diagram for controlling the voice quality division is shown in fig. 2:

s2, considering the characteristics of the land-air communication, constructing an evaluation index system for controlling voice quality division;

s5, weighting each evaluation index by adopting a weighted fusion algorithm to combine into a control voice data set with 5 levels of quality.

In step S2, the index is classified as shown in fig. 3:

the evaluation index system for controlling the voice quality division comprises an extremely large index, an intermediate index, an extremely small index and a specified index, wherein the larger the value of the extremely large index is, the better the voice recognition effect is, including accent, the closer the value of the intermediate index is to a certain intermediate value, the better the voice recognition effect is, including speech speed, tone (tone) and sound intensity, the smaller the value of the extremely small index is, the better the voice recognition effect is, including continuity, interference degree, professional term proportion, gray vocabulary content and sound variation, and when the value of the specified index is a certain value, the voice recognition effect is good, including language category, and the recognition effect of single-language voice is better than that of mixed-language voice.

The step of qualitatively analyzing each evaluation index system described in step S3 is as follows, the index quantitative grading effect graph is shown in fig. 5 and 6, and fig. 5 and 6 are two graphs divided from one whole graph:

step S3-1: the unit of speech rate quantization is word/second (Chinese), syllable/second (English), the speech rate analysis method includes the following steps:

step S3-1-1: performing framing, windowing and preprocessing on an input voice signal;

step S3-1-2: detecting an audio segment of valid speech; calculating the frame number of the effective audio frequency to obtain the effective pronunciation time;

step S3-1-3: processing a text corresponding to the audio to obtain the effective character number or vocabulary number of the audio text;

step S3-1-4: calculating the speech rate, speech rate = number of valid audio frames/number of syllables (or number of characters);

step S3-2: the tone (tone) quantization unit is the pitch change frequency, and the tone (tone) analysis method comprises the following steps:

s3-2-1, performing framing, windowing and preprocessing on the input voice signal, and filtering out other interference factors;

s3-2-2, performing Fourier transform on the preprocessed framing signals, and extracting time domain and frequency domain characteristic information of the voice waveform;

s3-2-3, directly estimating the waveform variation trend by a time domain and frequency domain estimation method of a voice waveform;

step S3-3: amplitude (dB) is taken as a sound intensity quantization unit, and the sound intensity analysis method comprises the following steps:

step S3-3-1: performing framing, windowing and preprocessing on an input voice signal;

step S3-3-2: obtaining each frequency and amplitude value through short-time Fourier transform and splitting an original signal;

step S3-3-3: performing normal distribution description on each amplitude value in the voice, and taking an expected value of the normal distribution as a sound intensity measurement value of the voice;

step S3-4: the accent quantization unit is similarity, and the accent analysis method comprises the following steps:

step S3-4-1: establishing a standard mandarin chinese phoneme library, and mapping different sound characteristics into corresponding phonemes;

step S3-4-2: extracting phonemes of the input speech by using a phoneme extraction algorithm;

step S3-4-3: comparing the difference of the standard pronunciation and the pronunciation of the accent of the input system, decoding the input voice by the acoustic model to obtain a voice characteristic sequence, comparing the voice characteristic sequence with a characteristic sequence of standard Mandarin, expressing the characteristic sequence by using a characteristic vector, and calculating the similarity between the two characteristic vectors;

step S3-5: the continuity quantification unit is the number of continuity abnormal segments in a piece of voice, and the continuity analysis method comprises the following steps:

step S3-5-1: preprocessing input voice;

step S3-5-2: removing the mute sections at the head end and the tail end of each voice by using an energy-based voice endpoint detection method, and marking out a continuity abnormal section in the effective voice;

step S3-5-3: the voice endpoint detection method based on the voice marks the part without the pronunciation of the speaker in the continuity abnormal section in the effective voice section in the step S3-5-2;

step S3-5-4: based on a context judgment algorithm, whether the phonetic segments marked in the step S3-5-3 belong to normal punctuation or the same phonetic segment is marked, and if the phonetic segments belong to the same phonetic segment, the duration of the phonetic segment is counted;

step S3-6: the interference degree quantization unit is a noise energy value, and the interference degree analysis method comprises the following steps:

step S3-6-1: carrying out short-time Fourier transform on input voice to respectively smooth a time domain and a frequency domain to obtain a short-time local energy spectrum value of the voice with noise;

step S3-6-2: the ratio of the energy spectrum value to the local minimum value is used as a threshold to remove the noise energy in the voice with noise;

step S3-6-3: continuously updating noise energy according to a threshold judgment result in a judgment process until an optimal noise reduction effect is obtained, wherein an energy value when the optimal noise reduction effect is obtained is used as an interference degree;

step S3-7: the unit of the language category quantization is the language category (Chinese-0, english-1, chinese-English hybrid-2).

The index value and the recognition effect are shown in fig. 4:

the higher the accent similarity of the voice data is, the better the voice recognition effect is when the speech rate of the voice data, the gene frequency of the tone (tone), and the amplitude of the sound intensity are, and at a certain intermediate value, the voice recognition effect is, the lower the number of abnormal segments, the noise energy, the occupation ratio of the professional terms, the number of gray vocabulary contents, and the number of pitch changes of the voice data are, the better the voice recognition effect is, and when the language category of the voice data is 1, the best the voice recognition effect is.

The language category analysis method comprises the following steps:

step S3-7-1: building Chinese and English speech recognizers, wherein each speech recognizer pertinently contains speech characteristics of respective language;

step S3-7-2: extracting the characteristics of the input voice, matching the characteristics with the voice characteristics of various languages, and determining the category of the voice language;

step S3-8: the quantitative unit of the professional term is the proportion of civil aviation professional terms in a voice text, and the analysis method of the proportion of the professional terms comprises the following steps:

step S3-8-1: acquiring correct texts corresponding to each text in a controlled voice database (based on manual labeling/semi-automation);

step S3-8-2: performing text sentence breaking, word segmentation, character discrimination and other processing by using a text analysis algorithm;

step S3-8-3: establishing a control instruction professional term dictionary by referring to the air traffic radio communication term, matching the vocabulary extracted in the step S3-8-2 with the dictionary through a matching algorithm, and counting the number of matched words as the voice professional term content;

step S3-9: the grey vocabulary content quantization unit is grey vocabulary content, and the grey vocabulary content analysis method comprises the following steps:

step S3-9-1: training an acoustic model by adopting an induced word bank;

step S3-9-2: performing framing, windowing and preprocessing on input voice, and extracting voice characteristics;

step S3-9-3: the acoustic model in the step S3-9-1 receives the voice characteristics in the step S3-9-2, detects an audio segment of input voice containing a sensed word, establishes a gating mechanism by combining a context discrimination algorithm, discriminates the sensed word and determines whether the audio segment is reserved or not;

step S3-9-4: labeling the voice frequency bands of the sensed voice words detected as meaningless, and counting the number of the meaningless voice frequency bands in the whole voice;

step S3-10: the sound change quantization unit is the number of sound changes, and the sound change analysis method comprises the following steps:

step S3-10-1: constructing a complete polyphone dictionary and a merged vocabulary library which is easy to change;

step S3-10-2: acquiring correct texts corresponding to each text in a controlled voice database (based on manual labeling/semi-automation);

step S3-10-3: performing word segmentation, part of speech tagging, character discrimination and other processing by using a text analysis algorithm;

step S3-10-4: matching the controlled voice text with the polyphone dictionary and the merged vocabulary in the step S3-10-1 by adopting a matching algorithm, and counting the polyphone and vocabulary number contained in the text.

The implementation of the content in step S4 includes the following steps:

step S4-1: inputting the number of grades to be divided and a data set ({ x) obtained by each single evaluation index analysis method ₁ x ₂ x ₃ ,…,x _n N is the number of data in the data set);

step S4-2: outputting clustering results and all grade ranges;

the method for grouping the data obtained by analyzing the single evaluation index by adopting the clustering method in the step S4 and specifying the range value of each voice grade under the single evaluation index comprises the following steps:

step S4-2-1: since the voice data is classified into classes which are not previously specified (the classes refer to the classes classified based on a certain voice evaluation index and the concept that the quality classes mentioned in the patent are different), the optimal class number k (i.e. the number of the points to be gathered and the set of the centroids { c) is determined by the elbow method or the contour coefficient method for the acquired data set ₁ c ₂ c ₃ ,…c _k |c _i May or may not be intra-dataset values });

step S4-2-2: initialized particle (class center) value x _j∈ {c ₁ c ₂ c ₃ ,…c _k Calculating Euclidean distance from each sample point to the center of each class, assigning each sample to the class closest to the sample point to form a clustering result, wherein the Euclidean distance refers to the real distance between two points in the m-dimensional spaceThe distance, or the natural length of the vector, is given by the formula:

wherein the content of the first and second substances,

representative sample points

To particle point

The distance of (a) to (b),

representing the kth property of the ith sample,

the kth attribute representing the jth sample has m-dimensional attributes, and in the invention, each evaluation index is analyzed to obtain a one-dimensional data value, so that m =1, and the calculation formula of the Euclidean distance in the invention is as follows:

step S4-2-3: updating a clustering central point: calculating the mean value of all samples in the cluster as a new cluster center for the cluster result;

step S4-2-4: updating a clustering central point: taking the sum of the distances from the sample to the class center as a target function, and outputting if iteration converges or meets a stopping condition; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;

step S4-2-5: iterative computation is used in the algorithm, the global optimal solution is difficult to achieve, and a heuristic strategy can be adopted to search for Nash equilibrium and the problem optimal solution.

The step of controlling the quality rating of the voice is as follows, and a route diagram of the controlling quality rating technology is shown in fig. 7:

according to the qualification approval rules of civil aircraft drivers, flight instructors and ground instructors (CCAR-61 division), < MH/T4014-2003 radio communication words for air traffic and guidance of people in the civil aviation field, 1-5 grades of controlled voice quality are innovatively proposed, the higher the grade is, the better the voice quality is, the 5 grades are the highest quality, and the judgment standards of all grades are as follows:

1) Level 1: the occupation ratio of the control professional vocabularies is small; too fast or too slow a speech rate; chinese-English mixed pronunciation voice; under the influence of own native language or region, mandarin chinese has a few accents; vocabulary audio (polyphones, homophones) containing misleading semantic understanding; the interference of transmission channels, ambient noise and the like is large;

2) And 2, stage: the occupation rate of the control professional vocabularies is low; too fast or too slow a speech rate; a small amount of gray words and Chinese-English mixed pronunciation exist; affected by own native language or region, mandarin with slight accent; lexical audio with individual misleading semantic understandings;

3) And 3, level: the occupation rate of the professional vocabularies of control is common; the speed of speech is normal; pronunciation of words without gray color; single language voice; speech signals occasionally pause; the audio frequency is interfered to a lesser extent; no accent;

4) 4, level: the control professional vocabulary accounts for a large rate; the speech definition is good; the speed of speech is normal; the voice is fluent; there are individual misleading semantic understanding vocabulary audios;

5) And 5, stage: the ratio of the control professional vocabularies is high; the interference degree is small; the voice is fluent; standard of mandarin pronunciation; lexical audio without misleading semantic understanding.

Step S5-1: the subjective weighting method is to use expert experience to adjust and optimize the weighting value objectively assigned by evaluation indexes, so that the weighting is more scientific and reasonable, thereby realizing the quantitative and visual display of the quality condition of the controlled voice, and the method is to use a 1-9 scale method to carry out quantitative comparison on every two of the importance degrees of each index belonging to the same level relative to the same index of the same level on the same level to form a judgment matrix X, use a maximum eigenvector method to calculate the eigenvector of the corresponding characteristic root of the judgment matrix, and use the eigenvector as the weight of each index when the judgment matrix is checked to meet the consistency;

step S5-2: the objective weighting method comprises the following steps:

very small- > very large:

intermediate- > very large:

wherein the content of the first and second substances,

is a forward numerical value;

step S5-2-2: data normalization, dimensional error between balance indices:

wherein the content of the first and second substances,

step S5-2-3: data normalization, unifying to interval 0-1:

wherein n is the number of evaluation objects;

s5-2-4, calculating information entropy of each evaluation index

Information entropy of each evaluation index:

wherein n is the number of evaluation objects, m is the number of evaluation indexes, and the value of j is taken from 1 to m;

step S5-2-5: calculating the weight:

wherein the value of j is taken from 1 to m;

step S5-3: fusion of subjective and objective weights:

wherein n is the number of evaluation objects,

in order to be the subjective weight, the user can select,

is an objective weight;

step S5-4: the comprehensive score of each voice is as follows:

wherein the content of the first and second substances,

step S5-5: each controlled speech quality level score range:

the comprehensive score is calculated by the whole standard control voice database according to the evaluation method, the comprehensive score sequence of the whole database is divided according to 5 grades, the interval range of each grade is the score range of each quality grade, the interval range is 0 to 1, all evaluation indexes are subjected to forward processing, and therefore the quality is better when the comprehensive score value is larger, and the quality of 5 grades is optimal.

A quantitative evaluation and classification device for voice quality control division adopts a Core i7-12700 processor, a memory adopts a Samsung 980 PRO 1T solid state disk and 4 NVIDIA P40 GPUs to accelerate the processing speed of relevant steps.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A quantitative evaluation and classification method for controlling voice quality division is characterized by comprising the following steps:

and S5, weighting each evaluation index by adopting a weighted fusion algorithm to combine a plurality of control voice data sets with the grade quality.

2. The quantitative evaluation and classification method for controlling speech quality division according to claim 1, wherein in step S2, the evaluation index system for controlling speech quality division includes an ultra-large index, an intermediate index, an ultra-small index, and a specified index, the ultra-large index has a larger value, and a better speech recognition effect including accents, the intermediate index has a value closer to a certain intermediate value, and the speech recognition effect including speech speed, pitch, and pitch is better, the ultra-small index has a smaller value, and the speech recognition effect including continuity, interference degree, professional term proportion, gray vocabulary content, and pitch change is better, and the specified index has a value, and the speech recognition effect includes a language category.

3. The method for quantitatively evaluating and classifying controlled speech quality division according to claim 1, wherein the step S4 of grouping the data analyzed by the single evaluation index by using the clustering method to specify the range value of each speech level under the single evaluation index comprises the following steps:

step S4-2: and outputting the clustering result and each grade range.

4. The quantitative evaluation and classification method for controlling speech quality classification according to claim 3, wherein the method for outputting the clustering result in step S4-2 comprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

representative sample points

To particle point

The distance of (a) to (b),

representing the kth property of the ith sample,

step S4-2-4: taking the sum of the distances from the sample to the class center as a target function, and outputting if iteration converges or meets a stopping condition; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;

5. The method as claimed in claim 1, wherein the classification of the controlled speech quality is classified into 1-5 grades, and the higher the grade is, the better the speech quality is.

6. The method as claimed in claim 2, wherein the weighting fusion algorithm used in step S5 includes a subjective weighting method and an objective weighting method, and the method includes the following steps:

step S5-2: the objective weighting method comprises the following steps:

very small- > very large:

intermediate- > very large:

wherein the content of the first and second substances,

is a forward numerical value;

step S5-2-2: data normalization, dimensional error between balance indices:

wherein the content of the first and second substances,

the value of the ith voice under the jth evaluation index is obtained;

step S5-2-3: data normalization, unifying to interval 0-1:

wherein n is the number of evaluation objects;

s5-2-4, calculating information entropy of each evaluation index

Information entropy of each evaluation index:

step S5-2-5: calculating the weight:

wherein the value of j is taken from 1 to m;

step S5-3: fusion of subjective and objective weights:

wherein n is the number of evaluation objects,

in order to be the subjective weight, the user can select,

is an objective weight;

step S5-4: each voice is given a composite score:

wherein the content of the first and second substances,

step S5-5: each controlled speech quality level score range:

7. The device for quantitatively evaluating and classifying the quality division of the control voice is characterized by comprising at least one processor and a memory which is in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the classification method of claim 1.