CN115547299A - Quantitative evaluation and classification method and device for controlled voice quality division - Google Patents

Quantitative evaluation and classification method and device for controlled voice quality division Download PDF

Info

Publication number
CN115547299A
CN115547299A CN202211469949.7A CN202211469949A CN115547299A CN 115547299 A CN115547299 A CN 115547299A CN 202211469949 A CN202211469949 A CN 202211469949A CN 115547299 A CN115547299 A CN 115547299A
Authority
CN
China
Prior art keywords
index
voice
value
evaluation
evaluation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211469949.7A
Other languages
Chinese (zh)
Other versions
CN115547299B (en
Inventor
潘卫军
张坚
蒋培元
蒋倩兰
王泆棣
张玉梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation Flight University of China
Original Assignee
Civil Aviation Flight University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation Flight University of China filed Critical Civil Aviation Flight University of China
Priority to CN202211469949.7A priority Critical patent/CN115547299B/en
Publication of CN115547299A publication Critical patent/CN115547299A/en
Application granted granted Critical
Publication of CN115547299B publication Critical patent/CN115547299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a quantitative evaluation and classification method and a device for controlled voice quality division, wherein the method comprises the following steps: s1, inputting voice data of a standard control voice database marked with correct meanings; s2, considering the characteristics of civil aviation land-air communication, constructing an evaluation index system for controlling voice quality division; s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit; s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index; s5, weighting each index by adopting a weighted fusion algorithm to combine a plurality of control voice data sets with the grade quality; the apparatus includes at least one processor and at least one memory. The problem that the voice quality cannot be objectively and quantitatively analyzed and the corresponding relation between the voice quality and each evaluation index cannot be clearly controlled is solved.

Description

Quantitative evaluation and classification method and device for controlled voice quality division
Technical Field
The invention relates to the field of quality measurement of controlled voice data, in particular to a quantitative evaluation and classification method and device for controlled voice quality division.
Background
At present, the main Speech Quality evaluation methods mainly surround Speech Quality evaluation models such as MOS (Mean Opinion Score), PESQ (objective Speech Quality evaluation), PSQM (Perceptual Speech Quality evaluation), and the like, but the evaluation classification method is a very fuzzy evaluation method, which obtains an evaluation Score by mapping a machine learning algorithm and a neural network model according to a level standard determined by people in advance, and has large subjective factors and insufficient objectivity of an evaluation result. In addition, existing objective speech quality assessment methods focus on: the speech quality is represented without reference based on certain specific parameters or is represented by reference comparison based on signals, but the objective evaluation methods can only obtain comprehensive evaluation results, and similar to the black box test method, a set of relatively perfect evaluation index system is not formed in the objective speech evaluation process. The inability to perform objective quantitative analysis on speech quality or find out the measurement units of objective quantitative analysis is a great difficulty in later stage research on the performance of speech recognition software, because the recognition performances corresponding to different speech qualities are different under the same speech recognition software.
According to the above contents, the existing method for classifying the controlled speech quality is not objective enough, cannot objectively and quantitatively analyze the speech quality or find out a measure unit of the objective quantitative analysis, cannot clearly control the corresponding relationship between the speech quality and each evaluation index, and does not form a sound controlled speech evaluation index system.
Disclosure of Invention
The invention aims to solve the problems that the voice quality cannot be objectively and quantitatively analyzed and the corresponding relation between the voice quality and each evaluation index cannot be clearly controlled, and provides a quantitative evaluation and classification method and a device for controlling voice quality classification, which are designed and generated by performing voice data quality classification on a standard control voice database (containing audio and labeled text) established by a project and designing and generating test sets with different control types and different difficulty levels.
In order to achieve the above object, the present invention provides the following technical solutions:
a quantitative evaluation and classification method for controlling voice quality division comprises the following steps:
s1, inputting voice data of a standard control voice database marked with correct meanings;
s2, considering the characteristics of civil aviation land-air communication, constructing an evaluation index system for controlling voice quality division;
s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit;
s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index;
s5, weighting each evaluation index by adopting a weighted fusion algorithm to combine into a control voice data set with multiple levels of quality.
Preferably, in step S2, the evaluation index system for controlling the speech quality division includes an extremely large index, an intermediate index, an extremely small index, and a specific index, where the larger the value of the extremely large index is, the better the speech recognition effect is, including accents, the closer the value of the intermediate index is to a certain intermediate value, the better the speech recognition effect is, including speech speed, tone (pitch), and sound intensity, the smaller the value of the extremely small index is, the better the speech recognition effect is, including continuity, interference degree, professional term proportion, gray vocabulary content, and pitch change, and the when the value of the specific index is a certain value, the better the speech recognition effect is, including language category.
Preferably, the grouping the data analyzed by the single evaluation index by using the clustering method in step S4 to specify the range value of each voice level under the single evaluation index includes the following steps:
step S4-1: inputting the number of grades to be divided and a data set obtained by each single evaluation index analysis method;
step S4-2: and outputting the clustering result and each grade range.
Preferably, the method for outputting the clustering result in step S4-2 is implemented, and comprises the following steps:
step S4-2-1: determining the optimal category number by adopting an elbow method or a contour coefficient method;
step S4-2-2: initializing a class center value, calculating Euclidean distance from each sample point to each class center, assigning each sample to the class closest to the sample point to form a clustering result, wherein the Euclidean distance refers to the real distance between two points in an m-dimensional space or the natural length of a vector, and the formula is as follows:
Figure 673367DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 403556DEST_PATH_IMAGE003
representative sample points
Figure 245611DEST_PATH_IMAGE004
To particle point
Figure 188290DEST_PATH_IMAGE005
The distance of (a) to (b),
Figure 799400DEST_PATH_IMAGE006
representing the kth property of the ith sample,
Figure 169332DEST_PATH_IMAGE007
representing the kth attribute of the jth sample, and sharing m-dimensional attributes;
step S4-2-3: calculating the mean value of all samples in each category of the clustering result as a new clustering center;
step S4-2-4: taking the sum of the distances from the sample to the class center as a target function, and outputting if the iteration is converged or the stopping condition is met; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;
step S4-2-5: the algorithm uses iterative computation, the global optimal solution is difficult to achieve, a heuristic strategy is adopted for the solution, and Nash equilibrium is utilized to achieve the optimal solution of the problem.
Preferably, the regulated speech quality is classified into 1-5 classes, with higher classes giving better speech quality.
Preferably, the weighting fusion algorithm adopted in step S5 includes a subjective weighting method and an objective weighting method, and the implementation steps are as follows:
step S5-1: the subjective weighting method is to use expert experience to adjust and optimize the weighting value objectively given by the evaluation index, and is to use a 1-9 scale method to carry out quantitative comparison on the importance degree of each index belonging to one level relative to the same index of the previous level to form a judgment matrix X, use a maximum eigenvector method to calculate the eigenvector of the corresponding characteristic root of the judgment matrix, and use the eigenvector as the weight of each index when the judgment matrix is checked to meet the consistency;
step S5-2: the objective weighting method comprises the following steps:
step S5-2-1: the indexes are converted into the positive indexes, namely the extremely small indexes and the intermediate indexes are converted into the extremely large indexes:
very small- > very large:
Figure 233103DEST_PATH_IMAGE009
intermediate- > very large:
Figure 494320DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 976248DEST_PATH_IMAGE012
in order to identify the optimal value of the effect, the voice centralized value obtained by the evaluation index method is taken as the optimal value,
Figure 766350DEST_PATH_IMAGE013
is a forward numerical value;
step S5-2-2: data normalization, dimensional error between balance indices:
wherein the content of the first and second substances,
Figure 68149DEST_PATH_IMAGE014
the numerical value of the ith voice under the jth evaluation index is obtained;
step S5-2-3: data normalization, unifying to interval 0-1:
Figure 601899DEST_PATH_IMAGE016
wherein n is the number of evaluation objects;
s5-2-4, calculating information entropy of each evaluation index
Information entropy of each evaluation index:
Figure 938333DEST_PATH_IMAGE018
wherein n is the number of the evaluation objects, m is the number of the evaluation indexes, and the value of j is taken from 1 to m;
step S5-2-5: calculating the weight:
Figure 164915DEST_PATH_IMAGE020
wherein the value of j is taken from 1 to m;
step S5-3: fusion of subjective and objective weights:
Figure DEST_PATH_IMAGE022
wherein n is the number of evaluation objects,
Figure 100002_DEST_PATH_IMAGE023
in order to be the subjective weight, the user can select,
Figure DEST_PATH_IMAGE024
is an objective weight;
step S5-4: each voice is given a composite score:
Figure DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE027
taking the value of i from 1 to n for the normalization of the value of the ith evaluation object under the jth evaluation index;
step S5-5: each controlled speech quality level score range:
the comprehensive score is calculated according to the evaluation method in the whole standard control voice database, the comprehensive score sequence of the whole database is divided according to 5 grades, the interval range of each grade is the score range of each quality grade, the interval range is 0-1, all evaluation indexes are subjected to forward processing, and therefore the quality is better when the comprehensive score value is larger, and the 5-grade quality is optimal.
A device for quantitative evaluation and classification for managing voice quality division comprises at least one processor and a memory which is in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the steps of the classification method.
Compared with the prior art, the invention has the following beneficial effects:
1. establishing a control voice quality evaluation index system, carrying out quantitative analysis on each evaluation index in the evaluation index system, realizing objective quantitative research on control voice quality, defining a metering unit for objective quantitative analysis on control voice quality, and acquiring a corresponding relation between control voice quality and each evaluation index;
2. according to quantitative analysis of evaluation indexes and a main empowerment fusion algorithm, an objective control voice quality division method is established, third-party control voice recognition software is tested according to the divided voice data sets with different quality grades, the aviation unit can conveniently select the control voice recognition software, and the air traffic control efficiency, safety, reliability and effectiveness are improved.
Drawings
FIG. 1 is a diagram of a partitioning technique for controlling speech quality;
FIG. 2 is a block diagram of a quantitative analysis structure for controlling speech quality classification;
FIG. 3 is a diagram of index classification;
FIG. 4 is a diagram illustrating the index values and the recognition effect trend;
FIG. 5 is a diagram of the effect of the first part of the index quantization grading;
FIG. 6 is a diagram of the effect of the second part of the index quantization step;
FIG. 7 is a roadmap for the assessment technique for governing speech quality ratings.
Detailed Description
The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.
Examples
The embodiment of the application collects control voice linguistic data from an actual environment of civil aviation operation by depending on a project background to establish a standard control voice database, wherein the linguistic content comprises multiple scenes, different controller pronunciations, different control instruction voices, different flight stages, an ultra-large amount of Liu Kongmo radio communication phrase vocabularies, single or mixed language pronunciations and other air management characteristics, a corresponding control instruction text is marked for each control voice audio in the database, and quantitative evaluation and classification are carried out on data in the database.
The implementation process and steps of the embodiment of the application are as follows, the flow block diagram is shown in fig. 1, and the quantitative analysis structure block diagram for controlling the voice quality division is shown in fig. 2:
s1, inputting voice data of a standard control voice database marked with correct meanings;
s2, considering the characteristics of the land-air communication, constructing an evaluation index system for controlling voice quality division;
s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit;
s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index;
s5, weighting each evaluation index by adopting a weighted fusion algorithm to combine into a control voice data set with 5 levels of quality.
In step S2, the index is classified as shown in fig. 3:
the evaluation index system for controlling the voice quality division comprises an extremely large index, an intermediate index, an extremely small index and a specified index, wherein the larger the value of the extremely large index is, the better the voice recognition effect is, including accent, the closer the value of the intermediate index is to a certain intermediate value, the better the voice recognition effect is, including speech speed, tone (tone) and sound intensity, the smaller the value of the extremely small index is, the better the voice recognition effect is, including continuity, interference degree, professional term proportion, gray vocabulary content and sound variation, and when the value of the specified index is a certain value, the voice recognition effect is good, including language category, and the recognition effect of single-language voice is better than that of mixed-language voice.
The step of qualitatively analyzing each evaluation index system described in step S3 is as follows, the index quantitative grading effect graph is shown in fig. 5 and 6, and fig. 5 and 6 are two graphs divided from one whole graph:
step S3-1: the unit of speech rate quantization is word/second (Chinese), syllable/second (English), the speech rate analysis method includes the following steps:
step S3-1-1: performing framing, windowing and preprocessing on an input voice signal;
step S3-1-2: detecting an audio segment of valid speech; calculating the frame number of the effective audio frequency to obtain the effective pronunciation time;
step S3-1-3: processing a text corresponding to the audio to obtain the effective character number or vocabulary number of the audio text;
step S3-1-4: calculating the speech rate, speech rate = number of valid audio frames/number of syllables (or number of characters);
step S3-2: the tone (tone) quantization unit is the pitch change frequency, and the tone (tone) analysis method comprises the following steps:
s3-2-1, performing framing, windowing and preprocessing on the input voice signal, and filtering out other interference factors;
s3-2-2, performing Fourier transform on the preprocessed framing signals, and extracting time domain and frequency domain characteristic information of the voice waveform;
s3-2-3, directly estimating the waveform variation trend by a time domain and frequency domain estimation method of a voice waveform;
step S3-3: amplitude (dB) is taken as a sound intensity quantization unit, and the sound intensity analysis method comprises the following steps:
step S3-3-1: performing framing, windowing and preprocessing on an input voice signal;
step S3-3-2: obtaining each frequency and amplitude value through short-time Fourier transform and splitting an original signal;
step S3-3-3: performing normal distribution description on each amplitude value in the voice, and taking an expected value of the normal distribution as a sound intensity measurement value of the voice;
step S3-4: the accent quantization unit is similarity, and the accent analysis method comprises the following steps:
step S3-4-1: establishing a standard mandarin chinese phoneme library, and mapping different sound characteristics into corresponding phonemes;
step S3-4-2: extracting phonemes of the input speech by using a phoneme extraction algorithm;
step S3-4-3: comparing the difference of the standard pronunciation and the pronunciation of the accent of the input system, decoding the input voice by the acoustic model to obtain a voice characteristic sequence, comparing the voice characteristic sequence with a characteristic sequence of standard Mandarin, expressing the characteristic sequence by using a characteristic vector, and calculating the similarity between the two characteristic vectors;
step S3-5: the continuity quantification unit is the number of continuity abnormal segments in a piece of voice, and the continuity analysis method comprises the following steps:
step S3-5-1: preprocessing input voice;
step S3-5-2: removing the mute sections at the head end and the tail end of each voice by using an energy-based voice endpoint detection method, and marking out a continuity abnormal section in the effective voice;
step S3-5-3: the voice endpoint detection method based on the voice marks the part without the pronunciation of the speaker in the continuity abnormal section in the effective voice section in the step S3-5-2;
step S3-5-4: based on a context judgment algorithm, whether the phonetic segments marked in the step S3-5-3 belong to normal punctuation or the same phonetic segment is marked, and if the phonetic segments belong to the same phonetic segment, the duration of the phonetic segment is counted;
step S3-6: the interference degree quantization unit is a noise energy value, and the interference degree analysis method comprises the following steps:
step S3-6-1: carrying out short-time Fourier transform on input voice to respectively smooth a time domain and a frequency domain to obtain a short-time local energy spectrum value of the voice with noise;
step S3-6-2: the ratio of the energy spectrum value to the local minimum value is used as a threshold to remove the noise energy in the voice with noise;
step S3-6-3: continuously updating noise energy according to a threshold judgment result in a judgment process until an optimal noise reduction effect is obtained, wherein an energy value when the optimal noise reduction effect is obtained is used as an interference degree;
step S3-7: the unit of the language category quantization is the language category (Chinese-0, english-1, chinese-English hybrid-2).
The index value and the recognition effect are shown in fig. 4:
the higher the accent similarity of the voice data is, the better the voice recognition effect is when the speech rate of the voice data, the gene frequency of the tone (tone), and the amplitude of the sound intensity are, and at a certain intermediate value, the voice recognition effect is, the lower the number of abnormal segments, the noise energy, the occupation ratio of the professional terms, the number of gray vocabulary contents, and the number of pitch changes of the voice data are, the better the voice recognition effect is, and when the language category of the voice data is 1, the best the voice recognition effect is.
The language category analysis method comprises the following steps:
step S3-7-1: building Chinese and English speech recognizers, wherein each speech recognizer pertinently contains speech characteristics of respective language;
step S3-7-2: extracting the characteristics of the input voice, matching the characteristics with the voice characteristics of various languages, and determining the category of the voice language;
step S3-8: the quantitative unit of the professional term is the proportion of civil aviation professional terms in a voice text, and the analysis method of the proportion of the professional terms comprises the following steps:
step S3-8-1: acquiring correct texts corresponding to each text in a controlled voice database (based on manual labeling/semi-automation);
step S3-8-2: performing text sentence breaking, word segmentation, character discrimination and other processing by using a text analysis algorithm;
step S3-8-3: establishing a control instruction professional term dictionary by referring to the air traffic radio communication term, matching the vocabulary extracted in the step S3-8-2 with the dictionary through a matching algorithm, and counting the number of matched words as the voice professional term content;
step S3-9: the grey vocabulary content quantization unit is grey vocabulary content, and the grey vocabulary content analysis method comprises the following steps:
step S3-9-1: training an acoustic model by adopting an induced word bank;
step S3-9-2: performing framing, windowing and preprocessing on input voice, and extracting voice characteristics;
step S3-9-3: the acoustic model in the step S3-9-1 receives the voice characteristics in the step S3-9-2, detects an audio segment of input voice containing a sensed word, establishes a gating mechanism by combining a context discrimination algorithm, discriminates the sensed word and determines whether the audio segment is reserved or not;
step S3-9-4: labeling the voice frequency bands of the sensed voice words detected as meaningless, and counting the number of the meaningless voice frequency bands in the whole voice;
step S3-10: the sound change quantization unit is the number of sound changes, and the sound change analysis method comprises the following steps:
step S3-10-1: constructing a complete polyphone dictionary and a merged vocabulary library which is easy to change;
step S3-10-2: acquiring correct texts corresponding to each text in a controlled voice database (based on manual labeling/semi-automation);
step S3-10-3: performing word segmentation, part of speech tagging, character discrimination and other processing by using a text analysis algorithm;
step S3-10-4: matching the controlled voice text with the polyphone dictionary and the merged vocabulary in the step S3-10-1 by adopting a matching algorithm, and counting the polyphone and vocabulary number contained in the text.
The implementation of the content in step S4 includes the following steps:
step S4-1: inputting the number of grades to be divided and a data set ({ x) obtained by each single evaluation index analysis method 1 x 2 x 3 ,…,x n N is the number of data in the data set);
step S4-2: outputting clustering results and all grade ranges;
the method for grouping the data obtained by analyzing the single evaluation index by adopting the clustering method in the step S4 and specifying the range value of each voice grade under the single evaluation index comprises the following steps:
step S4-2-1: since the voice data is classified into classes which are not previously specified (the classes refer to the classes classified based on a certain voice evaluation index and the concept that the quality classes mentioned in the patent are different), the optimal class number k (i.e. the number of the points to be gathered and the set of the centroids { c) is determined by the elbow method or the contour coefficient method for the acquired data set 1 c 2 c 3 ,…c k |c i May or may not be intra-dataset values });
step S4-2-2: initialized particle (class center) value x j∈ {c 1 c 2 c 3 ,…c k Calculating Euclidean distance from each sample point to the center of each class, assigning each sample to the class closest to the sample point to form a clustering result, wherein the Euclidean distance refers to the real distance between two points in the m-dimensional spaceThe distance, or the natural length of the vector, is given by the formula:
Figure DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 44358DEST_PATH_IMAGE003
representative sample points
Figure 116220DEST_PATH_IMAGE004
To particle point
Figure 307161DEST_PATH_IMAGE005
The distance of (a) to (b),
Figure 439065DEST_PATH_IMAGE006
representing the kth property of the ith sample,
Figure 964724DEST_PATH_IMAGE007
the kth attribute representing the jth sample has m-dimensional attributes, and in the invention, each evaluation index is analyzed to obtain a one-dimensional data value, so that m =1, and the calculation formula of the Euclidean distance in the invention is as follows:
Figure DEST_PATH_IMAGE030
step S4-2-3: updating a clustering central point: calculating the mean value of all samples in the cluster as a new cluster center for the cluster result;
step S4-2-4: updating a clustering central point: taking the sum of the distances from the sample to the class center as a target function, and outputting if iteration converges or meets a stopping condition; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;
step S4-2-5: iterative computation is used in the algorithm, the global optimal solution is difficult to achieve, and a heuristic strategy can be adopted to search for Nash equilibrium and the problem optimal solution.
The step of controlling the quality rating of the voice is as follows, and a route diagram of the controlling quality rating technology is shown in fig. 7:
according to the qualification approval rules of civil aircraft drivers, flight instructors and ground instructors (CCAR-61 division), < MH/T4014-2003 radio communication words for air traffic and guidance of people in the civil aviation field, 1-5 grades of controlled voice quality are innovatively proposed, the higher the grade is, the better the voice quality is, the 5 grades are the highest quality, and the judgment standards of all grades are as follows:
1) Level 1: the occupation ratio of the control professional vocabularies is small; too fast or too slow a speech rate; chinese-English mixed pronunciation voice; under the influence of own native language or region, mandarin chinese has a few accents; vocabulary audio (polyphones, homophones) containing misleading semantic understanding; the interference of transmission channels, ambient noise and the like is large;
2) And 2, stage: the occupation rate of the control professional vocabularies is low; too fast or too slow a speech rate; a small amount of gray words and Chinese-English mixed pronunciation exist; affected by own native language or region, mandarin with slight accent; lexical audio with individual misleading semantic understandings;
3) And 3, level: the occupation rate of the professional vocabularies of control is common; the speed of speech is normal; pronunciation of words without gray color; single language voice; speech signals occasionally pause; the audio frequency is interfered to a lesser extent; no accent;
4) 4, level: the control professional vocabulary accounts for a large rate; the speech definition is good; the speed of speech is normal; the voice is fluent; there are individual misleading semantic understanding vocabulary audios;
5) And 5, stage: the ratio of the control professional vocabularies is high; the interference degree is small; the voice is fluent; standard of mandarin pronunciation; lexical audio without misleading semantic understanding.
Step S5-1: the subjective weighting method is to use expert experience to adjust and optimize the weighting value objectively assigned by evaluation indexes, so that the weighting is more scientific and reasonable, thereby realizing the quantitative and visual display of the quality condition of the controlled voice, and the method is to use a 1-9 scale method to carry out quantitative comparison on every two of the importance degrees of each index belonging to the same level relative to the same index of the same level on the same level to form a judgment matrix X, use a maximum eigenvector method to calculate the eigenvector of the corresponding characteristic root of the judgment matrix, and use the eigenvector as the weight of each index when the judgment matrix is checked to meet the consistency;
step S5-2: the objective weighting method comprises the following steps:
step S5-2-1: the indexes are converted into the positive indexes, namely the extremely small indexes and the intermediate indexes are converted into the extremely large indexes:
very small- > very large:
Figure DEST_PATH_IMAGE031
intermediate- > very large:
Figure DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 341741DEST_PATH_IMAGE012
in order to identify the optimal value of the effect, the voice centralized value obtained by the evaluation index method is taken as the optimal value,
Figure 902035DEST_PATH_IMAGE013
is a forward numerical value;
step S5-2-2: data normalization, dimensional error between balance indices:
Figure DEST_PATH_IMAGE034
wherein the content of the first and second substances,
Figure 549048DEST_PATH_IMAGE014
the numerical value of the ith voice under the jth evaluation index is obtained;
step S5-2-3: data normalization, unifying to interval 0-1:
Figure 781578DEST_PATH_IMAGE015
wherein n is the number of evaluation objects;
s5-2-4, calculating information entropy of each evaluation index
Information entropy of each evaluation index:
Figure DEST_PATH_IMAGE035
wherein n is the number of evaluation objects, m is the number of evaluation indexes, and the value of j is taken from 1 to m;
step S5-2-5: calculating the weight:
Figure DEST_PATH_IMAGE036
wherein the value of j is taken from 1 to m;
step S5-3: fusion of subjective and objective weights:
Figure DEST_PATH_IMAGE037
wherein n is the number of evaluation objects,
Figure 398504DEST_PATH_IMAGE023
in order to be the subjective weight, the user can select,
Figure 298458DEST_PATH_IMAGE024
is an objective weight;
step S5-4: the comprehensive score of each voice is as follows:
Figure DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 568902DEST_PATH_IMAGE027
taking the value of i from 1 to n for the normalization of the value of the ith evaluation object under the jth evaluation index;
step S5-5: each controlled speech quality level score range:
the comprehensive score is calculated by the whole standard control voice database according to the evaluation method, the comprehensive score sequence of the whole database is divided according to 5 grades, the interval range of each grade is the score range of each quality grade, the interval range is 0 to 1, all evaluation indexes are subjected to forward processing, and therefore the quality is better when the comprehensive score value is larger, and the quality of 5 grades is optimal.
A quantitative evaluation and classification device for voice quality control division adopts a Core i7-12700 processor, a memory adopts a Samsung 980 PRO 1T solid state disk and 4 NVIDIA P40 GPUs to accelerate the processing speed of relevant steps.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (7)

1. A quantitative evaluation and classification method for controlling voice quality division is characterized by comprising the following steps:
s1, inputting voice data of a standard control voice database marked with correct meanings;
s2, considering the characteristics of civil aviation land-air communication, constructing an evaluation index system for controlling voice quality division;
s3, qualitatively analyzing each evaluation index, including a technical analysis method and an index grading quantization unit;
s4, grouping data obtained by analyzing the single evaluation index by adopting a clustering method, and specifying the range value of each voice grade under the single evaluation index;
and S5, weighting each evaluation index by adopting a weighted fusion algorithm to combine a plurality of control voice data sets with the grade quality.
2. The quantitative evaluation and classification method for controlling speech quality division according to claim 1, wherein in step S2, the evaluation index system for controlling speech quality division includes an ultra-large index, an intermediate index, an ultra-small index, and a specified index, the ultra-large index has a larger value, and a better speech recognition effect including accents, the intermediate index has a value closer to a certain intermediate value, and the speech recognition effect including speech speed, pitch, and pitch is better, the ultra-small index has a smaller value, and the speech recognition effect including continuity, interference degree, professional term proportion, gray vocabulary content, and pitch change is better, and the specified index has a value, and the speech recognition effect includes a language category.
3. The method for quantitatively evaluating and classifying controlled speech quality division according to claim 1, wherein the step S4 of grouping the data analyzed by the single evaluation index by using the clustering method to specify the range value of each speech level under the single evaluation index comprises the following steps:
step S4-1: inputting the number of grades to be divided and a data set obtained by each single evaluation index analysis method;
step S4-2: and outputting the clustering result and each grade range.
4. The quantitative evaluation and classification method for controlling speech quality classification according to claim 3, wherein the method for outputting the clustering result in step S4-2 comprises the following steps:
step S4-2-1: determining the optimal category number by adopting an elbow method or a contour coefficient method;
step S4-2-2: initializing a class center value, calculating Euclidean distance from each sample point to each class center, assigning each sample to the class closest to the sample point to form a clustering result, wherein the Euclidean distance refers to the real distance between two points in an m-dimensional space or the natural length of a vector, and the formula is as follows:
Figure 526795DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,
Figure 850260DEST_PATH_IMAGE003
representative sample points
Figure 181884DEST_PATH_IMAGE004
To particle point
Figure 842803DEST_PATH_IMAGE005
The distance of (a) to (b),
Figure 908979DEST_PATH_IMAGE006
representing the kth property of the ith sample,
Figure 743074DEST_PATH_IMAGE007
representing the kth attribute of the jth sample, and sharing m-dimensional attributes;
step S4-2-3: calculating the mean value of all samples in each category of the clustering result as a new clustering center;
step S4-2-4: taking the sum of the distances from the sample to the class center as a target function, and outputting if iteration converges or meets a stopping condition; otherwise, the number of the categories is +1, and the step S4-2-2 is returned to for repeated calculation;
step S4-2-5: the algorithm uses iterative computation, the global optimal solution is difficult to achieve, a heuristic strategy is adopted for the solution, and Nash equilibrium is utilized to achieve the optimal solution of the problem.
5. The method as claimed in claim 1, wherein the classification of the controlled speech quality is classified into 1-5 grades, and the higher the grade is, the better the speech quality is.
6. The method as claimed in claim 2, wherein the weighting fusion algorithm used in step S5 includes a subjective weighting method and an objective weighting method, and the method includes the following steps:
step S5-1: the subjective weighting method is to use expert experience to adjust and optimize the weighting value objectively given by the evaluation index, and is to use a 1-9 scale method to carry out quantitative comparison on the importance degree of each index belonging to one level relative to the same index of the previous level to form a judgment matrix X, use a maximum eigenvector method to calculate the eigenvector of the corresponding characteristic root of the judgment matrix, and use the eigenvector as the weight of each index when the judgment matrix is checked to meet the consistency;
step S5-2: the objective weighting method comprises the following steps:
step S5-2-1: the indexes are converted into the positive indexes, namely the extremely small indexes and the intermediate indexes are converted into the extremely large indexes:
very small- > very large:
Figure 414358DEST_PATH_IMAGE009
intermediate- > very large:
Figure 370813DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 2914DEST_PATH_IMAGE012
in order to identify the optimal value of the effect, the voice centralized value obtained by the evaluation index method is taken as the optimal value,
Figure 375120DEST_PATH_IMAGE013
is a forward numerical value;
step S5-2-2: data normalization, dimensional error between balance indices:
wherein the content of the first and second substances,
Figure 150178DEST_PATH_IMAGE014
the value of the ith voice under the jth evaluation index is obtained;
step S5-2-3: data normalization, unifying to interval 0-1:
Figure 497108DEST_PATH_IMAGE016
wherein n is the number of evaluation objects;
s5-2-4, calculating information entropy of each evaluation index
Information entropy of each evaluation index:
Figure 803456DEST_PATH_IMAGE018
wherein n is the number of the evaluation objects, m is the number of the evaluation indexes, and the value of j is taken from 1 to m;
step S5-2-5: calculating the weight:
Figure 244933DEST_PATH_IMAGE020
wherein the value of j is taken from 1 to m;
step S5-3: fusion of subjective and objective weights:
Figure 953126DEST_PATH_IMAGE022
wherein n is the number of evaluation objects,
Figure DEST_PATH_IMAGE023
in order to be the subjective weight, the user can select,
Figure 533274DEST_PATH_IMAGE024
is an objective weight;
step S5-4: each voice is given a composite score:
Figure 49619DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE027
taking the value of i from 1 to n for the normalization of the value of the ith evaluation object under the jth evaluation index;
step S5-5: each controlled speech quality level score range:
the comprehensive score is calculated by the whole standard control voice database according to the evaluation method, the comprehensive score sequence of the whole database is divided according to 5 grades, the interval range of each grade is the score range of each quality grade, the interval range is 0 to 1, all evaluation indexes are subjected to forward processing, and therefore the quality is better when the comprehensive score value is larger, and the quality of 5 grades is optimal.
7. The device for quantitatively evaluating and classifying the quality division of the control voice is characterized by comprising at least one processor and a memory which is in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the classification method of claim 1.
CN202211469949.7A 2022-11-22 2022-11-22 Quantitative evaluation and classification method and device for quality division of control voice Active CN115547299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211469949.7A CN115547299B (en) 2022-11-22 2022-11-22 Quantitative evaluation and classification method and device for quality division of control voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211469949.7A CN115547299B (en) 2022-11-22 2022-11-22 Quantitative evaluation and classification method and device for quality division of control voice

Publications (2)

Publication Number Publication Date
CN115547299A true CN115547299A (en) 2022-12-30
CN115547299B CN115547299B (en) 2023-08-01

Family

ID=84721576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211469949.7A Active CN115547299B (en) 2022-11-22 2022-11-22 Quantitative evaluation and classification method and device for quality division of control voice

Country Status (1)

Country Link
CN (1) CN115547299B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092482A (en) * 2023-04-12 2023-05-09 中国民用航空飞行学院 Real-time control voice quality metering method and system based on self-attention

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005164870A (en) * 2003-12-02 2005-06-23 Nippon Telegr & Teleph Corp <Ntt> Objective evaluation apparatus for speech quality taking band limitation into consideration
CN104835354A (en) * 2015-05-20 2015-08-12 青岛民航空管实业发展有限公司 Control load management system and controller workload evaluation method
CN105679309A (en) * 2014-11-21 2016-06-15 科大讯飞股份有限公司 Method and device for optimizing speech recognition system
CN107564534A (en) * 2017-08-21 2018-01-09 腾讯音乐娱乐(深圳)有限公司 Audio quality authentication method and device
CN108877839A (en) * 2018-08-02 2018-11-23 南京华苏科技有限公司 The method and system of perceptual evaluation of speech quality based on voice semantics recognition technology
CN110490428A (en) * 2019-07-26 2019-11-22 合肥讯飞数码科技有限公司 Job of air traffic control method for evaluating quality and relevant apparatus
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training
CN112466335A (en) * 2020-11-04 2021-03-09 吉林体育学院 English pronunciation quality evaluation method based on accent prominence
CN112967711A (en) * 2021-02-02 2021-06-15 早道(大连)教育科技有限公司 Spoken language pronunciation evaluation method, spoken language pronunciation evaluation system and storage medium for small languages
CN113779798A (en) * 2021-09-14 2021-12-10 国网江苏省电力有限公司电力科学研究院 Electric energy quality data processing method and device based on intuitive fuzzy combination empowerment
CN113792982A (en) * 2021-08-19 2021-12-14 北京邮电大学 Scientific and technological service quality assessment method and device based on combined weighting and fuzzy gray clustering
EP4086903A1 (en) * 2021-05-04 2022-11-09 GN Audio A/S System with post-conversation evaluation, electronic device, and related methods

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005164870A (en) * 2003-12-02 2005-06-23 Nippon Telegr & Teleph Corp <Ntt> Objective evaluation apparatus for speech quality taking band limitation into consideration
CN105679309A (en) * 2014-11-21 2016-06-15 科大讯飞股份有限公司 Method and device for optimizing speech recognition system
CN104835354A (en) * 2015-05-20 2015-08-12 青岛民航空管实业发展有限公司 Control load management system and controller workload evaluation method
CN107564534A (en) * 2017-08-21 2018-01-09 腾讯音乐娱乐(深圳)有限公司 Audio quality authentication method and device
CN108877839A (en) * 2018-08-02 2018-11-23 南京华苏科技有限公司 The method and system of perceptual evaluation of speech quality based on voice semantics recognition technology
CN110490428A (en) * 2019-07-26 2019-11-22 合肥讯飞数码科技有限公司 Job of air traffic control method for evaluating quality and relevant apparatus
CN112466335A (en) * 2020-11-04 2021-03-09 吉林体育学院 English pronunciation quality evaluation method based on accent prominence
CN112435512A (en) * 2020-11-12 2021-03-02 郑州大学 Voice behavior assessment and evaluation method for rail transit simulation training
CN112967711A (en) * 2021-02-02 2021-06-15 早道(大连)教育科技有限公司 Spoken language pronunciation evaluation method, spoken language pronunciation evaluation system and storage medium for small languages
EP4086903A1 (en) * 2021-05-04 2022-11-09 GN Audio A/S System with post-conversation evaluation, electronic device, and related methods
CN113792982A (en) * 2021-08-19 2021-12-14 北京邮电大学 Scientific and technological service quality assessment method and device based on combined weighting and fuzzy gray clustering
CN113779798A (en) * 2021-09-14 2021-12-10 国网江苏省电力有限公司电力科学研究院 Electric energy quality data processing method and device based on intuitive fuzzy combination empowerment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
柳震: "基于管制员语音反应时的疲劳风险定量评价模型", 《科技与创新》 *
柳震: "基于管制员语音反应时的疲劳风险定量评价模型", 《科技与创新》, no. 08, 23 March 2018 (2018-03-23) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092482A (en) * 2023-04-12 2023-05-09 中国民用航空飞行学院 Real-time control voice quality metering method and system based on self-attention

Also Published As

Publication number Publication date
CN115547299B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
JP3162994B2 (en) Method for recognizing speech words and system for recognizing speech words
US20190266998A1 (en) Speech recognition method and device, computer device and storage medium
TWI395201B (en) Method and system for identifying emotional voices
CN111640418A (en) Prosodic phrase identification method and device and electronic equipment
Swain et al. Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition
JP2007171944A (en) Method and apparatus for automatic text-independent grading of pronunciation for language instruction
CN113539240A (en) Animation generation method and device, electronic equipment and storage medium
CN112349289A (en) Voice recognition method, device, equipment and storage medium
CN112885336A (en) Training and recognition method and device of voice recognition system, and electronic equipment
CN112397054A (en) Power dispatching voice recognition method
CN115547299B (en) Quantitative evaluation and classification method and device for quality division of control voice
CN117711444B (en) Interaction method, device, equipment and storage medium based on talent expression
Gupta et al. A study on speech recognition system: a literature review
Wang Detecting pronunciation errors in spoken English tests based on multifeature fusion algorithm
Mathur et al. A study of machine learning algorithms in speech recognition and language identification system
Barczewska et al. Detection of disfluencies in speech signal
Rao et al. Language identification—a brief review
CN116564281B (en) Emotion recognition method and device based on AI
Hoseini Persian speech emotion recognition approach based on multilayer perceptron
Hlaing et al. Word Representations for Neural Network Based Myanmar Text-to-Speech S.
JP5066668B2 (en) Speech recognition apparatus and program
Marie-Sainte et al. A new system for Arabic recitation using speech recognition and Jaro Winkler algorithm
Mengistu et al. Text independent amharic language dialect recognition using neuro-fuzzy gaussian membership function
Mary et al. Modeling and fusion of prosody for speaker, language, emotion, and speech recognition
Wang et al. Automatic tonal and non-tonal language classification and language identification using prosodic information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant