CN111326169A - Voice quality evaluation method and device - Google Patents

Voice quality evaluation method and device Download PDF

Info

Publication number
CN111326169A
CN111326169A CN201811544623.XA CN201811544623A CN111326169A CN 111326169 A CN111326169 A CN 111326169A CN 201811544623 A CN201811544623 A CN 201811544623A CN 111326169 A CN111326169 A CN 111326169A
Authority
CN
China
Prior art keywords
voice
evaluated
signal
speech
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811544623.XA
Other languages
Chinese (zh)
Other versions
CN111326169B (en
Inventor
梁立涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Beijing Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201811544623.XA priority Critical patent/CN111326169B/en
Publication of CN111326169A publication Critical patent/CN111326169A/en
Application granted granted Critical
Publication of CN111326169B publication Critical patent/CN111326169B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Abstract

The invention discloses a method and a device for evaluating voice quality, which are used for acquiring a voice signal to be evaluated, comparing the voice signal to be evaluated with a stored voice signal, updating a built-in voice quality evaluation model when the difference between the voice signal to be evaluated and the stored voice signal is larger to obtain a new voice quality evaluation model, evaluating the voice signal to be evaluated by utilizing the new voice quality evaluation model, and continuously updating the voice quality evaluation model by continuously learning the voice signal so as to improve the accuracy of voice evaluation.

Description

Voice quality evaluation method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for evaluating voice quality.
Background
Voice services based on the internet have become one of the important services of the network, and are the fields in which the service providers focus on, and voice quality is an important factor for evaluating the quality of the communication network.
At present, a speech evaluation method generally uses a fixed speech quality evaluation model to evaluate speech quality, and the specific method is as follows: the characteristic parameters of the voice signal are extracted, a voice quality evaluation model is obtained through training based on the extracted characteristic parameters, and the voice signal is evaluated through the voice quality evaluation model.
Disclosure of Invention
The invention aims to provide a method and a device for evaluating voice quality so as to improve the accuracy of voice evaluation.
The purpose of the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for evaluating speech quality, including:
acquiring a voice signal to be evaluated, and determining identification information of the voice signal to be evaluated;
if the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, taking the voice signal to be evaluated as a new voice signal, and updating the first voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value to obtain a second voice quality evaluation model;
wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated;
and evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.
Optionally, updating the first speech quality evaluation model to obtain a second speech quality evaluation model, including:
acquiring the characteristic parameters of the new voice signal;
training the characteristic parameters by using a decision tree algorithm, and updating the first voice quality evaluation model to obtain a second voice quality evaluation model;
the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
Optionally, after acquiring the voice signal to be evaluated, the method further includes:
performing at least one of the following preprocessing on the voice signal to be evaluated: voice data validity detection, voice data normalization processing and default value difference fitting.
Optionally, after acquiring the voice signal to be evaluated, the method further includes:
evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated;
classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of the different interval levels is used for characterizing the speech quality of different classes.
Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, and the method includes:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
Optionally, the step of determining that the speech quality of the speech signal to be evaluated is different from the speech quality of the stored speech signal includes:
the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or
And the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals with different interval grades.
In a second aspect, the present invention provides an apparatus for evaluating speech quality, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a voice signal to be evaluated;
the determining unit is used for determining the identification information of the voice signal to be evaluated, and taking the voice signal to be evaluated as a new voice signal when the voice quality of the voice signal to be evaluated is determined to be different from the voice quality of the stored voice signal;
the updating unit is used for updating the first voice quality evaluation model to obtain a second voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value;
wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated;
and the evaluation unit is used for evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.
Optionally, the updating unit is specifically configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model as follows:
acquiring the characteristic parameters of the new voice signal;
training the characteristic parameters by using a decision tree algorithm, and updating the first voice quality evaluation model to obtain a second voice quality evaluation model;
the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
Optionally, the apparatus further comprises a processing unit configured to:
performing at least one of the following preprocessing on the voice signal to be evaluated: voice data validity detection, voice data normalization processing and default value difference fitting.
Optionally, the evaluation unit is further configured to:
evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated;
the processing unit is further to:
classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of the different interval levels is used for characterizing the speech quality of different classes.
Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, and the method includes:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
Optionally, the step of determining that the speech quality of the speech signal to be evaluated is different from the speech quality of the stored speech signal includes:
the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or
And the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals with different interval grades.
In a third aspect, the present invention further provides an apparatus for evaluating speech quality, including:
a memory for storing program instructions;
a processor for calling the program instructions stored in the memory and executing the method of the first aspect according to the obtained program.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform the method of the first aspect.
The invention provides a method and a device for evaluating voice quality, which are used for acquiring a voice signal to be evaluated, comparing the voice signal to be evaluated with a stored voice signal, updating a built-in voice quality evaluation model when the difference between the voice signal to be evaluated and the stored voice signal is larger to obtain a new voice quality evaluation model, evaluating the voice signal to be evaluated by using the new voice quality evaluation model, and continuously updating the voice quality evaluation model by continuously learning the voice signal so as to improve the voice evaluation accuracy.
Drawings
Fig. 1 is a flowchart of a method for evaluating speech quality according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a decision tree training classification according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of another decision tree training scheme provided in an embodiment of the present application;
fig. 4 is a flowchart of a method for updating an evaluation model of speech quality according to an embodiment of the present application;
fig. 5 is a flowchart of another speech quality evaluation method provided in the embodiment of the present application;
fig. 6 is a block diagram of a speech quality evaluation apparatus according to an embodiment of the present application;
fig. 7 is a schematic diagram of another speech quality evaluation apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the commonly used speech quality evaluation method is as follows: the method comprises the steps of extracting the parameter characteristics of a voice signal or acquiring other characteristic parameters related to voice quality, such as network delay, packet loss, jitter and the like, and then carrying out modeling analysis on the characteristic parameters to obtain objective voice quality evaluation.
Generally, a fixed algorithm can be used for modeling a fixed evaluation scene, for example, a subjective speech quality assessment (PESQ) algorithm for narrowband speech signals, an Objective Perceptual speech quality assessment (POLQA) algorithm for ultra-wideband speech evaluation, and the like, and speech quality evaluation models established by using the algorithm are trained linear regression models and have a specific mapping method, and finally, the Objective speech quality assessment and the crowd actual Perceptual quality are mapped to obtain a score of speech quality.
The existing method is suitable for scenes with little change of voice environment, and because parameters used in model training are limited, if parameters related to voice quality are more in scenes with large change of voice environment, such as trains, and may not be limited to the parameters used in the fixed voice quality evaluation model training, the voice quality evaluation accuracy may be lower by using the fixed voice quality evaluation model.
In view of this, embodiments of the present application provide a method and an apparatus for evaluating voice quality, in which a built-in evaluation model is continuously updated based on an input voice signal by continuously acquiring the voice signal, and evaluates the input voice signal to output a voice quality score, thereby improving accuracy of voice quality evaluation.
It is to be understood that the terms first, second, etc. used herein are used for descriptive purposes only and not for purposes of indicating or implying relative importance, nor order.
The embodiment of the application is not restricted by environmental factors and can be applied to various evaluation environments including a large-change environment and a stable environment.
Second, the application scenarios of the embodiments of the present application include, but are not limited to, a conventional 2rd-Generation (2G)/3 rd-Generation (3G) call, the 4th Generation mobile communication technology (4G) call, an 2/3/4G hybrid scenario, and the like.
Fig. 1 is a flowchart of a method for evaluating speech quality according to an embodiment of the present application, and referring to fig. 1, the method includes:
s101: and acquiring the voice signal to be evaluated, and determining the identification information of the voice signal to be evaluated.
S102: and if the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, taking the voice signal to be evaluated as a new voice signal.
It is understood that the "new speech signal" in the embodiment of the present application means that the speech signal to be evaluated is different from the speech signal received before the speech signal to be evaluated is received, and the speech signal to be evaluated can be marked as a new speech signal.
Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, that is, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal.
Wherein the speech signal already stored in the built-in speech quality evaluation model is a speech signal acquired before the speech signal to be evaluated.
In the embodiment of the present application, surrounding voice signals are continuously acquired, and therefore, for a voice signal to be evaluated, a voice signal acquired before the voice signal to be evaluated can be used as a reference signal of the voice signal to be evaluated.
It should be noted that the voice quality may include, but is not limited to, for example, a voice quality evaluation score and a voice quality level of the voice signal.
S103: and when the number of the new voice signals is larger than a first preset threshold value, updating the first voice quality evaluation model to obtain a second voice quality evaluation model.
For convenience of description, in the embodiment of the present application, the "built-in speech quality evaluation model" may be referred to as a "first speech quality evaluation model", and the "speech quality evaluation model after updating the built-in speech quality evaluation model" may be referred to as a "second speech quality evaluation model".
Specifically, in the embodiment of the present application, when the difference between the speech signal to be evaluated and the old speech signal is large, the speech signal to be evaluated is used as a new sample, and when the number of the new samples reaches a preset threshold, for example, may be a first preset threshold, the built-in speech quality evaluation model is updated, so as to obtain a second speech quality evaluation model.
It should be noted that, in the present application, the "new speech signal" and the "new sample" and the "old speech signal" and the "stored speech signal" are sometimes mixed, and those skilled in the art should understand that the meanings are consistent.
S104: and evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.
Specifically, when the speech signal to be evaluated is a new sample, the speech signal to be evaluated can be evaluated by using the updated second speech quality evaluation model, so that an accurate speech quality evaluation result is obtained.
In the embodiment of the application, the voice data is continuously acquired through the change of the external voice environment, the accuracy of the voice evaluation model is ensured by using the continuously updated data set, and the precision of the model can be improved.
In a possible implementation manner, updating the first speech quality evaluation model to obtain the second speech quality evaluation model may include:
and acquiring the characteristic parameters of the new voice signal, training the characteristic parameters by using a decision tree algorithm, updating the first voice quality evaluation model, and acquiring a second voice quality evaluation model.
Specifically, in the embodiment of the present application, a certain number of feature parameters of new speech signals (the number of the new speech signals is greater than a first preset threshold) may be extracted, or other feature parameters related to speech quality may be obtained, and then a new speech quality evaluation model may be obtained through training according to the feature parameters.
It is understood that other characteristic parameters related to voice quality include, but are not limited to, network latency, packet loss, jitter, etc.
The above method for obtaining the model by using the feature parameter training is similar to the existing scheme, and will not be described in detail herein.
In another possible implementation manner, in the embodiment of the present application, a new speech signal and an old speech signal may be fused, then feature parameters of the new speech signal and the old speech signal are extracted, or other feature parameters related to speech quality are obtained, and finally a new speech quality evaluation model is obtained according to the feature parameters.
It should be noted that the old speech signal is the speech signal received before the speech signal to be evaluated.
Specifically, the characteristic parameters in the embodiment of the present application include at least one of the following parameters: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
Because the speech quality evaluation parameters are more, some parameters with higher weight values are usually selected as characteristic parameters during training.
In this embodiment of the present application, the speech quality evaluation parameter may further include: average speech signal interference value, global background noise, speech interruption time, level dip, silence length, pitch period, mechanization, correlation between rear cavity and middle cavity, correlation of continuous frames, average power of continuous frames, energy sum of repeated frames, number of frames of unnatural beep, sample average energy of unnatural beep, sample proportion of unnatural beep, cepstrum standard deviation absolute value, cepstrum kurtosis coefficient, kurtosis coefficient of linear prediction coefficient, absolute value of skewness coefficient of linear prediction coefficient, fixed noise weighting, spectral clarity, average energy level of samples of background noise, average energy of samples of background noise, signal-to-noise ratio of multiplicative noise, total energy of unnatural silence frames, and the like.
Further, after acquiring the voice signal to be evaluated, the method further includes:
the speech signal to be evaluated is preprocessed by at least one of the following steps: voice data validity detection, voice data normalization processing and default value difference fitting.
Specifically, because a large amount of incomplete, inconsistent and abnormal data exists in the original speech signal, the execution efficiency of the later modeling is seriously affected, and even deviation of the model result may be caused. In addition, the value of the data itself also affects the results of the model, so the original speech signal can be data cleaned first. It is often necessary to handle data misses, exceptions, redundancies, and size scaling.
The data processing method mainly comprises data validity detection, data normalization, default value interpolation fitting and the like, but is not limited to the above methods.
Further, after acquiring the speech signal to be evaluated, the method further includes:
evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated; and classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades.
And the voice quality of different interval grades is used for representing different classes of voice quality.
There are many options for the classification algorithm in the speech evaluation model, for example, GBDT (gradient boosting Decision Tree) algorithm can be used.
Specifically, in the embodiment of the present application, a decision tree algorithm may be used to classify the quality of the speech signal, as shown in fig. 2.
In fig. 2, the feature labels (1), (2), etc. are used to represent the identification information of the feature parameters of the speech signal, and the decision tree algorithm can be regarded as a prediction model, and can also be understood as a classification tree. The classification of speech quality can be mapped using decision trees in the present application.
The classification of speech quality is mapped by the decision tree and the decision tree can be iterated several times to form a progressively improved combined tree to optimize the mapping performance, for example, fig. 3, in fig. 3, the learner can score the prediction of speech signal to obtain the predicted speech quality.
The parameters in fig. 3 represent: theta represents the weight and phi represents the mapping function of different learners.
It should be noted that fig. 2 and fig. 3 are only exemplary illustrations, and the specific form and content thereof are not limited to the form and content shown in the drawings. For example, the set of scores for speech quality is not limited to being classified by 0-5.
It is to be understood that the decision tree may be obtained by a method such as machine learning, and the embodiment of the present application is not limited thereto.
As can be seen from the boosting algorithm in FIG. 3, the final prediction scoring result of the speech signal is a combination of the b learner speech quality results:
Figure BDA0001909044380000101
it will be understood that in the above formula
Figure BDA0001909044380000102
Corresponding to phi in the figure.
The formula is optimized in a function space to obtain:
Figure BDA0001909044380000103
where ρ represents a learning rate.
The training value for one speech sample at a time can be obtained according to the formula as follows:
Figure BDA0001909044380000104
from the above formula, it can be seen that: the speech quality scores may correspond to different speech quality score intervals, e.g., [0, 1], [1, 2], etc., and the different speech quality score intervals may correspond to different speech classes.
Preferably, the identification information of the voice signal to be evaluated is different from the identification information of the voice signal already stored, and may include:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, and the method may include:
the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored are the voice signal of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored is smaller than a set threshold (for example, may be a second preset threshold), or the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored are the voice signals of different interval grades.
Optionally, in this embodiment of the application, a built-in speech quality evaluation model (a first speech quality evaluation model) may be used to evaluate a speech signal to be evaluated first, so as to determine whether the speech signal to be evaluated is a new speech signal.
Specifically, in the embodiment of the present application, the speech data is obtained from the outside, the speech quality of the newly obtained speech data is classified and scored by using a built-in evaluation model, and then whether the newly obtained speech signal belongs to a new sample is determined. If the voice data of different classifications is not greatly different, or the scores of the voice data of the same classification are too different from the scores of the old voice data, the voice data can be used as a new sample.
Specifically, the determining, by using the characteristic parameter, that the identification information of the speech signal to be evaluated is different from the identification information of the stored speech signal may include, but is not limited to, the following methods:
(1) and detecting based on unitary normal distribution:
the original data set is xi,1,xi,2,xi,3,…,xi,nI ∈ (1, …, m), containing m samples, n-dimensional features, the mean and method for each feature dimension can be calculated:
Figure BDA0001909044380000111
Figure BDA0001909044380000112
for new data, the probability can be calculated as:
Figure BDA0001909044380000121
the difference of the feature distribution of the new data and the old data can be judged according to the probability.
(2) Detecting based on multivariate Gaussian distribution:
the raw data set is
Figure BDA0001909044380000122
For a total of n-dimensional feature vectors, a covariance matrix of n x n and n-dimensional feature mean vectors can be calculated:
Figure BDA0001909044380000123
Σ=[Cov(xi,xj)],i,j∈(1,…,n)
for new data, the probability can be calculated as:
Figure BDA0001909044380000124
the difference in the distribution of the characteristics of the new data and the old data can be judged according to the probability, wherein T in the formula represents the transposition of the matrix.
(3) And detecting based on the Mahalanobis distance:
for a multidimensional data set, a is a mean vector, and the mahalanobis distance from new data a to a is:
Figure BDA0001909044380000125
wherein T represents the transpose of the matrix, S is the covariance matrix, and if the value of S is too large, the feature distribution is considered to be different.
(4) And detecting based on the feature importance:
using a tree-based model, such as GBDT or the like, a ranking of importance of features can be derived.
The global importance of feature j is measured by the average of the importance of feature j in a single tree:
Figure BDA0001909044380000126
where M is the number of trees.
The importance of feature j in a single tree is as follows:
Figure BDA0001909044380000131
wherein, L is the leaf node number of the tree, and L-1 is the non-leaf node number of the tree. V istIs a feature associated with the node t,
Figure BDA0001909044380000132
is node t is split and then is flatThe square penalty reduction value, J represents the feature set, and T represents the set of trees.
And regarding the first k characteristics of the new sample training, if the characteristics are different from the characteristics of the original data set, the distribution is different from the distribution of the original data set.
In this embodiment, in a possible implementation manner, the speech data may be incrementally learned through a flowchart of the method shown in fig. 4, so as to update the built-in speech quality evaluation model, which is shown in fig. 4.
It is to be understood that the normal score in fig. 4 is a score by a built-in speech quality evaluation model.
As for the whole method flow in the embodiment of the present application, the method flow shown in fig. 5 may be participated, in the method, an external voice signal is obtained and preprocessed, then, the voice signal quality is classified by using a decision tree algorithm, so as to obtain the quality score of the voice signal, and then, whether the voice sample data meets the new sample characteristics is determined, when the voice signal is a new sample, after collecting a certain number of new samples, the built-in voice quality evaluation model is updated, and the standard score is performed by using the updated voice quality evaluation model.
Based on the same concept as the above one of the method embodiments, an embodiment of the present invention further provides a block diagram of a speech quality evaluation apparatus, as shown in fig. 6, where the apparatus includes: an acquisition unit 101, a determination unit 102, an update unit 103, and an evaluation unit 104.
The obtaining unit 101 is configured to obtain a speech signal to be evaluated.
A determining unit 102, configured to determine the identification information of the voice signal to be evaluated acquired by the acquiring unit 101, and when it is determined that the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, take the voice signal to be evaluated as a new voice signal.
And the updating unit 103 is configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model when the number of the new speech signals determined by the determining unit 102 is greater than a first preset threshold.
Wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated.
And the evaluation unit 104 is configured to evaluate the speech signal to be evaluated by using the second speech quality evaluation model obtained by the updating unit 103.
Specifically, the updating unit 103 is specifically configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model as follows:
acquiring the characteristic parameters of a new voice signal; and training the characteristic parameters by using a decision tree algorithm, updating the first voice quality evaluation model, and obtaining a second voice quality evaluation model.
Wherein the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
Correspondingly, the device further comprises: the processing unit 105 is configured to:
the speech signal to be evaluated is subjected to at least one of the following preprocessing: voice data validity detection, voice data normalization processing and default value difference fitting.
Still further, the evaluation unit 104 is further configured to:
and evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated.
The processing unit 105 is further configured to:
classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of different interval classes is used to characterize the speech quality of different classes.
Optionally, the identification information of the speech signal to be evaluated is different from the identification information of the stored speech signal, and includes:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
Further, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal already stored, including:
the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or the voice quality of the voice signal to be evaluated and the voice quality of the voice signal stored in the memory are voice signals with different interval grades.
It should be noted that, for the implementation of the functions of each unit in the above-mentioned speech quality evaluation apparatus according to the embodiment of the present invention, reference may be further made to the description of the related method embodiment, which is not described herein again.
An embodiment of the present application further provides another apparatus for evaluating voice quality, as shown in fig. 7, the apparatus includes:
a memory 202 for storing program instructions.
The transceiver 201 is used for receiving and transmitting the evaluation instruction of the voice quality.
And the processor 200 is used for calling the program instructions stored in the memory, and executing the method executed by the processing unit (102), the determining unit (103), the updating unit (104) and the evaluating unit (105) shown in fig. 6 according to the obtained program according to the instructions received by the transceiver 201.
Where in fig. 7 the bus architecture may include any number of interconnected buses and bridges, with various circuits of one or more processors, represented by processor 200, and memory, represented by memory 202, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface.
The transceiver 201 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium.
The processor 200 is responsible for managing the bus architecture and general processing, and the memory 202 may store data used by the processor 200 in performing operations.
The processor 200 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Complex Programmable Logic Device (CPLD).
Embodiments of the present application also provide a computer storage medium for storing computer program instructions for any apparatus described in the embodiments of the present application, which includes a program for executing any method provided in the embodiments of the present application.
The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (14)

1. A method for evaluating voice quality, comprising:
acquiring a voice signal to be evaluated, and determining identification information of the voice signal to be evaluated;
if the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, taking the voice signal to be evaluated as a new voice signal, and updating the first voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value to obtain a second voice quality evaluation model;
wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated;
and evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.
2. The method of claim 1, wherein updating the first speech quality assessment model to obtain a second speech quality assessment model comprises:
acquiring the characteristic parameters of the new voice signal;
training the characteristic parameters by using a decision tree algorithm, and updating the first voice quality evaluation model to obtain a second voice quality evaluation model;
the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
3. The method of claim 1, wherein after acquiring the speech signal to be evaluated, the method further comprises:
performing at least one of the following preprocessing on the voice signal to be evaluated: voice data validity detection, voice data normalization processing and default value difference fitting.
4. The method of claim 1, wherein after acquiring the speech signal to be evaluated, the method further comprises:
evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated;
classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of the different interval levels is used for characterizing the speech quality of different classes.
5. The method of claim 1, wherein the identification information of the speech signal to be evaluated is different from the identification information of the speech signal already stored, comprising:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
6. The method of claim 5, wherein the speech quality of the speech signal to be evaluated is different from the speech quality of the speech signal already stored, comprising:
the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or
And the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals with different interval grades.
7. An apparatus for evaluating speech quality, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a voice signal to be evaluated;
the determining unit is used for determining the identification information of the voice signal to be evaluated, and taking the voice signal to be evaluated as a new voice signal when the voice quality of the voice signal to be evaluated is determined to be different from the voice quality of the stored voice signal;
the updating unit is used for updating the first voice quality evaluation model to obtain a second voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value;
the stored voice signal is a voice signal which is evaluated by the first voice quality evaluation model before the voice signal to be evaluated;
and the evaluation unit is used for evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.
8. The apparatus according to claim 7, wherein the updating unit is specifically configured to update the first speech quality assessment model to obtain a second speech quality assessment model as follows:
acquiring the characteristic parameters of the new voice signal;
training the characteristic parameters by using a decision tree algorithm, and updating the first voice quality evaluation model to obtain a second voice quality evaluation model;
the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.
9. The apparatus of claim 7, further comprising a processing unit to:
performing at least one of the following preprocessing on the voice signal to be evaluated: voice data validity detection, voice data normalization processing and default value difference fitting.
10. The apparatus of claim 7, wherein the evaluation unit is further to:
evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated;
the processing unit is further to:
classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of the different interval levels is used for characterizing the speech quality of different classes.
11. The apparatus of claim 7, wherein the identification information of the speech signal to be evaluated is different from the identification information of the speech signal already stored, comprising:
the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.
12. The apparatus of claim 11, wherein the speech quality of the speech signal to be evaluated is different from the speech quality of the speech signal already stored, comprising:
the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or
And the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals with different interval grades.
13. An apparatus for evaluating speech quality, comprising:
a memory for storing program instructions;
a processor for calling the program instructions stored in the memory and executing the method of any one of claims 1 to 6 according to the obtained program.
14. A computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-6.
CN201811544623.XA 2018-12-17 2018-12-17 Voice quality evaluation method and device Active CN111326169B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811544623.XA CN111326169B (en) 2018-12-17 2018-12-17 Voice quality evaluation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811544623.XA CN111326169B (en) 2018-12-17 2018-12-17 Voice quality evaluation method and device

Publications (2)

Publication Number Publication Date
CN111326169A true CN111326169A (en) 2020-06-23
CN111326169B CN111326169B (en) 2023-11-10

Family

ID=71172436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811544623.XA Active CN111326169B (en) 2018-12-17 2018-12-17 Voice quality evaluation method and device

Country Status (1)

Country Link
CN (1) CN111326169B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816207A (en) * 2020-08-31 2020-10-23 广州汽车集团股份有限公司 Sound analysis method, sound analysis system, automobile and storage medium
CN112634946A (en) * 2020-12-25 2021-04-09 深圳市博瑞得科技有限公司 Voice quality classification prediction method, computer equipment and storage medium
CN112632841A (en) * 2020-12-22 2021-04-09 交通运输部科学研究院 Road surface long-term performance prediction method and device
CN112885377A (en) * 2021-02-26 2021-06-01 平安普惠企业管理有限公司 Voice quality evaluation method and device, computer equipment and storage medium
CN113393863A (en) * 2021-06-10 2021-09-14 北京字跳网络技术有限公司 Voice evaluation method, device and equipment
CN113838168A (en) * 2021-10-13 2021-12-24 亿览在线网络技术(北京)有限公司 Method for generating particle special effect animation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
US20120059650A1 (en) * 2009-04-17 2012-03-08 France Telecom Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal
US20140032212A1 (en) * 2011-04-11 2014-01-30 Orange Evaluation of the voice quality of a coded speech signal
WO2017041553A1 (en) * 2015-09-07 2017-03-16 中兴通讯股份有限公司 Method and apparatus for determining voice quality
CN106558308A (en) * 2016-12-02 2017-04-05 深圳撒哈拉数据科技有限公司 A kind of internet audio quality of data auto-scoring system and method
CN107895582A (en) * 2017-10-16 2018-04-10 中国电子科技集团公司第二十八研究所 Towards the speaker adaptation speech-emotion recognition method in multi-source information field
CN108346434A (en) * 2017-01-24 2018-07-31 中国移动通信集团安徽有限公司 A kind of method and apparatus of speech quality evaluation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
US20120059650A1 (en) * 2009-04-17 2012-03-08 France Telecom Method and device for the objective evaluation of the voice quality of a speech signal taking into account the classification of the background noise contained in the signal
US20140032212A1 (en) * 2011-04-11 2014-01-30 Orange Evaluation of the voice quality of a coded speech signal
WO2017041553A1 (en) * 2015-09-07 2017-03-16 中兴通讯股份有限公司 Method and apparatus for determining voice quality
CN106558308A (en) * 2016-12-02 2017-04-05 深圳撒哈拉数据科技有限公司 A kind of internet audio quality of data auto-scoring system and method
CN108346434A (en) * 2017-01-24 2018-07-31 中国移动通信集团安徽有限公司 A kind of method and apparatus of speech quality evaluation
CN107895582A (en) * 2017-10-16 2018-04-10 中国电子科技集团公司第二十八研究所 Towards the speaker adaptation speech-emotion recognition method in multi-source information field

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816207A (en) * 2020-08-31 2020-10-23 广州汽车集团股份有限公司 Sound analysis method, sound analysis system, automobile and storage medium
CN112632841A (en) * 2020-12-22 2021-04-09 交通运输部科学研究院 Road surface long-term performance prediction method and device
CN112634946A (en) * 2020-12-25 2021-04-09 深圳市博瑞得科技有限公司 Voice quality classification prediction method, computer equipment and storage medium
CN112634946B (en) * 2020-12-25 2022-04-12 博瑞得科技有限公司 Voice quality classification prediction method, computer equipment and storage medium
CN112885377A (en) * 2021-02-26 2021-06-01 平安普惠企业管理有限公司 Voice quality evaluation method and device, computer equipment and storage medium
CN113393863A (en) * 2021-06-10 2021-09-14 北京字跳网络技术有限公司 Voice evaluation method, device and equipment
CN113393863B (en) * 2021-06-10 2023-11-03 北京字跳网络技术有限公司 Voice evaluation method, device and equipment
CN113838168A (en) * 2021-10-13 2021-12-24 亿览在线网络技术(北京)有限公司 Method for generating particle special effect animation
CN113838168B (en) * 2021-10-13 2023-10-03 亿览在线网络技术(北京)有限公司 Particle special effect animation generation method

Also Published As

Publication number Publication date
CN111326169B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
CN111326169B (en) Voice quality evaluation method and device
EP3528250B1 (en) Voice quality evaluation method and apparatus
CN110223673B (en) Voice processing method and device, storage medium and electronic equipment
CN109034046B (en) Method for automatically identifying foreign matters in electric energy meter based on acoustic detection
CN102982804A (en) Method and system of voice frequency classification
CN104903954A (en) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN111294812B (en) Resource capacity-expansion planning method and system
CN102915728B (en) Sound segmentation device and method and speaker recognition system
CN111160959B (en) User click conversion prediction method and device
Karbasi et al. Twin-HMM-based non-intrusive speech intelligibility prediction
US20070225972A1 (en) Speech signal classification system and method
CN110428845A (en) Composite tone detection method, system, mobile terminal and storage medium
CN110349597A (en) A kind of speech detection method and device
BR112013026333B1 (en) frame-based audio signal classification method, audio classifier, audio communication device, and audio codec layout
CN112508580A (en) Model construction method and device based on rejection inference method and electronic equipment
Gold et al. Issues and opportunities: The application of the numerical likelihood ratio framework to forensic speaker comparison
CN108831506A (en) Digital audio based on GMM-BIC distorts point detecting method and system
CN115062678A (en) Training method of equipment fault detection model, fault detection method and device
Naik et al. Filter selection for speaker diarization using homomorphism: speaker diarization
CN112801231B (en) Decision model training method and device for business object classification
Mossavat et al. A Bayesian hierarchical mixture of experts approach to estimate speech quality
CN113919432A (en) Classification model construction method, data classification method and device
JP3920749B2 (en) Acoustic model creation method for speech recognition, apparatus thereof, program thereof and recording medium thereof, speech recognition apparatus using acoustic model
CN111833842A (en) Synthetic sound template discovery method, device and equipment
CN111523604A (en) User classification method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant