CN111326169A

CN111326169A - Voice quality evaluation method and device

Info

Publication number: CN111326169A
Application number: CN201811544623.XA
Authority: CN
Inventors: 梁立涛
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Beijing Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Beijing Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-06-23
Anticipated expiration: 2038-12-17
Also published as: CN111326169B

Abstract

The invention discloses a method and a device for evaluating voice quality, which are used for acquiring a voice signal to be evaluated, comparing the voice signal to be evaluated with a stored voice signal, updating a built-in voice quality evaluation model when the difference between the voice signal to be evaluated and the stored voice signal is larger to obtain a new voice quality evaluation model, evaluating the voice signal to be evaluated by utilizing the new voice quality evaluation model, and continuously updating the voice quality evaluation model by continuously learning the voice signal so as to improve the accuracy of voice evaluation.

Description

Voice quality evaluation method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for evaluating voice quality.

Background

Voice services based on the internet have become one of the important services of the network, and are the fields in which the service providers focus on, and voice quality is an important factor for evaluating the quality of the communication network.

At present, a speech evaluation method generally uses a fixed speech quality evaluation model to evaluate speech quality, and the specific method is as follows: the characteristic parameters of the voice signal are extracted, a voice quality evaluation model is obtained through training based on the extracted characteristic parameters, and the voice signal is evaluated through the voice quality evaluation model.

Disclosure of Invention

The invention aims to provide a method and a device for evaluating voice quality so as to improve the accuracy of voice evaluation.

The purpose of the invention is realized by the following technical scheme:

in a first aspect, the present invention provides a method for evaluating speech quality, including:

acquiring a voice signal to be evaluated, and determining identification information of the voice signal to be evaluated;

if the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, taking the voice signal to be evaluated as a new voice signal, and updating the first voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value to obtain a second voice quality evaluation model;

wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated;

and evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.

Optionally, updating the first speech quality evaluation model to obtain a second speech quality evaluation model, including:

acquiring the characteristic parameters of the new voice signal;

training the characteristic parameters by using a decision tree algorithm, and updating the first voice quality evaluation model to obtain a second voice quality evaluation model;

the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.

Optionally, after acquiring the voice signal to be evaluated, the method further includes:

performing at least one of the following preprocessing on the voice signal to be evaluated: voice data validity detection, voice data normalization processing and default value difference fitting.

evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated;

classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of the different interval levels is used for characterizing the speech quality of different classes.

Optionally, the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, and the method includes:

the speech quality of the speech signal to be evaluated differs from the speech quality of the already stored speech signal and/or the characteristic parameters of the speech signal to be evaluated differ from the characteristic parameters of the already stored speech signal.

Optionally, the step of determining that the speech quality of the speech signal to be evaluated is different from the speech quality of the stored speech signal includes:

the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or

And the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals with different interval grades.

In a second aspect, the present invention provides an apparatus for evaluating speech quality, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a voice signal to be evaluated;

the determining unit is used for determining the identification information of the voice signal to be evaluated, and taking the voice signal to be evaluated as a new voice signal when the voice quality of the voice signal to be evaluated is determined to be different from the voice quality of the stored voice signal;

the updating unit is used for updating the first voice quality evaluation model to obtain a second voice quality evaluation model when the number of the new voice signals is larger than a first preset threshold value;

and the evaluation unit is used for evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.

Optionally, the updating unit is specifically configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model as follows:

acquiring the characteristic parameters of the new voice signal;

Optionally, the apparatus further comprises a processing unit configured to:

Optionally, the evaluation unit is further configured to:

the processing unit is further to:

In a third aspect, the present invention further provides an apparatus for evaluating speech quality, including:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method of the first aspect according to the obtained program.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform the method of the first aspect.

The invention provides a method and a device for evaluating voice quality, which are used for acquiring a voice signal to be evaluated, comparing the voice signal to be evaluated with a stored voice signal, updating a built-in voice quality evaluation model when the difference between the voice signal to be evaluated and the stored voice signal is larger to obtain a new voice quality evaluation model, evaluating the voice signal to be evaluated by using the new voice quality evaluation model, and continuously updating the voice quality evaluation model by continuously learning the voice signal so as to improve the voice evaluation accuracy.

Drawings

Fig. 1 is a flowchart of a method for evaluating speech quality according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a decision tree training classification according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of another decision tree training scheme provided in an embodiment of the present application;

fig. 4 is a flowchart of a method for updating an evaluation model of speech quality according to an embodiment of the present application;

fig. 5 is a flowchart of another speech quality evaluation method provided in the embodiment of the present application;

fig. 6 is a block diagram of a speech quality evaluation apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of another speech quality evaluation apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

At present, the commonly used speech quality evaluation method is as follows: the method comprises the steps of extracting the parameter characteristics of a voice signal or acquiring other characteristic parameters related to voice quality, such as network delay, packet loss, jitter and the like, and then carrying out modeling analysis on the characteristic parameters to obtain objective voice quality evaluation.

Generally, a fixed algorithm can be used for modeling a fixed evaluation scene, for example, a subjective speech quality assessment (PESQ) algorithm for narrowband speech signals, an Objective Perceptual speech quality assessment (POLQA) algorithm for ultra-wideband speech evaluation, and the like, and speech quality evaluation models established by using the algorithm are trained linear regression models and have a specific mapping method, and finally, the Objective speech quality assessment and the crowd actual Perceptual quality are mapped to obtain a score of speech quality.

The existing method is suitable for scenes with little change of voice environment, and because parameters used in model training are limited, if parameters related to voice quality are more in scenes with large change of voice environment, such as trains, and may not be limited to the parameters used in the fixed voice quality evaluation model training, the voice quality evaluation accuracy may be lower by using the fixed voice quality evaluation model.

In view of this, embodiments of the present application provide a method and an apparatus for evaluating voice quality, in which a built-in evaluation model is continuously updated based on an input voice signal by continuously acquiring the voice signal, and evaluates the input voice signal to output a voice quality score, thereby improving accuracy of voice quality evaluation.

It is to be understood that the terms first, second, etc. used herein are used for descriptive purposes only and not for purposes of indicating or implying relative importance, nor order.

The embodiment of the application is not restricted by environmental factors and can be applied to various evaluation environments including a large-change environment and a stable environment.

Second, the application scenarios of the embodiments of the present application include, but are not limited to, a conventional 2rd-Generation (2G)/3 rd-Generation (3G) call, the 4th Generation mobile communication technology (4G) call, an 2/3/4G hybrid scenario, and the like.

Fig. 1 is a flowchart of a method for evaluating speech quality according to an embodiment of the present application, and referring to fig. 1, the method includes:

s101: and acquiring the voice signal to be evaluated, and determining the identification information of the voice signal to be evaluated.

S102: and if the identification information of the voice signal to be evaluated is different from the identification information of the stored voice signal, taking the voice signal to be evaluated as a new voice signal.

It is understood that the "new speech signal" in the embodiment of the present application means that the speech signal to be evaluated is different from the speech signal received before the speech signal to be evaluated is received, and the speech signal to be evaluated can be marked as a new speech signal.

Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, that is, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal.

Wherein the speech signal already stored in the built-in speech quality evaluation model is a speech signal acquired before the speech signal to be evaluated.

In the embodiment of the present application, surrounding voice signals are continuously acquired, and therefore, for a voice signal to be evaluated, a voice signal acquired before the voice signal to be evaluated can be used as a reference signal of the voice signal to be evaluated.

It should be noted that the voice quality may include, but is not limited to, for example, a voice quality evaluation score and a voice quality level of the voice signal.

S103: and when the number of the new voice signals is larger than a first preset threshold value, updating the first voice quality evaluation model to obtain a second voice quality evaluation model.

For convenience of description, in the embodiment of the present application, the "built-in speech quality evaluation model" may be referred to as a "first speech quality evaluation model", and the "speech quality evaluation model after updating the built-in speech quality evaluation model" may be referred to as a "second speech quality evaluation model".

Specifically, in the embodiment of the present application, when the difference between the speech signal to be evaluated and the old speech signal is large, the speech signal to be evaluated is used as a new sample, and when the number of the new samples reaches a preset threshold, for example, may be a first preset threshold, the built-in speech quality evaluation model is updated, so as to obtain a second speech quality evaluation model.

It should be noted that, in the present application, the "new speech signal" and the "new sample" and the "old speech signal" and the "stored speech signal" are sometimes mixed, and those skilled in the art should understand that the meanings are consistent.

S104: and evaluating the voice signal to be evaluated by utilizing the second voice quality evaluation model.

Specifically, when the speech signal to be evaluated is a new sample, the speech signal to be evaluated can be evaluated by using the updated second speech quality evaluation model, so that an accurate speech quality evaluation result is obtained.

In the embodiment of the application, the voice data is continuously acquired through the change of the external voice environment, the accuracy of the voice evaluation model is ensured by using the continuously updated data set, and the precision of the model can be improved.

In a possible implementation manner, updating the first speech quality evaluation model to obtain the second speech quality evaluation model may include:

and acquiring the characteristic parameters of the new voice signal, training the characteristic parameters by using a decision tree algorithm, updating the first voice quality evaluation model, and acquiring a second voice quality evaluation model.

Specifically, in the embodiment of the present application, a certain number of feature parameters of new speech signals (the number of the new speech signals is greater than a first preset threshold) may be extracted, or other feature parameters related to speech quality may be obtained, and then a new speech quality evaluation model may be obtained through training according to the feature parameters.

It is understood that other characteristic parameters related to voice quality include, but are not limited to, network latency, packet loss, jitter, etc.

The above method for obtaining the model by using the feature parameter training is similar to the existing scheme, and will not be described in detail herein.

In another possible implementation manner, in the embodiment of the present application, a new speech signal and an old speech signal may be fused, then feature parameters of the new speech signal and the old speech signal are extracted, or other feature parameters related to speech quality are obtained, and finally a new speech quality evaluation model is obtained according to the feature parameters.

It should be noted that the old speech signal is the speech signal received before the speech signal to be evaluated.

Specifically, the characteristic parameters in the embodiment of the present application include at least one of the following parameters: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.

Because the speech quality evaluation parameters are more, some parameters with higher weight values are usually selected as characteristic parameters during training.

In this embodiment of the present application, the speech quality evaluation parameter may further include: average speech signal interference value, global background noise, speech interruption time, level dip, silence length, pitch period, mechanization, correlation between rear cavity and middle cavity, correlation of continuous frames, average power of continuous frames, energy sum of repeated frames, number of frames of unnatural beep, sample average energy of unnatural beep, sample proportion of unnatural beep, cepstrum standard deviation absolute value, cepstrum kurtosis coefficient, kurtosis coefficient of linear prediction coefficient, absolute value of skewness coefficient of linear prediction coefficient, fixed noise weighting, spectral clarity, average energy level of samples of background noise, average energy of samples of background noise, signal-to-noise ratio of multiplicative noise, total energy of unnatural silence frames, and the like.

Further, after acquiring the voice signal to be evaluated, the method further includes:

the speech signal to be evaluated is preprocessed by at least one of the following steps: voice data validity detection, voice data normalization processing and default value difference fitting.

Specifically, because a large amount of incomplete, inconsistent and abnormal data exists in the original speech signal, the execution efficiency of the later modeling is seriously affected, and even deviation of the model result may be caused. In addition, the value of the data itself also affects the results of the model, so the original speech signal can be data cleaned first. It is often necessary to handle data misses, exceptions, redundancies, and size scaling.

The data processing method mainly comprises data validity detection, data normalization, default value interpolation fitting and the like, but is not limited to the above methods.

Further, after acquiring the speech signal to be evaluated, the method further includes:

evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated; and classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades.

And the voice quality of different interval grades is used for representing different classes of voice quality.

There are many options for the classification algorithm in the speech evaluation model, for example, GBDT (gradient boosting Decision Tree) algorithm can be used.

Specifically, in the embodiment of the present application, a decision tree algorithm may be used to classify the quality of the speech signal, as shown in fig. 2.

In fig. 2, the feature labels (1), (2), etc. are used to represent the identification information of the feature parameters of the speech signal, and the decision tree algorithm can be regarded as a prediction model, and can also be understood as a classification tree. The classification of speech quality can be mapped using decision trees in the present application.

The classification of speech quality is mapped by the decision tree and the decision tree can be iterated several times to form a progressively improved combined tree to optimize the mapping performance, for example, fig. 3, in fig. 3, the learner can score the prediction of speech signal to obtain the predicted speech quality.

The parameters in fig. 3 represent: theta represents the weight and phi represents the mapping function of different learners.

It should be noted that fig. 2 and fig. 3 are only exemplary illustrations, and the specific form and content thereof are not limited to the form and content shown in the drawings. For example, the set of scores for speech quality is not limited to being classified by 0-5.

It is to be understood that the decision tree may be obtained by a method such as machine learning, and the embodiment of the present application is not limited thereto.

As can be seen from the boosting algorithm in FIG. 3, the final prediction scoring result of the speech signal is a combination of the b learner speech quality results:

it will be understood that in the above formula

Corresponding to phi in the figure.

The formula is optimized in a function space to obtain:

where ρ represents a learning rate.

The training value for one speech sample at a time can be obtained according to the formula as follows:

from the above formula, it can be seen that: the speech quality scores may correspond to different speech quality score intervals, e.g., [0, 1], [1, 2], etc., and the different speech quality score intervals may correspond to different speech classes.

Preferably, the identification information of the voice signal to be evaluated is different from the identification information of the voice signal already stored, and may include:

Specifically, the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, and the method may include:

the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored are the voice signal of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored is smaller than a set threshold (for example, may be a second preset threshold), or the voice quality of the voice signal to be evaluated and the voice quality of the voice signal already stored are the voice signals of different interval grades.

Optionally, in this embodiment of the application, a built-in speech quality evaluation model (a first speech quality evaluation model) may be used to evaluate a speech signal to be evaluated first, so as to determine whether the speech signal to be evaluated is a new speech signal.

Specifically, in the embodiment of the present application, the speech data is obtained from the outside, the speech quality of the newly obtained speech data is classified and scored by using a built-in evaluation model, and then whether the newly obtained speech signal belongs to a new sample is determined. If the voice data of different classifications is not greatly different, or the scores of the voice data of the same classification are too different from the scores of the old voice data, the voice data can be used as a new sample.

Specifically, the determining, by using the characteristic parameter, that the identification information of the speech signal to be evaluated is different from the identification information of the stored speech signal may include, but is not limited to, the following methods:

(1) and detecting based on unitary normal distribution:

the original data set is x_i，1，x_i，2，x_i，3，…，x_i，nI ∈ (1, …, m), containing m samples, n-dimensional features, the mean and method for each feature dimension can be calculated:

for new data, the probability can be calculated as:

the difference of the feature distribution of the new data and the old data can be judged according to the probability.

(2) Detecting based on multivariate Gaussian distribution:

the raw data set is

For a total of n-dimensional feature vectors, a covariance matrix of n x n and n-dimensional feature mean vectors can be calculated:

Σ＝[Cov(x_i，x_j)]，i，j∈(1，…，n)

for new data, the probability can be calculated as:

the difference in the distribution of the characteristics of the new data and the old data can be judged according to the probability, wherein T in the formula represents the transposition of the matrix.

(3) And detecting based on the Mahalanobis distance:

for a multidimensional data set, a is a mean vector, and the mahalanobis distance from new data a to a is:

wherein T represents the transpose of the matrix, S is the covariance matrix, and if the value of S is too large, the feature distribution is considered to be different.

(4) And detecting based on the feature importance:

using a tree-based model, such as GBDT or the like, a ranking of importance of features can be derived.

The global importance of feature j is measured by the average of the importance of feature j in a single tree:

where M is the number of trees.

The importance of feature j in a single tree is as follows:

wherein, L is the leaf node number of the tree, and L-1 is the non-leaf node number of the tree. V is_tIs a feature associated with the node t,

is node t is split and then is flatThe square penalty reduction value, J represents the feature set, and T represents the set of trees.

And regarding the first k characteristics of the new sample training, if the characteristics are different from the characteristics of the original data set, the distribution is different from the distribution of the original data set.

In this embodiment, in a possible implementation manner, the speech data may be incrementally learned through a flowchart of the method shown in fig. 4, so as to update the built-in speech quality evaluation model, which is shown in fig. 4.

It is to be understood that the normal score in fig. 4 is a score by a built-in speech quality evaluation model.

As for the whole method flow in the embodiment of the present application, the method flow shown in fig. 5 may be participated, in the method, an external voice signal is obtained and preprocessed, then, the voice signal quality is classified by using a decision tree algorithm, so as to obtain the quality score of the voice signal, and then, whether the voice sample data meets the new sample characteristics is determined, when the voice signal is a new sample, after collecting a certain number of new samples, the built-in voice quality evaluation model is updated, and the standard score is performed by using the updated voice quality evaluation model.

Based on the same concept as the above one of the method embodiments, an embodiment of the present invention further provides a block diagram of a speech quality evaluation apparatus, as shown in fig. 6, where the apparatus includes: an acquisition unit 101, a determination unit 102, an update unit 103, and an evaluation unit 104.

The obtaining unit 101 is configured to obtain a speech signal to be evaluated.

A determining unit 102, configured to determine the identification information of the voice signal to be evaluated acquired by the acquiring unit 101, and when it is determined that the voice quality of the voice signal to be evaluated is different from the voice quality of the stored voice signal, take the voice signal to be evaluated as a new voice signal.

And the updating unit 103 is configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model when the number of the new speech signals determined by the determining unit 102 is greater than a first preset threshold.

Wherein the stored voice signal is a voice signal acquired before the voice signal to be evaluated.

And the evaluation unit 104 is configured to evaluate the speech signal to be evaluated by using the second speech quality evaluation model obtained by the updating unit 103.

Specifically, the updating unit 103 is specifically configured to update the first speech quality evaluation model to obtain a second speech quality evaluation model as follows:

acquiring the characteristic parameters of a new voice signal; and training the characteristic parameters by using a decision tree algorithm, updating the first voice quality evaluation model, and obtaining a second voice quality evaluation model.

Wherein the characteristic parameters include at least one of: signal-to-noise ratio, background noise, noise level, asymmetric interference value of average speech signal spectrum, high-frequency flatness analysis, spectrum level range, spectrum level standard deviation, relative noise floor, skewness coefficient of linear prediction coefficient, cepstrum skewness coefficient, voiced sound, average cross section of back cavity, vocal tract amplitude variation and speech level.

Correspondingly, the device further comprises: the processing unit 105 is configured to:

the speech signal to be evaluated is subjected to at least one of the following preprocessing: voice data validity detection, voice data normalization processing and default value difference fitting.

Still further, the evaluation unit 104 is further configured to:

and evaluating the voice signal to be evaluated according to the first voice quality evaluation model to obtain the voice quality of the voice signal to be evaluated.

The processing unit 105 is further configured to:

classifying the voice quality of the voice signal to be evaluated to obtain the voice quality of different interval grades; the speech quality of different interval classes is used to characterize the speech quality of different classes.

Optionally, the identification information of the speech signal to be evaluated is different from the identification information of the stored speech signal, and includes:

Further, the voice quality of the voice signal to be evaluated is different from the voice quality of the voice signal already stored, including:

the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal are voice signals of the same interval grade, and the difference value between the voice quality of the voice signal to be evaluated and the voice quality of the stored voice signal is smaller than a second preset threshold value; or the voice quality of the voice signal to be evaluated and the voice quality of the voice signal stored in the memory are voice signals with different interval grades.

It should be noted that, for the implementation of the functions of each unit in the above-mentioned speech quality evaluation apparatus according to the embodiment of the present invention, reference may be further made to the description of the related method embodiment, which is not described herein again.

An embodiment of the present application further provides another apparatus for evaluating voice quality, as shown in fig. 7, the apparatus includes:

a memory 202 for storing program instructions.

The transceiver 201 is used for receiving and transmitting the evaluation instruction of the voice quality.

And the processor 200 is used for calling the program instructions stored in the memory, and executing the method executed by the processing unit (102), the determining unit (103), the updating unit (104) and the evaluating unit (105) shown in fig. 6 according to the obtained program according to the instructions received by the transceiver 201.

Where in fig. 7 the bus architecture may include any number of interconnected buses and bridges, with various circuits of one or more processors, represented by processor 200, and memory, represented by memory 202, being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface.

The transceiver 201 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium.

The processor 200 is responsible for managing the bus architecture and general processing, and the memory 202 may store data used by the processor 200 in performing operations.

The processor 200 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or a Complex Programmable Logic Device (CPLD).

Embodiments of the present application also provide a computer storage medium for storing computer program instructions for any apparatus described in the embodiments of the present application, which includes a program for executing any method provided in the embodiments of the present application.

The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for evaluating voice quality, comprising:

2. The method of claim 1, wherein updating the first speech quality assessment model to obtain a second speech quality assessment model comprises:

acquiring the characteristic parameters of the new voice signal;

3. The method of claim 1, wherein after acquiring the speech signal to be evaluated, the method further comprises:

4. The method of claim 1, wherein after acquiring the speech signal to be evaluated, the method further comprises:

5. The method of claim 1, wherein the identification information of the speech signal to be evaluated is different from the identification information of the speech signal already stored, comprising:

6. The method of claim 5, wherein the speech quality of the speech signal to be evaluated is different from the speech quality of the speech signal already stored, comprising:

7. An apparatus for evaluating speech quality, comprising:

the stored voice signal is a voice signal which is evaluated by the first voice quality evaluation model before the voice signal to be evaluated;

8. The apparatus according to claim 7, wherein the updating unit is specifically configured to update the first speech quality assessment model to obtain a second speech quality assessment model as follows:

acquiring the characteristic parameters of the new voice signal;

9. The apparatus of claim 7, further comprising a processing unit to:

10. The apparatus of claim 7, wherein the evaluation unit is further to:

the processing unit is further to:

11. The apparatus of claim 7, wherein the identification information of the speech signal to be evaluated is different from the identification information of the speech signal already stored, comprising:

12. The apparatus of claim 11, wherein the speech quality of the speech signal to be evaluated is different from the speech quality of the speech signal already stored, comprising:

13. An apparatus for evaluating speech quality, comprising:

a memory for storing program instructions;

a processor for calling the program instructions stored in the memory and executing the method of any one of claims 1 to 6 according to the obtained program.

14. A computer readable storage medium having stored thereon computer instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-6.