WO2018068396A1 - 语音质量评价方法和装置 - Google Patents

语音质量评价方法和装置 Download PDF

Info

Publication number
WO2018068396A1
WO2018068396A1 PCT/CN2016/111050 CN2016111050W WO2018068396A1 WO 2018068396 A1 WO2018068396 A1 WO 2018068396A1 CN 2016111050 W CN2016111050 W CN 2016111050W WO 2018068396 A1 WO2018068396 A1 WO 2018068396A1
Authority
WO
WIPO (PCT)
Prior art keywords
degraded
data
voice data
speech
processed
Prior art date
Application number
PCT/CN2016/111050
Other languages
English (en)
French (fr)
Inventor
殷兵
魏思
胡国平
程甦
Original Assignee
科大讯飞股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科大讯飞股份有限公司 filed Critical 科大讯飞股份有限公司
Priority to EP16918904.0A priority Critical patent/EP3528250B1/en
Priority to JP2019500365A priority patent/JP2019531494A/ja
Priority to KR1020197009232A priority patent/KR102262686B1/ko
Publication of WO2018068396A1 publication Critical patent/WO2018068396A1/zh
Priority to US16/280,705 priority patent/US10964337B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a voice quality evaluation method and apparatus.
  • the speech quality evaluation algorithm in the communication network includes a Perceptual Evaluation of Speech Quality (PESQ) algorithm and a Perceptual Objective Listening Quality Analysis (POLQA) algorithm.
  • PESQ Perceptual Evaluation of Speech Quality
  • POLQA Perceptual Objective Listening Quality Analysis
  • These algorithms need to obtain input voice data and output voice data when implemented.
  • the input voice data is generally clean voice data, and the output voice data is generally degraded voice data after passing through the communication network.
  • the voice data and the output voice data are input for analysis, and the quality of the output voice data is evaluated.
  • the input voice data is generally collected by the operator's road test vehicle. However, under the indoor conditions such as the community floor or the shopping mall, it is impossible to collect by the road test vehicle, so the input voice data cannot be obtained, and the voice data cannot be input.
  • the speech quality evaluation is performed such that the above algorithm for performing speech quality evaluation on the output speech data based on the input speech data and the output speech data has application limitations.
  • the present application aims to solve at least one of the technical problems in the related art to some extent.
  • an object of the present application is to provide a voice quality evaluation method, which can perform voice quality evaluation on the voice data to be evaluated, does not require corresponding input voice data, and realizes voice quality evaluation relying only on single-ended voice data. To expand the scope of application.
  • Another object of the present application is to propose a voice quality evaluation apparatus.
  • the voice quality evaluation method includes: receiving voice data to be evaluated; extracting an evaluation feature of the voice data to be evaluated; and evaluating characteristics according to the voice data to be evaluated. a constructed voice quality evaluation model for performing quality evaluation on the voice data to be evaluated, wherein the voice quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended voice data and the quality information of the single-ended voice data .
  • the voice quality evaluation method proposed by the embodiment of the first aspect of the present application can perform quality evaluation by using the voice quality evaluation model to evaluate the voice data, and only needs single-ended voice in voice quality evaluation. Data, to avoid the application-constrained problem caused by relying on double-ended voice data, thus extending the scope of application.
  • the voice quality evaluation apparatus includes: a receiving module, configured to receive voice data to be evaluated; an extracting module, configured to extract an evaluation feature of the voice data to be evaluated; and an evaluation module And for performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation of the single-ended speech data.
  • a receiving module configured to receive voice data to be evaluated
  • an extracting module configured to extract an evaluation feature of the voice data to be evaluated
  • an evaluation module And for performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation of the single-ended speech data.
  • the voice quality evaluation apparatus proposed by the second aspect of the present application can perform quality evaluation on the voice data to be evaluated by using the voice quality evaluation model, and only needs single-ended voice data in voice quality evaluation, thereby avoiding applications caused by double-ended voice data. Restricted issues to extend the scope of the application.
  • Embodiments of the present application also provide an apparatus comprising: one or more processors; a memory for storing one or more programs; and when the one or more programs are executed by the one or more processors Having the one or more processors perform the method of any of the first aspect embodiments of the present application.
  • Embodiments of the present application also provide a non-transitory computer readable storage medium that causes the one or more processes when one or more programs in the storage medium are executed by one or more processors of the device The method of any of the first aspect of the present application is performed.
  • Embodiments of the present application also provide a computer program product that, when executed by one or more processors in a device, causes the one or more processors to perform a first party as in the present application The method of any of the embodiments.
  • FIG. 1 is a schematic flow chart of a voice quality evaluation method according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a voice quality evaluation method according to another embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a voice quality evaluation apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a voice quality evaluation apparatus according to another embodiment of the present application.
  • ITU-T In order to solve the problem of PESQ algorithm and better adapt to the voice quality evaluation requirements of 4G/LTE era, ITU-T started the development of POLQA algorithm in 2006, and was officially released in early 2011. ITU-T P.863 standard. The main features can cover the latest voice coding and network transmission technology, and have higher accuracy for 3G, 4G/LTE, VoIP networks to support ultra-wideband (50Hz ⁇ 14KHz) voice transmission, high-quality voice transmission. Therefore, the POLQA algorithm is currently the generally chosen algorithm for evaluating the speech quality of a communication network.
  • Deep learning stems from the study of artificial neural networks.
  • a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
  • the application fields of deep learning mainly include: computer vision, acoustic model training of speech recognition, machine translation and semantic mining.
  • the inventor of the present application as a technician in the field of communication, usually adopts the POLQA algorithm before the completion of the speech quality evaluation, but the inventors have found that the POLQA algorithm requires double-ended speech data, that is, when evaluating the speech quality of the output speech data. It is not only necessary to output voice data but also input voice data. Since it is difficult to obtain input voice data in some cases, the application of the POLQA algorithm is limited. In order to avoid application limitations, new solutions are needed. The inventors found through further analysis that the model constructed by deep learning has excellent performance, so deep learning can be introduced into the speech quality evaluation algorithm.
  • the main idea of the present application is to introduce deep learning into speech quality evaluation, especially in the speech quality evaluation in the communication field. It will provide a new solution that relies on single-ended voice data for voice quality evaluation in the communication field, and when relying on single-ended voice data, constructing a model using deep learning can ensure the excellent performance of the model and thus limit the limitation. Technical problems with less performance and better voice quality evaluation. Further, it should be noted that although the main idea of the present application has been described above, the specific technical solution is not limited to the above main ideas, and may be combined with other features, and the combination of these different technical features still belongs to The scope of protection of this application.
  • FIG. 1 is a schematic flow chart of a voice quality evaluation method according to an embodiment of the present application.
  • the method in this embodiment includes:
  • S11 Receive voice data to be evaluated.
  • the voice data to be evaluated may specifically refer to the output voice data of the communication network, that is, the degraded voice data after the input voice data passes through the communication network.
  • the input voice data generally refers to clean voice data or called original voice data
  • the degraded voice data generally refers to voice data with quality degradation of one or more contents such as deterioration of brightness, delay, noise, and the like with respect to the original voice data. .
  • the evaluation features of the degraded speech data extraction are the same, and may be determined according to application requirements.
  • the evaluation feature refers to describing the characteristics of the voice data from the perspective of the human ear's auditory perception. For details, refer to the subsequent description.
  • S13 Perform quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation feature of the single-ended speech data. A relationship with quality information of the single-ended voice data.
  • the voice quality evaluation model may be pre-built before the voice quality evaluation is required.
  • the voice quality evaluation model is first constructed by offline mode, and when voice quality evaluation is required, A pre-built voice quality evaluation model can be directly used.
  • the voice quality evaluation model is built online, such as online when voice quality evaluation is required.
  • the specific construction content can be referred to the subsequent description.
  • the input and output of the speech quality evaluation model are the evaluation features and quality information of the single-ended speech data respectively. Therefore, after extracting the evaluation features of the speech data to be evaluated, the evaluation feature can be used as the input of the speech quality evaluation model, thereby obtaining The output is the quality information of the voice data to be evaluated, and the voice quality evaluation is realized.
  • the speech quality evaluation model may be described by a regression model or a classification model, and the specific content of the above quality information may be different under different description situations.
  • the obtained quality information is a specific evaluation score, such as one of 1-5 points;
  • the speech quality evaluation model is described by a classification model, the obtained quality information is an evaluation. Category, one of poor, poor, average, good, and good.
  • the quality evaluation result obtained by S13 may also be normalized.
  • the evaluation score obtained by S13 can be directly used as the final evaluation score, or can be obtained by combining the relevant parameters such as packet loss, jitter, and delay of the communication network with S13.
  • the evaluation score is regularized to obtain the final evaluation score.
  • the specific algorithm for synthesizing the network parameters may be set and will not be described in detail here.
  • the evaluation score obtained in S13 may be multiplied by a coefficient as the final evaluation score, and the coefficient is related to the above parameters of the communication network.
  • FIG. 2 is a schematic flow chart of a voice quality evaluation method according to another embodiment of the present application.
  • the degraded voice data after the voice data to be evaluated is the communication network is taken as an example.
  • the method of this embodiment includes:
  • S21 Acquire voice data, the voice data including clean voice data and degraded voice data.
  • the voice data can be obtained by collecting and/or directly acquiring from existing data. In order to improve the accuracy of the constructed speech quality evaluation model, as much speech data as possible should be obtained here.
  • the real network voice data can be directly collected, and the corresponding clean voice data and the degraded voice data are respectively obtained.
  • the specific acquisition manner is not limited in this application.
  • the clean voice data and the degraded voice data can be separately collected, so that the clean voice data and the degraded voice data can be directly obtained separately.
  • the clean voice data and the degraded voice data may be collected together, and the clean voice data and the degraded voice data may be separately marked to distinguish the clean voice data from the degraded voice data, for example, using 1 to indicate clean voice.
  • Data, 0 represents degraded speech data, and at this time, clean speech data and degraded speech data can be respectively acquired according to the markers.
  • S22 Acquire clean voice data to be processed according to the clean voice data, and acquire the degraded voice data to be processed according to the degraded voice data.
  • Can include:
  • the valid speech segments of the acquired degraded speech data are extracted, and the effective speech segments of the degraded speech data are clustered, and the effective speech segments of the degraded speech data corresponding to the cluster center are taken as the degraded speech data to be processed.
  • the obtained clean voice data and the degraded voice data may be directly used as the clean voice data to be processed and the degraded language to be processed. Sound data.
  • the effective voice segment is extracted separately, and the valid voice segment of the extracted clean voice data is used as the clean voice data to be processed, and the degraded voice data is valid.
  • the speech segment acts as degraded speech data to be processed.
  • the specific method for extracting valid voice segments is not limited, for example, a voice activity detection (VAD) method is adopted. By processing only valid speech segments, the amount of computation and complexity can be reduced.
  • VAD voice activity detection
  • all the degraded speech data included in the speech data or the valid speech segment of all the degraded speech data may be regarded as the degraded speech data to be processed, or the partially degraded speech data may also be selected. Or its valid speech segment as degraded speech data to be processed.
  • the clustering mode may be used to cluster all the degraded speech data or its effective speech segments, and the degraded speech data corresponding to the cluster center or its effective speech segment is taken as the degraded speech data to be processed.
  • the ivector feature of the effective speech segment of the degraded speech data is extracted, and the extracted ivector features are clustered by using the k-means method to obtain k cluster centers, and the degraded speech corresponding to each cluster center is obtained.
  • the data or its valid speech segment is used as degraded speech data to be processed.
  • S23 Calculate an evaluation score of the degraded voice data to be processed according to the clean voice data to be processed and the degraded voice data to be processed.
  • the valid speech segment of the clean speech data can be analyzed frame by frame for each valid speech segment of the degraded speech data, and the evaluation score of the valid speech segment of the degraded speech data is calculated.
  • the calculation method is not limited.
  • the evaluation score is a Mean Opinion Score (MOS) score of the voice data, and the specific calculation method may be the same as the prior art, such as using the POLQA algorithm or the PESQ algorithm, and is not More details.
  • MOS Mean Opinion Score
  • the evaluation feature describes the speech data from the perspective of human auditory perception.
  • the temporal domain features of the degraded speech data to be processed are first extracted, such as the short-term average energy of the speech data, the segmentation noise of the speech, and the speech.
  • the frequency domain features For example, FilterBank features, linear predictive coding (LPC) features, etc.
  • LPC linear predictive coding
  • a filter capable of describing the human auditory cochlear shape is used, so that the extracted frequency domain features can be perceived from the human ear.
  • Angle describes voice data; in order to better describe degraded speech data, it is also possible to extract the mean, variance, maximum, minimum, and difference features of each frequency domain feature, such as first-order, second-order difference values, etc.
  • the evaluation feature may be determined according to the application requirements and the degradation of the voice data, which is not limited in this application.
  • S25 Perform training according to the evaluation feature of extracting the degraded speech data to be processed and the evaluation score of the degraded speech data to be processed, and construct a speech quality evaluation model.
  • the parameters of the speech quality evaluation model can be specifically trained by using the deep learning method. Construct a speech quality evaluation model.
  • the network topology used in deep learning can be Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long and Short Time Memory (Long).
  • DNN Deep Neural Networks
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • Long Long and Short Time Memory
  • Short-Term Memory (LSTM) one or more combinations in a network such as a neural network, which is not limited in this application; the selection of a specific network is determined according to the application requirements.
  • the speech quality evaluation model can be described by different types of models, such as regression models or classification models. Under different types, the corresponding inputs and outputs of the model can be adjusted accordingly.
  • the evaluation features of the degraded speech data to be processed and the evaluation scores of the degraded speech data to be processed are directly input and output as models.
  • the evaluation feature of the degraded speech data to be processed obtained above is directly input as a model, and the model output is an evaluation category obtained by quantifying the evaluation score of the degraded speech data to be processed.
  • the evaluation score of the degraded speech data can be quantized by a fixed step size or a non-fixed step size. If a fixed step size is used, the fixed step size is 0.2, and the evaluation scores of all degraded speech data are quantized and quantified.
  • the category of post-degraded speech data taking the MOS score as an example, when quantizing with a fixed step size of 0.2, 20 evaluation categories can be obtained after 1 to 5 components;
  • the quantization step size of the evaluation score in each range of the degraded speech data can be determined according to the application requirement. For example, in the lower range of the evaluation score, the large step size can be used, and the evaluation score is in a higher range, and the small step can be adopted.
  • Step size quantization taking MOS score as an example, such as 1 to 3 is divided into lower score range, you can use large step size quantization, such as 0.5; 3 to 5 is divided into higher score range, you can use small step size quantization, such as 0.2 After quantification, a total of 14 evaluation categories can be obtained;
  • the evaluation score may be quantified by other methods, and the evaluation score is divided into a plurality of evaluation categories.
  • the quantified evaluation categories are poor, poor, general, good, and good, and the present application does not limited.
  • S26 Receive degraded voice data after passing through the communication network.
  • the extraction method of the evaluation feature is the same as the extraction method in the training process, and will not be described in detail here.
  • S28 Perform quality evaluation on the degraded voice data according to the evaluation feature and the constructed voice quality evaluation model.
  • the evaluation feature of the current degraded speech data is taken as an input of the speech quality evaluation model, and the output of the model is taken as a quality evaluation result of the current degraded speech data.
  • the quality evaluation result is an evaluation score
  • the speech quality evaluation model is described by a classification model, the quality evaluation result is an evaluation category.
  • the voice quality evaluation model to evaluate the voice data for quality evaluation, only single-ended voice data is needed in the voice quality evaluation, and the problem caused by relying on the double-ended voice data is avoided. Use limited issues to extend the scope of your application.
  • the excellent performance of the deep learning method can be utilized, so that the speech quality evaluation model is more accurate, and the speech quality evaluation result is more accurate.
  • the deep learning can be combined with the voice quality evaluation in the communication field to provide a new solution for the voice quality evaluation in the communication field.
  • FIG. 3 is a schematic structural diagram of a voice quality evaluation apparatus according to an embodiment of the present application.
  • the apparatus 30 of this embodiment includes: a receiving module 31, an extracting module 32, and an evaluation module 33.
  • the receiving module 31 is configured to receive voice data to be evaluated
  • the extracting module 32 is configured to extract the evaluation feature of the voice data to be evaluated
  • the evaluation module 33 is configured to perform quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, where the speech quality evaluation model is used to indicate single-ended speech The relationship between the evaluation characteristics of the data and the quality information of the single-ended voice data.
  • the voice data to be evaluated includes: degraded voice data after passing through the communication network.
  • the apparatus 30 of the present embodiment further includes: a building module 34 for constructing a voice quality evaluation model, the building module 34 comprising:
  • a first obtaining submodule 341, configured to acquire voice data, where the voice data includes a clean voice number And degraded speech data;
  • the second obtaining sub-module 342 is configured to acquire clean voice data to be processed according to the clean voice data, and acquire the degraded voice data to be processed according to the degraded voice data;
  • a calculation sub-module 343, configured to calculate an evaluation score of the degraded voice data to be processed according to the clean voice data to be processed and the degraded voice data to be processed;
  • An extraction submodule 344 configured to extract an evaluation feature of the degraded speech data to be processed
  • the training sub-module 345 is configured to perform training according to the evaluation feature of the degraded speech data to be processed and the evaluation score of the degraded speech data to be processed, and construct a speech quality evaluation model.
  • the voice quality evaluation model is constructed by training in a deep learning manner.
  • the training sub-module 345 is specifically configured to:
  • the evaluation features of the degraded speech data to be processed and the evaluation scores of the degraded speech data to be processed are respectively used as model input and model output, and training model parameters are Build a voice quality assessment model; or,
  • the evaluation feature of the degraded speech data to be processed is input as a model, and the evaluation score of the degraded speech data to be processed is quantified to obtain an evaluation category, and the evaluation is performed.
  • the score category is used as the model output, the model parameters are trained, and the speech quality evaluation model is constructed.
  • the second obtaining submodule 342 is configured to obtain according to the clean voice data.
  • Clean voice data to be processed including:
  • the obtained clean voice data is directly used as clean voice data to be processed; or
  • An effective voice segment of the obtained clean voice data is extracted, and an effective voice segment of the clean voice data is used as clean voice data to be processed.
  • the second obtaining submodule 342 is configured to obtain the degraded voice data to be processed according to the degraded voice data, including:
  • the valid speech segments of the acquired degraded speech data are extracted, and the effective speech segments of the degraded speech data are clustered, and the effective speech segments of the degraded speech data corresponding to the cluster center are taken as the degraded speech data to be processed.
  • Embodiments of the present application also provide an apparatus comprising: one or more processors; a memory for storing one or more programs; and when the one or more programs are executed by the one or more processors And causing, by the one or more processors, a method of: receiving voice data to be evaluated; extracting an evaluation feature of the voice data to be evaluated; and performing an evaluation feature of the voice data to be evaluated and a voice quality evaluation model that has been constructed, Performing quality evaluation on the voice data to be evaluated, wherein the voice quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended voice data and the quality information of the single-ended voice data.
  • Embodiments of the present application also provide a non-transitory computer readable storage medium that causes the one or more processes when one or more programs in the storage medium are executed by one or more processors of the device And performing the following method: receiving voice data to be evaluated; extracting the evaluation feature of the voice data to be evaluated; performing quality on the voice data to be evaluated according to the evaluation feature of the voice data to be evaluated and the constructed voice quality evaluation model The evaluation, wherein the speech quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended speech data and the quality information of the single-ended speech data.
  • the embodiment of the present application further provides a computer program product, when the computer program product is executed by one or more processors in the device, causing the one or more processors to perform the following method: receiving voice data to be evaluated; extracting The evaluation feature of the to-be-evaluated speech data; performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used Indicates the relationship between the evaluation characteristics of the single-ended voice data and the quality information of the single-ended voice data.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • the sub-steps may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiments.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请提出一种语音质量评价方法和装置,该语音质量评价方法包括:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。该方法能够扩展语音质量评价的应用范围。

Description

语音质量评价方法和装置
相关申请的交叉引用
本申请要求科大讯飞股份有限公司于2016年10月12日提交的、发明名称为“语音质量评价方法和装置”的、中国专利申请号“201610892176.1”的优先权。
技术领域
本申请涉及通信技术领域,尤其涉及一种语音质量评价方法和装置。
背景技术
随着技术的不断发展,通信在人们生活中的地位越来越重要,如采用通信网络进行语音数据的传输。语音质量是评价通信网络质量的一个重要因素。为了达到评价语音质量的目的,开发有效的语音质量评价算法是必须的。
相关技术中,通信网络中的语音质量评价算法包括语音质量的感知评价(Perceptual Evaluation of Speech Quality,PESQ)算法和感知客观语音质量评价(Perceptual Objective Listening Quality Analysis,POLQA)算法。这些算法在实现时需要获取输入语音数据和输出语音数据,输入语音数据一般为干净语音数据,输出语音数据一般为经过通信网络后的退化语音数据,通过对 输入语音数据和输出语音数据进行分析,对输出语音数据进行质量评价。输入语音数据一般是采用运营商的路测车采集的,但是,在小区楼层或商场等室内条件下,无法通过路测车进行采集,因此无法获取到输入语音数据,也就不能通过输入语音数据进行语音质量评价,使得上述基于输入语音数据和输出语音数据对输出语音数据进行语音质量评价的算法存在应用局限性。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请的一个目的在于提出一种语音质量评价方法,该方法可以在对待评价语音数据进行语音质量评价时,不需要相应的输入语音数据,实现仅依赖单端语音数据的语音质量评价,从而扩展应用范围。
本申请的另一个目的在于提出一种语音质量评价装置。
为达到上述目的,本申请第一方面实施例提出的语音质量评价方法,包括:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
本申请第一方面实施例提出的语音质量评价方法,通过采用语音质量评价模型对待评价语音数据进行质量评价,可以在语音质量评价时仅需要单端语音 数据,避免依赖双端语音数据造成的应用受限问题,从而扩展应用范围。
为达到上述目的,本申请第二方面实施例提出的语音质量评价装置,包括:接收模块,用于接收待评价语音数据;提取模块,用于提取所述待评价语音数据的评价特征;评价模块,用于根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
本申请第二方面实施例提出的语音质量评价装置,通过采用语音质量评价模型对待评价语音数据进行质量评价,可以在语音质量评价时仅需要单端语音数据,避免依赖双端语音数据造成的应用受限问题,从而扩展应用范围。
本申请实施例还提出了一种设备,包括:一个或多个处理器;用于存储一个或多个程序的存储器;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如本申请第一方面实施例任一项所述的方法。
本申请实施例还提出了一种非临时性计算机可读存储介质,当所述存储介质中的一个或多个程序由设备的一个或多个处理器执行时,使得所述一个或多个处理器执行如本申请第一方面实施例任一项所述的方法。
本申请实施例还提出了计算机程序产品,当所述计算机程序产品被设备中的一个或多个处理器执行时,使得所述一个或多个处理器执行如本申请第一方 面实施例任一项所述的方法。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是本申请一个实施例提出的语音质量评价方法的流程示意图;
图2是本申请另一个实施例提出的语音质量评价方法的流程示意图;
图3是本申请一个实施例提出的语音质量评价装置的结构示意图;
图4是本申请另一个实施例提出的语音质量评价装置的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的模块或具有相同或类似功能的模块。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。相反,本申请的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。
为了解决PESQ算法的问题以及更好的适应4G/LTE时代的语音质量评价需求,ITU-T于2006年开始了POLQA算法的开发工作,2011年初正式发布为 ITU-T P.863标准。主要特点可以覆盖最新的语音编码和网络传输技术,在用于3G,4G/LTE,VoIP网络时具有更高的准确性支持超宽带(50Hz~14KHz)语音传输,高质量语音传输。因此,POLQA算法是目前评价通信网络的语音质量的通常被选择的算法。
深度学习的概念源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。目前深度学习的应用领域主要包括:计算机视觉、语音识别的声学模型训练、机器翻译和语义挖掘等自然语言处理领域。
由于深度学习是新出现的技术,依然在不断发展中,因此目前比较成功的应用领域仅如上所示的有限领域。依据发明人所知,在通信领域,特别是通信领域的语音质量评价并未应用。
本申请的发明人作为通信领域的技术人员,在需要完成语音质量评价时,之前也通常采用POLQA算法,但是,发明人发现POLQA算法需要双端的语音数据,即在评价输出语音数据的语音质量时,不仅需要输出语音数据还需要输入语音数据,由于在一些情况下难以获取输入语音数据,就会使得POLQA算法的应用受限。为了避免应用受限问题,需要提出新的解决方案。发明人通过进一步分析发现,深度学习构建的模型具有优良的性能,因此可以将深度学习引入到语音质量评价算法中。进一步的,为了避免双端语音数据存在的应用 局限性问题,在通过深度学习构建模型时,可以仅采用单端语音数据作为样本进行训练,从而在采用已构建的模型进行语音质量评价时,可以仅需要待评价语音数据这一单端语音数据。
因此,本申请的主要思路是将深度学习引入到语音质量评价,特别是通信领域的语音质量评价中。将为通信领域的语音质量评价提供仅依赖单端语音数据的新的解决方案,并且在仅依赖单端语音数据时,采用深度学习方式构建模型,可以保证模型的优良性能,从而解决受限更少性能更优的语音质量评价的技术问题。进一步的,需要说明的是,虽然上述对本申请的主要思路进行了说明,但是,具体的技术方案不限于上述的主要思路,还可以与其他特征相互结合,这些不同技术特征之间的结合依然属于本申请的保护范围。
进一步的,需要说明的是,虽然上述给出了主要解决的技术问题,但是,本申请并不限于仅能解决上述技术问题,应用本申请还可以解决的其他技术问题依然属于本申请的保护范围。
进一步的,需要说明的是,虽然上述给出了本申请的主要思路,以及后续实施例会对一些特别点进行说明。但是,本申请的创新点并不限于上述的主要思路及特别点所涉及的内容,并不排除本申请中一些并未特殊说明的内容依然可能会包含本申请的创新点。
可以理解的是,虽然上述进行了一些说明,但依然不排除其他可能方案,因此,与后续本申请给出的实施例相同、相似、等同等情况的技术方案依然属 于本申请的保护范围。
下面将结合具体实施例对本申请的技术方案进行说明。
图1是本申请一个实施例提出的语音质量评价方法的流程示意图。
如图1所示,本实施例的方法包括:
S11:接收待评价语音数据。
以通信领域为例,待评价语音数据可以具体是指通信网络的输出语音数据,即输入语音数据经过通信网络后的退化语音数据。输入语音数据一般是指干净语音数据或者称为原始语音数据,而退化语音数据一般是指相对于原始语音数据存在清楚度退化、存在延迟、杂音等一项或多项内容的质量退化的语音数据。
S12:提取所述待评价语音数据的评价特征。
所述评价特征与语音质量模型构建时,对退化语音数据提取的评价特征相同,具体可以根据应用需求确定。
一般来讲,评价特征是指从人耳听觉感知的角度描述语音数据的特征,具体内容可以参见后续描述。
S13:根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
其中,语音质量评价模型可以是在需要进行语音质量评价之前预先构建的,例如,通过离线方式先构建出语音质量评价模型,在需要语音质量评价时,就 可以直接采用预先构建的语音质量评价模型。当然,也并不排除语音质量评价模型是在线构建的,比如在需要进行语音质量评价时在线构建的。具体构建内容可以参见后续描述。
语音质量评价模型的输入和输出分别是单端语音数据的评价特征和质量信息,因此,在提取出待评价语音数据的评价特征后,可以将该评价特征作为语音质量评价模型的输入,从而得到的输出就是待评价语音数据的质量信息,实现语音质量评价。
进一步的,语音质量评价模型可以用回归模型或分类模型描述,在不同描述情况下,上述的质量信息的具体内容可以是不同的。例如,如果语音质量评价模型采用回归模型描述,则得到的质量信息是具体的评价得分,如1-5分中的一个得分;如果语音质量评价模型采用分类模型描述,则得到的质量信息是评价类别,如差、较差、一般、好、较好中的一个类别。
进一步的,一些实施例中,为了提高语音质量评价的准确度,还可以对S13得到的质量评价结果进行规整。以质量评价结果是评价得分为例,在规整时,可以将S13得到的评价得分直接作为最终的评价得分,或者,也可以结合通信网络的丢包、抖动、时延等相关参数对S13得到的评价得分进行规整得到最终的评价得分。具体的结合网络参数进行规整的算法可以设置,在此不再详述,比如可以在S13得到的评价得分的基础上乘以一个系数作为最终的评价得分,该系数与通信网络的上述参数相关。
本实施例中,通过采用语音质量评价模型对待评价语音数据进行质量评价,可以在语音质量评价时仅需要单端语音数据,避免依赖双端语音数据造成的应用受限问题,从而扩展应用范围。
图2是本申请另一个实施例提出的语音质量评价方法的流程示意图。
本实施例以待评价语音数据是经过通信网络后的退化语音数据为例。在构建语音质量评价模型时以深度学习方式构建为例。
参见图2,本实施例的方法包括:
S21:获取语音数据,所述语音数据包括干净语音数据和退化语音数据。
其中,可以采用收集和/或从已有数据中直接获取的方式,获取到语音数据。为了提高构建的语音质量评价模型的准确度,此处应该获取到尽可能多的语音数据。
以收集方式为例,具体收集语音数据时,可以采用模拟通信的方式,分别收集到通话时的干净语音数据和经过通信网络后的退化语音数据,具体地先从高保真录音室采集大量干净语音数据,如2000小时的干净语音数据;然后利用多部手机模拟通话方式,即使用一部手机拨打电话播放所述干净语音数据,另一部手机接听这些干净语音数据,通过在通信网络上不同的接口处,还原发送的数据包,得到经过通信网络后的退化语音数据。
当然,也可直接收集真实的网络通话语音数据,分别获取相应干净语音数据和退化语音数据,具体获取方式本申请不作限定。
进一步的,在收集语音数据时,干净语音数据和退化语音数据可以分开收集,从而可以直接分别获取到干净语音数据和退化语音数据。或者,在收集语音数据时,干净语音数据和退化语音数据可以一起收集,此时可以分别对干净语音数据和退化语音数据进行标记,以区分干净语音数据和退化语音数据,如使用1表示干净语音数据,0表示退化语音数据,此时,可以根据标记分别获取到干净语音数据和退化语音数据。
S22:根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据。
可以包括:
将获取的退化语音数据直接作为待处理的退化语音数据;或者,
提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,
对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,
提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
具体的,在获取到干净语音数据和退化语音数据后,可以将获取的干净语音数据和退化语音数据直接分别作为待处理的干净语音数据和待处理的退化语 音数据。进一步的,还可以在获取到干净语音数据和退化语音数据后,分别进行有效语音段的提取,将提取得到的干净语音数据的有效语音段作为待处理的干净语音数据,将退化语音数据的有效语音段作为待处理的退化语音数据。具体的提取有效语音段的方式不限定,例如采用语音活动检测(Voice Activity Detection,VAD)方式。通过仅处理有效语音段,可以减少运算量和复杂度。
进一步的,在获取待处理的退化语音数据时,可以将语音数据中包括的所有退化语音数据或所有退化语音数据的有效语音段作为待处理的退化语音数据,或者,也可以选择部分退化语音数据或其有效语音段作为待处理的退化语音数据。在选择时,可以采用聚类方式,对所有的退化语音数据或其有效语音段进行聚类,将聚类中心对应的退化语音数据或其有效语音段作为待处理的退化语音数据。
例如,在聚类时,提取退化语音数据的有效语音段的ivector特征,使用k-means方法对提取的ivector特征进行聚类,得到k个聚类中心,将每个聚类中心对应的退化语音数据或其有效语音段作为待处理的退化语音数据。通过聚类以及只选择聚类中心对应的退化语音数据进行处理,可以减少数据量,提高运算效率。
S23:根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分。
以待处理的数据是有效语音段为例,在得到干净语音数据的有效语音段和 退化语音数据的有效语音段后,可以根据干净语音数据的有效语音段,对退化语音数据每个有效语音段进行逐帧分析,计算得到退化语音数据的有效语音段的评价得分。计算方式不限定,例如,所述评价得分为语音数据的平均意见分(Mean Opinion Score,MOS)得分,具体计算方法可以与现有技术相同,如使用POLQA算法或PESQ算法计算得到,在此不再详述。
S24:提取待处理的退化语音数据的评价特征。
所述评价特征从人耳听觉感知的角度描述语音数据,具体提取时,先提取待处理的退化语音数据的时域特征,如语音数据的短时平均能量、语音的分段底噪、语音的短时波形冲击或者震荡、基频特征及基频的差分特征,如基频特征的一阶、二阶差分值等;然后再提取待处理的退化语音数据的频域特征,所述频域特征如FilterBank特征、线性预测编码(linear predictive coding,LPC)特征等;所述频域特征提取时,采用能够描述人听觉的耳蜗形状的滤波器,从而使得提取的频域特征能够从人耳听觉感知角度描述语音数据;为了更好的描述退化语音数据,还可以提取每种频域特征的均值、方差、最大值、最小值、及差分特征,如一阶、二阶差分值等;具体提取哪种评价特征可以根据应用需求及语音数据的退化情况确定,具体本申请不作限定。
S25:根据所述提取待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
在训练时,可以具体采用深度学习方式训练得到语音质量评价模型的参数, 构建出语音质量评价模型。
深度学习方式采用的网络拓扑结构可以为深度神经网络((Deep Neural Networks,DNN)、卷积神经网络(Convolutional Neural Networks,CNN)、循环神经网络(Recurrent Neural Networks,RNN)、长短时记忆(Long Short-Term Memory,LSTM)神经网络等网络中的一种或多种组合,具体本申请不作限定;具体网络的选择根据应用需求确定。在确定模型的输入和输出后,参数训练过程与现有技术相同,在此不再详述。
进一步的,语音质量评价模型可以用不同类型的模型描述,如可以采用回归模型或分类模型描述。在不同类型下,模型对应的输入和输出可以相应调整。
具体的,采用回归模型描述语音质量评价模型时,直接将上述获取的待处理的退化语音数据的评价特征和待处理的退化语音数据的评价得分分别作为模型输入和输出。
采用分类模型描述语音质量评价模型时,直接将上述获取的待处理的退化语音数据的评价特征作为模型输入,而模型输出是对待处理的退化语音数据的评价得分进行量化后,得到的评价类别。
具体量化时,可以采用固定步长或非固定步长对退化语音数据的评价得分进行量化,如果采用固定步长时,固定步长为0.2,对所有退化语音数据的评价得分进行量化,得到量化后退化语音数据的类别,以MOS得分为例,以固定步长0.2进行量化时,1分到5分量化后可以得到20个评价类别;如果采用非 固定步长时,可以根据应用需求确定退化语音数据的每个范围内评价得分的量化步长,如评价得分较低范围内,可以采用大步长量化,评价得分较高范围内,可以采用小步长量化;以MOS得分为例,如1到3分为较低得分范围,可以采用大步长量化,如0.5;3到5分为较高得分范围,可以采用小步长量化,如0.2,量化后共可以得到14个评价类别;
当然,也可以采用其它方法对所述评价得分进行量化,将评价得分划分为多个评价类别,如量化后的所述评价类别为差、较差、一般、好、较好,具体本申请不作限定。
S26:接收经过通信网络后的退化语音数据。
S27:提取所述退化语音数据的评价特征。
评价特征的提取方式与训练过程中的提取方式相同,在此不再详述。
S28:根据所述评价特征和所述已构建的语音质量评价模型,对所述退化语音数据进行质量评价。
具体的,将当前的退化语音数据的评价特征作为语音质量评价模型的输入,将模型的输出作为对当前的退化语音数据的质量评价结果。其中,如果语音质量评价模型采用回归模型描述,则质量评价结果是评价得分,如果语音质量评价模型采用分类模型描述,则质量评价结果是评价类别。
本实施例中,通过采用语音质量评价模型对待评价语音数据进行质量评价,可以在语音质量评价时仅需要单端语音数据,避免依赖双端语音数据造成的应 用受限问题,从而扩展应用范围。进一步的,通过采用深度学习方式进行训练,可以利用深度学习方式的优良性能,使得语音质量评价模型更准确,进而语音质量评价结果更准确。进一步的,通过对通信领域的语音数据进行质量评价,可以将深度学习与通信领域的语音质量评价相结合,为通信领域的语音质量评价提供新的解决思路。
图3是本申请一个实施例提出的语音质量评价装置的结构示意图。
如图3所示,本实施例的装置30包括:接收模块31、提取模块32和评价模块33。
接收模块31,用于接收待评价语音数据;
提取模块32,用于提取所述待评价语音数据的评价特征;
评价模块33,用于根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
一些实施例中,所述待评价语音数据包括:经过通信网络后的退化语音数据。
一些实施例中,参见图4,本实施例的装置30还包括:用于构建语音质量评价模型的构建模块34,所述构建模块34包括:
第一获取子模块341,用于获取语音数据,所述语音数据包括干净语音数 据和退化语音数据;
第二获取子模块342,用于根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据;
计算子模块343,用于根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分;
提取子模块344,用于提取待处理的退化语音数据的评价特征;
训练子模块345,用于根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
一些实施例中,所述语音质量评价模型是采用深度学习方式进行训练后构建的。
一些实施例中,所述训练子模块345具体用于:
如果采用回归模型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分,分别作为模型输入和模型输出,训练模型参数,构建语音质量评价模型;或者,
如果采用分类型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征作为模型输入,对所述待处理的退化语音数据的评价得分进行量化,得到评价类别,将评价得分类别作为模型输出,训练模型参数,构建语音质量评价模型。
一些实施例中,所述第二获取子模块342用于根据所述干净语音数据获取 待处理的干净语音数据,包括:
将获取的干净语音数据直接作为待处理的干净语音数据;或者,
提取获取的干净语音数据的有效语音段,将干净语音数据的有效语音段作为待处理的干净语音数据。
一些实施例中,所述第二获取子模块342用于根据所述退化语音数据获取待处理的退化语音数据,包括:
将获取的退化语音数据直接作为待处理的退化语音数据;或者,
提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,
对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,
提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
可以理解的是,本实施例的装置与上述方法实施例对应,具体内容可以参见方法实施例的相关描述,在此不再详细说明。
本实施例中,通过采用语音质量评价模型对待评价语音数据进行质量评价,可以在语音质量评价时仅需要单端语音数据,避免依赖双端语音数据造成的应用受限问题,从而扩展应用范围。
本申请实施例还提出了一种设备,包括:一个或多个处理器;用于存储一个或多个程序的存储器;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如下方法:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
本申请实施例还提出了一种非临时性计算机可读存储介质,当所述存储介质中的一个或多个程序由设备的一个或多个处理器执行时,使得所述一个或多个处理器执行如下方法:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
本申请实施例还提出了计算机程序产品,当所述计算机程序产品被设备中的一个或多个处理器执行时,使得所述一个或多个处理器执行如下方法:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
可以理解的是,上述各实施例中相同或相似部分可以相互参考,在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是指至少两个。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部 分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (17)

  1. 一种语音质量评价方法,其特征在于,包括:
    接收待评价语音数据;
    提取所述待评价语音数据的评价特征;
    根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
  2. 根据权利要求1所述的方法,其特征在于,所述待评价语音数据包括:经过通信网络后的退化语音数据。
  3. 根据权利要求2所述的方法,其特征在于,还包括:构建语音质量评价模型,所述构建语音质量评价模型包括:
    获取语音数据,所述语音数据包括干净语音数据和退化语音数据;
    根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据;
    根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分;
    提取待处理的退化语音数据的评价特征;
    根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述语音质量评价模型是采用深度学习方式进行训练后构建的。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型,包括:
    如果采用回归模型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分,分别作为模型输入和模型输出,训练模型参数,构建语音质量评价模型;或者,
    如果采用分类型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征作为模型输入,对所述待处理的退化语音数据的评价得分进行量化,得到评价类别,将评价得分类别作为模型输出,训练模型参数,构建语音质量评价模型。
  6. 根据权利要求3所述的方法,其特征在于,所述根据所述干净语音数据获取待处理的干净语音数据,包括:
    将获取的干净语音数据直接作为待处理的干净语音数据;或者,
    提取获取的干净语音数据的有效语音段,将干净语音数据的有效语音段作为待处理的干净语音数据。
  7. 根据权利要求3所述的方法,其特征在于,所述根据所述退化语音数据获取待处理的退化语音数据,包括:
    将获取的退化语音数据直接作为待处理的退化语音数据;或者,
    提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,
    对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,
    提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
  8. 一种语音质量评价装置,其特征在于,包括:
    接收模块,用于接收待评价语音数据;
    提取模块,用于提取所述待评价语音数据的评价特征;
    评价模块,用于根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
  9. 根据权利要求8所述的装置,其特征在于,所述待评价语音数据包括:经过通信网络后的退化语音数据。
  10. 根据权利要求9所述的装置,其特征在于,还包括:用于构建语音质量评价模型的构建模块,所述构建模块包括:
    第一获取子模块,用于获取语音数据,所述语音数据包括干净语音数据和退化语音数据;
    第二获取子模块,用于根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据;
    计算子模块,用于根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分;
    提取子模块,用于提取待处理的退化语音数据的评价特征;
    训练子模块,用于根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
  11. 根据权利要求8-10任一项所述的装置,其特征在于,所述语音质量评价模型是采用深度学习方式进行训练后构建的。
  12. 根据权利要求11所述的装置,其特征在于,所述训练子模块具体用于:
    如果采用回归模型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分,分别作为模型输入和模型输出,训练模型参数,构建语音质量评价模型;或者,
    如果采用分类型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征作为模型输入,对所述待处理的退化语音数据的评价得分进行量化,得到评价类别,将评价得分类别作为模型输出,训练模型参数,构建语音质量评价模型。
  13. 根据权利要求10所述的装置,其特征在于,所述第二获取子模块用于根据所述干净语音数据获取待处理的干净语音数据,包括:
    将获取的干净语音数据直接作为待处理的干净语音数据;或者,
    提取获取的干净语音数据的有效语音段,将干净语音数据的有效语音段作为待处理的干净语音数据。
  14. 根据权利要求10所述的装置,其特征在于,所述第二获取子模块用于根据所述退化语音数据获取待处理的退化语音数据,包括:
    将获取的退化语音数据直接作为待处理的退化语音数据;或者,
    提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,
    对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,
    提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
  15. 一种设备,其特征在于,包括:
    一个或多个处理器;
    用于存储一个或多个程序的存储器;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或 多个处理器执行如权利要求1-7任一项所述的方法。
  16. 一种非临时性计算机可读存储介质,其特征在于,当所述存储介质中的一个或多个程序由设备的一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任一项所述的方法。
  17. 一种计算机程序产品,其特征在于,当所述计算机程序产品被设备中的一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2016/111050 2016-10-12 2016-12-20 语音质量评价方法和装置 WO2018068396A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP16918904.0A EP3528250B1 (en) 2016-10-12 2016-12-20 Voice quality evaluation method and apparatus
JP2019500365A JP2019531494A (ja) 2016-10-12 2016-12-20 音声品質評価方法及び装置
KR1020197009232A KR102262686B1 (ko) 2016-10-12 2016-12-20 음성 품질 평가 방법 및 음성 품질 평가 장치
US16/280,705 US10964337B2 (en) 2016-10-12 2019-02-20 Method, device, and storage medium for evaluating speech quality

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610892176.1 2016-10-12
CN201610892176.1A CN106531190B (zh) 2016-10-12 2016-10-12 语音质量评价方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/280,705 Continuation US10964337B2 (en) 2016-10-12 2019-02-20 Method, device, and storage medium for evaluating speech quality

Publications (1)

Publication Number Publication Date
WO2018068396A1 true WO2018068396A1 (zh) 2018-04-19

Family

ID=58331645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111050 WO2018068396A1 (zh) 2016-10-12 2016-12-20 语音质量评价方法和装置

Country Status (6)

Country Link
US (1) US10964337B2 (zh)
EP (1) EP3528250B1 (zh)
JP (1) JP2019531494A (zh)
KR (1) KR102262686B1 (zh)
CN (1) CN106531190B (zh)
WO (1) WO2018068396A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10964337B2 (en) 2016-10-12 2021-03-30 Iflytek Co., Ltd. Method, device, and storage medium for evaluating speech quality
JP2022514878A (ja) * 2018-12-21 2022-02-16 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 音質の推定および制御を使用した音源分離のための装置および方法

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108346434B (zh) * 2017-01-24 2020-12-22 中国移动通信集团安徽有限公司 一种语音质量评估的方法和装置
CN107358966B (zh) * 2017-06-27 2020-05-12 北京理工大学 基于深度学习语音增强的无参考语音质量客观评估方法
CN109979486B (zh) * 2017-12-28 2021-07-09 中国移动通信集团北京有限公司 一种语音质量评估方法及装置
CN108322346B (zh) * 2018-02-09 2021-02-02 山西大学 一种基于机器学习的语音质量评价方法
US10777217B2 (en) * 2018-02-27 2020-09-15 At&T Intellectual Property I, L.P. Performance sensitive audio signal selection
CN108304890B (zh) * 2018-03-16 2021-06-08 科大讯飞股份有限公司 一种分类模型的生成方法及装置
CN109308913A (zh) * 2018-08-02 2019-02-05 平安科技(深圳)有限公司 音乐质量评价方法、装置、计算机设备及存储介质
CN109065072B (zh) * 2018-09-30 2019-12-17 中国科学院声学研究所 一种基于深度神经网络的语音质量客观评价方法
CN109658920B (zh) * 2018-12-18 2020-10-09 百度在线网络技术(北京)有限公司 用于生成模型的方法和装置
CN111383657A (zh) * 2018-12-27 2020-07-07 中国移动通信集团辽宁有限公司 语音质量评估方法、装置、设备及介质
CN109830247A (zh) * 2019-03-22 2019-05-31 北京百度网讯科技有限公司 用于测试通话质量的方法和装置
CN110189771A (zh) 2019-05-31 2019-08-30 腾讯音乐娱乐科技(深圳)有限公司 同源音频的音质检测方法、装置及存储介质
US11322173B2 (en) * 2019-06-21 2022-05-03 Rohde & Schwarz Gmbh & Co. Kg Evaluation of speech quality in audio or video signals
CN110164443B (zh) * 2019-06-28 2021-09-14 联想(北京)有限公司 用于电子设备的语音处理方法、装置以及电子设备
CN110334240B (zh) * 2019-07-08 2021-10-22 联想(北京)有限公司 信息处理方法、系统及第一设备、第二设备
CN110503981A (zh) * 2019-08-26 2019-11-26 苏州科达科技股份有限公司 无参考音频客观质量评价方法、装置及存储介质
EP3866165B1 (en) * 2020-02-14 2022-08-17 System One Noc & Development Solutions, S.A. Method for enhancing telephone speech signals based on convolutional neural networks
US20210407493A1 (en) * 2020-06-30 2021-12-30 Plantronics, Inc. Audio Anomaly Detection in a Speech Signal
CN111537056A (zh) * 2020-07-08 2020-08-14 浙江浙能天然气运行有限公司 基于svm与时频域特征的管道沿线第三方施工动态预警方法
CN114187921A (zh) * 2020-09-15 2022-03-15 华为技术有限公司 语音质量评价方法和装置
CN116997962A (zh) * 2020-11-30 2023-11-03 杜比国际公司 基于卷积神经网络的鲁棒侵入式感知音频质量评估
CN113409820B (zh) * 2021-06-09 2022-03-15 合肥群音信息服务有限公司 一种基于语音数据的质量评价方法
CN113411456B (zh) * 2021-06-29 2023-05-02 中国人民解放军63892部队 一种基于语音识别的话音质量评估方法及装置
CN114358089A (zh) * 2022-01-24 2022-04-15 北京蕴岚科技有限公司 基于脑电的语音评估模型的训练方法、装置及电子设备
CN115175233B (zh) * 2022-07-06 2024-09-10 中国联合网络通信集团有限公司 语音质量评估方法、装置、电子设备及存储介质
CN116092482B (zh) 2023-04-12 2023-06-20 中国民用航空飞行学院 一套基于自注意力的实时管制语音质量计量方法及系统
CN117612566B (zh) * 2023-11-16 2024-05-28 书行科技(北京)有限公司 音频质量评估方法及相关产品

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361894A (zh) * 2014-11-27 2015-02-18 湖南省计量检测研究院 一种基于输出的客观语音质量评估的方法
US20150348571A1 (en) * 2014-05-29 2015-12-03 Nec Corporation Speech data processing device, speech data processing method, and speech data processing program
CN105282347A (zh) * 2014-07-22 2016-01-27 中国移动通信集团公司 语音质量的评估方法及装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715372A (en) * 1995-01-10 1998-02-03 Lucent Technologies Inc. Method and apparatus for characterizing an input signal
WO1997005730A1 (en) * 1995-07-27 1997-02-13 British Telecommunications Public Limited Company Assessment of signal quality
US6446038B1 (en) * 1996-04-01 2002-09-03 Qwest Communications International, Inc. Method and system for objectively evaluating speech
JPH1195795A (ja) * 1997-09-16 1999-04-09 Nippon Telegr & Teleph Corp <Ntt> 音声品質評価方法および記録媒体
EP1187100A1 (en) * 2000-09-06 2002-03-13 Koninklijke KPN N.V. A method and a device for objective speech quality assessment without reference signal
EP1206104B1 (en) * 2000-11-09 2006-07-19 Koninklijke KPN N.V. Measuring a talking quality of a telephone link in a telecommunications network
US7327985B2 (en) * 2003-01-21 2008-02-05 Telefonaktiebolaget Lm Ericsson (Publ) Mapping objective voice quality metrics to a MOS domain for field measurements
US8305913B2 (en) * 2005-06-15 2012-11-06 Nortel Networks Limited Method and apparatus for non-intrusive single-ended voice quality assessment in VoIP
US7856355B2 (en) * 2005-07-05 2010-12-21 Alcatel-Lucent Usa Inc. Speech quality assessment method and system
US8560312B2 (en) * 2009-12-17 2013-10-15 Alcatel Lucent Method and apparatus for the detection of impulsive noise in transmitted speech signals for use in speech quality assessment
FR2973923A1 (fr) * 2011-04-11 2012-10-12 France Telecom Evaluation de la qualite vocale d'un signal de parole code
US9396738B2 (en) * 2013-05-31 2016-07-19 Sonus Networks, Inc. Methods and apparatus for signal quality analysis
CN104517613A (zh) * 2013-09-30 2015-04-15 华为技术有限公司 语音质量评估方法及装置
CN105224558A (zh) * 2014-06-16 2016-01-06 华为技术有限公司 语音业务的评价处理方法及装置
CN105702250B (zh) * 2016-01-06 2020-05-19 福建天晴数码有限公司 语音识别方法和装置
WO2018028767A1 (en) * 2016-08-09 2018-02-15 Huawei Technologies Co., Ltd. Devices and methods for evaluating speech quality
CN106531190B (zh) 2016-10-12 2020-05-05 科大讯飞股份有限公司 语音质量评价方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150348571A1 (en) * 2014-05-29 2015-12-03 Nec Corporation Speech data processing device, speech data processing method, and speech data processing program
CN105282347A (zh) * 2014-07-22 2016-01-27 中国移动通信集团公司 语音质量的评估方法及装置
CN104361894A (zh) * 2014-11-27 2015-02-18 湖南省计量检测研究院 一种基于输出的客观语音质量评估的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3528250A4 *
YIN, WEI: "Chapter VII Non-intrusive speech quality assessment method based on NLPC coefficient and GMM-HMM model", RESEARCH ON SPEECH ENHANCEMENT BASED ON SPEECH MODELING AND SPEECH QUALITY ASSESSMENT DISSERTATION SUBMITTED TO WUHAN UNIVERSITY FOR THE DOCTORAL DEGREE, 30 September 2009 (2009-09-30), pages 94 - 104, XP009513293 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10964337B2 (en) 2016-10-12 2021-03-30 Iflytek Co., Ltd. Method, device, and storage medium for evaluating speech quality
JP2022514878A (ja) * 2018-12-21 2022-02-16 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 音質の推定および制御を使用した音源分離のための装置および方法
JP7314279B2 (ja) 2018-12-21 2023-07-25 フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 音質の推定および制御を使用した音源分離のための装置および方法

Also Published As

Publication number Publication date
US20190180771A1 (en) 2019-06-13
KR20190045278A (ko) 2019-05-02
JP2019531494A (ja) 2019-10-31
CN106531190B (zh) 2020-05-05
EP3528250A1 (en) 2019-08-21
EP3528250B1 (en) 2022-05-25
KR102262686B1 (ko) 2021-06-09
US10964337B2 (en) 2021-03-30
CN106531190A (zh) 2017-03-22
EP3528250A4 (en) 2020-05-13

Similar Documents

Publication Publication Date Title
WO2018068396A1 (zh) 语音质量评价方法和装置
CN110600017B (zh) 语音处理模型的训练方法、语音识别方法、系统及装置
US20220230651A1 (en) Voice signal dereverberation processing method and apparatus, computer device and storage medium
CN107358966B (zh) 基于深度学习语音增强的无参考语音质量客观评估方法
JP6339187B2 (ja) 音声信号品質を測定するためのシステムおよび方法
US11190898B2 (en) Rendering scene-aware audio using neural network-based acoustic analysis
CN108346434B (zh) 一种语音质量评估的方法和装置
US20160189730A1 (en) Speech separation method and system
US8655656B2 (en) Method and system for assessing intelligibility of speech represented by a speech signal
US9997168B2 (en) Method and apparatus for signal extraction of audio signal
CN111868823B (zh) 一种声源分离方法、装置及设备
CN108039168B (zh) 声学模型优化方法及装置
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN107895571A (zh) 无损音频文件识别方法及装置
CN111862951A (zh) 语音端点检测方法及装置、存储介质、电子设备
CN112151055B (zh) 音频处理方法及装置
KR20190129805A (ko) 잡음 환경 분류 및 제거 기능을 갖는 보청기 및 그 방법
US11551707B2 (en) Speech processing method, information device, and computer program product
CN112967735A (zh) 语音质量检测模型的训练方法及语音质量的检测方法
WO2024055751A1 (zh) 音频数据处理方法、装置、设备、存储介质及程序产品
CN116403594B (zh) 基于噪声更新因子的语音增强方法和装置
CN113689886B (zh) 语音数据情感检测方法、装置、电子设备和存储介质
CN116705013B (zh) 语音唤醒词的检测方法、装置、存储介质和电子设备
US20240005908A1 (en) Acoustic environment profile estimation
CN113436644A (zh) 音质评估方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16918904

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019500365

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197009232

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016918904

Country of ref document: EP

Effective date: 20190513