WO2018068396A1 - 语音质量评价方法和装置 - Google Patents
语音质量评价方法和装置 Download PDFInfo
- Publication number
- WO2018068396A1 WO2018068396A1 PCT/CN2016/111050 CN2016111050W WO2018068396A1 WO 2018068396 A1 WO2018068396 A1 WO 2018068396A1 CN 2016111050 W CN2016111050 W CN 2016111050W WO 2018068396 A1 WO2018068396 A1 WO 2018068396A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- degraded
- data
- voice data
- speech
- processed
- Prior art date
Links
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 137
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000011156 evaluation Methods 0.000 claims abstract description 112
- 238000004891 communication Methods 0.000 claims description 29
- 238000012549 training Methods 0.000 claims description 21
- 238000013135 deep learning Methods 0.000 claims description 19
- 230000015654 memory Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000001303 quality assessment method Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 17
- 238000013145 classification model Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000013139 quantization Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/06—Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/69—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
Definitions
- the present application relates to the field of communications technologies, and in particular, to a voice quality evaluation method and apparatus.
- the speech quality evaluation algorithm in the communication network includes a Perceptual Evaluation of Speech Quality (PESQ) algorithm and a Perceptual Objective Listening Quality Analysis (POLQA) algorithm.
- PESQ Perceptual Evaluation of Speech Quality
- POLQA Perceptual Objective Listening Quality Analysis
- These algorithms need to obtain input voice data and output voice data when implemented.
- the input voice data is generally clean voice data, and the output voice data is generally degraded voice data after passing through the communication network.
- the voice data and the output voice data are input for analysis, and the quality of the output voice data is evaluated.
- the input voice data is generally collected by the operator's road test vehicle. However, under the indoor conditions such as the community floor or the shopping mall, it is impossible to collect by the road test vehicle, so the input voice data cannot be obtained, and the voice data cannot be input.
- the speech quality evaluation is performed such that the above algorithm for performing speech quality evaluation on the output speech data based on the input speech data and the output speech data has application limitations.
- the present application aims to solve at least one of the technical problems in the related art to some extent.
- an object of the present application is to provide a voice quality evaluation method, which can perform voice quality evaluation on the voice data to be evaluated, does not require corresponding input voice data, and realizes voice quality evaluation relying only on single-ended voice data. To expand the scope of application.
- Another object of the present application is to propose a voice quality evaluation apparatus.
- the voice quality evaluation method includes: receiving voice data to be evaluated; extracting an evaluation feature of the voice data to be evaluated; and evaluating characteristics according to the voice data to be evaluated. a constructed voice quality evaluation model for performing quality evaluation on the voice data to be evaluated, wherein the voice quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended voice data and the quality information of the single-ended voice data .
- the voice quality evaluation method proposed by the embodiment of the first aspect of the present application can perform quality evaluation by using the voice quality evaluation model to evaluate the voice data, and only needs single-ended voice in voice quality evaluation. Data, to avoid the application-constrained problem caused by relying on double-ended voice data, thus extending the scope of application.
- the voice quality evaluation apparatus includes: a receiving module, configured to receive voice data to be evaluated; an extracting module, configured to extract an evaluation feature of the voice data to be evaluated; and an evaluation module And for performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation of the single-ended speech data.
- a receiving module configured to receive voice data to be evaluated
- an extracting module configured to extract an evaluation feature of the voice data to be evaluated
- an evaluation module And for performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation of the single-ended speech data.
- the voice quality evaluation apparatus proposed by the second aspect of the present application can perform quality evaluation on the voice data to be evaluated by using the voice quality evaluation model, and only needs single-ended voice data in voice quality evaluation, thereby avoiding applications caused by double-ended voice data. Restricted issues to extend the scope of the application.
- Embodiments of the present application also provide an apparatus comprising: one or more processors; a memory for storing one or more programs; and when the one or more programs are executed by the one or more processors Having the one or more processors perform the method of any of the first aspect embodiments of the present application.
- Embodiments of the present application also provide a non-transitory computer readable storage medium that causes the one or more processes when one or more programs in the storage medium are executed by one or more processors of the device The method of any of the first aspect of the present application is performed.
- Embodiments of the present application also provide a computer program product that, when executed by one or more processors in a device, causes the one or more processors to perform a first party as in the present application The method of any of the embodiments.
- FIG. 1 is a schematic flow chart of a voice quality evaluation method according to an embodiment of the present application.
- FIG. 2 is a schematic flowchart of a voice quality evaluation method according to another embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a voice quality evaluation apparatus according to an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a voice quality evaluation apparatus according to another embodiment of the present application.
- ITU-T In order to solve the problem of PESQ algorithm and better adapt to the voice quality evaluation requirements of 4G/LTE era, ITU-T started the development of POLQA algorithm in 2006, and was officially released in early 2011. ITU-T P.863 standard. The main features can cover the latest voice coding and network transmission technology, and have higher accuracy for 3G, 4G/LTE, VoIP networks to support ultra-wideband (50Hz ⁇ 14KHz) voice transmission, high-quality voice transmission. Therefore, the POLQA algorithm is currently the generally chosen algorithm for evaluating the speech quality of a communication network.
- Deep learning stems from the study of artificial neural networks.
- a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning combines low-level features to form more abstract high-level representation attribute categories or features to discover distributed feature representations of data.
- the application fields of deep learning mainly include: computer vision, acoustic model training of speech recognition, machine translation and semantic mining.
- the inventor of the present application as a technician in the field of communication, usually adopts the POLQA algorithm before the completion of the speech quality evaluation, but the inventors have found that the POLQA algorithm requires double-ended speech data, that is, when evaluating the speech quality of the output speech data. It is not only necessary to output voice data but also input voice data. Since it is difficult to obtain input voice data in some cases, the application of the POLQA algorithm is limited. In order to avoid application limitations, new solutions are needed. The inventors found through further analysis that the model constructed by deep learning has excellent performance, so deep learning can be introduced into the speech quality evaluation algorithm.
- the main idea of the present application is to introduce deep learning into speech quality evaluation, especially in the speech quality evaluation in the communication field. It will provide a new solution that relies on single-ended voice data for voice quality evaluation in the communication field, and when relying on single-ended voice data, constructing a model using deep learning can ensure the excellent performance of the model and thus limit the limitation. Technical problems with less performance and better voice quality evaluation. Further, it should be noted that although the main idea of the present application has been described above, the specific technical solution is not limited to the above main ideas, and may be combined with other features, and the combination of these different technical features still belongs to The scope of protection of this application.
- FIG. 1 is a schematic flow chart of a voice quality evaluation method according to an embodiment of the present application.
- the method in this embodiment includes:
- S11 Receive voice data to be evaluated.
- the voice data to be evaluated may specifically refer to the output voice data of the communication network, that is, the degraded voice data after the input voice data passes through the communication network.
- the input voice data generally refers to clean voice data or called original voice data
- the degraded voice data generally refers to voice data with quality degradation of one or more contents such as deterioration of brightness, delay, noise, and the like with respect to the original voice data. .
- the evaluation features of the degraded speech data extraction are the same, and may be determined according to application requirements.
- the evaluation feature refers to describing the characteristics of the voice data from the perspective of the human ear's auditory perception. For details, refer to the subsequent description.
- S13 Perform quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used to indicate the evaluation feature of the single-ended speech data. A relationship with quality information of the single-ended voice data.
- the voice quality evaluation model may be pre-built before the voice quality evaluation is required.
- the voice quality evaluation model is first constructed by offline mode, and when voice quality evaluation is required, A pre-built voice quality evaluation model can be directly used.
- the voice quality evaluation model is built online, such as online when voice quality evaluation is required.
- the specific construction content can be referred to the subsequent description.
- the input and output of the speech quality evaluation model are the evaluation features and quality information of the single-ended speech data respectively. Therefore, after extracting the evaluation features of the speech data to be evaluated, the evaluation feature can be used as the input of the speech quality evaluation model, thereby obtaining The output is the quality information of the voice data to be evaluated, and the voice quality evaluation is realized.
- the speech quality evaluation model may be described by a regression model or a classification model, and the specific content of the above quality information may be different under different description situations.
- the obtained quality information is a specific evaluation score, such as one of 1-5 points;
- the speech quality evaluation model is described by a classification model, the obtained quality information is an evaluation. Category, one of poor, poor, average, good, and good.
- the quality evaluation result obtained by S13 may also be normalized.
- the evaluation score obtained by S13 can be directly used as the final evaluation score, or can be obtained by combining the relevant parameters such as packet loss, jitter, and delay of the communication network with S13.
- the evaluation score is regularized to obtain the final evaluation score.
- the specific algorithm for synthesizing the network parameters may be set and will not be described in detail here.
- the evaluation score obtained in S13 may be multiplied by a coefficient as the final evaluation score, and the coefficient is related to the above parameters of the communication network.
- FIG. 2 is a schematic flow chart of a voice quality evaluation method according to another embodiment of the present application.
- the degraded voice data after the voice data to be evaluated is the communication network is taken as an example.
- the method of this embodiment includes:
- S21 Acquire voice data, the voice data including clean voice data and degraded voice data.
- the voice data can be obtained by collecting and/or directly acquiring from existing data. In order to improve the accuracy of the constructed speech quality evaluation model, as much speech data as possible should be obtained here.
- the real network voice data can be directly collected, and the corresponding clean voice data and the degraded voice data are respectively obtained.
- the specific acquisition manner is not limited in this application.
- the clean voice data and the degraded voice data can be separately collected, so that the clean voice data and the degraded voice data can be directly obtained separately.
- the clean voice data and the degraded voice data may be collected together, and the clean voice data and the degraded voice data may be separately marked to distinguish the clean voice data from the degraded voice data, for example, using 1 to indicate clean voice.
- Data, 0 represents degraded speech data, and at this time, clean speech data and degraded speech data can be respectively acquired according to the markers.
- S22 Acquire clean voice data to be processed according to the clean voice data, and acquire the degraded voice data to be processed according to the degraded voice data.
- Can include:
- the valid speech segments of the acquired degraded speech data are extracted, and the effective speech segments of the degraded speech data are clustered, and the effective speech segments of the degraded speech data corresponding to the cluster center are taken as the degraded speech data to be processed.
- the obtained clean voice data and the degraded voice data may be directly used as the clean voice data to be processed and the degraded language to be processed. Sound data.
- the effective voice segment is extracted separately, and the valid voice segment of the extracted clean voice data is used as the clean voice data to be processed, and the degraded voice data is valid.
- the speech segment acts as degraded speech data to be processed.
- the specific method for extracting valid voice segments is not limited, for example, a voice activity detection (VAD) method is adopted. By processing only valid speech segments, the amount of computation and complexity can be reduced.
- VAD voice activity detection
- all the degraded speech data included in the speech data or the valid speech segment of all the degraded speech data may be regarded as the degraded speech data to be processed, or the partially degraded speech data may also be selected. Or its valid speech segment as degraded speech data to be processed.
- the clustering mode may be used to cluster all the degraded speech data or its effective speech segments, and the degraded speech data corresponding to the cluster center or its effective speech segment is taken as the degraded speech data to be processed.
- the ivector feature of the effective speech segment of the degraded speech data is extracted, and the extracted ivector features are clustered by using the k-means method to obtain k cluster centers, and the degraded speech corresponding to each cluster center is obtained.
- the data or its valid speech segment is used as degraded speech data to be processed.
- S23 Calculate an evaluation score of the degraded voice data to be processed according to the clean voice data to be processed and the degraded voice data to be processed.
- the valid speech segment of the clean speech data can be analyzed frame by frame for each valid speech segment of the degraded speech data, and the evaluation score of the valid speech segment of the degraded speech data is calculated.
- the calculation method is not limited.
- the evaluation score is a Mean Opinion Score (MOS) score of the voice data, and the specific calculation method may be the same as the prior art, such as using the POLQA algorithm or the PESQ algorithm, and is not More details.
- MOS Mean Opinion Score
- the evaluation feature describes the speech data from the perspective of human auditory perception.
- the temporal domain features of the degraded speech data to be processed are first extracted, such as the short-term average energy of the speech data, the segmentation noise of the speech, and the speech.
- the frequency domain features For example, FilterBank features, linear predictive coding (LPC) features, etc.
- LPC linear predictive coding
- a filter capable of describing the human auditory cochlear shape is used, so that the extracted frequency domain features can be perceived from the human ear.
- Angle describes voice data; in order to better describe degraded speech data, it is also possible to extract the mean, variance, maximum, minimum, and difference features of each frequency domain feature, such as first-order, second-order difference values, etc.
- the evaluation feature may be determined according to the application requirements and the degradation of the voice data, which is not limited in this application.
- S25 Perform training according to the evaluation feature of extracting the degraded speech data to be processed and the evaluation score of the degraded speech data to be processed, and construct a speech quality evaluation model.
- the parameters of the speech quality evaluation model can be specifically trained by using the deep learning method. Construct a speech quality evaluation model.
- the network topology used in deep learning can be Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long and Short Time Memory (Long).
- DNN Deep Neural Networks
- CNN Convolutional Neural Networks
- RNN Recurrent Neural Networks
- Long Long and Short Time Memory
- Short-Term Memory (LSTM) one or more combinations in a network such as a neural network, which is not limited in this application; the selection of a specific network is determined according to the application requirements.
- the speech quality evaluation model can be described by different types of models, such as regression models or classification models. Under different types, the corresponding inputs and outputs of the model can be adjusted accordingly.
- the evaluation features of the degraded speech data to be processed and the evaluation scores of the degraded speech data to be processed are directly input and output as models.
- the evaluation feature of the degraded speech data to be processed obtained above is directly input as a model, and the model output is an evaluation category obtained by quantifying the evaluation score of the degraded speech data to be processed.
- the evaluation score of the degraded speech data can be quantized by a fixed step size or a non-fixed step size. If a fixed step size is used, the fixed step size is 0.2, and the evaluation scores of all degraded speech data are quantized and quantified.
- the category of post-degraded speech data taking the MOS score as an example, when quantizing with a fixed step size of 0.2, 20 evaluation categories can be obtained after 1 to 5 components;
- the quantization step size of the evaluation score in each range of the degraded speech data can be determined according to the application requirement. For example, in the lower range of the evaluation score, the large step size can be used, and the evaluation score is in a higher range, and the small step can be adopted.
- Step size quantization taking MOS score as an example, such as 1 to 3 is divided into lower score range, you can use large step size quantization, such as 0.5; 3 to 5 is divided into higher score range, you can use small step size quantization, such as 0.2 After quantification, a total of 14 evaluation categories can be obtained;
- the evaluation score may be quantified by other methods, and the evaluation score is divided into a plurality of evaluation categories.
- the quantified evaluation categories are poor, poor, general, good, and good, and the present application does not limited.
- S26 Receive degraded voice data after passing through the communication network.
- the extraction method of the evaluation feature is the same as the extraction method in the training process, and will not be described in detail here.
- S28 Perform quality evaluation on the degraded voice data according to the evaluation feature and the constructed voice quality evaluation model.
- the evaluation feature of the current degraded speech data is taken as an input of the speech quality evaluation model, and the output of the model is taken as a quality evaluation result of the current degraded speech data.
- the quality evaluation result is an evaluation score
- the speech quality evaluation model is described by a classification model, the quality evaluation result is an evaluation category.
- the voice quality evaluation model to evaluate the voice data for quality evaluation, only single-ended voice data is needed in the voice quality evaluation, and the problem caused by relying on the double-ended voice data is avoided. Use limited issues to extend the scope of your application.
- the excellent performance of the deep learning method can be utilized, so that the speech quality evaluation model is more accurate, and the speech quality evaluation result is more accurate.
- the deep learning can be combined with the voice quality evaluation in the communication field to provide a new solution for the voice quality evaluation in the communication field.
- FIG. 3 is a schematic structural diagram of a voice quality evaluation apparatus according to an embodiment of the present application.
- the apparatus 30 of this embodiment includes: a receiving module 31, an extracting module 32, and an evaluation module 33.
- the receiving module 31 is configured to receive voice data to be evaluated
- the extracting module 32 is configured to extract the evaluation feature of the voice data to be evaluated
- the evaluation module 33 is configured to perform quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, where the speech quality evaluation model is used to indicate single-ended speech The relationship between the evaluation characteristics of the data and the quality information of the single-ended voice data.
- the voice data to be evaluated includes: degraded voice data after passing through the communication network.
- the apparatus 30 of the present embodiment further includes: a building module 34 for constructing a voice quality evaluation model, the building module 34 comprising:
- a first obtaining submodule 341, configured to acquire voice data, where the voice data includes a clean voice number And degraded speech data;
- the second obtaining sub-module 342 is configured to acquire clean voice data to be processed according to the clean voice data, and acquire the degraded voice data to be processed according to the degraded voice data;
- a calculation sub-module 343, configured to calculate an evaluation score of the degraded voice data to be processed according to the clean voice data to be processed and the degraded voice data to be processed;
- An extraction submodule 344 configured to extract an evaluation feature of the degraded speech data to be processed
- the training sub-module 345 is configured to perform training according to the evaluation feature of the degraded speech data to be processed and the evaluation score of the degraded speech data to be processed, and construct a speech quality evaluation model.
- the voice quality evaluation model is constructed by training in a deep learning manner.
- the training sub-module 345 is specifically configured to:
- the evaluation features of the degraded speech data to be processed and the evaluation scores of the degraded speech data to be processed are respectively used as model input and model output, and training model parameters are Build a voice quality assessment model; or,
- the evaluation feature of the degraded speech data to be processed is input as a model, and the evaluation score of the degraded speech data to be processed is quantified to obtain an evaluation category, and the evaluation is performed.
- the score category is used as the model output, the model parameters are trained, and the speech quality evaluation model is constructed.
- the second obtaining submodule 342 is configured to obtain according to the clean voice data.
- Clean voice data to be processed including:
- the obtained clean voice data is directly used as clean voice data to be processed; or
- An effective voice segment of the obtained clean voice data is extracted, and an effective voice segment of the clean voice data is used as clean voice data to be processed.
- the second obtaining submodule 342 is configured to obtain the degraded voice data to be processed according to the degraded voice data, including:
- the valid speech segments of the acquired degraded speech data are extracted, and the effective speech segments of the degraded speech data are clustered, and the effective speech segments of the degraded speech data corresponding to the cluster center are taken as the degraded speech data to be processed.
- Embodiments of the present application also provide an apparatus comprising: one or more processors; a memory for storing one or more programs; and when the one or more programs are executed by the one or more processors And causing, by the one or more processors, a method of: receiving voice data to be evaluated; extracting an evaluation feature of the voice data to be evaluated; and performing an evaluation feature of the voice data to be evaluated and a voice quality evaluation model that has been constructed, Performing quality evaluation on the voice data to be evaluated, wherein the voice quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended voice data and the quality information of the single-ended voice data.
- Embodiments of the present application also provide a non-transitory computer readable storage medium that causes the one or more processes when one or more programs in the storage medium are executed by one or more processors of the device And performing the following method: receiving voice data to be evaluated; extracting the evaluation feature of the voice data to be evaluated; performing quality on the voice data to be evaluated according to the evaluation feature of the voice data to be evaluated and the constructed voice quality evaluation model The evaluation, wherein the speech quality evaluation model is used to indicate a relationship between the evaluation feature of the single-ended speech data and the quality information of the single-ended speech data.
- the embodiment of the present application further provides a computer program product, when the computer program product is executed by one or more processors in the device, causing the one or more processors to perform the following method: receiving voice data to be evaluated; extracting The evaluation feature of the to-be-evaluated speech data; performing quality evaluation on the to-be-evaluated speech data according to the evaluation feature of the to-be-evaluated speech data and the constructed speech quality evaluation model, wherein the speech quality evaluation model is used Indicates the relationship between the evaluation characteristics of the single-ended voice data and the quality information of the single-ended voice data.
- portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
- multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
- a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
- the sub-steps may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, which, when executed, includes one or a combination of the steps of the method embodiments.
- each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
- the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (17)
- 一种语音质量评价方法,其特征在于,包括:接收待评价语音数据;提取所述待评价语音数据的评价特征;根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
- 根据权利要求1所述的方法,其特征在于,所述待评价语音数据包括:经过通信网络后的退化语音数据。
- 根据权利要求2所述的方法,其特征在于,还包括:构建语音质量评价模型,所述构建语音质量评价模型包括:获取语音数据,所述语音数据包括干净语音数据和退化语音数据;根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据;根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分;提取待处理的退化语音数据的评价特征;根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
- 根据权利要求1-3任一项所述的方法,其特征在于,所述语音质量评价模型是采用深度学习方式进行训练后构建的。
- 根据权利要求4所述的方法,其特征在于,所述根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型,包括:如果采用回归模型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分,分别作为模型输入和模型输出,训练模型参数,构建语音质量评价模型;或者,如果采用分类型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征作为模型输入,对所述待处理的退化语音数据的评价得分进行量化,得到评价类别,将评价得分类别作为模型输出,训练模型参数,构建语音质量评价模型。
- 根据权利要求3所述的方法,其特征在于,所述根据所述干净语音数据获取待处理的干净语音数据,包括:将获取的干净语音数据直接作为待处理的干净语音数据;或者,提取获取的干净语音数据的有效语音段,将干净语音数据的有效语音段作为待处理的干净语音数据。
- 根据权利要求3所述的方法,其特征在于,所述根据所述退化语音数据获取待处理的退化语音数据,包括:将获取的退化语音数据直接作为待处理的退化语音数据;或者,提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
- 一种语音质量评价装置,其特征在于,包括:接收模块,用于接收待评价语音数据;提取模块,用于提取所述待评价语音数据的评价特征;评价模块,用于根据所述待评价语音数据的评价特征和已构建的语音质量评价模型,对所述待评价语音数据进行质量评价,其中,所述语音质量评价模型用于表明单端语音数据的评价特征与所述单端语音数据的质量信息之间的关系。
- 根据权利要求8所述的装置,其特征在于,所述待评价语音数据包括:经过通信网络后的退化语音数据。
- 根据权利要求9所述的装置,其特征在于,还包括:用于构建语音质量评价模型的构建模块,所述构建模块包括:第一获取子模块,用于获取语音数据,所述语音数据包括干净语音数据和退化语音数据;第二获取子模块,用于根据所述干净语音数据获取待处理的干净语音数据,以及,根据所述退化语音数据获取待处理的退化语音数据;计算子模块,用于根据待处理的干净语音数据和待处理的退化语音数据,计算待处理的退化语音数据的评价得分;提取子模块,用于提取待处理的退化语音数据的评价特征;训练子模块,用于根据所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分进行训练,构建语音质量评价模型。
- 根据权利要求8-10任一项所述的装置,其特征在于,所述语音质量评价模型是采用深度学习方式进行训练后构建的。
- 根据权利要求11所述的装置,其特征在于,所述训练子模块具体用于:如果采用回归模型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征和所述待处理的退化语音数据的评价得分,分别作为模型输入和模型输出,训练模型参数,构建语音质量评价模型;或者,如果采用分类型描述所述语音质量评价模型,则将所述待处理的退化语音数据的评价特征作为模型输入,对所述待处理的退化语音数据的评价得分进行量化,得到评价类别,将评价得分类别作为模型输出,训练模型参数,构建语音质量评价模型。
- 根据权利要求10所述的装置,其特征在于,所述第二获取子模块用于根据所述干净语音数据获取待处理的干净语音数据,包括:将获取的干净语音数据直接作为待处理的干净语音数据;或者,提取获取的干净语音数据的有效语音段,将干净语音数据的有效语音段作为待处理的干净语音数据。
- 根据权利要求10所述的装置,其特征在于,所述第二获取子模块用于根据所述退化语音数据获取待处理的退化语音数据,包括:将获取的退化语音数据直接作为待处理的退化语音数据;或者,提取获取的退化语音数据的有效语音段,将退化语音数据的有效语音段作为待处理的退化语音数据;或者,对获取的退化语音数据进行聚类,将聚类中心对应的退化语音数据作为待处理的退化语音数据;或者,提取获取的退化语音数据的有效语音段,对退化语音数据的有效语音段进行聚类,将聚类中心对应的退化语音数据的有效语音段作为待处理的退化语音数据。
- 一种设备,其特征在于,包括:一个或多个处理器;用于存储一个或多个程序的存储器;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或 多个处理器执行如权利要求1-7任一项所述的方法。
- 一种非临时性计算机可读存储介质,其特征在于,当所述存储介质中的一个或多个程序由设备的一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任一项所述的方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品被设备中的一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16918904.0A EP3528250B1 (en) | 2016-10-12 | 2016-12-20 | Voice quality evaluation method and apparatus |
JP2019500365A JP2019531494A (ja) | 2016-10-12 | 2016-12-20 | 音声品質評価方法及び装置 |
KR1020197009232A KR102262686B1 (ko) | 2016-10-12 | 2016-12-20 | 음성 품질 평가 방법 및 음성 품질 평가 장치 |
US16/280,705 US10964337B2 (en) | 2016-10-12 | 2019-02-20 | Method, device, and storage medium for evaluating speech quality |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610892176.1 | 2016-10-12 | ||
CN201610892176.1A CN106531190B (zh) | 2016-10-12 | 2016-10-12 | 语音质量评价方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/280,705 Continuation US10964337B2 (en) | 2016-10-12 | 2019-02-20 | Method, device, and storage medium for evaluating speech quality |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018068396A1 true WO2018068396A1 (zh) | 2018-04-19 |
Family
ID=58331645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/111050 WO2018068396A1 (zh) | 2016-10-12 | 2016-12-20 | 语音质量评价方法和装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10964337B2 (zh) |
EP (1) | EP3528250B1 (zh) |
JP (1) | JP2019531494A (zh) |
KR (1) | KR102262686B1 (zh) |
CN (1) | CN106531190B (zh) |
WO (1) | WO2018068396A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10964337B2 (en) | 2016-10-12 | 2021-03-30 | Iflytek Co., Ltd. | Method, device, and storage medium for evaluating speech quality |
JP2022514878A (ja) * | 2018-12-21 | 2022-02-16 | フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 音質の推定および制御を使用した音源分離のための装置および方法 |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108346434B (zh) * | 2017-01-24 | 2020-12-22 | 中国移动通信集团安徽有限公司 | 一种语音质量评估的方法和装置 |
CN107358966B (zh) * | 2017-06-27 | 2020-05-12 | 北京理工大学 | 基于深度学习语音增强的无参考语音质量客观评估方法 |
CN109979486B (zh) * | 2017-12-28 | 2021-07-09 | 中国移动通信集团北京有限公司 | 一种语音质量评估方法及装置 |
CN108322346B (zh) * | 2018-02-09 | 2021-02-02 | 山西大学 | 一种基于机器学习的语音质量评价方法 |
US10777217B2 (en) * | 2018-02-27 | 2020-09-15 | At&T Intellectual Property I, L.P. | Performance sensitive audio signal selection |
CN108304890B (zh) * | 2018-03-16 | 2021-06-08 | 科大讯飞股份有限公司 | 一种分类模型的生成方法及装置 |
CN109308913A (zh) * | 2018-08-02 | 2019-02-05 | 平安科技(深圳)有限公司 | 音乐质量评价方法、装置、计算机设备及存储介质 |
CN109065072B (zh) * | 2018-09-30 | 2019-12-17 | 中国科学院声学研究所 | 一种基于深度神经网络的语音质量客观评价方法 |
CN109658920B (zh) * | 2018-12-18 | 2020-10-09 | 百度在线网络技术(北京)有限公司 | 用于生成模型的方法和装置 |
CN111383657A (zh) * | 2018-12-27 | 2020-07-07 | 中国移动通信集团辽宁有限公司 | 语音质量评估方法、装置、设备及介质 |
CN109830247A (zh) * | 2019-03-22 | 2019-05-31 | 北京百度网讯科技有限公司 | 用于测试通话质量的方法和装置 |
CN110189771A (zh) | 2019-05-31 | 2019-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | 同源音频的音质检测方法、装置及存储介质 |
US11322173B2 (en) * | 2019-06-21 | 2022-05-03 | Rohde & Schwarz Gmbh & Co. Kg | Evaluation of speech quality in audio or video signals |
CN110164443B (zh) * | 2019-06-28 | 2021-09-14 | 联想(北京)有限公司 | 用于电子设备的语音处理方法、装置以及电子设备 |
CN110334240B (zh) * | 2019-07-08 | 2021-10-22 | 联想(北京)有限公司 | 信息处理方法、系统及第一设备、第二设备 |
CN110503981A (zh) * | 2019-08-26 | 2019-11-26 | 苏州科达科技股份有限公司 | 无参考音频客观质量评价方法、装置及存储介质 |
EP3866165B1 (en) * | 2020-02-14 | 2022-08-17 | System One Noc & Development Solutions, S.A. | Method for enhancing telephone speech signals based on convolutional neural networks |
US20210407493A1 (en) * | 2020-06-30 | 2021-12-30 | Plantronics, Inc. | Audio Anomaly Detection in a Speech Signal |
CN111537056A (zh) * | 2020-07-08 | 2020-08-14 | 浙江浙能天然气运行有限公司 | 基于svm与时频域特征的管道沿线第三方施工动态预警方法 |
CN114187921A (zh) * | 2020-09-15 | 2022-03-15 | 华为技术有限公司 | 语音质量评价方法和装置 |
CN116997962A (zh) * | 2020-11-30 | 2023-11-03 | 杜比国际公司 | 基于卷积神经网络的鲁棒侵入式感知音频质量评估 |
CN113409820B (zh) * | 2021-06-09 | 2022-03-15 | 合肥群音信息服务有限公司 | 一种基于语音数据的质量评价方法 |
CN113411456B (zh) * | 2021-06-29 | 2023-05-02 | 中国人民解放军63892部队 | 一种基于语音识别的话音质量评估方法及装置 |
CN114358089A (zh) * | 2022-01-24 | 2022-04-15 | 北京蕴岚科技有限公司 | 基于脑电的语音评估模型的训练方法、装置及电子设备 |
CN115175233B (zh) * | 2022-07-06 | 2024-09-10 | 中国联合网络通信集团有限公司 | 语音质量评估方法、装置、电子设备及存储介质 |
CN116092482B (zh) | 2023-04-12 | 2023-06-20 | 中国民用航空飞行学院 | 一套基于自注意力的实时管制语音质量计量方法及系统 |
CN117612566B (zh) * | 2023-11-16 | 2024-05-28 | 书行科技(北京)有限公司 | 音频质量评估方法及相关产品 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361894A (zh) * | 2014-11-27 | 2015-02-18 | 湖南省计量检测研究院 | 一种基于输出的客观语音质量评估的方法 |
US20150348571A1 (en) * | 2014-05-29 | 2015-12-03 | Nec Corporation | Speech data processing device, speech data processing method, and speech data processing program |
CN105282347A (zh) * | 2014-07-22 | 2016-01-27 | 中国移动通信集团公司 | 语音质量的评估方法及装置 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5715372A (en) * | 1995-01-10 | 1998-02-03 | Lucent Technologies Inc. | Method and apparatus for characterizing an input signal |
WO1997005730A1 (en) * | 1995-07-27 | 1997-02-13 | British Telecommunications Public Limited Company | Assessment of signal quality |
US6446038B1 (en) * | 1996-04-01 | 2002-09-03 | Qwest Communications International, Inc. | Method and system for objectively evaluating speech |
JPH1195795A (ja) * | 1997-09-16 | 1999-04-09 | Nippon Telegr & Teleph Corp <Ntt> | 音声品質評価方法および記録媒体 |
EP1187100A1 (en) * | 2000-09-06 | 2002-03-13 | Koninklijke KPN N.V. | A method and a device for objective speech quality assessment without reference signal |
EP1206104B1 (en) * | 2000-11-09 | 2006-07-19 | Koninklijke KPN N.V. | Measuring a talking quality of a telephone link in a telecommunications network |
US7327985B2 (en) * | 2003-01-21 | 2008-02-05 | Telefonaktiebolaget Lm Ericsson (Publ) | Mapping objective voice quality metrics to a MOS domain for field measurements |
US8305913B2 (en) * | 2005-06-15 | 2012-11-06 | Nortel Networks Limited | Method and apparatus for non-intrusive single-ended voice quality assessment in VoIP |
US7856355B2 (en) * | 2005-07-05 | 2010-12-21 | Alcatel-Lucent Usa Inc. | Speech quality assessment method and system |
US8560312B2 (en) * | 2009-12-17 | 2013-10-15 | Alcatel Lucent | Method and apparatus for the detection of impulsive noise in transmitted speech signals for use in speech quality assessment |
FR2973923A1 (fr) * | 2011-04-11 | 2012-10-12 | France Telecom | Evaluation de la qualite vocale d'un signal de parole code |
US9396738B2 (en) * | 2013-05-31 | 2016-07-19 | Sonus Networks, Inc. | Methods and apparatus for signal quality analysis |
CN104517613A (zh) * | 2013-09-30 | 2015-04-15 | 华为技术有限公司 | 语音质量评估方法及装置 |
CN105224558A (zh) * | 2014-06-16 | 2016-01-06 | 华为技术有限公司 | 语音业务的评价处理方法及装置 |
CN105702250B (zh) * | 2016-01-06 | 2020-05-19 | 福建天晴数码有限公司 | 语音识别方法和装置 |
WO2018028767A1 (en) * | 2016-08-09 | 2018-02-15 | Huawei Technologies Co., Ltd. | Devices and methods for evaluating speech quality |
CN106531190B (zh) | 2016-10-12 | 2020-05-05 | 科大讯飞股份有限公司 | 语音质量评价方法和装置 |
-
2016
- 2016-10-12 CN CN201610892176.1A patent/CN106531190B/zh active Active
- 2016-12-20 WO PCT/CN2016/111050 patent/WO2018068396A1/zh unknown
- 2016-12-20 KR KR1020197009232A patent/KR102262686B1/ko active IP Right Grant
- 2016-12-20 JP JP2019500365A patent/JP2019531494A/ja active Pending
- 2016-12-20 EP EP16918904.0A patent/EP3528250B1/en active Active
-
2019
- 2019-02-20 US US16/280,705 patent/US10964337B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150348571A1 (en) * | 2014-05-29 | 2015-12-03 | Nec Corporation | Speech data processing device, speech data processing method, and speech data processing program |
CN105282347A (zh) * | 2014-07-22 | 2016-01-27 | 中国移动通信集团公司 | 语音质量的评估方法及装置 |
CN104361894A (zh) * | 2014-11-27 | 2015-02-18 | 湖南省计量检测研究院 | 一种基于输出的客观语音质量评估的方法 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3528250A4 * |
YIN, WEI: "Chapter VII Non-intrusive speech quality assessment method based on NLPC coefficient and GMM-HMM model", RESEARCH ON SPEECH ENHANCEMENT BASED ON SPEECH MODELING AND SPEECH QUALITY ASSESSMENT DISSERTATION SUBMITTED TO WUHAN UNIVERSITY FOR THE DOCTORAL DEGREE, 30 September 2009 (2009-09-30), pages 94 - 104, XP009513293 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10964337B2 (en) | 2016-10-12 | 2021-03-30 | Iflytek Co., Ltd. | Method, device, and storage medium for evaluating speech quality |
JP2022514878A (ja) * | 2018-12-21 | 2022-02-16 | フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 音質の推定および制御を使用した音源分離のための装置および方法 |
JP7314279B2 (ja) | 2018-12-21 | 2023-07-25 | フラウンホーファー-ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | 音質の推定および制御を使用した音源分離のための装置および方法 |
Also Published As
Publication number | Publication date |
---|---|
US20190180771A1 (en) | 2019-06-13 |
KR20190045278A (ko) | 2019-05-02 |
JP2019531494A (ja) | 2019-10-31 |
CN106531190B (zh) | 2020-05-05 |
EP3528250A1 (en) | 2019-08-21 |
EP3528250B1 (en) | 2022-05-25 |
KR102262686B1 (ko) | 2021-06-09 |
US10964337B2 (en) | 2021-03-30 |
CN106531190A (zh) | 2017-03-22 |
EP3528250A4 (en) | 2020-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018068396A1 (zh) | 语音质量评价方法和装置 | |
CN110600017B (zh) | 语音处理模型的训练方法、语音识别方法、系统及装置 | |
US20220230651A1 (en) | Voice signal dereverberation processing method and apparatus, computer device and storage medium | |
CN107358966B (zh) | 基于深度学习语音增强的无参考语音质量客观评估方法 | |
JP6339187B2 (ja) | 音声信号品質を測定するためのシステムおよび方法 | |
US11190898B2 (en) | Rendering scene-aware audio using neural network-based acoustic analysis | |
CN108346434B (zh) | 一种语音质量评估的方法和装置 | |
US20160189730A1 (en) | Speech separation method and system | |
US8655656B2 (en) | Method and system for assessing intelligibility of speech represented by a speech signal | |
US9997168B2 (en) | Method and apparatus for signal extraction of audio signal | |
CN111868823B (zh) | 一种声源分离方法、装置及设备 | |
CN108039168B (zh) | 声学模型优化方法及装置 | |
US20230186943A1 (en) | Voice activity detection method and apparatus, and storage medium | |
CN107895571A (zh) | 无损音频文件识别方法及装置 | |
CN111862951A (zh) | 语音端点检测方法及装置、存储介质、电子设备 | |
CN112151055B (zh) | 音频处理方法及装置 | |
KR20190129805A (ko) | 잡음 환경 분류 및 제거 기능을 갖는 보청기 및 그 방법 | |
US11551707B2 (en) | Speech processing method, information device, and computer program product | |
CN112967735A (zh) | 语音质量检测模型的训练方法及语音质量的检测方法 | |
WO2024055751A1 (zh) | 音频数据处理方法、装置、设备、存储介质及程序产品 | |
CN116403594B (zh) | 基于噪声更新因子的语音增强方法和装置 | |
CN113689886B (zh) | 语音数据情感检测方法、装置、电子设备和存储介质 | |
CN116705013B (zh) | 语音唤醒词的检测方法、装置、存储介质和电子设备 | |
US20240005908A1 (en) | Acoustic environment profile estimation | |
CN113436644A (zh) | 音质评估方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16918904 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019500365 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197009232 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2016918904 Country of ref document: EP Effective date: 20190513 |