CN109767760A - Far field audio recognition method based on the study of the multiple target of amplitude and phase information - Google Patents

Far field audio recognition method based on the study of the multiple target of amplitude and phase information Download PDF

Info

Publication number
CN109767760A
CN109767760A CN201910134661.6A CN201910134661A CN109767760A CN 109767760 A CN109767760 A CN 109767760A CN 201910134661 A CN201910134661 A CN 201910134661A CN 109767760 A CN109767760 A CN 109767760A
Authority
CN
China
Prior art keywords
phase
information
amplitude
feature
multiple target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910134661.6A
Other languages
Chinese (zh)
Inventor
党建武
崔凌赫
王龙标
李东播
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910134661.6A priority Critical patent/CN109767760A/en
Publication of CN109767760A publication Critical patent/CN109767760A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of far field audio recognition methods based on the study of the multiple target of amplitude and phase information, comprising the following steps: step 1, input data prepare;Step 2 extracts amplitude characteristic and a variety of phase properties;Step 3 constructs multitask deep neural network, and the amplitude characteristic of extraction and phase property are input to training in neural network, voice and enhanced feature after output enhancing.SRMR evaluation and test is done using enhanced voice, does speech recognition using enhanced feature.Present invention utilizes the methods of multiple target study, voice and feature are enhanced simultaneously, compared with the existing methods, it is poor to consider effect of group delay system (MGDCC) feature under reverberation voice, increasing another phase property makes up the deficiency of MGDCC based on the channel information (PBSFVT) of the source separation method of phase field, and then improves speech recognition accuracy.

Description

Far field audio recognition method based on the study of the multiple target of amplitude and phase information
Technical field
The invention belongs to far field technical field of voice recognition, are specifically related to a kind of more mesh based on amplitude and phase information Mark the far field audio recognition method of study.
Background technique
Interactive voice is most direct, the most natural communication exchange mode of human society.Speech recognition as key technology it One, text can be converted by voice signal by recognition of speech signals.Speech recognition is one and touches wide range of areas Cross discipline, final purpose are to make one similar computer to carry out interactive voice.
By years of researches, near field voice identification technology has been achieved for important breakthrough, and substantially increases performance, but It is that there is also problems for far field speech recognition technology, in the speech recognition of far field, target voice is often by ambient noise It is interfered with reverberation, to reduce the accuracy rate of speech recognition, leads to the sharply decline of performance.Therefore it needs to acquire microphone The signal arrived carries out speech enhan-cement processing, removes the disturbing factors such as noise and reverberation.
Summary of the invention
The present invention is heavily disturbed in reverberation voice for phase information, and phase existing for phase information itself Winding problems have used group delay method to avoid the winding problems of phase information, while attempting using different phase informations, group The channel information (PBSFVT) of delay system (MGDCC) and the source separation method based on phase field, utilizes out of phase information Complementarity carry out speech enhan-cement as important supplemental characteristic.
In order to solve problem above, the present invention uses different phase informations as important supplemental characteristic to carry out voice Enhancing proposes a kind of far field audio recognition method based on the study of the multiple target of amplitude and phase information, the technical side of use Case is as follows: the far field audio recognition method based on the study of the multiple target of amplitude and phase information, comprising the following steps:
1) input data prepares: the data concentrated respectively to training set, development set and verifying carry out data preparation;
2) feature extraction:
(1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by Fourier transformation, then carries out frequency using Mel filter Filtering and the sensory perceptual system of the mankind is simulated with this;
(2) based on the feature extraction of phase information: extracting the phase information of each frame voice, including group delay system Two kinds of phase properties of channel information PBSFVT of MGDCC and the source separation method based on phase field;
3) model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be simultaneously Two different targets are learnt, to simulate the general character and difference between different target.
Feature extraction based on phase information in the step 2)-(2), including group delay system MGDCC phase property, tool Body extraction process is as follows: during carrying out Speech processing, needing to carry out the phase bit position of voice signal expansion and asks Its negative derivative is solved, negative derivative is known as group delay coefficient (GDF);
Group delay function its be substantially the negative for calculating the derivative of continuous sound spectrograph;
The phase spectrum signature of continuous phase spectrum signature, that is, non-rolling can indicate are as follows:
Group delay function can similarly be calculated as following expression form:
Wherein: what two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicated be x (n) and Frequency domain information after nx (n) Fourier transform;
Group delay coefficient after adjustment may be calculated:
Wherein: S (ω) indicates the smoothed version of X (ω);
The spike behavior for reducing frequency spectrum, introduces two new variable αs and γ to be eliminated:
Wherein: α and γ, value range is between 0~1.
Feature extraction based on phase information in the step 2)-(2), the sound including the source separation method based on phase field Two kinds of phase properties of road information PBSFVT, specific extraction process are as follows:
Two kinds can be broken down into using Short Time Fourier Transform X (ω): two parts of allpass phase and minimum phase:
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein: XMinPh(ω) and XAllp(ω) respectively indicate the corresponding minimum phase part X after Fourier transformation and Allpass phase part, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
Voice signal is transformed in phase field from amplitude domain by Hilbert transform, obtains minimum phase feature:
After Fourier transformation, convolution relation will become multiplication relationship, obtain following equalities:
Minimum phase feature and channel information processing method are combined, using source Filtering Model in minimum phase domain Operation carry out source filtering operation carry out information separation, minimum phase voice signal is decomposed into sound source information and channel information, The different model both to obtain.
The step 3) specifically: building multitask deep neural network, the amplitude characteristic of extraction and phase property is defeated Enter into neural network training, voice and enhanced feature after output enhancing.
It further include SRMR assessment and speech recognition, the specifically enhanced feature by DNN output carries out speech recognition, from And Word Error Rate WER (Word Error Rate) is obtained, the enhanced voice of output is carried out SRMR evaluation and test.
Beneficial effect
Present invention utilizes the methods of multiple target study, while enhancing the feature of voice signal and voice, and existing Method is compared, it is contemplated that effect of group delay system (MGDCC) feature under reverberation voice is poor, increases another phase Feature makes up the deficiency of MGDCC based on the channel information (PBSFVT) of the source separation method of phase field, and then improves voice and know Other accuracy rate.
Detailed description of the invention
Fig. 1 is multiple target learning framework basic block diagram proposed by the present invention.
Fig. 2 is the minimum phase domain channel information extraction process based on source separation method.
Fig. 3 is the method for the present invention flow chart.
Specific embodiment
Below by specific embodiments and the drawings, the present invention is further illustrated.The embodiment of the present invention is in order to more So that those skilled in the art is more fully understood the present invention well, any limitation is not made to the present invention.
As shown in figure 3, a kind of far field audio recognition method based on the study of the multiple target of amplitude and phase information, including with Lower step:
Step 1, input data prepare: data set chooses data provided by 2014 challenge match of REVERB, respectively to instruction Practice the data that collection, development set and verifying are concentrated and carries out data preparation;
Step 2, feature extraction:
1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by Fourier transformation, then carries out frequency using Mel filter Filtering and the sensory perceptual system of the mankind is simulated with this.
2) based on the feature extraction of phase information: extracting the phase information of each frame voice, including group delay system (MGDCC) and two kinds of phase properties of the channel information of the source separation method based on phase field (PBSFVT).
Feature extraction of the step 2 of the present invention based on phase information includes group delay system (MGDCC) and based on phase Two kinds of phase properties of channel information (PBSFVT) of the source separation method of bit field, specific extraction process are as follows:
1) MGDCC is extracted:
When we are during carrying out Speech processing, need to carry out expansion solution to the phase bit position of voice signal Its negative derivative, negative derivative are known as group delay coefficient (GDF), and doing so can be efficiently used for extracting various voice signal ginsengs Number.Group delay function is the main representation method of current phase spectrum, is substantially the negative for calculating the derivative of continuous sound spectrograph. Therefore the phase spectrum signature of continuous phase spectrum signature, that is, non-rolling can indicate are as follows:
It is the phase information function of non-rolling in above formula, group delay function can similarly be calculated as following table Form is stated,
Wherein, what two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicated be x (n) and Frequency domain information after nx (n) Fourier transform.In addition as can be seen from the above formula that, denominator disappears at zero close to unit circle It loses, it is therefore desirable to which the case where function is further adjusted, i.e., become zero for denominator is adjusted.By with its base The denominator is replaced to carry out solving the problems, such as that denominator becomes zero in smooth spectrum, the characteristic for the spike that group delay can be overcome to compose.It adjusts Group delay coefficient after whole may be calculated:
Wherein, S (ω) indicates the smoothed version of X (ω), but original group delay function remains formant spectrum Peak value acute problem, will affect the performance of speech recognition in this way.In order to reduce the spike behavior of frequency spectrum, introduce two it is new Variable α and γ are eliminated, and value range is between 0~1.
2) PBSFVT is extracted:
Voice signal is a kind of signal of mixed-phase information, wherein including minimum phase information and allpass phase information Etc..Therefore two kinds can be broken down into using Short Time Fourier Transform X (ω): two portions of allpass phase and minimum phase Point.
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein, XMinPh(ω) and XAllp(ω) respectively indicate the corresponding minimum phase part X after Fourier transformation and Allpass phase part, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
For the information of minimum phase part, Hilbert transform provides the mapping relations between phase and amplitude, because We can be transformed to voice signal in phase field from amplitude domain by Hilbert transform for this, as follows, in this way we It is obtained with minimum phase feature,
After Fourier transformation, convolution relation will become multiplication relationship, therefore available following equalities:
Minimum phase feature and channel information processing method can be combined, using source Filtering Model in minimum phase The operation of bit field carries out source filtering operation and carries out information separation, and minimum phase voice signal can be thus decomposed into sound source letter Breath and channel information, to obtain the different model of the two.
Step 3, model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be with Two different targets are learnt simultaneously, to simulate the general character and difference between different target.
Step 4 exports result: the enhanced feature of DNN output being carried out speech recognition, to obtain WER (Word Error Rate), the enhanced voice of output is carried out SRMR evaluation and test.
Fig. 1 is multiple target learning framework basic block diagram proposed by the invention, by the voice based on MFCC essential characteristic Identification mission is as main task, based on the speech enhan-cement task of sound spectrograph feature as nonproductive task.This front-end feature processes Model combines voice recognition tasks and speech enhan-cement task, mix using the non-linear mapping capability of neural network Ring operation.In the regression model of front end dereverberation processing, the loss function to minimize mean square error (MSE) is carried out as target Optimization.Multi-purpose Neural e-learning estimates the target of two different tasks simultaneously, learns the general character and difference between two tasks It is different.Compared with independent training pattern, the learning efficiency and precision of prediction of main task model is can be improved in this method.Wherein DNN has 3 Layer hidden layer, including 3072 nodes, loss function are MSE, and what optimization algorithm was chosen is stochastic gradient descent.Speech recognition Task chooses the MFB feature of 23 dimensions as essential characteristic, and the speech enhan-cement task of auxiliary chooses the sound spectrograph feature of 256 dimensions, phase Position feature MGDCC and PBSFVT are 13 peacekeepings 23 dimension respectively.
In this multiple target learning framework, main task is characterized enhancing task, auxiliary for the speech recognition system of rear end Helping task is speech enhan-cement task, for promoting the extensive effect of main task.It can learn two by the representation method of inclusion layer Correlation between a task.
Fig. 2 is the minimum phase domain channel information extraction process based on source separation method, due to the big portion in voice signal Point information concentrates on middle low frequency part, therefore is filtered using Meier filter, obtains higher resolution ratio in low frequency, and high Frequency part can then be suppressed its resolution ratio.
In addition a last step is exactly to carry out decorrelation operation, and main cause is so that data are preferably matched in GMM- Diagonal covariance matrix used in HMM system acquires more accurate alignment information in acoustic model.Finally, to feature Vector carries out final processing, such as using cepstrum mean normalization (CMN), and calculates behavioral characteristics.In this task, the spy The basic setup for levying parameter is frame length, and frame displacement and filter quantity are respectively 25ms and 10ms and 23, use DCT later Decorrelation and Data Dimensionality Reduction is carried out, obtains the data characteristicses of 13 dimensions.
Table 1 is the structure and parameter setting of neural network;
Parameter Value
The shared hiding number of plies 3
Concealed nodes Every layer 3072
Concealed nodes type Sigmoid
Loss function MSE (mean square error)
Optimization algorithm Stochastic gradient descent
Context size 15
The number of iterations 30
Table 2 is different input results comparisons (WER%) in multi-task learning frame
Table 1 lists the structure of neural network and specific parameter setting, table 2 are in 2014 challenge match number of REVERB According to the experimental result comparison carried out on collection, evaluation index is WER (Word Error Rate), we can prove phase information Importance in multiple target study.In this experiment, using phase information as the important auxiliary information of amplitude information, as The supplement of amplitude characteristic.In the present invention, the letter that will be handled in frequency domain by the method that Hilbert feature space is converted Breath is transformed into minimum phase domain and carries out estimation phase information, can obtain relatively accurate phase estimation feature in this way.By right Than MFB+spectrum method and MFB+spectrum+MGDCC+PBSFVT method, it can be seen that the performance of automatic speech recognition It is improved, and recognition result WER is reduced to 23.68% from 26.57%, opposite error rate reduces 10.88%.
Although above in conjunction with figure, invention has been described, and the invention is not limited to above-mentioned specific embodiment parties Formula, the above mentioned embodiment is only schematical, rather than restrictive, and those skilled in the art are in this hair Under bright enlightenment, without deviating from the spirit of the invention, many variations can also be made, these belong to guarantor of the invention Within shield.

Claims (5)

1. the far field audio recognition method based on the study of the multiple target of amplitude and phase information, which is characterized in that including following step It is rapid:
1) input data prepares: the data concentrated respectively to training set, development set and verifying carry out data preparation;
2) feature extraction:
(1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick Fu Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by leaf transformation, and the mistake of frequency is then carried out using Mel filter Filter and simulate with this sensory perceptual system of mankind;
(2) based on the feature extraction of phase information: extract the phase information of each frame voice, including group delay system MGDCC with And two kinds of phase properties of the channel information PBSFVT of the source separation method based on phase field;
3) model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be simultaneously to two A different target is learnt, to simulate the general character and difference between different target.
2. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information, It is characterized in that, the feature extraction based on phase information in the step 2)-(2), including group delay system MGDCC phase property, Specific extraction process is as follows: during carrying out Speech processing, needing that the phase bit position of voice signal is unfolded Its negative derivative is solved, negative derivative is known as group delay coefficient (GDF);
Group delay function its be substantially the negative for calculating the derivative of continuous sound spectrograph;
Phase spectrum signature, that is, non-rolling phase spectrum signature can indicate are as follows:
Group delay function can similarly be calculated as following expression form:
Wherein: that two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicate is x (n) and nx (n) frequency domain information after Fourier transform;
Group delay coefficient after adjustment may be calculated:
Wherein: S (ω) indicates the smoothed version of X (ω);
The spike behavior for reducing frequency spectrum, introduces two new variable αs and γ to be eliminated:
Wherein: α and γ, value range is between 0~1.
3. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information, It is characterized in that, the feature extraction based on phase information in the step 2)-(2), including the source separation method based on phase field Two kinds of phase properties of channel information PBSFVT, specific extraction process are as follows:
Two kinds can be broken down into using Short Time Fourier Transform X (ω): two parts of allpass phase and minimum phase:
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein: XMinPh(ω) and XAllp(ω) respectively indicates the corresponding minimum phase part X and all-pass after Fourier transformation Phase bit position, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
Voice signal is transformed in phase field from amplitude domain by Hilbert transform, obtains minimum phase feature:
After Fourier transformation, convolution relation will become multiplication relationship, obtain following equalities:
Minimum phase feature and channel information processing method are combined, the behaviour using source Filtering Model in minimum phase domain Make carry out source filtering operation and carry out information separation, minimum phase voice signal is decomposed into sound source information and channel information, thus Obtain the different model of the two.
4. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information, It is characterized in that, the step 3) specifically: building multitask deep neural network, the amplitude characteristic of extraction and phase property is defeated Enter into neural network training, voice and enhanced feature after output enhancing.
5. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information, It being characterized in that, further includes SRMR assessment and speech recognition, the specifically enhanced feature by DNN output carries out speech recognition, To obtain Word Error Rate, the enhanced voice of output is carried out SRMR evaluation and test.
CN201910134661.6A 2019-02-23 2019-02-23 Far field audio recognition method based on the study of the multiple target of amplitude and phase information Pending CN109767760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910134661.6A CN109767760A (en) 2019-02-23 2019-02-23 Far field audio recognition method based on the study of the multiple target of amplitude and phase information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910134661.6A CN109767760A (en) 2019-02-23 2019-02-23 Far field audio recognition method based on the study of the multiple target of amplitude and phase information

Publications (1)

Publication Number Publication Date
CN109767760A true CN109767760A (en) 2019-05-17

Family

ID=66457198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910134661.6A Pending CN109767760A (en) 2019-02-23 2019-02-23 Far field audio recognition method based on the study of the multiple target of amplitude and phase information

Country Status (1)

Country Link
CN (1) CN109767760A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110231410A (en) * 2019-06-12 2019-09-13 武汉市工程科学技术研究院 Anchor pole detection without damage data intelligence means of interpretation
CN110324702A (en) * 2019-07-04 2019-10-11 三星电子(中国)研发中心 Information-pushing method and device in video display process
CN112349277A (en) * 2020-09-28 2021-02-09 紫光展锐(重庆)科技有限公司 Feature domain voice enhancement method combined with AI model and related product
CN112565977A (en) * 2020-11-27 2021-03-26 大象声科(深圳)科技有限公司 Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device
CN113269305A (en) * 2021-05-20 2021-08-17 郑州铁路职业技术学院 Feedback voice strengthening method for strengthening memory
CN113903334A (en) * 2021-09-13 2022-01-07 北京百度网讯科技有限公司 Method and device for training sound source positioning model and sound source positioning
CN114141239A (en) * 2021-11-29 2022-03-04 江南大学 Voice short instruction identification method and system based on lightweight deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07225596A (en) * 1994-02-15 1995-08-22 Sony Corp Device for analyzing/synthesizing acoustic signal
KR20090016343A (en) * 2007-08-10 2009-02-13 한국전자통신연구원 Method and apparatus for encoding/decoding signal having strong non-stationary properties using hilbert-huang transform
CN102855882A (en) * 2011-06-29 2013-01-02 自然低音技术有限公司 Perception enhancement for low-frequency sound components
CN103250208A (en) * 2010-11-24 2013-08-14 日本电气株式会社 Signal processing device, signal processing method and signal processing program
CN104823237A (en) * 2012-11-26 2015-08-05 哈曼国际工业有限公司 System, computer-readable storage medium and method for repair of compressed audio signals
CN108962277A (en) * 2018-07-20 2018-12-07 广州酷狗计算机科技有限公司 Speech signal separation method, apparatus, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07225596A (en) * 1994-02-15 1995-08-22 Sony Corp Device for analyzing/synthesizing acoustic signal
KR20090016343A (en) * 2007-08-10 2009-02-13 한국전자통신연구원 Method and apparatus for encoding/decoding signal having strong non-stationary properties using hilbert-huang transform
CN103250208A (en) * 2010-11-24 2013-08-14 日本电气株式会社 Signal processing device, signal processing method and signal processing program
CN102855882A (en) * 2011-06-29 2013-01-02 自然低音技术有限公司 Perception enhancement for low-frequency sound components
CN104823237A (en) * 2012-11-26 2015-08-05 哈曼国际工业有限公司 System, computer-readable storage medium and method for repair of compressed audio signals
CN108962277A (en) * 2018-07-20 2018-12-07 广州酷狗计算机科技有限公司 Speech signal separation method, apparatus, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DONGBO LI ETC: "Multiple Phase Information Combination for Replay Attacks Detection", 《INTERSPEECH 2018》 *
LONGBIAO WANG ETC: "PHASE AWARE DEEP NEURAL NETWORK FOR NOISE ROBUST VOICE ACTIVITY", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) 2017》 *
ZEYAN OO ETC: "Phase and reverberation aware DNN for distant-talking", 《MULTIMEDIA TOOLS AND APPLICATIONS》 *
徐勇: "基于深层神经网络的语音增强方法研究", 《中国博士学位论文全文数据库》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110231410A (en) * 2019-06-12 2019-09-13 武汉市工程科学技术研究院 Anchor pole detection without damage data intelligence means of interpretation
CN110231410B (en) * 2019-06-12 2022-01-28 武汉市工程科学技术研究院 Intelligent interpretation method for nondestructive testing data of anchor rod
CN110324702A (en) * 2019-07-04 2019-10-11 三星电子(中国)研发中心 Information-pushing method and device in video display process
CN110324702B (en) * 2019-07-04 2022-06-07 三星电子(中国)研发中心 Information pushing method and device in video playing process
CN112349277A (en) * 2020-09-28 2021-02-09 紫光展锐(重庆)科技有限公司 Feature domain voice enhancement method combined with AI model and related product
CN112565977A (en) * 2020-11-27 2021-03-26 大象声科(深圳)科技有限公司 Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device
CN112565977B (en) * 2020-11-27 2023-03-07 大象声科(深圳)科技有限公司 Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device
CN113269305A (en) * 2021-05-20 2021-08-17 郑州铁路职业技术学院 Feedback voice strengthening method for strengthening memory
CN113269305B (en) * 2021-05-20 2024-05-03 郑州铁路职业技术学院 Feedback voice strengthening method for strengthening memory
CN113903334A (en) * 2021-09-13 2022-01-07 北京百度网讯科技有限公司 Method and device for training sound source positioning model and sound source positioning
CN113903334B (en) * 2021-09-13 2022-09-23 北京百度网讯科技有限公司 Method and device for training sound source positioning model and sound source positioning
CN114141239A (en) * 2021-11-29 2022-03-04 江南大学 Voice short instruction identification method and system based on lightweight deep learning

Similar Documents

Publication Publication Date Title
CN109767760A (en) Far field audio recognition method based on the study of the multiple target of amplitude and phase information
Yin et al. Phasen: A phase-and-harmonics-aware speech enhancement network
CN107610707B (en) A kind of method for recognizing sound-groove and device
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN105023580B (en) Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method
CN103559879B (en) Acoustic feature extracting method and device in language recognition system
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN107146601A (en) A kind of rear end i vector Enhancement Methods for Speaker Recognition System
CN105679312B (en) The phonetic feature processing method of Application on Voiceprint Recognition under a kind of noise circumstance
CN109119072A (en) Civil aviaton's land sky call acoustic model construction method based on DNN-HMM
CN109243483A (en) A kind of noisy frequency domain convolution blind source separation method
CN108986798B (en) Processing method, device and the equipment of voice data
CN106384588B (en) The hybrid compensation method of additive noise and reverberation in short-term based on vector Taylor series
CN108922513A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN105096955A (en) Speaker rapid identification method and system based on growing and clustering algorithm of models
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN110047504B (en) Speaker identification method under identity vector x-vector linear transformation
CN106373559A (en) Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
Lv et al. A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation
CN109841218A (en) A kind of voiceprint registration method and device for far field environment
CN110189766A (en) A kind of voice style transfer method neural network based
Sainath et al. Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.
CN104778948A (en) Noise-resistant voice recognition method based on warped cepstrum feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190517