CN109767760A - Far field audio recognition method based on the study of the multiple target of amplitude and phase information - Google Patents
Far field audio recognition method based on the study of the multiple target of amplitude and phase information Download PDFInfo
- Publication number
- CN109767760A CN109767760A CN201910134661.6A CN201910134661A CN109767760A CN 109767760 A CN109767760 A CN 109767760A CN 201910134661 A CN201910134661 A CN 201910134661A CN 109767760 A CN109767760 A CN 109767760A
- Authority
- CN
- China
- Prior art keywords
- phase
- information
- amplitude
- feature
- multiple target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a kind of far field audio recognition methods based on the study of the multiple target of amplitude and phase information, comprising the following steps: step 1, input data prepare;Step 2 extracts amplitude characteristic and a variety of phase properties;Step 3 constructs multitask deep neural network, and the amplitude characteristic of extraction and phase property are input to training in neural network, voice and enhanced feature after output enhancing.SRMR evaluation and test is done using enhanced voice, does speech recognition using enhanced feature.Present invention utilizes the methods of multiple target study, voice and feature are enhanced simultaneously, compared with the existing methods, it is poor to consider effect of group delay system (MGDCC) feature under reverberation voice, increasing another phase property makes up the deficiency of MGDCC based on the channel information (PBSFVT) of the source separation method of phase field, and then improves speech recognition accuracy.
Description
Technical field
The invention belongs to far field technical field of voice recognition, are specifically related to a kind of more mesh based on amplitude and phase information
Mark the far field audio recognition method of study.
Background technique
Interactive voice is most direct, the most natural communication exchange mode of human society.Speech recognition as key technology it
One, text can be converted by voice signal by recognition of speech signals.Speech recognition is one and touches wide range of areas
Cross discipline, final purpose are to make one similar computer to carry out interactive voice.
By years of researches, near field voice identification technology has been achieved for important breakthrough, and substantially increases performance, but
It is that there is also problems for far field speech recognition technology, in the speech recognition of far field, target voice is often by ambient noise
It is interfered with reverberation, to reduce the accuracy rate of speech recognition, leads to the sharply decline of performance.Therefore it needs to acquire microphone
The signal arrived carries out speech enhan-cement processing, removes the disturbing factors such as noise and reverberation.
Summary of the invention
The present invention is heavily disturbed in reverberation voice for phase information, and phase existing for phase information itself
Winding problems have used group delay method to avoid the winding problems of phase information, while attempting using different phase informations, group
The channel information (PBSFVT) of delay system (MGDCC) and the source separation method based on phase field, utilizes out of phase information
Complementarity carry out speech enhan-cement as important supplemental characteristic.
In order to solve problem above, the present invention uses different phase informations as important supplemental characteristic to carry out voice
Enhancing proposes a kind of far field audio recognition method based on the study of the multiple target of amplitude and phase information, the technical side of use
Case is as follows: the far field audio recognition method based on the study of the multiple target of amplitude and phase information, comprising the following steps:
1) input data prepares: the data concentrated respectively to training set, development set and verifying carry out data preparation;
2) feature extraction:
(1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick
Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by Fourier transformation, then carries out frequency using Mel filter
Filtering and the sensory perceptual system of the mankind is simulated with this;
(2) based on the feature extraction of phase information: extracting the phase information of each frame voice, including group delay system
Two kinds of phase properties of channel information PBSFVT of MGDCC and the source separation method based on phase field;
3) model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be simultaneously
Two different targets are learnt, to simulate the general character and difference between different target.
Feature extraction based on phase information in the step 2)-(2), including group delay system MGDCC phase property, tool
Body extraction process is as follows: during carrying out Speech processing, needing to carry out the phase bit position of voice signal expansion and asks
Its negative derivative is solved, negative derivative is known as group delay coefficient (GDF);
Group delay function its be substantially the negative for calculating the derivative of continuous sound spectrograph;
The phase spectrum signature of continuous phase spectrum signature, that is, non-rolling can indicate are as follows:
Group delay function can similarly be calculated as following expression form:
Wherein: what two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicated be x (n) and
Frequency domain information after nx (n) Fourier transform;
Group delay coefficient after adjustment may be calculated:
Wherein: S (ω) indicates the smoothed version of X (ω);
The spike behavior for reducing frequency spectrum, introduces two new variable αs and γ to be eliminated:
Wherein: α and γ, value range is between 0~1.
Feature extraction based on phase information in the step 2)-(2), the sound including the source separation method based on phase field
Two kinds of phase properties of road information PBSFVT, specific extraction process are as follows:
Two kinds can be broken down into using Short Time Fourier Transform X (ω): two parts of allpass phase and minimum phase:
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein: XMinPh(ω) and XAllp(ω) respectively indicate the corresponding minimum phase part X after Fourier transformation and
Allpass phase part, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
Voice signal is transformed in phase field from amplitude domain by Hilbert transform, obtains minimum phase feature:
After Fourier transformation, convolution relation will become multiplication relationship, obtain following equalities:
Minimum phase feature and channel information processing method are combined, using source Filtering Model in minimum phase domain
Operation carry out source filtering operation carry out information separation, minimum phase voice signal is decomposed into sound source information and channel information,
The different model both to obtain.
The step 3) specifically: building multitask deep neural network, the amplitude characteristic of extraction and phase property is defeated
Enter into neural network training, voice and enhanced feature after output enhancing.
It further include SRMR assessment and speech recognition, the specifically enhanced feature by DNN output carries out speech recognition, from
And Word Error Rate WER (Word Error Rate) is obtained, the enhanced voice of output is carried out SRMR evaluation and test.
Beneficial effect
Present invention utilizes the methods of multiple target study, while enhancing the feature of voice signal and voice, and existing
Method is compared, it is contemplated that effect of group delay system (MGDCC) feature under reverberation voice is poor, increases another phase
Feature makes up the deficiency of MGDCC based on the channel information (PBSFVT) of the source separation method of phase field, and then improves voice and know
Other accuracy rate.
Detailed description of the invention
Fig. 1 is multiple target learning framework basic block diagram proposed by the present invention.
Fig. 2 is the minimum phase domain channel information extraction process based on source separation method.
Fig. 3 is the method for the present invention flow chart.
Specific embodiment
Below by specific embodiments and the drawings, the present invention is further illustrated.The embodiment of the present invention is in order to more
So that those skilled in the art is more fully understood the present invention well, any limitation is not made to the present invention.
As shown in figure 3, a kind of far field audio recognition method based on the study of the multiple target of amplitude and phase information, including with
Lower step:
Step 1, input data prepare: data set chooses data provided by 2014 challenge match of REVERB, respectively to instruction
Practice the data that collection, development set and verifying are concentrated and carries out data preparation;
Step 2, feature extraction:
1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick
Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by Fourier transformation, then carries out frequency using Mel filter
Filtering and the sensory perceptual system of the mankind is simulated with this.
2) based on the feature extraction of phase information: extracting the phase information of each frame voice, including group delay system
(MGDCC) and two kinds of phase properties of the channel information of the source separation method based on phase field (PBSFVT).
Feature extraction of the step 2 of the present invention based on phase information includes group delay system (MGDCC) and based on phase
Two kinds of phase properties of channel information (PBSFVT) of the source separation method of bit field, specific extraction process are as follows:
1) MGDCC is extracted:
When we are during carrying out Speech processing, need to carry out expansion solution to the phase bit position of voice signal
Its negative derivative, negative derivative are known as group delay coefficient (GDF), and doing so can be efficiently used for extracting various voice signal ginsengs
Number.Group delay function is the main representation method of current phase spectrum, is substantially the negative for calculating the derivative of continuous sound spectrograph.
Therefore the phase spectrum signature of continuous phase spectrum signature, that is, non-rolling can indicate are as follows:
It is the phase information function of non-rolling in above formula, group delay function can similarly be calculated as following table
Form is stated,
Wherein, what two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicated be x (n) and
Frequency domain information after nx (n) Fourier transform.In addition as can be seen from the above formula that, denominator disappears at zero close to unit circle
It loses, it is therefore desirable to which the case where function is further adjusted, i.e., become zero for denominator is adjusted.By with its base
The denominator is replaced to carry out solving the problems, such as that denominator becomes zero in smooth spectrum, the characteristic for the spike that group delay can be overcome to compose.It adjusts
Group delay coefficient after whole may be calculated:
Wherein, S (ω) indicates the smoothed version of X (ω), but original group delay function remains formant spectrum
Peak value acute problem, will affect the performance of speech recognition in this way.In order to reduce the spike behavior of frequency spectrum, introduce two it is new
Variable α and γ are eliminated, and value range is between 0~1.
2) PBSFVT is extracted:
Voice signal is a kind of signal of mixed-phase information, wherein including minimum phase information and allpass phase information
Etc..Therefore two kinds can be broken down into using Short Time Fourier Transform X (ω): two portions of allpass phase and minimum phase
Point.
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein, XMinPh(ω) and XAllp(ω) respectively indicate the corresponding minimum phase part X after Fourier transformation and
Allpass phase part, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
For the information of minimum phase part, Hilbert transform provides the mapping relations between phase and amplitude, because
We can be transformed to voice signal in phase field from amplitude domain by Hilbert transform for this, as follows, in this way we
It is obtained with minimum phase feature,
After Fourier transformation, convolution relation will become multiplication relationship, therefore available following equalities:
Minimum phase feature and channel information processing method can be combined, using source Filtering Model in minimum phase
The operation of bit field carries out source filtering operation and carries out information separation, and minimum phase voice signal can be thus decomposed into sound source letter
Breath and channel information, to obtain the different model of the two.
Step 3, model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be with
Two different targets are learnt simultaneously, to simulate the general character and difference between different target.
Step 4 exports result: the enhanced feature of DNN output being carried out speech recognition, to obtain WER (Word
Error Rate), the enhanced voice of output is carried out SRMR evaluation and test.
Fig. 1 is multiple target learning framework basic block diagram proposed by the invention, by the voice based on MFCC essential characteristic
Identification mission is as main task, based on the speech enhan-cement task of sound spectrograph feature as nonproductive task.This front-end feature processes
Model combines voice recognition tasks and speech enhan-cement task, mix using the non-linear mapping capability of neural network
Ring operation.In the regression model of front end dereverberation processing, the loss function to minimize mean square error (MSE) is carried out as target
Optimization.Multi-purpose Neural e-learning estimates the target of two different tasks simultaneously, learns the general character and difference between two tasks
It is different.Compared with independent training pattern, the learning efficiency and precision of prediction of main task model is can be improved in this method.Wherein DNN has 3
Layer hidden layer, including 3072 nodes, loss function are MSE, and what optimization algorithm was chosen is stochastic gradient descent.Speech recognition
Task chooses the MFB feature of 23 dimensions as essential characteristic, and the speech enhan-cement task of auxiliary chooses the sound spectrograph feature of 256 dimensions, phase
Position feature MGDCC and PBSFVT are 13 peacekeepings 23 dimension respectively.
In this multiple target learning framework, main task is characterized enhancing task, auxiliary for the speech recognition system of rear end
Helping task is speech enhan-cement task, for promoting the extensive effect of main task.It can learn two by the representation method of inclusion layer
Correlation between a task.
Fig. 2 is the minimum phase domain channel information extraction process based on source separation method, due to the big portion in voice signal
Point information concentrates on middle low frequency part, therefore is filtered using Meier filter, obtains higher resolution ratio in low frequency, and high
Frequency part can then be suppressed its resolution ratio.
In addition a last step is exactly to carry out decorrelation operation, and main cause is so that data are preferably matched in GMM-
Diagonal covariance matrix used in HMM system acquires more accurate alignment information in acoustic model.Finally, to feature
Vector carries out final processing, such as using cepstrum mean normalization (CMN), and calculates behavioral characteristics.In this task, the spy
The basic setup for levying parameter is frame length, and frame displacement and filter quantity are respectively 25ms and 10ms and 23, use DCT later
Decorrelation and Data Dimensionality Reduction is carried out, obtains the data characteristicses of 13 dimensions.
Table 1 is the structure and parameter setting of neural network;
Parameter | Value |
The shared hiding number of plies | 3 |
Concealed nodes | Every layer 3072 |
Concealed nodes type | Sigmoid |
Loss function | MSE (mean square error) |
Optimization algorithm | Stochastic gradient descent |
Context size | 15 |
The number of iterations | 30 |
Table 2 is different input results comparisons (WER%) in multi-task learning frame
Table 1 lists the structure of neural network and specific parameter setting, table 2 are in 2014 challenge match number of REVERB
According to the experimental result comparison carried out on collection, evaluation index is WER (Word Error Rate), we can prove phase information
Importance in multiple target study.In this experiment, using phase information as the important auxiliary information of amplitude information, as
The supplement of amplitude characteristic.In the present invention, the letter that will be handled in frequency domain by the method that Hilbert feature space is converted
Breath is transformed into minimum phase domain and carries out estimation phase information, can obtain relatively accurate phase estimation feature in this way.By right
Than MFB+spectrum method and MFB+spectrum+MGDCC+PBSFVT method, it can be seen that the performance of automatic speech recognition
It is improved, and recognition result WER is reduced to 23.68% from 26.57%, opposite error rate reduces 10.88%.
Although above in conjunction with figure, invention has been described, and the invention is not limited to above-mentioned specific embodiment parties
Formula, the above mentioned embodiment is only schematical, rather than restrictive, and those skilled in the art are in this hair
Under bright enlightenment, without deviating from the spirit of the invention, many variations can also be made, these belong to guarantor of the invention
Within shield.
Claims (5)
1. the far field audio recognition method based on the study of the multiple target of amplitude and phase information, which is characterized in that including following step
It is rapid:
1) input data prepares: the data concentrated respectively to training set, development set and verifying carry out data preparation;
2) feature extraction:
(1) based on the feature extraction of amplitude information: by framing, adding window, and to each short-time analysis window, by quick Fu
Signal is transformed into frequency domain by time domain and obtains corresponding frequency spectrum by leaf transformation, and the mistake of frequency is then carried out using Mel filter
Filter and simulate with this sensory perceptual system of mankind;
(2) based on the feature extraction of phase information: extract the phase information of each frame voice, including group delay system MGDCC with
And two kinds of phase properties of the channel information PBSFVT of the source separation method based on phase field;
3) model training: the feature extracted is input in the DNN of multiple target, and the DNN network of multiple target can be simultaneously to two
A different target is learnt, to simulate the general character and difference between different target.
2. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information,
It is characterized in that, the feature extraction based on phase information in the step 2)-(2), including group delay system MGDCC phase property,
Specific extraction process is as follows: during carrying out Speech processing, needing that the phase bit position of voice signal is unfolded
Its negative derivative is solved, negative derivative is known as group delay coefficient (GDF);
Group delay function its be substantially the negative for calculating the derivative of continuous sound spectrograph;
Phase spectrum signature, that is, non-rolling phase spectrum signature can indicate are as follows:
Group delay function can similarly be calculated as following expression form:
Wherein: that two parts of real and imaginary parts and Y (ω) that subscript R and I are respectively indicated respectively indicate is x (n) and nx
(n) frequency domain information after Fourier transform;
Group delay coefficient after adjustment may be calculated:
Wherein: S (ω) indicates the smoothed version of X (ω);
The spike behavior for reducing frequency spectrum, introduces two new variable αs and γ to be eliminated:
Wherein: α and γ, value range is between 0~1.
3. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information,
It is characterized in that, the feature extraction based on phase information in the step 2)-(2), including the source separation method based on phase field
Two kinds of phase properties of channel information PBSFVT, specific extraction process are as follows:
Two kinds can be broken down into using Short Time Fourier Transform X (ω): two parts of allpass phase and minimum phase:
X (ω)=| X (ω) | ejarg{X(ω)}=XMinPh(ω)XAllp(ω)
Wherein: XMinPh(ω) and XAllp(ω) respectively indicates the corresponding minimum phase part X and all-pass after Fourier transformation
Phase bit position, and there is the relationships of following formula between minimum phase and primary speech signal:
| X (ω) |=| XMinPh(ω)|
On the other hand, the relationship between minimum phase and allpass phase are as follows:
Arg { X (ω) }=arg { XMinPh(ω)}+arg{XAllp(ω)}
Voice signal is transformed in phase field from amplitude domain by Hilbert transform, obtains minimum phase feature:
After Fourier transformation, convolution relation will become multiplication relationship, obtain following equalities:
Minimum phase feature and channel information processing method are combined, the behaviour using source Filtering Model in minimum phase domain
Make carry out source filtering operation and carry out information separation, minimum phase voice signal is decomposed into sound source information and channel information, thus
Obtain the different model of the two.
4. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information,
It is characterized in that, the step 3) specifically: building multitask deep neural network, the amplitude characteristic of extraction and phase property is defeated
Enter into neural network training, voice and enhanced feature after output enhancing.
5. the far field audio recognition method according to claim 1 based on the study of the multiple target of amplitude and phase information,
It being characterized in that, further includes SRMR assessment and speech recognition, the specifically enhanced feature by DNN output carries out speech recognition,
To obtain Word Error Rate, the enhanced voice of output is carried out SRMR evaluation and test.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134661.6A CN109767760A (en) | 2019-02-23 | 2019-02-23 | Far field audio recognition method based on the study of the multiple target of amplitude and phase information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910134661.6A CN109767760A (en) | 2019-02-23 | 2019-02-23 | Far field audio recognition method based on the study of the multiple target of amplitude and phase information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109767760A true CN109767760A (en) | 2019-05-17 |
Family
ID=66457198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910134661.6A Pending CN109767760A (en) | 2019-02-23 | 2019-02-23 | Far field audio recognition method based on the study of the multiple target of amplitude and phase information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109767760A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110231410A (en) * | 2019-06-12 | 2019-09-13 | 武汉市工程科学技术研究院 | Anchor pole detection without damage data intelligence means of interpretation |
CN110324702A (en) * | 2019-07-04 | 2019-10-11 | 三星电子(中国)研发中心 | Information-pushing method and device in video display process |
CN112349277A (en) * | 2020-09-28 | 2021-02-09 | 紫光展锐(重庆)科技有限公司 | Feature domain voice enhancement method combined with AI model and related product |
CN112565977A (en) * | 2020-11-27 | 2021-03-26 | 大象声科(深圳)科技有限公司 | Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device |
CN113269305A (en) * | 2021-05-20 | 2021-08-17 | 郑州铁路职业技术学院 | Feedback voice strengthening method for strengthening memory |
CN113903334A (en) * | 2021-09-13 | 2022-01-07 | 北京百度网讯科技有限公司 | Method and device for training sound source positioning model and sound source positioning |
CN114141239A (en) * | 2021-11-29 | 2022-03-04 | 江南大学 | Voice short instruction identification method and system based on lightweight deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07225596A (en) * | 1994-02-15 | 1995-08-22 | Sony Corp | Device for analyzing/synthesizing acoustic signal |
KR20090016343A (en) * | 2007-08-10 | 2009-02-13 | 한국전자통신연구원 | Method and apparatus for encoding/decoding signal having strong non-stationary properties using hilbert-huang transform |
CN102855882A (en) * | 2011-06-29 | 2013-01-02 | 自然低音技术有限公司 | Perception enhancement for low-frequency sound components |
CN103250208A (en) * | 2010-11-24 | 2013-08-14 | 日本电气株式会社 | Signal processing device, signal processing method and signal processing program |
CN104823237A (en) * | 2012-11-26 | 2015-08-05 | 哈曼国际工业有限公司 | System, computer-readable storage medium and method for repair of compressed audio signals |
CN108962277A (en) * | 2018-07-20 | 2018-12-07 | 广州酷狗计算机科技有限公司 | Speech signal separation method, apparatus, computer equipment and storage medium |
-
2019
- 2019-02-23 CN CN201910134661.6A patent/CN109767760A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07225596A (en) * | 1994-02-15 | 1995-08-22 | Sony Corp | Device for analyzing/synthesizing acoustic signal |
KR20090016343A (en) * | 2007-08-10 | 2009-02-13 | 한국전자통신연구원 | Method and apparatus for encoding/decoding signal having strong non-stationary properties using hilbert-huang transform |
CN103250208A (en) * | 2010-11-24 | 2013-08-14 | 日本电气株式会社 | Signal processing device, signal processing method and signal processing program |
CN102855882A (en) * | 2011-06-29 | 2013-01-02 | 自然低音技术有限公司 | Perception enhancement for low-frequency sound components |
CN104823237A (en) * | 2012-11-26 | 2015-08-05 | 哈曼国际工业有限公司 | System, computer-readable storage medium and method for repair of compressed audio signals |
CN108962277A (en) * | 2018-07-20 | 2018-12-07 | 广州酷狗计算机科技有限公司 | Speech signal separation method, apparatus, computer equipment and storage medium |
Non-Patent Citations (4)
Title |
---|
DONGBO LI ETC: "Multiple Phase Information Combination for Replay Attacks Detection", 《INTERSPEECH 2018》 * |
LONGBIAO WANG ETC: "PHASE AWARE DEEP NEURAL NETWORK FOR NOISE ROBUST VOICE ACTIVITY", 《PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) 2017》 * |
ZEYAN OO ETC: "Phase and reverberation aware DNN for distant-talking", 《MULTIMEDIA TOOLS AND APPLICATIONS》 * |
徐勇: "基于深层神经网络的语音增强方法研究", 《中国博士学位论文全文数据库》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110231410A (en) * | 2019-06-12 | 2019-09-13 | 武汉市工程科学技术研究院 | Anchor pole detection without damage data intelligence means of interpretation |
CN110231410B (en) * | 2019-06-12 | 2022-01-28 | 武汉市工程科学技术研究院 | Intelligent interpretation method for nondestructive testing data of anchor rod |
CN110324702A (en) * | 2019-07-04 | 2019-10-11 | 三星电子(中国)研发中心 | Information-pushing method and device in video display process |
CN110324702B (en) * | 2019-07-04 | 2022-06-07 | 三星电子(中国)研发中心 | Information pushing method and device in video playing process |
CN112349277A (en) * | 2020-09-28 | 2021-02-09 | 紫光展锐(重庆)科技有限公司 | Feature domain voice enhancement method combined with AI model and related product |
CN112565977A (en) * | 2020-11-27 | 2021-03-26 | 大象声科(深圳)科技有限公司 | Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device |
CN112565977B (en) * | 2020-11-27 | 2023-03-07 | 大象声科(深圳)科技有限公司 | Training method of high-frequency signal reconstruction model and high-frequency signal reconstruction method and device |
CN113269305A (en) * | 2021-05-20 | 2021-08-17 | 郑州铁路职业技术学院 | Feedback voice strengthening method for strengthening memory |
CN113269305B (en) * | 2021-05-20 | 2024-05-03 | 郑州铁路职业技术学院 | Feedback voice strengthening method for strengthening memory |
CN113903334A (en) * | 2021-09-13 | 2022-01-07 | 北京百度网讯科技有限公司 | Method and device for training sound source positioning model and sound source positioning |
CN113903334B (en) * | 2021-09-13 | 2022-09-23 | 北京百度网讯科技有限公司 | Method and device for training sound source positioning model and sound source positioning |
CN114141239A (en) * | 2021-11-29 | 2022-03-04 | 江南大学 | Voice short instruction identification method and system based on lightweight deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767760A (en) | Far field audio recognition method based on the study of the multiple target of amplitude and phase information | |
Yin et al. | Phasen: A phase-and-harmonics-aware speech enhancement network | |
CN107610707B (en) | A kind of method for recognizing sound-groove and device | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
CN105023580B (en) | Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method | |
CN103559879B (en) | Acoustic feature extracting method and device in language recognition system | |
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN107146601A (en) | A kind of rear end i vector Enhancement Methods for Speaker Recognition System | |
CN105679312B (en) | The phonetic feature processing method of Application on Voiceprint Recognition under a kind of noise circumstance | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN109243483A (en) | A kind of noisy frequency domain convolution blind source separation method | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN106384588B (en) | The hybrid compensation method of additive noise and reverberation in short-term based on vector Taylor series | |
CN108922513A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN105096955A (en) | Speaker rapid identification method and system based on growing and clustering algorithm of models | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
CN110047504B (en) | Speaker identification method under identity vector x-vector linear transformation | |
CN106373559A (en) | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting | |
Lv et al. | A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation | |
CN109841218A (en) | A kind of voiceprint registration method and device for far field environment | |
CN110189766A (en) | A kind of voice style transfer method neural network based | |
Sainath et al. | Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction. | |
CN104778948A (en) | Noise-resistant voice recognition method based on warped cepstrum feature |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190517 |