CN107068167A - Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures - Google Patents
Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures Download PDFInfo
- Publication number
- CN107068167A CN107068167A CN201710146957.0A CN201710146957A CN107068167A CN 107068167 A CN107068167 A CN 107068167A CN 201710146957 A CN201710146957 A CN 201710146957A CN 107068167 A CN107068167 A CN 107068167A
- Authority
- CN
- China
- Prior art keywords
- network
- speaker
- neutral net
- layer
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 201000009240 nasopharyngitis Diseases 0.000 title claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 12
- 230000007935 neutral effect Effects 0.000 claims abstract description 26
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 22
- 230000007787 long-term memory Effects 0.000 claims abstract description 12
- 230000004927 fusion Effects 0.000 claims abstract description 11
- 238000001228 spectrum Methods 0.000 claims abstract description 8
- 238000000576 coating method Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 6
- 210000005036 nerve Anatomy 0.000 claims description 5
- 238000003475 lamination Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036449 good health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Public Health (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of speaker's cold symptoms recognition methods for merging a variety of end-to-end neural network structures, comprise the following steps:S1. it is voice to build and train input, and identification network is the end-to-end neutral net A of convolutional neural networks and shot and long term memory network;S2. it is voice spectrum to build and train input, and identification network is the end-to-end neutral net B of convolutional neural networks and shot and long term memory network;S3. it is voice spectrum to build and train input, and identification network is convolutional neural networks and the end-to-end neutral net C of fully-connected network;S4. it is voice MFCC features/CQCC features to build and train input, and identification network is the end-to-end neutral net D of shot and long term memory network;S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
Description
Technical field
The present invention relates to Application on Voiceprint Recognition field, more particularly, to a kind of a variety of end-to-end neural network structures of fusion
Speaker's cold symptoms recognition methods.
Background technology
Speaker Identification is also known as Application on Voiceprint Recognition, is the technology of Land use models identification technology automatic identification speaker.Current
Speaker Recognition Technology obtains good performance in experiment condition, but in practice, the voice recognized can be by environment
The influence of noise and speaker's healthiness condition so that the robustness reduction of existing speaker Recognition Technology.Existing speaker knows
In terms of other method is mainly used in speaker's identity determination, there is presently no the related identification side applied to speaker's cold symptoms
Method.
In voice technology research, researcher always wants to that the feature for representing target type can be found, from identification target language
The characteristic that significant difference normal voice is found in sound is described, and speech feature extraction is the phonetic feature harmony for extracting speaker
Road feature, at present, the characteristic parameter of main flow are based on single feature, to characterize speaker including MFCC, LPCC, CQCC etc., all
The information of cold symptoms is not enough, influences accuracy of identification.A large amount of knowledge for distinguishing class object voice are needed simultaneously, and are calculated in identification
In method, starting is the method based on channel model and speech model earlier, but because the complexity of model, is not obtained very
Good practical function.And model matching method such as technology such as dynamic time warping, hidden Markov model, vector quantization etc. starts
Play good recognition effect.Feature extraction and pattern classification are separately studied be Study of recognition common method, but exist
The problem of feature and unmatched models, training difficulty, feature are difficult to find, the problem of classical identification framework has above-mentioned.
Recently as the development of deep learning, had shown that based on deep-neural-network in the identification of image and voice huge
Big energy, a series of neural network structure is also suggested, such as autocoding network, convolutional neural networks and circulation nerve
Network etc..There are many scholars to find, voice is learnt by neutral net, can obtain being better described the hiding knot of voice
Structure feature, end to end recognition methods is exactly by few priori of trying one's best, while being carried out to feature learning and feature recognition
Processing, with good recognition effect.
The content of the invention
Feature extraction and pattern classification are separated caused feature by the present invention to solve the identification technology that prior art is provided
Difficult with unmatched models, training, there is provided a variety of end-to-end neural network structures of one kind fusion for the problems such as feature is difficult to find
Speaker's cold symptoms recognition methods, this method is by being unified feature learning and pattern classification so that entirely say
Talk about people's cold symptoms identification process simpler quick, be with a wide range of applications.
To realize above goal of the invention, the technical scheme of use is:
Speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures is merged, is comprised the following steps:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network
Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network
Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network
Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network
Neutral net D;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
Preferably, the convolutional neural networks of the end-to-end neutral net A include 8 modules, and each module includes one
Convolutional layer, ReLU active coatings and one-dimensional maximum pond layer are tieed up, wherein the size of the convolution kernel of one-dimensional convolutional layer is 32, one-dimensional maximum
The Chi Huahe of pond layer size is 2, and pond step-length is 2.
Preferably, the convolutional neural networks of the end-to-end neutral net B include 6 modules, and each module includes two dimension
Convolutional layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer is used
5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, pond step-length
For 2.
Preferably, the convolutional neural networks of the end-to-end neutral net C include 6 modules, and each module includes two dimension
Convolutional layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer is used
5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, pond step-length
For 2.
Compared with prior art, the beneficial effects of the invention are as follows:
Existing identification technology is all that feature and pattern classification are separately studied, existing characteristics and unmatched models, training difficulty, special
Levy the problems such as being difficult to find.And the method that the present invention is provided is by merging four kinds of different end-to-end neutral nets feature learning
It is unified with pattern classification so that whole speaker's cold symptoms identification process is simpler quick, is answered with extensive
Use prospect.
Brief description of the drawings
Fig. 1 is the specific implementation schematic diagram of method.
Fig. 2 is the flow chart that voice extracts mel cepstrum coefficients (MFCC).
Fig. 3 is the flow chart of voice extraction constant Q cepstrum coefficients (CQCC).
Fig. 4 is end-to-end neutral net A schematic diagram.
Fig. 5 is end-to-end neutral net B schematic diagram.
Fig. 6 is end-to-end neutral net C schematic diagram.
Fig. 7 is end-to-end neutral net D schematic diagram.
Embodiment
Accompanying drawing being given for example only property explanation, it is impossible to be interpreted as the limitation to this patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
The specific implementation process figure for the method that Fig. 1 provides for the present invention, as shown in figure 1, a variety of ends of fusion that the present invention is provided are arrived
Speaker's cold symptoms recognition methods of terminal nerve network structure, comprises the following steps:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network
Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network
Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network
Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network
Neutral net D, it is specific as shown in Figure 7;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
Wherein, as shown in Figure 2,3, the MFCC features in step S4 are by carrying out preemphasis to voice, adding window framing, quick
After Fourier transformation, calculating energy spectral density, the filtering of melscale triangular filter group, computing of taking the logarithm, discrete cosine transform
Finally give, and CQCC features be by voice carry out constant Q transform, ask energy spectral density, operation of taking the logarithm, it is discrete more than
String conversion is obtained.
In specific implementation process, as shown in figure 4, the convolutional neural networks of the end-to-end neutral net A include 8
Module, each module includes one-dimensional convolutional layer, ReLU active coatings and one-dimensional maximum pond layer, wherein the convolution of one-dimensional convolutional layer
The size of core is 32, and the Chi Huahe of one-dimensional maximum pond layer size is 2, and pond step-length is 2.
In specific implementation process, as shown in figure 5, the convolutional neural networks of the end-to-end neutral net B include 6
Module, each module includes two-dimensional convolution layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*
7 convolution kernel, the second layer uses 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers are used
3*3 Chi Huahe, pond step-length is 2.
In specific implementation process, as shown in fig. 6, the convolutional neural networks of the end-to-end neutral net C include 6
Module, each module includes two-dimensional convolution layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*
7 convolution kernel, the second layer uses 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers are used
3*3 Chi Huahe, pond step-length is 2.
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not pair
The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description
To make other changes in different forms.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Any modifications, equivalent substitutions and improvements made within the spirit and principle of invention etc., should be included in the claims in the present invention
Protection domain within.
Claims (4)
1. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion, it is characterised in that:Including following
Step:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network
Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network
Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network
Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network
Neutral net D;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
2. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1,
It is characterized in that:The convolutional neural networks of the end-to-end neutral net A include 8 modules, and each module includes one-dimensional volume
Lamination, ReLU active coatings and one-dimensional maximum pond layer, wherein the size of the convolution kernel of one-dimensional convolutional layer is 32, one-dimensional maximum pond
The Chi Huahe of layer size is 2, and pond step-length is 2.
3. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1,
It is characterized in that:The convolutional neural networks of the end-to-end neutral net B include 6 modules, and each module includes two-dimensional convolution
Layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer uses 5*5's
Convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, and pond step-length is 2.
4. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1,
It is characterized in that:The convolutional neural networks of the end-to-end neutral net C include 6 modules, and each module includes two-dimensional convolution
Layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer uses 5*5's
Convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, and pond step-length is 2.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710146957.0A CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
PCT/CN2018/076272 WO2018166316A1 (en) | 2017-03-13 | 2018-02-11 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710146957.0A CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107068167A true CN107068167A (en) | 2017-08-18 |
Family
ID=59621946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710146957.0A Pending CN107068167A (en) | 2017-03-13 | 2017-03-13 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107068167A (en) |
WO (1) | WO2018166316A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108053841A (en) * | 2017-10-23 | 2018-05-18 | 平安科技(深圳)有限公司 | The method and application server of disease forecasting are carried out using voice |
WO2018166316A1 (en) * | 2017-03-13 | 2018-09-20 | 佛山市顺德区中山大学研究院 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
CN108899051A (en) * | 2018-06-26 | 2018-11-27 | 北京大学深圳研究生院 | A kind of speech emotion recognition model and recognition methods based on union feature expression |
CN109086892A (en) * | 2018-06-15 | 2018-12-25 | 中山大学 | It is a kind of based on the visual problem inference pattern and system that typically rely on tree |
CN109192226A (en) * | 2018-06-26 | 2019-01-11 | 深圳大学 | A kind of signal processing method and device |
CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Bragg grating based on LSTM network interlocks the demodulation method of spectrum |
CN109960910A (en) * | 2017-12-14 | 2019-07-02 | 广东欧珀移动通信有限公司 | Method of speech processing, device, storage medium and terminal device |
CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | Hybrid neural network vehicle type identification method based on audio feature fusion |
CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3054063A1 (en) | 2017-03-03 | 2018-09-07 | Pindrop Security, Inc. | Method and apparatus for detecting spoofing conditions |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
CN105139864A (en) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | Voice recognition method and voice recognition device |
CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107068167A (en) * | 2017-03-13 | 2017-08-18 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures |
-
2017
- 2017-03-13 CN CN201710146957.0A patent/CN107068167A/en active Pending
-
2018
- 2018-02-11 WO PCT/CN2018/076272 patent/WO2018166316A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214743A (en) * | 1989-10-25 | 1993-05-25 | Hitachi, Ltd. | Information processing apparatus |
CN105139864A (en) * | 2015-08-17 | 2015-12-09 | 北京天诚盛业科技有限公司 | Voice recognition method and voice recognition device |
CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
Non-Patent Citations (2)
Title |
---|
TARA N. SAINATH等: "Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks", 《ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2015 IEEE INTERNATIONAL CONFERENCE ON》 * |
杜朦旭: "感冒病人嗓音的特征提取与识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018166316A1 (en) * | 2017-03-13 | 2018-09-20 | 佛山市顺德区中山大学研究院 | Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures |
CN108053841A (en) * | 2017-10-23 | 2018-05-18 | 平安科技(深圳)有限公司 | The method and application server of disease forecasting are carried out using voice |
CN109960910A (en) * | 2017-12-14 | 2019-07-02 | 广东欧珀移动通信有限公司 | Method of speech processing, device, storage medium and terminal device |
CN109960910B (en) * | 2017-12-14 | 2021-06-08 | Oppo广东移动通信有限公司 | Voice processing method, device, storage medium and terminal equipment |
CN109086892B (en) * | 2018-06-15 | 2022-02-18 | 中山大学 | General dependency tree-based visual problem reasoning model and system |
CN109086892A (en) * | 2018-06-15 | 2018-12-25 | 中山大学 | It is a kind of based on the visual problem inference pattern and system that typically rely on tree |
CN108899051B (en) * | 2018-06-26 | 2020-06-16 | 北京大学深圳研究生院 | Speech emotion recognition model and recognition method based on joint feature representation |
CN109192226A (en) * | 2018-06-26 | 2019-01-11 | 深圳大学 | A kind of signal processing method and device |
CN108899051A (en) * | 2018-06-26 | 2018-11-27 | 北京大学深圳研究生院 | A kind of speech emotion recognition model and recognition methods based on union feature expression |
CN109256118A (en) * | 2018-10-22 | 2019-01-22 | 江苏师范大学 | End-to-end Chinese dialects identifying system and method based on production auditory model |
CN109256118B (en) * | 2018-10-22 | 2021-06-25 | 江苏师范大学 | End-to-end Chinese dialect identification system and method based on generative auditory model |
CN109282837A (en) * | 2018-10-24 | 2019-01-29 | 福州大学 | Bragg grating based on LSTM network interlocks the demodulation method of spectrum |
CN111028859A (en) * | 2019-12-15 | 2020-04-17 | 中北大学 | Hybrid neural network vehicle type identification method based on audio feature fusion |
CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
Also Published As
Publication number | Publication date |
---|---|
WO2018166316A1 (en) | 2018-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107068167A (en) | Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures | |
Qian et al. | Very deep convolutional neural networks for noise robust speech recognition | |
CN109272988B (en) | Voice recognition method based on multi-path convolution neural network | |
CN104732978B (en) | The relevant method for distinguishing speek person of text based on combined depth study | |
CN106847309A (en) | A kind of speech-emotion recognition method | |
CN105321525B (en) | A kind of system and method reducing VOIP communication resource expense | |
CN106952649A (en) | Method for distinguishing speek person based on convolutional neural networks and spectrogram | |
CN106952643A (en) | A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering | |
CN111048097B (en) | Twin network voiceprint recognition method based on 3D convolution | |
CN106782511A (en) | Amendment linear depth autoencoder network audio recognition method | |
CN109036465A (en) | Speech-emotion recognition method | |
CN109346084A (en) | Method for distinguishing speek person based on depth storehouse autoencoder network | |
CN110544482B (en) | Single-channel voice separation system | |
CN107039036A (en) | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN112017682A (en) | Single-channel voice simultaneous noise reduction and reverberation removal system | |
CN108986798A (en) | Processing method, device and the equipment of voice data | |
Sukhwal et al. | Comparative study of different classifiers based speaker recognition system using modified MFCC for noisy environment | |
CN113763965A (en) | Speaker identification method with multiple attention characteristics fused | |
CN110189766A (en) | A kind of voice style transfer method neural network based | |
CN114299986A (en) | Small sample voice recognition method and system based on cross-domain transfer learning | |
Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
CN114220438B (en) | Lightweight speaker identification method and system based on bottleeck and channel segmentation | |
CN115472168B (en) | Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features | |
CN116596031A (en) | Unsupervised sentence characterization learning method with negative sample removed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |
|
RJ01 | Rejection of invention patent application after publication |