CN107068167A - Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures - Google Patents

Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures Download PDF

Info

Publication number
CN107068167A
CN107068167A CN201710146957.0A CN201710146957A CN107068167A CN 107068167 A CN107068167 A CN 107068167A CN 201710146957 A CN201710146957 A CN 201710146957A CN 107068167 A CN107068167 A CN 107068167A
Authority
CN
China
Prior art keywords
network
speaker
neutral net
layer
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710146957.0A
Other languages
Chinese (zh)
Inventor
李明
倪志东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SYSU CMU Shunde International Joint Research Institute
National Sun Yat Sen University
Original Assignee
SYSU CMU Shunde International Joint Research Institute
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SYSU CMU Shunde International Joint Research Institute, National Sun Yat Sen University filed Critical SYSU CMU Shunde International Joint Research Institute
Priority to CN201710146957.0A priority Critical patent/CN107068167A/en
Publication of CN107068167A publication Critical patent/CN107068167A/en
Priority to PCT/CN2018/076272 priority patent/WO2018166316A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of speaker's cold symptoms recognition methods for merging a variety of end-to-end neural network structures, comprise the following steps:S1. it is voice to build and train input, and identification network is the end-to-end neutral net A of convolutional neural networks and shot and long term memory network;S2. it is voice spectrum to build and train input, and identification network is the end-to-end neutral net B of convolutional neural networks and shot and long term memory network;S3. it is voice spectrum to build and train input, and identification network is convolutional neural networks and the end-to-end neutral net C of fully-connected network;S4. it is voice MFCC features/CQCC features to build and train input, and identification network is the end-to-end neutral net D of shot and long term memory network;S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.

Description

Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures
Technical field
The present invention relates to Application on Voiceprint Recognition field, more particularly, to a kind of a variety of end-to-end neural network structures of fusion Speaker's cold symptoms recognition methods.
Background technology
Speaker Identification is also known as Application on Voiceprint Recognition, is the technology of Land use models identification technology automatic identification speaker.Current Speaker Recognition Technology obtains good performance in experiment condition, but in practice, the voice recognized can be by environment The influence of noise and speaker's healthiness condition so that the robustness reduction of existing speaker Recognition Technology.Existing speaker knows In terms of other method is mainly used in speaker's identity determination, there is presently no the related identification side applied to speaker's cold symptoms Method.
In voice technology research, researcher always wants to that the feature for representing target type can be found, from identification target language The characteristic that significant difference normal voice is found in sound is described, and speech feature extraction is the phonetic feature harmony for extracting speaker Road feature, at present, the characteristic parameter of main flow are based on single feature, to characterize speaker including MFCC, LPCC, CQCC etc., all The information of cold symptoms is not enough, influences accuracy of identification.A large amount of knowledge for distinguishing class object voice are needed simultaneously, and are calculated in identification In method, starting is the method based on channel model and speech model earlier, but because the complexity of model, is not obtained very Good practical function.And model matching method such as technology such as dynamic time warping, hidden Markov model, vector quantization etc. starts Play good recognition effect.Feature extraction and pattern classification are separately studied be Study of recognition common method, but exist The problem of feature and unmatched models, training difficulty, feature are difficult to find, the problem of classical identification framework has above-mentioned.
Recently as the development of deep learning, had shown that based on deep-neural-network in the identification of image and voice huge Big energy, a series of neural network structure is also suggested, such as autocoding network, convolutional neural networks and circulation nerve Network etc..There are many scholars to find, voice is learnt by neutral net, can obtain being better described the hiding knot of voice Structure feature, end to end recognition methods is exactly by few priori of trying one's best, while being carried out to feature learning and feature recognition Processing, with good recognition effect.
The content of the invention
Feature extraction and pattern classification are separated caused feature by the present invention to solve the identification technology that prior art is provided Difficult with unmatched models, training, there is provided a variety of end-to-end neural network structures of one kind fusion for the problems such as feature is difficult to find Speaker's cold symptoms recognition methods, this method is by being unified feature learning and pattern classification so that entirely say Talk about people's cold symptoms identification process simpler quick, be with a wide range of applications.
To realize above goal of the invention, the technical scheme of use is:
Speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures is merged, is comprised the following steps:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network Neutral net D;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
Preferably, the convolutional neural networks of the end-to-end neutral net A include 8 modules, and each module includes one Convolutional layer, ReLU active coatings and one-dimensional maximum pond layer are tieed up, wherein the size of the convolution kernel of one-dimensional convolutional layer is 32, one-dimensional maximum The Chi Huahe of pond layer size is 2, and pond step-length is 2.
Preferably, the convolutional neural networks of the end-to-end neutral net B include 6 modules, and each module includes two dimension Convolutional layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer is used 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, pond step-length For 2.
Preferably, the convolutional neural networks of the end-to-end neutral net C include 6 modules, and each module includes two dimension Convolutional layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer is used 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, pond step-length For 2.
Compared with prior art, the beneficial effects of the invention are as follows:
Existing identification technology is all that feature and pattern classification are separately studied, existing characteristics and unmatched models, training difficulty, special Levy the problems such as being difficult to find.And the method that the present invention is provided is by merging four kinds of different end-to-end neutral nets feature learning It is unified with pattern classification so that whole speaker's cold symptoms identification process is simpler quick, is answered with extensive Use prospect.
Brief description of the drawings
Fig. 1 is the specific implementation schematic diagram of method.
Fig. 2 is the flow chart that voice extracts mel cepstrum coefficients (MFCC).
Fig. 3 is the flow chart of voice extraction constant Q cepstrum coefficients (CQCC).
Fig. 4 is end-to-end neutral net A schematic diagram.
Fig. 5 is end-to-end neutral net B schematic diagram.
Fig. 6 is end-to-end neutral net C schematic diagram.
Fig. 7 is end-to-end neutral net D schematic diagram.
Embodiment
Accompanying drawing being given for example only property explanation, it is impossible to be interpreted as the limitation to this patent;
Below in conjunction with drawings and examples, the present invention is further elaborated.
Embodiment 1
The specific implementation process figure for the method that Fig. 1 provides for the present invention, as shown in figure 1, a variety of ends of fusion that the present invention is provided are arrived Speaker's cold symptoms recognition methods of terminal nerve network structure, comprises the following steps:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network Neutral net D, it is specific as shown in Figure 7;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
Wherein, as shown in Figure 2,3, the MFCC features in step S4 are by carrying out preemphasis to voice, adding window framing, quick After Fourier transformation, calculating energy spectral density, the filtering of melscale triangular filter group, computing of taking the logarithm, discrete cosine transform Finally give, and CQCC features be by voice carry out constant Q transform, ask energy spectral density, operation of taking the logarithm, it is discrete more than String conversion is obtained.
In specific implementation process, as shown in figure 4, the convolutional neural networks of the end-to-end neutral net A include 8 Module, each module includes one-dimensional convolutional layer, ReLU active coatings and one-dimensional maximum pond layer, wherein the convolution of one-dimensional convolutional layer The size of core is 32, and the Chi Huahe of one-dimensional maximum pond layer size is 2, and pond step-length is 2.
In specific implementation process, as shown in figure 5, the convolutional neural networks of the end-to-end neutral net B include 6 Module, each module includes two-dimensional convolution layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7* 7 convolution kernel, the second layer uses 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers are used 3*3 Chi Huahe, pond step-length is 2.
In specific implementation process, as shown in fig. 6, the convolutional neural networks of the end-to-end neutral net C include 6 Module, each module includes two-dimensional convolution layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7* 7 convolution kernel, the second layer uses 5*5 convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers are used 3*3 Chi Huahe, pond step-length is 2.
Obviously, the above embodiment of the present invention is only intended to clearly illustrate example of the present invention, and is not pair The restriction of embodiments of the present invention.For those of ordinary skill in the field, may be used also on the basis of the above description To make other changes in different forms.There is no necessity and possibility to exhaust all the enbodiments.It is all this Any modifications, equivalent substitutions and improvements made within the spirit and principle of invention etc., should be included in the claims in the present invention Protection domain within.

Claims (4)

1. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion, it is characterised in that:Including following Step:
S1. it is voice to build and train input, and identification network is the end-to-end god of convolutional neural networks and shot and long term memory network Through network A;
S2. it is voice spectrum that structure and training, which are inputted, and identification network arrives for the end of convolutional neural networks and shot and long term memory network Terminal nerve network B;
S3. it is voice spectrum to build and train input, and identification network is the end-to-end god of convolutional neural networks and fully-connected network Through network C;
S4. it is voice MFCC features/CQCC features to build and train input, and identification network is end-to-end for shot and long term memory network Neutral net D;
S5. the end-to-end neutral net that more than fusion four kinds train carries out speaker's cold symptoms identification.
2. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1, It is characterized in that:The convolutional neural networks of the end-to-end neutral net A include 8 modules, and each module includes one-dimensional volume Lamination, ReLU active coatings and one-dimensional maximum pond layer, wherein the size of the convolution kernel of one-dimensional convolutional layer is 32, one-dimensional maximum pond The Chi Huahe of layer size is 2, and pond step-length is 2.
3. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1, It is characterized in that:The convolutional neural networks of the end-to-end neutral net B include 6 modules, and each module includes two-dimensional convolution Layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer uses 5*5's Convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, and pond step-length is 2.
4. speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures of fusion according to claim 1, It is characterized in that:The convolutional neural networks of the end-to-end neutral net C include 6 modules, and each module includes two-dimensional convolution Layer, ReLU active coatings and Two-dimensional Maximum pond layer;Wherein first convolutional layer uses 7*7 convolution kernel, and the second layer uses 5*5's Convolution kernel, is left 4 layers of convolution kernel using 3*3;All maximum pond layers use 3*3 Chi Huahe, and pond step-length is 2.
CN201710146957.0A 2017-03-13 2017-03-13 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures Pending CN107068167A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710146957.0A CN107068167A (en) 2017-03-13 2017-03-13 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures
PCT/CN2018/076272 WO2018166316A1 (en) 2017-03-13 2018-02-11 Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710146957.0A CN107068167A (en) 2017-03-13 2017-03-13 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures

Publications (1)

Publication Number Publication Date
CN107068167A true CN107068167A (en) 2017-08-18

Family

ID=59621946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710146957.0A Pending CN107068167A (en) 2017-03-13 2017-03-13 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures

Country Status (2)

Country Link
CN (1) CN107068167A (en)
WO (1) WO2018166316A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice
WO2018166316A1 (en) * 2017-03-13 2018-09-20 佛山市顺德区中山大学研究院 Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures
CN108899051A (en) * 2018-06-26 2018-11-27 北京大学深圳研究生院 A kind of speech emotion recognition model and recognition methods based on union feature expression
CN109086892A (en) * 2018-06-15 2018-12-25 中山大学 It is a kind of based on the visual problem inference pattern and system that typically rely on tree
CN109192226A (en) * 2018-06-26 2019-01-11 深圳大学 A kind of signal processing method and device
CN109256118A (en) * 2018-10-22 2019-01-22 江苏师范大学 End-to-end Chinese dialects identifying system and method based on production auditory model
CN109282837A (en) * 2018-10-24 2019-01-29 福州大学 Bragg grating based on LSTM network interlocks the demodulation method of spectrum
CN109960910A (en) * 2017-12-14 2019-07-02 广东欧珀移动通信有限公司 Method of speech processing, device, storage medium and terminal device
CN111028859A (en) * 2019-12-15 2020-04-17 中北大学 Hybrid neural network vehicle type identification method based on audio feature fusion
CN116110437A (en) * 2023-04-14 2023-05-12 天津大学 Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2018226844B2 (en) 2017-03-03 2021-11-18 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5214743A (en) * 1989-10-25 1993-05-25 Hitachi, Ltd. Information processing apparatus
CN105139864A (en) * 2015-08-17 2015-12-09 北京天诚盛业科技有限公司 Voice recognition method and voice recognition device
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107068167A (en) * 2017-03-13 2017-08-18 广东顺德中山大学卡内基梅隆大学国际联合研究院 Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5214743A (en) * 1989-10-25 1993-05-25 Hitachi, Ltd. Information processing apparatus
CN105139864A (en) * 2015-08-17 2015-12-09 北京天诚盛业科技有限公司 Voice recognition method and voice recognition device
CN106328122A (en) * 2016-08-19 2017-01-11 深圳市唯特视科技有限公司 Voice identification method using long-short term memory model recurrent neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TARA N. SAINATH等: "Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks", 《ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2015 IEEE INTERNATIONAL CONFERENCE ON》 *
杜朦旭: "感冒病人嗓音的特征提取与识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018166316A1 (en) * 2017-03-13 2018-09-20 佛山市顺德区中山大学研究院 Speaker's flu symptoms recognition method fused with multiple end-to-end neural network structures
CN108053841A (en) * 2017-10-23 2018-05-18 平安科技(深圳)有限公司 The method and application server of disease forecasting are carried out using voice
CN109960910A (en) * 2017-12-14 2019-07-02 广东欧珀移动通信有限公司 Method of speech processing, device, storage medium and terminal device
CN109960910B (en) * 2017-12-14 2021-06-08 Oppo广东移动通信有限公司 Voice processing method, device, storage medium and terminal equipment
CN109086892B (en) * 2018-06-15 2022-02-18 中山大学 General dependency tree-based visual problem reasoning model and system
CN109086892A (en) * 2018-06-15 2018-12-25 中山大学 It is a kind of based on the visual problem inference pattern and system that typically rely on tree
CN108899051B (en) * 2018-06-26 2020-06-16 北京大学深圳研究生院 Speech emotion recognition model and recognition method based on joint feature representation
CN109192226A (en) * 2018-06-26 2019-01-11 深圳大学 A kind of signal processing method and device
CN108899051A (en) * 2018-06-26 2018-11-27 北京大学深圳研究生院 A kind of speech emotion recognition model and recognition methods based on union feature expression
CN109256118A (en) * 2018-10-22 2019-01-22 江苏师范大学 End-to-end Chinese dialects identifying system and method based on production auditory model
CN109256118B (en) * 2018-10-22 2021-06-25 江苏师范大学 End-to-end Chinese dialect identification system and method based on generative auditory model
CN109282837A (en) * 2018-10-24 2019-01-29 福州大学 Bragg grating based on LSTM network interlocks the demodulation method of spectrum
CN111028859A (en) * 2019-12-15 2020-04-17 中北大学 Hybrid neural network vehicle type identification method based on audio feature fusion
CN116110437A (en) * 2023-04-14 2023-05-12 天津大学 Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics

Also Published As

Publication number Publication date
WO2018166316A1 (en) 2018-09-20

Similar Documents

Publication Publication Date Title
CN107068167A (en) Merge speaker's cold symptoms recognition methods of a variety of end-to-end neural network structures
Qian et al. Very deep convolutional neural networks for noise robust speech recognition
CN109272988B (en) Voice recognition method based on multi-path convolution neural network
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN106847309A (en) A kind of speech-emotion recognition method
CN105321525B (en) A kind of system and method reducing VOIP communication resource expense
CN106952649A (en) Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN107146601A (en) A kind of rear end i vector Enhancement Methods for Speaker Recognition System
CN108766419A (en) A kind of abnormal speech detection method based on deep learning
CN106952643A (en) A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN109243467A (en) Sound-groove model construction method, method for recognizing sound-groove and system
CN108847244A (en) Voiceprint recognition method and system based on MFCC and improved BP neural network
CN111048097B (en) Twin network voiceprint recognition method based on 3D convolution
CN109036460A (en) Method of speech processing and device based on multi-model neural network
CN109785852A (en) A kind of method and system enhancing speaker's voice
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN112017682A (en) Single-channel voice simultaneous noise reduction and reverberation removal system
CN107039036A (en) A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN109559755A (en) A kind of sound enhancement method based on DNN noise classification
CN110544482A (en) single-channel voice separation system
CN106898355A (en) A kind of method for distinguishing speek person based on two modelings
Sukhwal et al. Comparative study of different classifiers based speaker recognition system using modified MFCC for noisy environment
CN113763965A (en) Speaker identification method with multiple attention characteristics fused
CN110189766A (en) A kind of voice style transfer method neural network based
Zheng et al. MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818

RJ01 Rejection of invention patent application after publication