CN108962278A - A kind of hearing aid sound scene classification method - Google Patents
A kind of hearing aid sound scene classification method Download PDFInfo
- Publication number
- CN108962278A CN108962278A CN201810667959.9A CN201810667959A CN108962278A CN 108962278 A CN108962278 A CN 108962278A CN 201810667959 A CN201810667959 A CN 201810667959A CN 108962278 A CN108962278 A CN 108962278A
- Authority
- CN
- China
- Prior art keywords
- layer
- network
- self
- sound scene
- sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 210000002569 neuron Anatomy 0.000 claims abstract description 16
- 230000009467 reduction Effects 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 238000001228 spectrum Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 15
- 238000010276 construction Methods 0.000 claims description 5
- 238000009826 distribution Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 5
- 230000036461 convulsion Effects 0.000 claims 1
- 230000017260 vegetative to reproductive phase transition of meristem Effects 0.000 claims 1
- 230000006872 improvement Effects 0.000 description 4
- 206010011878 Deafness Diseases 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
- H04R25/50—Customised settings for obtaining desired overall acoustical characteristics
- H04R25/505—Customised settings for obtaining desired overall acoustical characteristics using digital signal processing
- H04R25/507—Customised settings for obtaining desired overall acoustical characteristics using digital signal processing implemented by neural network or fuzzy logic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Neurosurgery (AREA)
- Otolaryngology (AREA)
- Fuzzy Systems (AREA)
- Automation & Control Theory (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of hearing aid sound scene classification method, step includes: 1) to extract prosodic features, sound quality feature and the spectrum signature of input voice;2) the self-editing code neural network of multilayer stack is constructed;3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, carry out sound scene Recognition.The present invention is using a kind of improved stack from coding structure: first layer, from the coding study one hiding feature bigger than input dimension, makes noise reduction from coding study not by input dimension using noise reduction;The second layer, from encoding, learns sparsity feature using sparse from a large amount of hidden layer neurons.Method uses acoustic feature to carry out layer-by-layer pre-training to network first, achievees the purpose that layer-by-layer network parameter initialization, is then finely adjusted by back-propagation algorithm to whole network.Experimental result shows that this method can effectively identify hearing aid sound field scape.
Description
Technical field
The present invention relates to a kind of methods of Audio Signal Processing, more particularly to a kind of hearing aid sound scene classification method.
Background technique
In digital deaf-aid system, usually need to adjust different signals in the different scene such as voice, noise, music
Processing strategie and parameter, and the ability that system classifies automatically to sound scenery then constrains system performance.High performance number
Word hearing aid can handle sound, improve signal-to-noise ratio, improve and use according to current sound scene automatic switching program, adjusting parameter
Family experience.
In recent years, for digital deaf-aid application, many scholars study sound scenery sorting algorithm.These sides
Method has their own characteristics each, and tests database used and also has nothing in common with each other.Many scholars study the selection and classification of acoustical characteristic parameters collection
The foundation of model.The performance for being suitble to the feature for distinguishing sound scenery to can be improved entire categorizing system is reasonably selected, is reduced
The calculation amount of model.In these researchs, short-time energy, linear regression coeffficient, zero-crossing rate, fundamental frequency, formant, entropy letter
Breath, cepstrum information etc. are all main features to be used.Many scholars also propose various sorting algorithms, such as artificial mind to sound scenery
Through network, support vector machines, hidden Markov model and mixed Gauss model etc..But known based on feature extraction and mode
Method for distinguishing causes digital deaf-aid calculation amount to become larger, real-time be deteriorated, in systems in practice usually due to power consumption is excessive can not
Using.In recent years, deep learning causes the extensive concern of educational circles and industry in the immense success of image and voice field.It is some
Scholar has started to identify deep neural network applied to ambient sound, and obtains certain achievement.But the research has just been in
Step section, there are also many work to be left to be desired.
Summary of the invention
It is an object of the invention to overcome deficiency in the prior art, a kind of hearing aid sound scene classification method is provided, is solved
The technical problem that certainly sound field scape classification effectiveness is low in the prior art and reliability is low.
In order to solve the above technical problems, the technical scheme adopted by the invention is that: a kind of hearing aid sound scene classification method,
The following steps are included:
1) prosodic features, sound quality feature and the spectrum signature of input voice are extracted;
2) the self-editing code neural network of multilayer stack is constructed;
3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, carry out sound
Scene Recognition.
Further, the building of the self-editing code neural network of multilayer stack uses following special construction:
1) first layer is noise reduction from coding layer, construction method are as follows: first by certain positions in input signal x by random
It sets 0 and is expressed asRecycle encoder function f (x) willBeing mapped as hidden layer indicates h;Then, by decoder g, obtain x's
Reconstruction signal z;Training self-encoding encoder objective function with reconstruction error L (z, x)=| | z-x | |2It indicates;Coding h after training
It is used to sound scene classification as feature;
2) second layer is sparse from coding layer, construction method are as follows: setting structure parameter first, hidden layer neuron j's swashs
Activity is aj(xi)=s (Wi,jxi+ b), the average active degree of hidden neuron jForIt is dilute secondly by being arranged
Property parameter ρ is dredged, and penalty factor is set and punishes thoseBiggish hidden unit is differed with ρ, so that hidden neuron
Average active degree is maintained at smaller level, and the method for measuring difference between two distributions is
Wherein M is the number of neuron in hidden layer;
3) the overall cost function of network isWherein β controls sparsity and punishes
Weight of the penalty factor in cost function, W are network weight, and b is network offset amount, and suitable parameter can be obtained by training
(W,b)。
Compared with prior art, the beneficial effects obtained by the present invention are as follows being: 1) building is based on improving stack from coding structure
Deep neural network, which can extract more sparse and strong quadratic character, to remain spy to the full extent
Reference breath, while improving efficiency of algorithm;2) learnt by low-dimensional feature, using noise reduction from encoding characteristics, behind plus
The less feature of more sparse but information loss can be obtained in the case where excessively not reducing dimension by entering sparse self-encoding encoder, in this way
The existing noise reduction of structure from coding advantage combine again it is sparse from encode the advantages of.
In conclusion it is of the invention based on improve stack from coding artificial neural sound scene classification method have compared with
High accuracy of identification, and the scalability of system is strong.Only need to modify the evaluation and test output dimension of convolutional neural networks, so that it may real
The identification work such as existing voice mood.It has the advantages that above-mentioned many and practical value, and there are no in congenic method similar
Design publish or use and really belong to innovation, have biggish improvement, technically have large improvement, there is the wide of industry
General utility value is really a new and innovative, progressive, practical new design.
Detailed description of the invention
Fig. 1 is self-encoding encoder schematic diagram of the present invention.
Fig. 2 is improved stack of the present invention from coding structure.
Fig. 3 is that improved stack of the present invention encodes schematic diagram certainly.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and not intended to limit the protection scope of the present invention.
As shown in Figure 1, from coder be a kind of unsupervised learning algorithm, one identity function of its trial learning.From coding
Learn an encoder and a decoder, f, g is used to indicate respectively.Assuming that there is input data x, x can must be compiled after passing through encoder f
Code h=f (x), then by decoder g, obtain the reconstruction signal z of x, i.e. z=g (h)=g (f (x)).The mistake of training self-encoding encoder
Cheng Zhong, objective function indicate with reconstruction error L, use herein mean square error as reconstruction error L (z, x)=| | z-x | |2, but
It is, for binary input vectors either x ∈ [0,1]NCross entropy can also be used[6] it is used as reconstruction error.Pass through minimum
Objective function trains self-encoding encoder.When self-encoding encoder training after the completion of, coding h may act as feature be used to classify, often this
The original data characteristics of the aspect ratio of sample has stronger robustness.
As shown in Fig. 2, improved stacking-type includes input layer from coding structure, noise reduction is sparse from coding layer from coding layer
And output layer.Improved stacking-type is as shown in Figure 3 from coding principle.
Wherein, input layer is input voice basic feature information, includes prosodic features, sound quality feature and spectrum signature.Its
In, prosodic features includes pitch period, amplitude and pronunciation duration etc.;Sound quality feature includes formant, energy and zero-crossing rate
Deng;Spectrum signature includes linear cepstrum coefficient LPCC, MFCC and difference MFCC etc..
The structure second layer is that noise reduction encodes certainly, by the random some noises of addition of input x or with certain
Probability erasing inputs the data in certain dimensions, with impaired inputX is rebuild.Noise reduction self-encoding encoder is more likely to learn
It practises the distribution of data rather than the dimension of h, which limits, to be indicated to hidden layer.0 is set simultaneously by random in certain positions in input signal
It is expressed asIt will by function f (x)Being mapped as hidden layer indicates h.It is forced it is worth noting that, indicating that h maps to obtain z by hidden layer
Close is notBut x, this also be reflected in the solution of objective function L (x, z).
The third layer of the structure is sparse from encoding, and for the rarefaction representation of learning characteristic, coding certainly is added certain dilute
Property limitation is dredged, more preferably performance will be shown in e-learning.If ajIndicate the activity of hidden layer neuron j[3], aj(xi)
=s (Wi,jxi+ b) indicate that in given input be xiIn the case of hidden layer neuron j activity.In order to describe this characteristic, calculate
Method introducesIndicate the average active degree of hidden neuron j, wherein N is the dimension for inputting x, and expression formula is
By modified objective function, penalty factor is added, sparsity limitation may be implemented.Sparsity parameter ρ, algorithm are set
Study so thatSparsity can be arranged to hidden neuron j to limit.
Those are punished by the way that penalty factor is arrangedBiggish hidden unit is differed with ρ, so that hidden neuron
Average active degree is maintained at smaller level.Measuring the method for difference between two distributions is
In formula, M is the number of neuron in hidden layer.
So far, overall cost function can be expressed asWherein, W is network
Weight, b are network offset amount, and suitable parameter (W, b) can be obtained by training.
The 4th layer of the structure is output layer, calculates the posteriority Bayesian probability distribution of sound field scape sample to be identified, calculates
Process isWherein, x(i)For i-th of sample to be identified by sparse from coding
The output vector that layer obtains, θ1,...,θkTo export layer parameter, k is ambient sound categorical measure.
Effect of the invention can be further illustrated by experiment.
Sound scenery classification experiments have been carried out using the audio database that system is recorded in advance.Have in database pure voice,
Noisy speech, pure noise three types audio file, pure noise using in noisex92 noise library white noise, make an uproar in tank
Sound, dining room noise and high frequency channel noise.When signal-to-noise ratio is 0dB, either in pure voice, pure noise or noisy language
Sound field scape, classification accuracy rate have been above 95%.
In conclusion a kind of hearing aid sound scene classification method of the present invention uses a kind of improved stack from coding structure:
First layer, from the coding study one hiding feature bigger than input dimension, ties up noise reduction by input from coding study using noise reduction
Several;The second layer, from encoding, learns sparsity feature using sparse from a large amount of hidden layer neurons.Method uses acoustics first
Feature carries out layer-by-layer pre-training to network, achievees the purpose that layer-by-layer network parameter initialization, then passes through back-propagation algorithm pair
Whole network is finely adjusted.Experimental result shows that this method can effectively identify hearing aid sound field scape.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations
Also it should be regarded as protection scope of the present invention.
Claims (2)
1. a kind of hearing aid sound scene classification method, comprising the following steps:
1) prosodic features, sound quality feature and the spectrum signature of input voice are extracted;
2) the self-editing code neural network of multilayer stack is constructed;
3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, sound field scape is carried out
Identification.
2. a kind of hearing aid sound scene classification method according to claim 1, which is characterized in that the self-editing code mind of multilayer stack
Building through network uses such as flowering structure:
1) first layer is noise reduction from coding layer, construction method are as follows: certain positions in input signal x are set 0 by random first
And it is expressed asRecycle encoder function f (x) willBeing mapped as hidden layer indicates h;Then, by decoder g, the weight of x is obtained
Build signal z;Training self-encoding encoder objective function with reconstruction error L (z, x)=| | z-x | |2It indicates;Coding h after training makees
It is characterized for sound scene classification;
2) second layer is sparse from coding layer, construction method are as follows: setting structure parameter first, the activity of hidden layer neuron j
For aj(xi)=s (Wi,jxi+ b), the average active degree of hidden neuron jForJoin secondly by setting sparsity
Number ρ, and penalty factor is set and punishes thoseBiggish hidden unit is differed with ρ, so that the average work of hidden neuron
Jerk is maintained at smaller level, and the method for measuring difference between two distributions is
Wherein M is the number of neuron in hidden layer;
3) the overall cost function of network isWherein β control sparsity punishment because
Weight of the son in cost function, W are network weight, and b is network offset amount, and suitable parameter (W, b) can be obtained by training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810667959.9A CN108962278A (en) | 2018-06-26 | 2018-06-26 | A kind of hearing aid sound scene classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810667959.9A CN108962278A (en) | 2018-06-26 | 2018-06-26 | A kind of hearing aid sound scene classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108962278A true CN108962278A (en) | 2018-12-07 |
Family
ID=64486540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810667959.9A Withdrawn CN108962278A (en) | 2018-06-26 | 2018-06-26 | A kind of hearing aid sound scene classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108962278A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859767A (en) * | 2019-03-06 | 2019-06-07 | 哈尔滨工业大学(深圳) | A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid |
CN109859771A (en) * | 2019-01-15 | 2019-06-07 | 华南理工大学 | A kind of sound field scape clustering method of combined optimization deep layer transform characteristics and cluster process |
CN110782917A (en) * | 2019-11-01 | 2020-02-11 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN111144482A (en) * | 2019-12-26 | 2020-05-12 | 惠州市锦好医疗科技股份有限公司 | Scene matching method and device for digital hearing aid and computer equipment |
CN111491245A (en) * | 2020-03-13 | 2020-08-04 | 天津大学 | Digital hearing aid sound field identification algorithm based on cyclic neural network and hardware implementation method |
-
2018
- 2018-06-26 CN CN201810667959.9A patent/CN108962278A/en not_active Withdrawn
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859771A (en) * | 2019-01-15 | 2019-06-07 | 华南理工大学 | A kind of sound field scape clustering method of combined optimization deep layer transform characteristics and cluster process |
CN109859771B (en) * | 2019-01-15 | 2021-03-30 | 华南理工大学 | Sound scene clustering method for jointly optimizing deep layer transformation characteristics and clustering process |
CN109859767A (en) * | 2019-03-06 | 2019-06-07 | 哈尔滨工业大学(深圳) | A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid |
WO2020177371A1 (en) * | 2019-03-06 | 2020-09-10 | 哈尔滨工业大学(深圳) | Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium |
CN109859767B (en) * | 2019-03-06 | 2020-10-13 | 哈尔滨工业大学(深圳) | Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid |
CN110782917A (en) * | 2019-11-01 | 2020-02-11 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN110782917B (en) * | 2019-11-01 | 2022-07-12 | 广州美读信息技术有限公司 | Poetry reciting style classification method and system |
CN111144482A (en) * | 2019-12-26 | 2020-05-12 | 惠州市锦好医疗科技股份有限公司 | Scene matching method and device for digital hearing aid and computer equipment |
CN111144482B (en) * | 2019-12-26 | 2023-10-27 | 惠州市锦好医疗科技股份有限公司 | Scene matching method and device for digital hearing aid and computer equipment |
CN111491245A (en) * | 2020-03-13 | 2020-08-04 | 天津大学 | Digital hearing aid sound field identification algorithm based on cyclic neural network and hardware implementation method |
CN111491245B (en) * | 2020-03-13 | 2022-03-04 | 天津大学 | Digital hearing aid sound field identification algorithm based on cyclic neural network and implementation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108962278A (en) | A kind of hearing aid sound scene classification method | |
Melin et al. | Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms. | |
CN108831443B (en) | Mobile recording equipment source identification method based on stacked self-coding network | |
CN104318927A (en) | Anti-noise low-bitrate speech coding method and decoding method | |
CN110111797A (en) | Method for distinguishing speek person based on Gauss super vector and deep neural network | |
CN109887489A (en) | Speech dereverberation method based on the depth characteristic for generating confrontation network | |
Wang et al. | Speaker recognition based on MFCC and BP neural networks | |
CN108806694A (en) | A kind of teaching Work attendance method based on voice recognition | |
CN112735435A (en) | Voiceprint open set identification method with unknown class internal division capability | |
Ohnaka et al. | Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images | |
Watrous | Speaker normalization and adaptation using second-order connectionist networks | |
Palo et al. | Comparison of neural network models for speech emotion recognition | |
Abumallouh et al. | Deep neural network combined posteriors for speakers' age and gender classification | |
Bie et al. | DNN-based voice activity detection for speaker recognition | |
Arun Sankar et al. | Speech sound classification and estimation of optimal order of LPC using neural network | |
Bennani | Text-independent talker identification system combining connectionist and conventional models | |
CN110060692A (en) | A kind of Voiceprint Recognition System and its recognition methods | |
Yang et al. | The research of voiceprint recognition based on genetic optimized RBF neural networks | |
Gowda et al. | Continuous kannada speech segmentation and speech recognition based on threshold using MFCC and VQ | |
Choi | Discrimination algorithm using voiced detection method and time–delay neural network system by 3 FFT sub–bands | |
Islam et al. | Hybrid feature and decision fusion based audio-visual speaker identification in challenging environment | |
Bapineedu | Analysis of Lombard effect speech and its application in speaker verification for imposter detection | |
CN117854509B (en) | Training method and device for whisper speaker recognition model | |
Tran et al. | A fuzzy approach to speaker verification | |
Patterson et al. | Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181207 |