CN108962278A - A kind of hearing aid sound scene classification method - Google Patents

A kind of hearing aid sound scene classification method Download PDF

Info

Publication number
CN108962278A
CN108962278A CN201810667959.9A CN201810667959A CN108962278A CN 108962278 A CN108962278 A CN 108962278A CN 201810667959 A CN201810667959 A CN 201810667959A CN 108962278 A CN108962278 A CN 108962278A
Authority
CN
China
Prior art keywords
layer
network
self
sound scene
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810667959.9A
Other languages
Chinese (zh)
Inventor
奚吉
庞学东
王文琴
李亦飞
史书明
秦福高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Institute of Technology
Original Assignee
Changzhou Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Institute of Technology filed Critical Changzhou Institute of Technology
Priority to CN201810667959.9A priority Critical patent/CN108962278A/en
Publication of CN108962278A publication Critical patent/CN108962278A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/50Customised settings for obtaining desired overall acoustical characteristics
    • H04R25/505Customised settings for obtaining desired overall acoustical characteristics using digital signal processing
    • H04R25/507Customised settings for obtaining desired overall acoustical characteristics using digital signal processing implemented by neural network or fuzzy logic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Neurosurgery (AREA)
  • Otolaryngology (AREA)
  • Fuzzy Systems (AREA)
  • Automation & Control Theory (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a kind of hearing aid sound scene classification method, step includes: 1) to extract prosodic features, sound quality feature and the spectrum signature of input voice;2) the self-editing code neural network of multilayer stack is constructed;3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, carry out sound scene Recognition.The present invention is using a kind of improved stack from coding structure: first layer, from the coding study one hiding feature bigger than input dimension, makes noise reduction from coding study not by input dimension using noise reduction;The second layer, from encoding, learns sparsity feature using sparse from a large amount of hidden layer neurons.Method uses acoustic feature to carry out layer-by-layer pre-training to network first, achievees the purpose that layer-by-layer network parameter initialization, is then finely adjusted by back-propagation algorithm to whole network.Experimental result shows that this method can effectively identify hearing aid sound field scape.

Description

A kind of hearing aid sound scene classification method
Technical field
The present invention relates to a kind of methods of Audio Signal Processing, more particularly to a kind of hearing aid sound scene classification method.
Background technique
In digital deaf-aid system, usually need to adjust different signals in the different scene such as voice, noise, music Processing strategie and parameter, and the ability that system classifies automatically to sound scenery then constrains system performance.High performance number Word hearing aid can handle sound, improve signal-to-noise ratio, improve and use according to current sound scene automatic switching program, adjusting parameter Family experience.
In recent years, for digital deaf-aid application, many scholars study sound scenery sorting algorithm.These sides Method has their own characteristics each, and tests database used and also has nothing in common with each other.Many scholars study the selection and classification of acoustical characteristic parameters collection The foundation of model.The performance for being suitble to the feature for distinguishing sound scenery to can be improved entire categorizing system is reasonably selected, is reduced The calculation amount of model.In these researchs, short-time energy, linear regression coeffficient, zero-crossing rate, fundamental frequency, formant, entropy letter Breath, cepstrum information etc. are all main features to be used.Many scholars also propose various sorting algorithms, such as artificial mind to sound scenery Through network, support vector machines, hidden Markov model and mixed Gauss model etc..But known based on feature extraction and mode Method for distinguishing causes digital deaf-aid calculation amount to become larger, real-time be deteriorated, in systems in practice usually due to power consumption is excessive can not Using.In recent years, deep learning causes the extensive concern of educational circles and industry in the immense success of image and voice field.It is some Scholar has started to identify deep neural network applied to ambient sound, and obtains certain achievement.But the research has just been in Step section, there are also many work to be left to be desired.
Summary of the invention
It is an object of the invention to overcome deficiency in the prior art, a kind of hearing aid sound scene classification method is provided, is solved The technical problem that certainly sound field scape classification effectiveness is low in the prior art and reliability is low.
In order to solve the above technical problems, the technical scheme adopted by the invention is that: a kind of hearing aid sound scene classification method, The following steps are included:
1) prosodic features, sound quality feature and the spectrum signature of input voice are extracted;
2) the self-editing code neural network of multilayer stack is constructed;
3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, carry out sound Scene Recognition.
Further, the building of the self-editing code neural network of multilayer stack uses following special construction:
1) first layer is noise reduction from coding layer, construction method are as follows: first by certain positions in input signal x by random It sets 0 and is expressed asRecycle encoder function f (x) willBeing mapped as hidden layer indicates h;Then, by decoder g, obtain x's Reconstruction signal z;Training self-encoding encoder objective function with reconstruction error L (z, x)=| | z-x | |2It indicates;Coding h after training It is used to sound scene classification as feature;
2) second layer is sparse from coding layer, construction method are as follows: setting structure parameter first, hidden layer neuron j's swashs Activity is aj(xi)=s (Wi,jxi+ b), the average active degree of hidden neuron jForIt is dilute secondly by being arranged Property parameter ρ is dredged, and penalty factor is set and punishes thoseBiggish hidden unit is differed with ρ, so that hidden neuron Average active degree is maintained at smaller level, and the method for measuring difference between two distributions is Wherein M is the number of neuron in hidden layer;
3) the overall cost function of network isWherein β controls sparsity and punishes Weight of the penalty factor in cost function, W are network weight, and b is network offset amount, and suitable parameter can be obtained by training (W,b)。
Compared with prior art, the beneficial effects obtained by the present invention are as follows being: 1) building is based on improving stack from coding structure Deep neural network, which can extract more sparse and strong quadratic character, to remain spy to the full extent Reference breath, while improving efficiency of algorithm;2) learnt by low-dimensional feature, using noise reduction from encoding characteristics, behind plus The less feature of more sparse but information loss can be obtained in the case where excessively not reducing dimension by entering sparse self-encoding encoder, in this way The existing noise reduction of structure from coding advantage combine again it is sparse from encode the advantages of.
In conclusion it is of the invention based on improve stack from coding artificial neural sound scene classification method have compared with High accuracy of identification, and the scalability of system is strong.Only need to modify the evaluation and test output dimension of convolutional neural networks, so that it may real The identification work such as existing voice mood.It has the advantages that above-mentioned many and practical value, and there are no in congenic method similar Design publish or use and really belong to innovation, have biggish improvement, technically have large improvement, there is the wide of industry General utility value is really a new and innovative, progressive, practical new design.
Detailed description of the invention
Fig. 1 is self-encoding encoder schematic diagram of the present invention.
Fig. 2 is improved stack of the present invention from coding structure.
Fig. 3 is that improved stack of the present invention encodes schematic diagram certainly.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.
As shown in Figure 1, from coder be a kind of unsupervised learning algorithm, one identity function of its trial learning.From coding Learn an encoder and a decoder, f, g is used to indicate respectively.Assuming that there is input data x, x can must be compiled after passing through encoder f Code h=f (x), then by decoder g, obtain the reconstruction signal z of x, i.e. z=g (h)=g (f (x)).The mistake of training self-encoding encoder Cheng Zhong, objective function indicate with reconstruction error L, use herein mean square error as reconstruction error L (z, x)=| | z-x | |2, but It is, for binary input vectors either x ∈ [0,1]NCross entropy can also be used[6] it is used as reconstruction error.Pass through minimum Objective function trains self-encoding encoder.When self-encoding encoder training after the completion of, coding h may act as feature be used to classify, often this The original data characteristics of the aspect ratio of sample has stronger robustness.
As shown in Fig. 2, improved stacking-type includes input layer from coding structure, noise reduction is sparse from coding layer from coding layer And output layer.Improved stacking-type is as shown in Figure 3 from coding principle.
Wherein, input layer is input voice basic feature information, includes prosodic features, sound quality feature and spectrum signature.Its In, prosodic features includes pitch period, amplitude and pronunciation duration etc.;Sound quality feature includes formant, energy and zero-crossing rate Deng;Spectrum signature includes linear cepstrum coefficient LPCC, MFCC and difference MFCC etc..
The structure second layer is that noise reduction encodes certainly, by the random some noises of addition of input x or with certain Probability erasing inputs the data in certain dimensions, with impaired inputX is rebuild.Noise reduction self-encoding encoder is more likely to learn It practises the distribution of data rather than the dimension of h, which limits, to be indicated to hidden layer.0 is set simultaneously by random in certain positions in input signal It is expressed asIt will by function f (x)Being mapped as hidden layer indicates h.It is forced it is worth noting that, indicating that h maps to obtain z by hidden layer Close is notBut x, this also be reflected in the solution of objective function L (x, z).
The third layer of the structure is sparse from encoding, and for the rarefaction representation of learning characteristic, coding certainly is added certain dilute Property limitation is dredged, more preferably performance will be shown in e-learning.If ajIndicate the activity of hidden layer neuron j[3], aj(xi) =s (Wi,jxi+ b) indicate that in given input be xiIn the case of hidden layer neuron j activity.In order to describe this characteristic, calculate Method introducesIndicate the average active degree of hidden neuron j, wherein N is the dimension for inputting x, and expression formula is
By modified objective function, penalty factor is added, sparsity limitation may be implemented.Sparsity parameter ρ, algorithm are set Study so thatSparsity can be arranged to hidden neuron j to limit.
Those are punished by the way that penalty factor is arrangedBiggish hidden unit is differed with ρ, so that hidden neuron Average active degree is maintained at smaller level.Measuring the method for difference between two distributions is In formula, M is the number of neuron in hidden layer.
So far, overall cost function can be expressed asWherein, W is network Weight, b are network offset amount, and suitable parameter (W, b) can be obtained by training.
The 4th layer of the structure is output layer, calculates the posteriority Bayesian probability distribution of sound field scape sample to be identified, calculates Process isWherein, x(i)For i-th of sample to be identified by sparse from coding The output vector that layer obtains, θ1,...,θkTo export layer parameter, k is ambient sound categorical measure.
Effect of the invention can be further illustrated by experiment.
Sound scenery classification experiments have been carried out using the audio database that system is recorded in advance.Have in database pure voice, Noisy speech, pure noise three types audio file, pure noise using in noisex92 noise library white noise, make an uproar in tank Sound, dining room noise and high frequency channel noise.When signal-to-noise ratio is 0dB, either in pure voice, pure noise or noisy language Sound field scape, classification accuracy rate have been above 95%.
In conclusion a kind of hearing aid sound scene classification method of the present invention uses a kind of improved stack from coding structure: First layer, from the coding study one hiding feature bigger than input dimension, ties up noise reduction by input from coding study using noise reduction Several;The second layer, from encoding, learns sparsity feature using sparse from a large amount of hidden layer neurons.Method uses acoustics first Feature carries out layer-by-layer pre-training to network, achievees the purpose that layer-by-layer network parameter initialization, then passes through back-propagation algorithm pair Whole network is finely adjusted.Experimental result shows that this method can effectively identify hearing aid sound field scape.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (2)

1. a kind of hearing aid sound scene classification method, comprising the following steps:
1) prosodic features, sound quality feature and the spectrum signature of input voice are extracted;
2) the self-editing code neural network of multilayer stack is constructed;
3) sound scene Recognition: by the self-editing code neural network of multilayer stack of the acoustic feature input building of extraction, sound field scape is carried out Identification.
2. a kind of hearing aid sound scene classification method according to claim 1, which is characterized in that the self-editing code mind of multilayer stack Building through network uses such as flowering structure:
1) first layer is noise reduction from coding layer, construction method are as follows: certain positions in input signal x are set 0 by random first And it is expressed asRecycle encoder function f (x) willBeing mapped as hidden layer indicates h;Then, by decoder g, the weight of x is obtained Build signal z;Training self-encoding encoder objective function with reconstruction error L (z, x)=| | z-x | |2It indicates;Coding h after training makees It is characterized for sound scene classification;
2) second layer is sparse from coding layer, construction method are as follows: setting structure parameter first, the activity of hidden layer neuron j For aj(xi)=s (Wi,jxi+ b), the average active degree of hidden neuron jForJoin secondly by setting sparsity Number ρ, and penalty factor is set and punishes thoseBiggish hidden unit is differed with ρ, so that the average work of hidden neuron Jerk is maintained at smaller level, and the method for measuring difference between two distributions is Wherein M is the number of neuron in hidden layer;
3) the overall cost function of network isWherein β control sparsity punishment because Weight of the son in cost function, W are network weight, and b is network offset amount, and suitable parameter (W, b) can be obtained by training.
CN201810667959.9A 2018-06-26 2018-06-26 A kind of hearing aid sound scene classification method Withdrawn CN108962278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810667959.9A CN108962278A (en) 2018-06-26 2018-06-26 A kind of hearing aid sound scene classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810667959.9A CN108962278A (en) 2018-06-26 2018-06-26 A kind of hearing aid sound scene classification method

Publications (1)

Publication Number Publication Date
CN108962278A true CN108962278A (en) 2018-12-07

Family

ID=64486540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810667959.9A Withdrawn CN108962278A (en) 2018-06-26 2018-06-26 A kind of hearing aid sound scene classification method

Country Status (1)

Country Link
CN (1) CN108962278A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859767A (en) * 2019-03-06 2019-06-07 哈尔滨工业大学(深圳) A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid
CN109859771A (en) * 2019-01-15 2019-06-07 华南理工大学 A kind of sound field scape clustering method of combined optimization deep layer transform characteristics and cluster process
CN110782917A (en) * 2019-11-01 2020-02-11 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN111144482A (en) * 2019-12-26 2020-05-12 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN111491245A (en) * 2020-03-13 2020-08-04 天津大学 Digital hearing aid sound field identification algorithm based on cyclic neural network and hardware implementation method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859771A (en) * 2019-01-15 2019-06-07 华南理工大学 A kind of sound field scape clustering method of combined optimization deep layer transform characteristics and cluster process
CN109859771B (en) * 2019-01-15 2021-03-30 华南理工大学 Sound scene clustering method for jointly optimizing deep layer transformation characteristics and clustering process
CN109859767A (en) * 2019-03-06 2019-06-07 哈尔滨工业大学(深圳) A kind of environment self-adaption neural network noise-reduction method, system and storage medium for digital deaf-aid
WO2020177371A1 (en) * 2019-03-06 2020-09-10 哈尔滨工业大学(深圳) Environment adaptive neural network noise reduction method and system for digital hearing aids, and storage medium
CN109859767B (en) * 2019-03-06 2020-10-13 哈尔滨工业大学(深圳) Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid
CN110782917A (en) * 2019-11-01 2020-02-11 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN110782917B (en) * 2019-11-01 2022-07-12 广州美读信息技术有限公司 Poetry reciting style classification method and system
CN111144482A (en) * 2019-12-26 2020-05-12 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN111144482B (en) * 2019-12-26 2023-10-27 惠州市锦好医疗科技股份有限公司 Scene matching method and device for digital hearing aid and computer equipment
CN111491245A (en) * 2020-03-13 2020-08-04 天津大学 Digital hearing aid sound field identification algorithm based on cyclic neural network and hardware implementation method
CN111491245B (en) * 2020-03-13 2022-03-04 天津大学 Digital hearing aid sound field identification algorithm based on cyclic neural network and implementation method

Similar Documents

Publication Publication Date Title
CN108962278A (en) A kind of hearing aid sound scene classification method
Melin et al. Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms.
CN108831443B (en) Mobile recording equipment source identification method based on stacked self-coding network
CN104318927A (en) Anti-noise low-bitrate speech coding method and decoding method
CN110111797A (en) Method for distinguishing speek person based on Gauss super vector and deep neural network
CN109887489A (en) Speech dereverberation method based on the depth characteristic for generating confrontation network
Wang et al. Speaker recognition based on MFCC and BP neural networks
CN108806694A (en) A kind of teaching Work attendance method based on voice recognition
CN112735435A (en) Voiceprint open set identification method with unknown class internal division capability
Ohnaka et al. Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Watrous Speaker normalization and adaptation using second-order connectionist networks
Palo et al. Comparison of neural network models for speech emotion recognition
Abumallouh et al. Deep neural network combined posteriors for speakers' age and gender classification
Bie et al. DNN-based voice activity detection for speaker recognition
Arun Sankar et al. Speech sound classification and estimation of optimal order of LPC using neural network
Bennani Text-independent talker identification system combining connectionist and conventional models
CN110060692A (en) A kind of Voiceprint Recognition System and its recognition methods
Yang et al. The research of voiceprint recognition based on genetic optimized RBF neural networks
Gowda et al. Continuous kannada speech segmentation and speech recognition based on threshold using MFCC and VQ
Choi Discrimination algorithm using voiced detection method and time–delay neural network system by 3 FFT sub–bands
Islam et al. Hybrid feature and decision fusion based audio-visual speaker identification in challenging environment
Bapineedu Analysis of Lombard effect speech and its application in speaker verification for imposter detection
CN117854509B (en) Training method and device for whisper speaker recognition model
Tran et al. A fuzzy approach to speaker verification
Patterson et al. Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181207