CN109448755A

CN109448755A - Artificial cochlea's auditory scene recognition methods

Info

Publication number: CN109448755A
Application number: CN201811276573.1A
Authority: CN
Inventors: 林和平; 许长建; 樊伟; 王澄; 刘根芳
Original assignee: Lishengte Medical Science & Tech Co Ltd
Current assignee: Lishengte Medical Science & Tech Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-03-08

Abstract

The invention discloses a kind of artificial cochlea's auditory scene recognition methods comprising following steps: (A) establishes the scene training UBM of standard；(B) voice signal is subjected to framing and windowing process；(C) pretreated voice signal is identified by frame；(D) VAD treated scene noise signal is subjected to characteristic vector pickup；(E) signal after feature extraction is handled in GMM-UBM system, obtains likelihood score value, finally identifies scene type.Artificial cochlea's auditory scene recognition methods is by establishing a series of models, it can identify different auditory scenes, instruction is provided for signal processing modules such as the enhancing of speech processor subsequent voice and speech strategies, match the signal processing of speech processor more with auditory scene, improve clarity, the intelligibility of the voice signal of patient in a noisy environment, the listening effect under music scenario also can be improved simultaneously, further improve the quality of life of artificial cave patient.

Description

Artificial cochlea's auditory scene recognition methods

Technical field

The present invention relates to a kind of auditory scene recognition methods more particularly to a kind of artificial cochlea's auditory scene recognition methods.

Background technique

Artificial cochlea is recognized in the world bilateral severe or pole profound sensorineural hearing loss patient to be made to restore to listen The unique effective ways and device felt.Existing artificial cochlea's operation workflow are as follows: sound is first converted to telecommunications by microphone acquisition Number, it by special digitized processing, is encoded according still further to certain strategy, is transmitted to body by being loaded in the transmitting coil after ear It is interior, it after the receiving coil of implant senses signal, is decoded by decoding chip, the stimulating electrode of implant is made to generate electric current, To stimulate auditory nerve to generate the sense of hearing.Due to the limitation of use environment, environment noise is necessarily adulterated in sound, is needed to sound Signal carries out certain algorithm optimization, but in view of the diversification of use environment, if only using single algorithm optimization, algorithm is excellent Signal after change is deviated with actual conditions sometimes, is unable to reach optimal auditory effect, therefore needs a kind of auditory scene Recognition methods so that different scenes use different optimization algorithms, have reached optimal auditory effect.

Summary of the invention

In view of the above drawbacks of the prior art, technical problem to be solved by the invention is to provide a kind of artificial cochleas to listen Feel scene recognition method, can identify different auditory scenes.

To achieve the above object, the present invention provides a kind of artificial cochlea's auditory scene recognition methods comprising following step Rapid: the various scene training signals of (A) model training program module collection form the scene training UBM of standard by EM algorithm； (B) voice signal is carried out framing and windowing process by preprocessor module；(C) VAD handler module is to pretreated Voice signal is identified by frame, identifies that the frame signal is scene noise signal or voice signal；(D) feature extraction program mould VAD treated scene noise signal is carried out characteristic vector pickup by block；(E) scene Recognition program module will be after feature extraction A part input UBM carries out related operation；A part input GMM operation, then in the related data in UBM and GMM data into Row operation forms new GMM；The data in UBM are compared with the data in new GMM later, obtain likelihood score value, final to know It Chu not scene type.

In stepb, which uses Hamming window or Hanning window.

Further, Hamming window:Wherein, the long N of window =256, frame pipettes 128.

In step C, which uses the VAD detection method based on short-time energy and short-time zero-crossing rate.

In step D, this feature vector, which extracts, uses MFCC or FBank.

Further, it the calculation method of the MFCC parameter of a frame scene noise signal: is calculated and is believed according to discrete Fourier transform Number discrete spectrum { S (ω) }；Frequency is divided into D=30 equal part by Bark scale, and calculates its centre frequency and edge frequency, Wherein, Bark scale Ω is with frequency f transformation relationIt is filtered using D triangle band logical Wave device does logarithmic energy output E (d) (d=1,2 ..., D) that convolution finds out each frequency range with discrete spectrum { S (ω) } respectively, In, the centre frequency and edge frequency of triangular filter are aligned with corresponding Bark frequency range；Logarithmic energy output to each frequency range Discrete cosine transform is carried out to obtainIt takes Preceding 16 dimension is used as characteristic parameter.

In step E, in GMM-UBM system, scene noise model modifies certain of UBM by Bayesian adaptation method A little parameters obtain, and adaptive algorithm is divided into two steps, and the first step is expectation process, calculate scene training data in each single Gauss of UBM Statistical parameter in distribution；Second step obtains the parameter of scene noise model with the parameter weighting of new statistical parameter and UBM, Method of weighting makes in final scene noise model, by adaptive its parameter of distribution of more scene training data close to survey Try the parameter of scene noise itself, and by its adaptive distribution parameter of less test data close to the parameter of UBM.

Further, UBM and trained vector sequence X={ x is given₁,x₂,...,x_T, each characteristic vector is calculated first to be belonged to The probability of any Gaussian Profile in UBM calculates i-th of Gaussian Profile in UBMSo Afterwards according to Pr (i | x_t) and x_tIt calculates for modifying weight, mean value and the statistical parameter of variance Finally, obtained by scene training data These new statistical parameters are used to update the model parameter of UBM Auto-adaptive parameter a_iControl is new The balance of old parameter, scale factor γ adjust weight, so that the sum of weight all after adaptive is 1, for i-th of Gauss Distribution is used for above-mentioned auto-adaptive parameter a_i, it is defined asWherein, r is a fixed value, controls the parameter of UBM Weight in adaptive, sets r=16.

Artificial cochlea's auditory scene recognition methods of the present invention can identify different sense of hearing fields by establishing a series of models Scape provides instruction for signal processing modules such as the enhancing of speech processor subsequent voice and speech strategies, makes the letter of speech processor Number processing is more matched with auditory scene, exports the stimulus signal being more consistent with practical auditory scene, patient is in noise for raising Clarity, the intelligibility of voice signal under environment, while the listening effect under music scenario also can be improved, further improve people The quality of life of work cochlea implantation patient.

It is described further below with reference to technical effect of the attached drawing to design of the invention, specific structure and generation, with It is fully understood from the purpose of the present invention, feature and effect.

Detailed description of the invention

Fig. 1 is the flow diagram of artificial cochlea's auditory scene recognition methods of the present invention.

Specific embodiment

The present invention provides a kind of artificial cochlea's auditory scene recognition methods, different auditory scene for identification, such as Classroom, street, music hall, market, railway station, food market etc..

Artificial cochlea's auditory scene recognition methods includes model training, pretreatment, VAD (Voice Activity Detection, voice activity detection) processing, feature extraction, five steps of scene Recognition.

Model training: the various scene training signals of model training program module collection (i.e. scene voice signal) establish field Jing Ku forms the scene training UBM of standard by EM (Expectation Maximization, greatest hope) algorithm (Universal Background Model, universal background model).

EM algorithm:

Feature vector set o=(o₁,o₂,...,o_T)；

Model λ:

GMM (Gaussian Mixture Model, gauss hybrid models) is distributed maximum likelihood function:

The weights omega of m-th of Gauss_m:

The mean value of m-th of Gauss

The variance of m-th of Gauss

Pretreatment: voice signal is carried out framing and windowing process by preprocessor module.

By taking system sampling frequency is 16kHz as an example.

The windowing process uses Hamming window, the long N=256 of window, and frame pipettes the long half of window, i.e., and 128.

Hamming window:

Other window functions such as Hanning window also can be used in the windowing process, and frame length and frame shifting can also be according to system need It is changed setting.

VAD processing: VAD handler module is identified pretreated voice signal by frame, identifies the frame signal For scene noise signal or voice signal, wherein the identification uses the detection side VAD based on short-time energy and short-time zero-crossing rate Method.

Feature extraction: VAD treated scene noise signal is carried out characteristic vector pickup by feature extraction program module, In, this feature vector, which extracts, uses MFCC (Mel-Frequency Cepstrum Coefficient, mel-frequency cepstrum system Number) or FBank (Mel-scale Filter Bank, Meier scale filter group).

The calculation method of the MFCC parameter of one frame scene noise signal is as follows:

(1) according to discrete Fourier transform calculate signal discrete spectrum S (ω) | ω=1,2 ..., N }；

(2) frequency is divided into D=30 equal part by Bark scale, and calculates its centre frequency and edge frequency, Bark is carved Spending Ω with frequency f transformation relation is

(3) the logarithm energy that convolution finds out each frequency range is done with discrete spectrum { S (ω) } respectively using D triangle bandpass filter Amount output E (d) (d=1,2 ..., D), the wherein centre frequency and edge frequency of triangular filter and corresponding Bark frequency range pair Together；

(4) discrete cosine transform is carried out to the logarithmic energy output of each frequency range to obtain

Take preceding 16 dimension as characteristic parameter.

Scene Recognition: a part input UBM after feature extraction is carried out related operation by scene Recognition program module；One Point input GMM operation, then data carry out operation in the related data in UBM and GMM, form new GMM；Later in UBM Data are compared with the data in new GMM, are obtained likelihood score value, are finally identified scene type, wherein in GMM-UBM system In system, scene noise model is obtained by certain parameters that Bayesian adaptation method modifies UBM, and adaptive algorithm is divided into two Step, the first step is expectation process, calculates statistical parameter of the scene training data in each single Gaussian Profile of UBM；Second step, with new Statistical parameter and the parameter weighting of UBM obtain the parameter of scene noise model, method of weighting makes final scene noise mould In type, by more scene training data it is adaptive be distributed its parameter close to test scene noise itself parameter, and by compared with Parameter of few its adaptive distribution parameter of test data close to UBM.

Adaptive approach is as follows, gives UBM and trained vector sequence X={ x₁,x₂,...,x_T, each feature is calculated first Vector belongs to the probability of any Gaussian Profile in UBM.To i-th of Gaussian Profile in UBM, calculate

Then according to Pr (i | x_t) and x_tIt calculates for modifying weight, mean value and the statistical parameter of variance

Finally, these the new statistical parameters obtained by scene training data are used to update the model parameter of UBM

Auto-adaptive parameter a_iThe balance of new and old parameter is controlled, scale factor γ adjusts weight, so that all after adaptive The sum of weight be 1.

For i-th of Gaussian Profile, it to be used for above-mentioned auto-adaptive parameter a_i, it is defined as

Wherein, r is a fixed value, controls the weight of the parameter of UBM in adaptive, sets r=16.Using data phase The auto-adaptive parameter of pass, so that being adaptively related to Gaussian Profile.If the probability number n of a distribution_iIt is smaller, then a_i → 0, parameter of the adaptive rear scene noise model parameters close to UBM.If the probability number n of a distribution_iIt is bigger, then a_i→ 1, scene noise model parameter is mainly determined by scene training data.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims

1. a kind of artificial cochlea's auditory scene recognition methods comprising following steps: (A) model training program module collection is various Scene training signal forms the scene training UBM of standard by EM algorithm；(B) preprocessor module carries out voice signal Framing and windowing process；(C) VAD handler module is identified pretreated voice signal by frame, identifies that the frame is believed Number be scene noise signal or voice signal；(D) feature extraction program module carries out VAD treated scene noise signal Characteristic vector pickup；(E) a part input UBM after feature extraction is carried out related operation by scene Recognition program module；One Point input GMM operation, then data carry out operation in the related data in UBM and GMM, form new GMM；Later in UBM Data are compared with the data in new GMM, are obtained likelihood score value, are finally identified scene type.

2. artificial cochlea's auditory scene recognition methods as described in claim 1, it is characterised in that: in stepb, at the adding window Reason uses Hamming window or Hanning window.

3. artificial cochlea's auditory scene recognition methods as claimed in claim 2, it is characterised in that: Hamming window:Wherein, the long N=256 of window, frame pipette 128.

4. artificial cochlea's auditory scene recognition methods as described in claim 1, it is characterised in that: in step C, which is adopted With the VAD detection method based on short-time energy and short-time zero-crossing rate.

5. artificial cochlea's auditory scene recognition methods as described in claim 1, it is characterised in that: in step D, this feature to Amount, which is extracted, uses MFCC or FBank.

6. artificial cochlea's auditory scene recognition methods as claimed in claim 5, it is characterised in that: a frame scene noise signal The calculation method of MFCC parameter: the discrete spectrum { S (ω) } of signal is calculated according to discrete Fourier transform；By Bark scale frequency It is divided into D=30 equal part, and calculates its centre frequency and edge frequency, wherein Bark scale Ω is with frequency f transformation relationConvolution is done with discrete spectrum { S (ω) } respectively using D triangle bandpass filter to find out The logarithmic energy of each frequency range exports E (d) (d=1,2 ..., D), wherein the centre frequency and edge frequency of triangular filter It is aligned with corresponding Bark frequency range；Discrete cosine transform is carried out to the logarithmic energy output of each frequency range to obtainTake preceding 16 dimension as characteristic parameter.

7. artificial cochlea's auditory scene recognition methods as described in claim 1, it is characterised in that: in step E, in GMM- In UBM system, scene noise model is obtained by certain parameters that Bayesian adaptation method modifies UBM, adaptive algorithm point For two steps, the first step is expectation process, calculates statistical parameter of the scene training data in each single Gaussian Profile of UBM；Second step, The parameter of scene noise model is obtained with the parameter weighting of new statistical parameter and UBM, method of weighting makes final scene make an uproar In acoustic model, by adaptive its parameter of distribution of more scene training data close to the parameter of test scene noise itself, and By its adaptive distribution parameter of less test data close to the parameter of UBM.

8. artificial cochlea's auditory scene recognition methods as claimed in claim 7, it is characterised in that: given UBM and trained vector Sequence X={ x₁,x₂,...,x_T, the probability that each characteristic vector belongs to any Gaussian Profile in UBM is calculated first, in UBM I-th of Gaussian Profile calculatesThen according to Pr (i | x_t) and x_tIt calculates and is used for the power of amendment The statistical parameter of weight, mean value and variance Finally, these the new statistical parameters obtained by scene training data are used to update The model parameter of UBM Auto-adaptive parameter a_iControl the balance of new and old parameter, scale factor γ adjustment Weight, for i-th of Gaussian Profile, is used for above-mentioned auto-adaptive parameter a so that the sum of weight all after adaptive is 1_i, It is defined asWherein, r is a fixed value, controls the weight of the parameter of UBM in adaptive, sets r=16.