CN108711419B

CN108711419B - Environmental sound sensing method and system for cochlear implant

Info

Publication number: CN108711419B
Application number: CN201810856692.8A
Authority: CN
Inventors: 张晓薇; 韩彦; 孙晓安; 黄穗
Original assignee: Zhejiang Nurotron Nerve Electronic Technology Co ltd
Current assignee: Zhejiang Nurotron Nerve Electronic Technology Co ltd
Priority date: 2018-07-31
Filing date: 2018-07-31
Publication date: 2020-07-31
Anticipated expiration: 2038-07-31
Also published as: CN108711419A; ES2849124B2; ES2849124A1; WO2020024807A1

Abstract

The invention discloses an environment sound sensing method and system of a cochlear implant, wherein the method comprises the following steps: the sound collection module collects environmental sound in real time by adopting a microphone and outputs a section of collected discrete sound signal to the sound characteristic extraction module; the sound characteristic extraction module processes the sound signals sent by the sound acquisition module, extracts a group of characteristic values representing the characteristics of the sound signals and outputs the characteristic values to the neural network classification module; the neural network classification module receives a group of characteristic values extracted by the sound characteristic extraction module, classifies the group of characteristic values through the trained neural network, and then outputs the classification result to the comprehensive decision module; after receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives judgment of the current scene, and outputs the judgment result to the voice processing selection module; and the voice processing selection module selects an optimal voice processing program and parameter configuration thereof according to the judgment result of the comprehensive decision module on the current scene.

Description

Environmental sound sensing method and system for cochlear implant

Technical Field

The invention belongs to the field of signal processing, and relates to an environment sound sensing method and system for a cochlear implant.

Background

The artificial cochlea is the only medical instrument which can effectively recover the hearing of the patients with severe or extremely severe deafness in the current market. The general working principle of the cochlear implant is that a sound signal collected by a microphone is converted into a stimulation code through a signal processing unit and is sent to an implant, and the implant stimulates an auditory nerve through a microelectrode according to the stimulation code, so that an implant recovers the hearing. Like other hearing aids, such systems lack an important function of the hearing system of a regular person, i.e. the target signal can be distinguished and extracted in a complex sound scene. Such as listening to words spoken by chat objects in a group of people or in a relatively noisy environment. The common solution is to reduce the influence of noise on the hearing through a certain denoising algorithm. However, the denoising algorithm and the parameter configuration of the algorithm are different under different environments (such as pure voice, voice to be denoised or noise environments).

In order to solve the problems, an ambient sound perception algorithm is introduced, and the system can pertinently start the noise reduction algorithm and configure related parameters according to the judgment result of the ambient sound perception algorithm. In early cochlear implant or hearing aid systems, hidden markov models were used as classifiers for the ambient sound perception algorithm. The model is relatively simple, the theory is mature earlier, the requirement on training data is not high, and a certain correct recognition rate is also kept. And the operation amount is low, and the method can be suitable for the device with limited operation capability, such as the cochlear implant. With continuous innovation in the fields of pattern recognition, machine learning and the like and continuous progress in computational algorithm in recent years, more classification algorithms (support vector machines, neural networks and the like) are more prominent in the field of environmental sound perception, and the classification accuracy is higher. And classifiers such as support vector machines, neural networks, and the like place the center of gravity on the distinguishing categories without providing prior probability of category transformation, relative to the hidden Markov model. That is, only data of different ambient sounds need to be analyzed, without considering what the probability of converting one ambient sound to another. It is very difficult to obtain such transition probabilities and the analysis from the data is not accurate enough. However, the neural network varies greatly, and the network structure can be combined in many ways according to the number of input characteristic values, the number of hidden layers and the number of nodes in each layer. In addition, the classification accuracy of the neural network is generally proportional to the scale of the neural network, and thus the required computation amount is relatively large.

Disclosure of Invention

In order to solve the problems, the invention provides an environmental sound sensing method of an artificial cochlea aiming at the defects of the existing sound sensing processing, a neural network is adopted to classify the environmental sound, the input characteristic value and the network structure of the neural network are optimized on an artificial cochlea system, and the calculation amount is minimized under the condition of meeting a certain classification accuracy.

In order to achieve the above object, the technical solution of the present invention is an ambient sound sensing method for a cochlear implant, including the steps of:

the sound collection module collects environmental sound in real time by adopting a microphone and then outputs a section of collected discrete sound signal to the sound feature extraction module;

the sound characteristic extraction module processes the sound signals sent by the sound acquisition module, extracts a group of characteristic values representing the characteristics of the sound signals and outputs the characteristic values to the neural network classification module;

after receiving a group of characteristic values extracted by the sound characteristic extraction module, the neural network classification module classifies the group of characteristic values through the trained neural network and then outputs the classification result to the comprehensive decision-making module;

after receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives judgment of the current scene, and outputs the judgment result to the voice processing selection module;

and the voice processing selection module selects an optimal voice processing program and parameter configuration thereof according to the judgment result of the comprehensive decision module on the current scene.

Preferably, the microphones collect ambient sound in real time using an omni-directional microphone or a microphone array.

Preferably, the sampling rate of the sound collection module is 16K.

Preferably, the number of feature values extracted from the set of feature values representing the characteristics of the sound signal is 8.

Preferably, the neural network classification module employs a deep neural network or a delayed neural network comprising two hidden layers, 15 neurons per layer.

Preferably, 8 of said characteristic values are selected from 60 characteristic values.

Preferably, the feature value screening adopts a method of comprehensively analyzing statistics of feature values and a gaussian mixture model, an average influence value algorithm, a sequence forward selection algorithm and classifier training result evaluation.

Preferably, the calculation amount of the characteristic value and the calculation amount of the neural network do not exceed 20% of the operation capacity of the artificial cochlear speech processor.

Based on the above purpose, the invention also provides an environmental sound perception system of the cochlear implant, which comprises a sound collection module, a sound feature extraction module, a neural network classification module, a comprehensive decision module and a voice processing selection module which are connected in sequence, wherein,

the sound collection module is used for collecting environmental sound in real time by adopting a microphone and then outputting a section of collected discrete sound signal to the sound feature extraction module;

the sound feature extraction module is used for processing the sound signals sent by the sound acquisition module, extracting a group of feature values representing the characteristics of the sound signals and outputting the feature values to the neural network classification module;

the neural network classification module is used for classifying a group of characteristic values extracted by the sound characteristic extraction module through the trained neural network after receiving the group of characteristic values, and then outputting the classification result to the comprehensive decision module;

the comprehensive decision-making module is used for comprehensively analyzing and giving judgment of the current scene after receiving the classification result of the neural network classification module and outputting the judgment result to the voice processing selection module;

and the voice processing selection module is used for selecting an optimal voice processing program and parameter configuration thereof according to the judgment result of the comprehensive decision module on the current scene.

Drawings

Fig. 1 is a flowchart illustrating steps of a cochlear implant ambient sound sensing method according to an embodiment of the present invention;

fig. 2 is a block diagram of an ambient sound sensing system of a cochlear implant according to an embodiment of the present invention;

fig. 3 is a detailed schematic diagram of a neural network classification module of the cochlear implant environmental sound sensing method and system according to the embodiment of the present invention;

fig. 4 is a comparison graph of the operation amount and the accuracy of the network with different hidden layers and different neuron numbers according to the method for sensing the environmental sound of the cochlear implant of the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1, a flow chart of steps of the method for sensing environmental sound of cochlear implant according to the technical solution of the embodiment of the present invention includes the following steps:

s10, the sound collection module collects the environment sound in real time by a microphone and then outputs a section of collected discrete sound signal to the sound feature extraction module;

s20, the sound feature extraction module processes the sound signal sent by the sound collection module, extracts a group of feature values representing the characteristics of the sound signal and outputs the feature values to the neural network classification module;

s30, after receiving a group of characteristic values extracted by the sound characteristic extraction module, the neural network classification module classifies the group of characteristic values through the trained neural network, and then outputs the classification result to the comprehensive decision module;

s40, after receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives the judgment of the current scene, and outputs the judgment result to the voice processing selection module;

and S50, selecting the optimal voice processing program and the parameter configuration thereof by the voice processing selection module according to the judgment result of the comprehensive decision module on the current scene.

Referring to fig. 2, the system of the present invention includes a sound collection module 10, a sound feature extraction module 20, a neural network classification module 30, a comprehensive decision module 40, and a speech processing selection module 50, which are connected in sequence, wherein,

the sound collection module 10 is configured to collect environmental sound in real time by using a microphone, and then output a section of collected discrete sound signals to the sound feature extraction module 20;

the sound feature extraction module 20 is configured to process the sound signal sent by the sound acquisition module, extract a group of feature values representing characteristics of the sound signal, and output the feature values to the neural network classification module 30;

the neural network classification module 30 is configured to, after receiving a set of feature values extracted by the sound feature extraction module, classify the set of feature values through the trained neural network, and then output a classification result to the comprehensive decision module 40;

the comprehensive decision module 40 is configured to, after receiving the classification result of the neural network classification module, comprehensively analyze the classification result to give a judgment of the current scene, and output the judgment result to the voice processing selection module 50;

and the voice processing selection module 50 is used for selecting an optimal voice processing program and parameter configuration thereof according to the judgment result of the comprehensive decision module on the current scene.

In a specific embodiment, in S10, the microphone collects the ambient sound in real time by using an omnidirectional microphone or a microphone array, and the sampling rate of the sound collection module 10 is 16K.

In S20, a group of feature values representing characteristics of the sound signal are extracted, and 8 feature values are extracted from 60 feature values. Before extracting the characteristic value, normalization processing is carried out, and the formula is as follows:

wherein x is_normTo normalize the result, X_maxFor the maximum value of the training sample, X, where the characteristic value is located_minThe minimum value of the training sample where the characteristic value is located.

The neural network classification module in S30 adopts a deep neural network or a delayed neural network including two hidden layers, 15 neurons per layer. The neural network module is obtained by training a large number of data samples, taking 4 types of environmental sounds (pure voice, noisy voice, noise, music and silence) as an example, and the neural network model thereof is shown in fig. 3. The characteristic values are selected from 1, 2, 3, 4, 5 and 6, and six types are combined into a group. The training samples are extracted from a large number of collected audio files, and comprise 144000 groups of sample characteristic values in total, and each type of environmental sound comprises 36000 groups of characteristic values. To find a balance between the amount of computation and the accuracy, we tried 1 hidden layer and 2 hidden layer, each with different numbers of neurons, see fig. 4. As can be seen from the figure, the accuracy of the neural network of the two hidden layers is obviously higher than that of the neural network of the single hidden layer, and the optimal neuron number is 15.

The neural network decision formula in S40 is as follows:

wherein, X_inputFor inputting a matrix of eigenvalues, W¹、W²、W³For each layer of the trained neural network, B¹、B²、B³For each layer of bias matrix of the trained neural network, activeFcn is an activation function, Y_outThe results are calculated for the network.

To reduce the amount of computation, we will use the activation function activeFcn of the hidden layer_HAnd an activation function activeFcn of the output layer_OAre respectively defined as:

where x is the input to the activation function and i is the ambient sound category.

After receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes a series of factors, mainly including the recognition result of the neural network and the sound energy in a short period of time, gives the judgment of the current scene, and outputs the judgment result to the voice processing selection module.

The characteristic value screening adopts a method of comprehensively analyzing the statistic value of the characteristic value, a Gaussian mixture model, an average influence value algorithm, a sequence forward selection algorithm and classifier training result evaluation.

The calculated amount of the characteristic value and the calculated amount of the neural network do not exceed 20% of the operation capacity of the artificial cochlea speech processor.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An ambient sound sensing method of a cochlear implant, comprising the steps of:

the voice processing selection module selects an optimal voice processing program and parameter configuration thereof according to the judgment result of the comprehensive decision module on the current scene;

the sound characteristic extraction module processes the sound signals sent by the sound acquisition module, extracts a group of characteristic values representing the characteristics of the sound signals, outputs the characteristic values to the neural network classification module, extracts 8 characteristic values for extracting the group of characteristic values representing the characteristics of the sound signals, screens the 8 characteristic values from 60 characteristic values, and performs normalization processing before extracting the characteristic values, wherein the formula is as follows:

wherein x is_normTo normalize the result, X_maxFor the maximum value of the training sample, X, where the characteristic value is located_minThe minimum value of the training sample where the characteristic value is located;

the neural network classification module classifies a group of characteristic values extracted by the sound characteristic extraction module through a trained neural network after receiving the group of characteristic values, then outputs the classification result to the comprehensive decision module, the neural network classification module adopts a deep neural network or a delayed neural network containing two hidden layers and 15 neurons in each layer, the neural network module is obtained through training of a large number of data samples to distinguish 4 types of environmental sounds including pure voice, noisy voice, noise, music and silence, the characteristic values are selected from 1, 2, 3, 4, 5 and 6 to form a group, the training samples are extracted from a large number of collected audio files and contain 144000 groups of sample characteristic values, and each type of environmental sound contains 36000 groups of characteristic values;

after receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives the judgment of the current scene, and outputs the judgment result to the voice processing selection module, wherein the judgment formula for the neural network is as follows:

2. The method of claim 1, wherein the real-time collection of ambient sound by the microphones uses an omni-directional microphone or a microphone array.

3. The method of claim 1, wherein the sound collection module has a sampling rate of 16K.

4. The method of claim 1, wherein the eigenvalue screening adopts a method of comprehensive analysis of statistics of eigenvalues and gaussian mixture model, mean influence value algorithm, sequence forward selection algorithm, and classifier training result evaluation.

5. The method of claim 1, wherein the eigenvalue and neural network are not more than 20% of the computational power of the cochlear prosthesis speech processor.

6. The system adopting the method of one of claims 1 to 5, comprising a sound collection module, a sound feature extraction module, a neural network classification module, a comprehensive decision module and a voice processing selection module which are connected in sequence, wherein,