WO2020024807A1

WO2020024807A1 - Artificial cochlea ambient sound sensing method and system

Info

Publication number: WO2020024807A1
Application number: PCT/CN2019/096648
Authority: WO
Inventors: 张晓薇; 韩彦; 孙晓安; 黄穗
Original assignee: 浙江诺尔康神经电子科技股份有限公司
Priority date: 2018-07-31
Filing date: 2019-07-19
Publication date: 2020-02-06
Also published as: CN108711419B; ES2849124B2; CN108711419A; ES2849124A1

Abstract

An artificial cochlea ambient sound sensing method and system. The method comprises the following steps: a sound acquisition module acquires ambient sound in real time by using a microphone, and outputs an acquired discrete sound signal to a sound feature extraction module (S10); the sound feature extraction module processes sound signals transmitted by the sound acquisition module, extracts one group of feature values representing sound signal characteristics, and outputs the feature values to a neural network classification module (S20); the neural network classification module performs classification on the group of feature values by means of a trained neural network after one group of feature values extracted by the sound feature extraction module are received, and then outputs the classification result to an integrated decision module (S30); the integrated decision module integrally analyzes and determines the current scene after the classification result of the neural network classification module is received, and outputs the determination result to a voice processing selection module (S40); the voice processing selection module selects the optimal voice processing program and parameter configuration thereof according to the determination result of the integrated decision module for the current scene (S50).

Description

Ambient sound sensing method and system of cochlear implant

Technical field

The invention belongs to the field of signal processing, and relates to a method and system for sensing ambient sound of an artificial cochlea.

Background technique

Cochlear implants are currently the only medical device on the market that can effectively restore hearing in patients with severe or severe deafness. The general working principle of a cochlear implant is to convert the sound signal collected by the microphone into a stimulus code and send it to the implant through the signal processing unit. The implant then stimulates the auditory nerve through microelectrodes according to the stimulus code, so that the implant Restore hearing. Like other hearing devices such as hearing aids, this type of system lacks an important function of ordinary human hearing systems, which is to distinguish target signals in complex sound scenes and extract them. For example, in a group of people or in a relatively noisy environment, listen to what the chat object is saying. The usual solution is to reduce the effect of noise on the sound of hearing through a certain denoising algorithm. However, the denoising algorithm and the parameter configuration of the algorithm are different in different environments (such as pure speech, speech to be noisy, or noise environment).

In order to solve this kind of problem, the ambient sound perception algorithm is also introduced. According to the judgment result of the ambient sound perception algorithm, the system can open the noise reduction algorithm and configure related parameters in a targeted manner. In early cochlear implants or hearing aid systems, the classifier of the ambient sound perception algorithm used a hidden Markov model. The model is relatively simple, the theory matures earlier, and it does not require high training data. It also maintains a certain correct recognition rate. And its calculation volume is relatively low, and it can adapt to such a device with limited computing power as a cochlear implant. With the continuous innovation of pattern recognition, machine learning and other fields in recent years, and the continuous improvement of computing power algorithms, more classification algorithms (support vector machines, neural networks, etc.) have become more prominent in the field of ambient sound perception. Higher classification accuracy. In addition, classifiers such as support vector machines and neural networks focus on distinguishing categories compared to hidden Markov models without providing prior probabilities of class conversion. In other words, only the data of different ambient sounds need to be analyzed, and the probability of converting one ambient sound to another ambient sound need not be considered. Obtaining this conversion probability is very difficult, and the analysis from the data is not accurate enough. But neural networks have many changes. According to the number of input feature values, the number of hidden layers, and the number of network nodes in each layer, the network structure can have many combinations. Moreover, the classification accuracy rate of neural networks is usually proportional to its scale, so the amount of computation required is also relatively large.

Summary of the invention

In order to solve the above-mentioned problems, the present invention proposes an environmental sound sensing method of an artificial cochlea in response to the shortcomings of the existing sound perception processing. The neural sound network is used to classify the environmental sound. The optimization on the cochlear system is to minimize the amount of calculations when a certain classification accuracy is met.

In order to achieve the above object, the technical solution of the present invention is a method for sensing an ambient sound of a cochlear implant, which includes the following steps:

The sound collection module uses a microphone to collect the ambient sound in real time, and then outputs the collected discrete sound signal to the sound feature extraction module;

The sound feature extraction module processes the sound signals sent by the sound acquisition module, extracts a set of feature values representing the characteristics of the sound signals, and outputs them to the neural network classification module;

After receiving a set of feature values extracted by the sound feature extraction module, the neural network classification module classifies the set of feature values through a trained neural network, and then outputs the classification result to the comprehensive decision module;

After receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives the judgment of the current scene, and outputs the judgment result to the speech processing selection module;

The speech processing selection module selects the optimal speech processing program and its parameter configuration according to the judgment result of the current scene by the comprehensive decision module.

Preferably, the microphone collects ambient sounds in real time using an omnidirectional microphone or a microphone array.

Preferably, the sampling rate of the sound acquisition module is 16K.

Preferably, the feature value extracted by extracting a set of feature values representing the characteristics of the sound signal is eight.

Preferably, the neural network classification module uses a deep neural network or a delayed neural network including two hidden layers and 15 neurons in each layer.

Preferably, the eight characteristic values are selected from 60 characteristic values.

Preferably, the feature value screening adopts a method of comprehensively analyzing the statistical value of the feature value and the Gaussian mixture model, the average influence value algorithm, the sequence forward selection algorithm, and the evaluation of the classifier training result.

Preferably, the calculation amount of the characteristic value and the calculation amount of the neural network does not exceed 20% of the calculation capacity of the cochlear speech processor.

Based on the above objectives, the present invention also provides an environmental sound perception system for a cochlear implant, which includes a sound acquisition module, a sound feature extraction module, a neural network classification module, a comprehensive decision module, and a speech processing selection module, which are connected in this order.

The sound collection module is configured to use a microphone to collect ambient sounds in real time, and then output the collected discrete sound signal to a sound feature extraction module;

The sound feature extraction module is configured to process the sound signals sent by the sound acquisition module, extract a set of feature values representing the characteristics of the sound signal, and output to a neural network classification module;

The neural network classification module is configured to, after receiving a set of feature values extracted by the sound feature extraction module, classify the set of feature values through a trained neural network, and then output the classification result to a comprehensive decision module;

The comprehensive decision module is configured to comprehensively analyze and give a judgment of the current scene after receiving the classification result of the neural network classification module, and output the judgment result to the voice processing selection module;

The speech processing selection module is configured to select an optimal speech processing program and its parameter configuration according to the determination result of the current scenario by the comprehensive decision module.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of steps in a method for sensing an environmental sound of a cochlear implant according to an embodiment of the present invention; FIG.

FIG. 2 is a structural block diagram of an environmental sound sensing system of a cochlear implant according to an embodiment of the present invention; FIG.

FIG. 3 is a specific schematic diagram of a neural network classification module of a method and a system for sensing an environmental sound of a cochlear implant according to an embodiment of the present invention; FIG.

FIG. 4 is a comparison diagram of a calculation amount and an accuracy rate of a method of sensing an ambient sound of a cochlear implant on networks of different hidden layers and different numbers of neurons according to an embodiment of the present invention.

detailed description

In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

On the contrary, the invention covers any alternatives, modifications, equivalent methods, and schemes made on the spirit and scope of the invention as defined by the claims. Further, in order to give the public a better understanding of the present invention, in the following detailed description of the present invention, some specific details are described in detail. To those skilled in the art, the present invention can be fully understood without the description of these details.

Referring to FIG. 1, a flowchart of steps of a method for sensing an environmental sound of a cochlear implant according to an embodiment of the present invention is provided, including the following steps:

S10: The sound collection module uses a microphone to collect ambient sounds in real time, and then outputs the collected discrete sound signal to the sound feature extraction module;

S20: The sound feature extraction module processes the sound signals sent by the sound acquisition module, extracts a set of feature values representing the characteristics of the sound signal, and outputs them to the neural network classification module;

S30. After receiving a set of feature values extracted by the sound feature extraction module, the neural network classification module classifies the set of feature values through a trained neural network, and then outputs the classification result to the comprehensive decision module;

S40. After receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives a judgment of the current scene, and outputs the judgment result to the voice processing selection module;

S50. The voice processing selection module selects an optimal voice processing program and its parameter configuration according to the judgment result of the current scenario by the comprehensive decision module.

An embodiment of the system of the present invention is shown in FIG. 2 and includes a sound collection module 10, a sound feature extraction module 20, a neural network classification module 30, a comprehensive decision module 40, and a voice processing selection module 50, which are connected in this order.

A sound collection module 10, configured to use a microphone to collect ambient sounds in real time, and then output the collected discrete sound signal to the sound feature extraction module 20;

The sound feature extraction module 20 is configured to process the sound signal sent by the sound acquisition module, extract a set of feature values representing the characteristics of the sound signal, and output it to the neural network classification module 30;

A neural network classification module 30 is configured to classify the set of feature values through a trained neural network after receiving a set of feature values extracted by the sound feature extraction module, and then output the classification result to the comprehensive decision module 40;

The comprehensive decision module 40 is configured to, after receiving the classification result of the neural network classification module, comprehensively give a determination of the current scene, and output the determination result to the speech processing selection module 50;

The voice processing selection module 50 is configured to select an optimal voice processing program and its parameter configuration according to the judgment result of the current scene by the comprehensive decision module.

In a specific embodiment, an omnidirectional microphone or a microphone array is used by the microphone to collect ambient sounds in real time in S10, and the sampling rate of the sound collection module 10 is 16K.

A set of feature values representing the characteristics of the sound signal is extracted in S20. The extracted feature values are eight, and the eight feature values are filtered from the 60 feature values. The normalization process is performed before the feature values are extracted. The formula is as follows:

Among them, x _norm is the normalized result, X _{max is} the maximum value of the training sample where the feature value is, and X _{min is} the minimum value of the training sample where the feature value is.

The neural network classification module in S30 uses a deep neural network or a delayed neural network with two hidden layers and 15 neurons in each layer. The neural network module is obtained through training on a large number of data samples. Taking the discrimination of 4 types of environmental sounds (pure speech, noisy speech, noise, music, and quietness) as examples, see Figure 3 for its neural network model. The eigenvalues are selected from 1, 2, 3, 4, 5, and 6, and a total of six types form a group. The training samples are extracted from a large number of collected audio files, and they contain a total of 144,000 sets of sample feature values, and each type of ambient sound contains 36,000 sets of feature values. In order to find the balance between the amount of computation and the accuracy, see Figure 4, we tried 1 hidden layer and 2 hidden layers, each layer has a different number of neurons. It can be seen from the figure that the accuracy rate of the neural network with two hidden layers is significantly higher than the neural network with a single hidden layer, and the optimal number of neurons is 15.

The neural network decision formula in S40 is as follows:

Among them, X _input is the input eigenvalue matrix, W ¹ , W ² , and W ³ are the weight matrices of each layer of the trained neural network, and B ¹ , B ² , and B ³ are the bias matrices of each layer of the trained neural network. activeFcn is the activation function and Y _out is the network calculation result.

To reduce the amount of computation, we define the activation function activeFcn _{H of the} hidden layer and the activation function activeFcn _{O of the} output layer as:

Where x is the input to the activation function and i is the ambient sound category.

After receiving the classification results of the neural network classification module, the comprehensive decision-making module comprehensively analyzes a series of factors, mainly including the recognition results of the neural network and the size of the sound energy for a short period of time, gives a judgment of the current scene, and outputs the judgment results to Speech processing selection module.

The eigenvalue selection uses a comprehensive analysis of the eigenvalue statistics and Gaussian mixture model, the average influence value algorithm, the sequence forward selection algorithm, and the method of classifier training result evaluation.

The amount of calculation of the eigenvalues and the calculation of the neural network does not exceed 20% of the calculation capacity of the cochlear speech processor.

The above description is only the preferred embodiments of the present invention, and is not intended to limit the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention shall be included in the protection of the present invention. Within range.

Claims

An environmental sound sensing method of a cochlear implant, which comprises the following steps:

The sound collection module uses a microphone to collect the ambient sound in real time, and then outputs the collected discrete sound signal to the sound feature extraction module;

The sound feature extraction module processes the sound signals sent by the sound acquisition module, extracts a set of feature values representing the characteristics of the sound signals, and outputs them to the neural network classification module;

After receiving a set of feature values extracted by the sound feature extraction module, the neural network classification module classifies the set of feature values through a trained neural network, and then outputs the classification result to the comprehensive decision module;

After receiving the classification result of the neural network classification module, the comprehensive decision module comprehensively analyzes and gives the judgment of the current scene, and outputs the judgment result to the speech processing selection module;

The speech processing selection module selects the optimal speech processing program and its parameter configuration according to the judgment result of the current scene by the comprehensive decision module.
The method according to claim 1, wherein the microphone collects ambient sounds in real time using an omnidirectional microphone or a microphone array.
The method according to claim 1, wherein a sampling rate of the sound acquisition module is 16K.
The method according to claim 1, wherein the feature value extracted by extracting a set of feature values representing the characteristics of the sound signal is eight.
The method according to claim 1, wherein the neural network classification module uses a deep neural network or a delayed neural network including two hidden layers and 15 neurons in each layer.
The method according to claim 4, wherein eight of the feature values are filtered from 60 feature values.
The method according to claim 6, characterized in that the feature value screening adopts a method of comprehensively analyzing statistical values and Gaussian mixture models of feature values, an average influence value algorithm, a sequence forward selection algorithm, and a method for evaluating classifier training results .
The method according to claim 1, wherein the calculation amount of the characteristic value and the calculation amount of the neural network does not exceed 20% of the calculation capacity of the cochlear speech processor.
The system adopting the method according to any one of claims 1 to 8, further comprising a sound acquisition module, a sound feature extraction module, a neural network classification module, a comprehensive decision module, and a speech processing selection module connected in sequence, wherein:

The sound collection module is configured to use a microphone to collect ambient sounds in real time, and then output the collected discrete sound signal to a sound feature extraction module;

The sound feature extraction module is configured to process the sound signals sent by the sound acquisition module, extract a set of feature values representing the characteristics of the sound signal, and output to a neural network classification module;

The neural network classification module is configured to, after receiving a set of feature values extracted by the sound feature extraction module, classify the set of feature values through a trained neural network, and then output the classification result to a comprehensive decision module;

The comprehensive decision module is configured to comprehensively analyze and give a judgment of the current scene after receiving the classification result of the neural network classification module, and output the judgment result to the voice processing selection module;

The speech processing selection module is configured to select an optimal speech processing program and its parameter configuration according to the determination result of the current scenario by the comprehensive decision module.