CN110681051A

CN110681051A - Artificial cochlea signal processing method and device and computer readable storage medium

Info

Publication number: CN110681051A
Application number: CN201910999264.5A
Authority: CN
Inventors: 郑能恒; 史裕鹏; 康迂勇; 张伟
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-10-21
Filing date: 2019-10-21
Publication date: 2020-01-14
Anticipated expiration: 2039-10-21
Also published as: CN110681051B

Abstract

According to the cochlear implant signal processing method, the cochlear implant signal processing device and the computer-readable storage medium disclosed by the embodiment of the invention, firstly, a training voice signal is obtained, and the training voice signal is input to an envelope extraction network for network training after being preprocessed, wherein the envelope extraction network comprises three deep neural networks which are sequentially connected in sequence; then, inputting the acquired real-time voice signals into an envelope extraction network after training is finished after preprocessing, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and finally, sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the extracted channel envelopes, and outputting electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.

Description

Artificial cochlea signal processing method and device and computer readable storage medium

Technical Field

The present invention relates to the field of signal processing technologies, and in particular, to a cochlear implant signal processing method, an apparatus and a computer-readable storage medium.

Background

The Cochlear Implant (CI) is an auditory bionic device and is mainly used for providing speech perception for deaf patients with severe auditory peripheral injury (such as inner ear hair cell necrosis). Currently, the most advanced CI devices are capable of achieving speech perception in quiet acoustic environments for CI implanters comparable to normal persons. However, real-life background noise (such as ambient noise or the case of a multi-person conversation) can severely impact the speech perception experience of CI implants.

In recent years, academia and industry have proposed many signal processing systems for improving the perception of CI speech that combine noise reduction algorithms with traditional CI signal processing strategies. On one hand, however, the current noise reduction algorithm has huge model parameters and higher computational complexity, which leads to low signal processing efficiency and high power consumption in practical application; on the other hand, the current noise reduction algorithm cannot reliably extract a time domain fine structure in sound, and the noise reduction processing effect is limited; in addition, when the voice signal processed by the current noise reduction algorithm is input to the CI signal processing unit for processing, it cannot be ensured that the final output can achieve the optimal speech perception effect, so that the adaptability between the noise reduction algorithm and the CI processing strategy is poor.

Disclosure of Invention

The embodiments of the present invention mainly aim to provide a cochlear implant signal processing method, an apparatus and a computer-readable storage medium, which can at least solve the problems of low processing efficiency, high power consumption, limited noise reduction processing effect and incapability of being well adapted to CI processing strategies of noise reduction algorithms used in the related art.

In order to achieve the above object, a first aspect of embodiments of the present invention provides a cochlear implant signal processing method based on deep learning, which is applied to a cochlear implant device, and includes:

acquiring a training voice signal, preprocessing the training voice signal, inputting the training voice signal to an envelope extraction network, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;

preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body;

and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of the target number to the implanted electrodes of the corresponding number.

In order to achieve the above object, a second aspect of the embodiments of the present invention provides a cochlear implant signal processing apparatus based on deep learning, applied to a cochlear implant apparatus, the apparatus including:

the training module is used for acquiring a training voice signal, inputting the training voice signal into an envelope extraction network after preprocessing, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;

the extraction module is used for preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals into a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of the electrodes implanted in the body;

and the processing module is used for sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number.

To achieve the above object, a third aspect of embodiments of the present invention provides a cochlear implant device including: a processor, a memory, and a communication bus;

the communication bus is used for realizing connection communication between the processor and the memory;

the processor is used for executing one or more programs stored in the memory to realize the steps of any cochlear implant signal processing method.

In order to achieve the above object, a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of any of the above cochlear implant signal processing methods.

According to the cochlear implant signal processing method, the cochlear implant signal processing device and the computer-readable storage medium provided by the embodiment of the invention, a training voice signal is obtained, the training voice signal is input to an envelope extraction network after being preprocessed, and the envelope extraction network is trained, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.

Other features and corresponding effects of the present invention are set forth in the following portions of the specification, and it should be understood that at least some of the effects are apparent from the description of the present invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic basic flow chart of a cochlear implant signal processing method according to a first embodiment of the present invention;

fig. 2 is a schematic flow chart of a network training method according to a first embodiment of the present invention;

fig. 3 is a schematic diagram illustrating training of an envelope extraction network according to a first embodiment of the present invention;

fig. 4 is a schematic structural diagram of a cochlear implant signal processing apparatus according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of a cochlear implant device according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment:

in order to solve the technical problems of low processing efficiency, high power consumption, limited noise reduction processing effect, and incapability of being well adapted to the CI processing strategy of the noise reduction algorithm used in the related art, the present embodiment provides a cochlear implant signal processing method, which is applied to a cochlear implant device, and as shown in fig. 1, is a basic flow diagram of the cochlear implant signal processing method provided by the present embodiment, and the cochlear implant signal processing method provided by the present embodiment includes the following steps:

step 101, acquiring a training voice signal, preprocessing the training voice signal, inputting the preprocessed training voice signal to an envelope extraction network, and training the envelope extraction network.

Specifically, the envelope extraction network in this embodiment includes a first deep neural network (DNN1), a second deep neural network (DNN2), and a third deep neural network (DNN3) connected in sequence, where the first deep neural network may preferably be a Long Short-Term Memory network (LSTM) for extracting high-dimensional features from the input features, the second deep neural network is used to estimate features of the enhanced training speech signal, and the third deep neural network is used to extract channel envelopes whose number corresponds to the number of electrodes implanted in the body from the features estimated by the second deep neural network. It should be understood that the features in the present embodiment may be frequency domain features (such as logarithmic magnitude spectrum, etc.) or time domain features in practical applications.

The CI device currently on the market comprises an intracorporeal implant and an extracorporeal machine, the signal processing system of the CI of the present embodiment is preferably disposed in the extracorporeal machine, and the number of implanted electrodes of the CI product of the present embodiment may preferably be 22. The embodiment extracts the sub-band signal envelopes of the channels with the same number as the number of implanted electrodes of the actual CI product through the envelope extraction network, so that the envelopes contain richer original sound detail information.

It should be further noted that, in practical applications, the training speech signal may be an existing training speech sample, for example, obtained directly from a preset sample database, or obtained by self-recording, and this embodiment is not limited herein.

In an optional implementation manner of this embodiment, the acquiring the training speech signal includes: randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set; and generating a training voice signal under the preset signal-to-noise ratio based on the clean voice sample and the noise sample.

Specifically, the present embodiment constructs training speech samples from rows by selecting the appropriate speech database and noise database. The clean speech sample set of the envelope extraction network of the embodiment can be formed by randomly selecting 2500 sentences of speech from a training set of a Chinese speech database in Qinghua, and two types of noise, namely whitensose and base, can be selected from a noise X-92 noise set as noise sample sets. Then, the 2500 sentences of voice and noise are randomly combined under four conditions that the signal-to-noise ratio is-5 dB, 0dB and 5dB respectively and no noise exists, and then the training voice signal with noise for training the envelope extraction network can be generated.

As shown in fig. 2, which is a schematic flow chart of a network training method provided in this embodiment, in an optional implementation manner of this embodiment, a training speech signal is input to an envelope extraction network after being preprocessed, and training the envelope extraction network specifically includes the following steps:

step 1011, preprocessing the training voice signal to obtain the characteristics of continuous preset frame number;

step 1012, inputting the features of the continuous preset frame number into a first deep neural network comprising 128 neurons for high-dimensional feature extraction;

step 1013, the output of the first deep neural network passes through a second deep neural network composed of two fully-connected layers each including 512 neurons and a linear layer including 65 neurons, and the features of the enhanced training speech signal are estimated;

step 1014, extracting channel envelopes of which the number corresponds to the number of the in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons;

and step 1015, performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.

Specifically, in this embodiment, firstly, the noisy speech and the clean speech sample corresponding to the noisy speech may be respectively preprocessed to obtain a short-time fourier transform Log-energy spectrum feature (Log-power maps-LPS, 8 ms/frame, frame shift of 1 ms). Considering the correlation between adjacent frames of speech, the input of the envelope extraction network of this embodiment may be a continuous 25-frame feature as a continuous feature block. Inputting LPS characteristics (the dimension is 25 x 65) of noise-containing voice of 25 continuous frames into a layer of unidirectional DNN1 (such as LSTM), outputting estimated LPS (the dimension is 25 x 65) through DNN2 by the output of DNN1, continuously inputting the output of DNN2 into DNN3, outputting estimated 22 channel envelopes (the dimension is 25 x 22), and then performing parameter optimization on the network by adopting a back propagation algorithm to obtain a final network model.

It should be noted that in the back propagation process, the values of various parameters of the network are adjusted by a loss function, and the loss function is used for estimating the approximation degree of the predicted value and the true value of the trained network model, and is a convex optimization process, wherein the smaller the loss function is, the stronger the envelope extraction and processing capability of the model is. In this embodiment, the network parameters are updated according to the loss function, and then the iterative network training process continues until the network converges, that is, the function value of the loss function basically stops decreasing, that is, the training is completed to complete the network model of the envelope extraction network of this embodiment.

Further, in an optional implementation manner of this embodiment, the loss function of the envelope extraction network is represented as:

loss＝w_stft*loss_stft+w_env*loss_env+w_waveform*loss_waveform，

therein, loss_stftError of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signal, loss_envThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategy_waveformError, w, of a clean speech sample for a simulated speech signal obtained by electrode mapping or the like based on a channel envelope extracted by a third deep neural network_stft、w_env、w_waveformThe weights are corresponding to the errors. It should be understood that the above-mentioned error of the present embodiment may preferably be an L1 paradigm error.

Fig. 3 is a training diagram of an envelope extraction network according to this embodiment, where a represents an input training speech signal, B represents clean speech samples used for generating the training speech signal together with noise, and C represents the envelope extraction network. Taking into account the above preferred scale of the envelope extraction network, in this embodiment, the features (dimension 25 × 65) of the noisy speech are input into DNN1 for high-dimensional feature extraction, the output of DNN1 passes through DNN2 and then outputs the estimated LPS features (dimension 25 × 65), and loss can be calculated based on the 65-dimensional LPS features output by DNN2 and the 65-dimensional LPS features of the corresponding clean speech samples_stftHere loss_stftThe calculation of the method can use a weighting perception method commonly used by audio coding for reference, and aims to guide a model to be less sensitive to noise near formants and more sensitive to noise near valleys of non-formants; in addition, the loss is calculated based on the 22-dimensional channel envelope output by the DNN3 and the 22-dimensional channel envelope extracted by the existing traditional CI processing strategy, such as ACE (advanced combination encoders) strategy_envB, carrying out the following steps of; in addition, a simulated voice signal is constructed by the 22-dimensional channel envelope output by the DNN3, and the error loss of the simulated voice signal and a clean voice waveform is calculated_waveformThus forcing the envelope extraction network learning of the present embodimentThe detail information of the voice is cleaned, and the defect that the time domain detail information cannot be effectively extracted by a traditional CI strategy is effectively overcome.

And finally, weighting the three errors by three adjustable weighting factors and adding the weighted errors to serve as an objective function for optimizing and learning the envelope extraction network. In a preferred embodiment, the whole envelope extraction network is trained by using an Adam gradient optimizer to obtain 60 epochs, and the model with the minimum verification loss is saved as the finally trained envelope extraction network model.

It should be noted that the loss function provided by this embodiment can guide, to a certain extent, how the model learning conventional CI processing strategy extracts envelope energy from the fourier energy spectrum, and can also force, in two domains, the network learning to approach the data distribution of clean speech, so that the envelope signal output by the envelope extraction network indirectly has more detailed information, and the defect that the conventional CI processing strategy itself cannot extract detailed information on the speech signal time sequence is overcome to a great extent.

And 102, preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals into a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of implanted electrodes in a human body.

Specifically, when the voice acquisition unit of the CI device, such as a microphone, receives an external voice signal, the voice acquisition unit preprocesses the signal and outputs the signal to the trained envelope extraction network, and in this embodiment, the number of implanted electrodes may be preferably 22, and then the network outputs 22 channel envelope signals. It should be noted that the envelope extraction network model in this embodiment is more simplified than the existing algorithm model, the size is only about 1.9MB, the number of parameters of the network is 0.46M, the system complexity is significantly reduced, and the average decoding processing time per frame (8ms) is about 0.1-0.2 ms. The total quantity of parameters and the calculation complexity of the network model are greatly reduced, and the power consumption is correspondingly reduced (the memory and the CPU are small), so that the feasibility of applying the envelope extraction network model to an actual CI product is ensured.

And 103, sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of the target number to the implanted electrodes of the corresponding number.

Specifically, the preprocessing, the nonlinear compression processing, the channel selection processing, the electrode mapping processing, the pulse modulation processing, and the generation of the simulated voice signal in this embodiment all adopt the same manner as the conventional CI processing strategy, and are not described herein again. It should be noted that, when channel selection is performed, the present embodiment may select N envelope signals with the largest energy and/or the highest signal-to-noise ratio for electrode mapping, for example, when the total number of implanted electrodes in the body is 22, the selected channels may be 8, and the electrical stimulation signals after pulse modulation are output to the corresponding 8 implanted electrodes.

By utilizing the strong learning capability of the envelope extraction network of the embodiment and constructing proper voice data with noise for training, the noise reduction effect can be well achieved, and the noise resistance which cannot be achieved by the traditional CI processing strategy can be achieved under the condition that other front-end noise reduction modules are not additionally arranged.

In addition, the envelope extraction network of the embodiment can learn to obtain an adjustable parameter similar to a parameter of a triangular filter bank in the existing CI processing strategy through the second deep neural network, and optimize the envelope extraction network based on the back propagation of an error obtained by simulated voice and real voice, so that the extracted envelope has more detailed information, the voice processing effect realized in a quiet environment is superior to the traditional CI processing strategy, and the noise reduction performance in a noise environment is obviously superior to the traditional CI processing strategy in which wiener filtering or some lightweight DNNs are used as a front-end noise reduction module.

According to the cochlear implant signal processing method provided by the embodiment of the invention, a training voice signal is obtained, the training voice signal is input to an envelope extraction network after being preprocessed, and the envelope extraction network is trained, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.

Second embodiment:

in order to solve the technical problems of low processing efficiency, high power consumption, limited noise reduction processing effect, and incapability of being well adapted to the CI processing strategy of the noise reduction algorithm used in the related art, this embodiment shows a cochlear implant signal processing apparatus, which is applied to a cochlear implant device, and specifically, referring to fig. 4, the cochlear implant signal processing apparatus of this embodiment includes:

the training module 401 is configured to acquire a training speech signal, input the training speech signal to an envelope extraction network after preprocessing, and train the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of implanted electrodes in a body from the features estimated by the second deep neural network;

an extraction module 402, configured to input the acquired real-time speech signal after preprocessing to a trained envelope extraction network, and extract channel envelopes whose number corresponds to the number of electrodes implanted in a body;

the processing module 403 is configured to sequentially perform nonlinear compression, channel selection, electrode mapping, and pulse modulation on the channel envelope extracted from the real-time speech signal, and then output a target number of electrode stimulation signals to a corresponding number of implanted electrodes in the body.

In an optional implementation manner of this embodiment, the training module 401, after preprocessing the training speech signal, inputs the training speech signal to the envelope extraction network, and when training the envelope extraction network, is specifically configured to: preprocessing a training voice signal to obtain the characteristics of continuous preset frame numbers; inputting the features of continuous preset frames into a first deep neural network containing 128 neurons for high-dimensional feature extraction; estimating the characteristics of the enhanced training speech signal by the output of the first deep neural network through a second deep neural network consisting of two fully-connected layers each containing 512 neurons and a linear layer containing 65 neurons; extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons; and (3) performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.

In an optional implementation manner of this embodiment, when the training module 401 acquires the training speech signal, it is specifically configured to: randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set; and generating a training voice signal under the preset signal-to-noise ratio based on the clean voice sample and the noise sample.

loss＝w_stft*loss_stft+w_env*loss_env+w_waveform*loss_waveform，

therein, loss_stftError of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signal, loss_envExtracting the channel envelope characteristics extracted for the third deep neural network and the clean speech sample by the traditional CI processing strategyError of the extracted envelope characteristic of the channel, loss_waveformError, w, of the simulated speech signal, obtained based on the extracted channel envelope of the third deep neural network, from the clean speech sample_stft、w_env、w_waveformThe weights are corresponding to the errors.

It should be noted that, all the cochlear prosthesis signal processing methods in the foregoing embodiments can be implemented based on the cochlear prosthesis signal processing apparatus provided in this embodiment, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the cochlear prosthesis signal processing apparatus described in this embodiment may refer to the corresponding process in the foregoing method embodiments, and details are not described here again.

The cochlear implant signal processing device provided by the embodiment is adopted to acquire a training voice signal, input the training voice signal into an envelope extraction network after preprocessing, and train the envelope extraction network, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.

The third embodiment:

the present embodiment provides a cochlear implant device, as shown in fig. 5, which includes a processor 501, a memory 502 and a communication bus 503, wherein: the communication bus 503 is used for realizing connection communication between the processor 501 and the memory 502; the processor 501 is configured to execute one or more computer programs stored in the memory 502 to implement at least one step of the cochlear implant signal processing method in the first embodiment.

The present embodiments also provide a computer-readable storage medium including volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media include, but are not limited to, RAM (Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact disk Read-Only Memory), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.

The computer-readable storage medium in this embodiment may be used for storing one or more computer programs, and the stored one or more computer programs may be executed by a processor to implement at least one step of the method in the first embodiment.

The present embodiment also provides a computer program, which can be distributed on a computer readable medium and executed by a computing device to implement at least one step of the method in the first embodiment; and in some cases at least one of the steps shown or described may be performed in an order different than that described in the embodiments above.

The present embodiments also provide a computer program product comprising a computer readable means on which a computer program as shown above is stored. The computer readable means in this embodiment may include a computer readable storage medium as shown above.

It will be apparent to those skilled in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software (which may be implemented in computer program code executable by a computing device), firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit.

In addition, communication media typically embodies computer readable instructions, data structures, computer program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to one of ordinary skill in the art. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is a more detailed description of embodiments of the present invention, and the present invention is not to be considered limited to such descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A cochlear implant signal processing method is applied to a cochlear implant device, and is characterized by comprising the following steps:

2. The cochlear implant signal processing method of claim 1, wherein the training speech signal is input to an envelope extraction network after being preprocessed, and the training of the envelope extraction network comprises:

preprocessing the training voice signal to obtain the characteristics of continuous preset frame numbers;

inputting the features of the continuous preset frame number into a first deep neural network comprising 128 neurons for high-dimensional feature extraction;

estimating the characteristics of the enhanced training speech signal by the output of the first deep neural network through a second deep neural network consisting of two fully-connected layers each containing 512 neurons and a linear layer containing 65 neurons;

extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons;

and performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.

3. The cochlear implant signal processing method of claim 1, wherein said acquiring a training speech signal comprises:

randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set;

and generating a training voice signal under a preset signal-to-noise ratio based on the clean voice sample and the noise sample.

4. A cochlear implant signal processing method as claimed in any one of claims 1 to 3, wherein the loss function of the envelope extraction network is expressed as:

loss＝w_stft*loss_stft+w_env*loss_env+w_waveform*loss_waveform，

therein, loss_stftError, loss, of the feature output by the second deep neural network from the feature of a clean speech sample corresponding to the training speech signal_envThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategy_waveformError, w, of the clean speech sample for a simulated speech signal derived based on the extracted channel envelope of the third deep neural network_stft、w_env、w_waveformThe weights are corresponding to the errors.

5. A cochlear implant signal processing device applied to a cochlear implant device, comprising:

6. The cochlear implant signal processing apparatus of claim 5, wherein the training module, after preprocessing the training speech signal, inputs the preprocessed training speech signal to an envelope extraction network, and when training the envelope extraction network, is specifically configured to:

7. The cochlear implant signal processing apparatus of claim 5, wherein the training module, when acquiring the training speech signal, is specifically configured to:

and combining and generating a training voice signal under a preset signal-to-noise ratio based on the clean voice sample and the noise sample.

8. A cochlear implant signal processing apparatus as claimed in any one of claims 5 to 7, wherein the loss function of the envelope extraction network is expressed as:

loss＝w_stft*loss_stft+w_env*loss_env+w_waveform*loss_waveform，

therein, loss_stftError, loss, of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signal_envThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategy_waveformError, w, of the clean speech sample for a simulated speech signal derived based on the extracted channel envelope of the third deep neural network_stft、w_env、w_waveformThe weights are corresponding to the errors.

9. A cochlear implant device, comprising: a processor, a memory, and a communication bus;

the processor is configured to execute one or more programs stored in the memory to implement the steps of the cochlear implant signal processing method according to any one of claims 1 to 4.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs executable by one or more processors to implement the steps of the cochlear implant signal processing method according to any one of claims 1 to 4.