CN110681051A - Artificial cochlea signal processing method and device and computer readable storage medium - Google Patents

Artificial cochlea signal processing method and device and computer readable storage medium Download PDF

Info

Publication number
CN110681051A
CN110681051A CN201910999264.5A CN201910999264A CN110681051A CN 110681051 A CN110681051 A CN 110681051A CN 201910999264 A CN201910999264 A CN 201910999264A CN 110681051 A CN110681051 A CN 110681051A
Authority
CN
China
Prior art keywords
deep neural
training
neural network
envelope extraction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910999264.5A
Other languages
Chinese (zh)
Other versions
CN110681051B (en
Inventor
郑能恒
史裕鹏
康迂勇
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201910999264.5A priority Critical patent/CN110681051B/en
Publication of CN110681051A publication Critical patent/CN110681051A/en
Application granted granted Critical
Publication of CN110681051B publication Critical patent/CN110681051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61NELECTROTHERAPY; MAGNETOTHERAPY; RADIATION THERAPY; ULTRASOUND THERAPY
    • A61N1/00Electrotherapy; Circuits therefor
    • A61N1/18Applying electric currents by contact electrodes
    • A61N1/32Applying electric currents by contact electrodes alternating or intermittent currents
    • A61N1/36Applying electric currents by contact electrodes alternating or intermittent currents for stimulation
    • A61N1/36036Applying electric currents by contact electrodes alternating or intermittent currents for stimulation of the outer, middle or inner ear
    • A61N1/36038Cochlear stimulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

According to the cochlear implant signal processing method, the cochlear implant signal processing device and the computer-readable storage medium disclosed by the embodiment of the invention, firstly, a training voice signal is obtained, and the training voice signal is input to an envelope extraction network for network training after being preprocessed, wherein the envelope extraction network comprises three deep neural networks which are sequentially connected in sequence; then, inputting the acquired real-time voice signals into an envelope extraction network after training is finished after preprocessing, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and finally, sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the extracted channel envelopes, and outputting electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.

Description

Artificial cochlea signal processing method and device and computer readable storage medium
Technical Field
The present invention relates to the field of signal processing technologies, and in particular, to a cochlear implant signal processing method, an apparatus and a computer-readable storage medium.
Background
The Cochlear Implant (CI) is an auditory bionic device and is mainly used for providing speech perception for deaf patients with severe auditory peripheral injury (such as inner ear hair cell necrosis). Currently, the most advanced CI devices are capable of achieving speech perception in quiet acoustic environments for CI implanters comparable to normal persons. However, real-life background noise (such as ambient noise or the case of a multi-person conversation) can severely impact the speech perception experience of CI implants.
In recent years, academia and industry have proposed many signal processing systems for improving the perception of CI speech that combine noise reduction algorithms with traditional CI signal processing strategies. On one hand, however, the current noise reduction algorithm has huge model parameters and higher computational complexity, which leads to low signal processing efficiency and high power consumption in practical application; on the other hand, the current noise reduction algorithm cannot reliably extract a time domain fine structure in sound, and the noise reduction processing effect is limited; in addition, when the voice signal processed by the current noise reduction algorithm is input to the CI signal processing unit for processing, it cannot be ensured that the final output can achieve the optimal speech perception effect, so that the adaptability between the noise reduction algorithm and the CI processing strategy is poor.
Disclosure of Invention
The embodiments of the present invention mainly aim to provide a cochlear implant signal processing method, an apparatus and a computer-readable storage medium, which can at least solve the problems of low processing efficiency, high power consumption, limited noise reduction processing effect and incapability of being well adapted to CI processing strategies of noise reduction algorithms used in the related art.
In order to achieve the above object, a first aspect of embodiments of the present invention provides a cochlear implant signal processing method based on deep learning, which is applied to a cochlear implant device, and includes:
acquiring a training voice signal, preprocessing the training voice signal, inputting the training voice signal to an envelope extraction network, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;
preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body;
and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of the target number to the implanted electrodes of the corresponding number.
In order to achieve the above object, a second aspect of the embodiments of the present invention provides a cochlear implant signal processing apparatus based on deep learning, applied to a cochlear implant apparatus, the apparatus including:
the training module is used for acquiring a training voice signal, inputting the training voice signal into an envelope extraction network after preprocessing, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;
the extraction module is used for preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals into a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of the electrodes implanted in the body;
and the processing module is used for sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number.
To achieve the above object, a third aspect of embodiments of the present invention provides a cochlear implant device including: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is used for executing one or more programs stored in the memory to realize the steps of any cochlear implant signal processing method.
In order to achieve the above object, a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of any of the above cochlear implant signal processing methods.
According to the cochlear implant signal processing method, the cochlear implant signal processing device and the computer-readable storage medium provided by the embodiment of the invention, a training voice signal is obtained, the training voice signal is input to an envelope extraction network after being preprocessed, and the envelope extraction network is trained, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.
Other features and corresponding effects of the present invention are set forth in the following portions of the specification, and it should be understood that at least some of the effects are apparent from the description of the present invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic basic flow chart of a cochlear implant signal processing method according to a first embodiment of the present invention;
fig. 2 is a schematic flow chart of a network training method according to a first embodiment of the present invention;
fig. 3 is a schematic diagram illustrating training of an envelope extraction network according to a first embodiment of the present invention;
fig. 4 is a schematic structural diagram of a cochlear implant signal processing apparatus according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a cochlear implant device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment:
in order to solve the technical problems of low processing efficiency, high power consumption, limited noise reduction processing effect, and incapability of being well adapted to the CI processing strategy of the noise reduction algorithm used in the related art, the present embodiment provides a cochlear implant signal processing method, which is applied to a cochlear implant device, and as shown in fig. 1, is a basic flow diagram of the cochlear implant signal processing method provided by the present embodiment, and the cochlear implant signal processing method provided by the present embodiment includes the following steps:
step 101, acquiring a training voice signal, preprocessing the training voice signal, inputting the preprocessed training voice signal to an envelope extraction network, and training the envelope extraction network.
Specifically, the envelope extraction network in this embodiment includes a first deep neural network (DNN1), a second deep neural network (DNN2), and a third deep neural network (DNN3) connected in sequence, where the first deep neural network may preferably be a Long Short-Term Memory network (LSTM) for extracting high-dimensional features from the input features, the second deep neural network is used to estimate features of the enhanced training speech signal, and the third deep neural network is used to extract channel envelopes whose number corresponds to the number of electrodes implanted in the body from the features estimated by the second deep neural network. It should be understood that the features in the present embodiment may be frequency domain features (such as logarithmic magnitude spectrum, etc.) or time domain features in practical applications.
The CI device currently on the market comprises an intracorporeal implant and an extracorporeal machine, the signal processing system of the CI of the present embodiment is preferably disposed in the extracorporeal machine, and the number of implanted electrodes of the CI product of the present embodiment may preferably be 22. The embodiment extracts the sub-band signal envelopes of the channels with the same number as the number of implanted electrodes of the actual CI product through the envelope extraction network, so that the envelopes contain richer original sound detail information.
It should be further noted that, in practical applications, the training speech signal may be an existing training speech sample, for example, obtained directly from a preset sample database, or obtained by self-recording, and this embodiment is not limited herein.
In an optional implementation manner of this embodiment, the acquiring the training speech signal includes: randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set; and generating a training voice signal under the preset signal-to-noise ratio based on the clean voice sample and the noise sample.
Specifically, the present embodiment constructs training speech samples from rows by selecting the appropriate speech database and noise database. The clean speech sample set of the envelope extraction network of the embodiment can be formed by randomly selecting 2500 sentences of speech from a training set of a Chinese speech database in Qinghua, and two types of noise, namely whitensose and base, can be selected from a noise X-92 noise set as noise sample sets. Then, the 2500 sentences of voice and noise are randomly combined under four conditions that the signal-to-noise ratio is-5 dB, 0dB and 5dB respectively and no noise exists, and then the training voice signal with noise for training the envelope extraction network can be generated.
As shown in fig. 2, which is a schematic flow chart of a network training method provided in this embodiment, in an optional implementation manner of this embodiment, a training speech signal is input to an envelope extraction network after being preprocessed, and training the envelope extraction network specifically includes the following steps:
step 1011, preprocessing the training voice signal to obtain the characteristics of continuous preset frame number;
step 1012, inputting the features of the continuous preset frame number into a first deep neural network comprising 128 neurons for high-dimensional feature extraction;
step 1013, the output of the first deep neural network passes through a second deep neural network composed of two fully-connected layers each including 512 neurons and a linear layer including 65 neurons, and the features of the enhanced training speech signal are estimated;
step 1014, extracting channel envelopes of which the number corresponds to the number of the in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons;
and step 1015, performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.
Specifically, in this embodiment, firstly, the noisy speech and the clean speech sample corresponding to the noisy speech may be respectively preprocessed to obtain a short-time fourier transform Log-energy spectrum feature (Log-power maps-LPS, 8 ms/frame, frame shift of 1 ms). Considering the correlation between adjacent frames of speech, the input of the envelope extraction network of this embodiment may be a continuous 25-frame feature as a continuous feature block. Inputting LPS characteristics (the dimension is 25 x 65) of noise-containing voice of 25 continuous frames into a layer of unidirectional DNN1 (such as LSTM), outputting estimated LPS (the dimension is 25 x 65) through DNN2 by the output of DNN1, continuously inputting the output of DNN2 into DNN3, outputting estimated 22 channel envelopes (the dimension is 25 x 22), and then performing parameter optimization on the network by adopting a back propagation algorithm to obtain a final network model.
It should be noted that in the back propagation process, the values of various parameters of the network are adjusted by a loss function, and the loss function is used for estimating the approximation degree of the predicted value and the true value of the trained network model, and is a convex optimization process, wherein the smaller the loss function is, the stronger the envelope extraction and processing capability of the model is. In this embodiment, the network parameters are updated according to the loss function, and then the iterative network training process continues until the network converges, that is, the function value of the loss function basically stops decreasing, that is, the training is completed to complete the network model of the envelope extraction network of this embodiment.
Further, in an optional implementation manner of this embodiment, the loss function of the envelope extraction network is represented as:
loss=wstft*lossstft+wenv*lossenv+wwaveform*losswaveform
therein, lossstftError of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signal, lossenvThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategywaveformError, w, of a clean speech sample for a simulated speech signal obtained by electrode mapping or the like based on a channel envelope extracted by a third deep neural networkstft、wenv、wwaveformThe weights are corresponding to the errors. It should be understood that the above-mentioned error of the present embodiment may preferably be an L1 paradigm error.
Fig. 3 is a training diagram of an envelope extraction network according to this embodiment, where a represents an input training speech signal, B represents clean speech samples used for generating the training speech signal together with noise, and C represents the envelope extraction network. Taking into account the above preferred scale of the envelope extraction network, in this embodiment, the features (dimension 25 × 65) of the noisy speech are input into DNN1 for high-dimensional feature extraction, the output of DNN1 passes through DNN2 and then outputs the estimated LPS features (dimension 25 × 65), and loss can be calculated based on the 65-dimensional LPS features output by DNN2 and the 65-dimensional LPS features of the corresponding clean speech samplesstftHere lossstftThe calculation of the method can use a weighting perception method commonly used by audio coding for reference, and aims to guide a model to be less sensitive to noise near formants and more sensitive to noise near valleys of non-formants; in addition, the loss is calculated based on the 22-dimensional channel envelope output by the DNN3 and the 22-dimensional channel envelope extracted by the existing traditional CI processing strategy, such as ACE (advanced combination encoders) strategyenvB, carrying out the following steps of; in addition, a simulated voice signal is constructed by the 22-dimensional channel envelope output by the DNN3, and the error loss of the simulated voice signal and a clean voice waveform is calculatedwaveformThus forcing the envelope extraction network learning of the present embodimentThe detail information of the voice is cleaned, and the defect that the time domain detail information cannot be effectively extracted by a traditional CI strategy is effectively overcome.
And finally, weighting the three errors by three adjustable weighting factors and adding the weighted errors to serve as an objective function for optimizing and learning the envelope extraction network. In a preferred embodiment, the whole envelope extraction network is trained by using an Adam gradient optimizer to obtain 60 epochs, and the model with the minimum verification loss is saved as the finally trained envelope extraction network model.
It should be noted that the loss function provided by this embodiment can guide, to a certain extent, how the model learning conventional CI processing strategy extracts envelope energy from the fourier energy spectrum, and can also force, in two domains, the network learning to approach the data distribution of clean speech, so that the envelope signal output by the envelope extraction network indirectly has more detailed information, and the defect that the conventional CI processing strategy itself cannot extract detailed information on the speech signal time sequence is overcome to a great extent.
And 102, preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals into a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of implanted electrodes in a human body.
Specifically, when the voice acquisition unit of the CI device, such as a microphone, receives an external voice signal, the voice acquisition unit preprocesses the signal and outputs the signal to the trained envelope extraction network, and in this embodiment, the number of implanted electrodes may be preferably 22, and then the network outputs 22 channel envelope signals. It should be noted that the envelope extraction network model in this embodiment is more simplified than the existing algorithm model, the size is only about 1.9MB, the number of parameters of the network is 0.46M, the system complexity is significantly reduced, and the average decoding processing time per frame (8ms) is about 0.1-0.2 ms. The total quantity of parameters and the calculation complexity of the network model are greatly reduced, and the power consumption is correspondingly reduced (the memory and the CPU are small), so that the feasibility of applying the envelope extraction network model to an actual CI product is ensured.
And 103, sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of the target number to the implanted electrodes of the corresponding number.
Specifically, the preprocessing, the nonlinear compression processing, the channel selection processing, the electrode mapping processing, the pulse modulation processing, and the generation of the simulated voice signal in this embodiment all adopt the same manner as the conventional CI processing strategy, and are not described herein again. It should be noted that, when channel selection is performed, the present embodiment may select N envelope signals with the largest energy and/or the highest signal-to-noise ratio for electrode mapping, for example, when the total number of implanted electrodes in the body is 22, the selected channels may be 8, and the electrical stimulation signals after pulse modulation are output to the corresponding 8 implanted electrodes.
By utilizing the strong learning capability of the envelope extraction network of the embodiment and constructing proper voice data with noise for training, the noise reduction effect can be well achieved, and the noise resistance which cannot be achieved by the traditional CI processing strategy can be achieved under the condition that other front-end noise reduction modules are not additionally arranged.
In addition, the envelope extraction network of the embodiment can learn to obtain an adjustable parameter similar to a parameter of a triangular filter bank in the existing CI processing strategy through the second deep neural network, and optimize the envelope extraction network based on the back propagation of an error obtained by simulated voice and real voice, so that the extracted envelope has more detailed information, the voice processing effect realized in a quiet environment is superior to the traditional CI processing strategy, and the noise reduction performance in a noise environment is obviously superior to the traditional CI processing strategy in which wiener filtering or some lightweight DNNs are used as a front-end noise reduction module.
According to the cochlear implant signal processing method provided by the embodiment of the invention, a training voice signal is obtained, the training voice signal is input to an envelope extraction network after being preprocessed, and the envelope extraction network is trained, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.
Second embodiment:
in order to solve the technical problems of low processing efficiency, high power consumption, limited noise reduction processing effect, and incapability of being well adapted to the CI processing strategy of the noise reduction algorithm used in the related art, this embodiment shows a cochlear implant signal processing apparatus, which is applied to a cochlear implant device, and specifically, referring to fig. 4, the cochlear implant signal processing apparatus of this embodiment includes:
the training module 401 is configured to acquire a training speech signal, input the training speech signal to an envelope extraction network after preprocessing, and train the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of implanted electrodes in a body from the features estimated by the second deep neural network;
an extraction module 402, configured to input the acquired real-time speech signal after preprocessing to a trained envelope extraction network, and extract channel envelopes whose number corresponds to the number of electrodes implanted in a body;
the processing module 403 is configured to sequentially perform nonlinear compression, channel selection, electrode mapping, and pulse modulation on the channel envelope extracted from the real-time speech signal, and then output a target number of electrode stimulation signals to a corresponding number of implanted electrodes in the body.
In an optional implementation manner of this embodiment, the training module 401, after preprocessing the training speech signal, inputs the training speech signal to the envelope extraction network, and when training the envelope extraction network, is specifically configured to: preprocessing a training voice signal to obtain the characteristics of continuous preset frame numbers; inputting the features of continuous preset frames into a first deep neural network containing 128 neurons for high-dimensional feature extraction; estimating the characteristics of the enhanced training speech signal by the output of the first deep neural network through a second deep neural network consisting of two fully-connected layers each containing 512 neurons and a linear layer containing 65 neurons; extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons; and (3) performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.
In an optional implementation manner of this embodiment, when the training module 401 acquires the training speech signal, it is specifically configured to: randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set; and generating a training voice signal under the preset signal-to-noise ratio based on the clean voice sample and the noise sample.
Further, in an optional implementation manner of this embodiment, the loss function of the envelope extraction network is represented as:
loss=wstft*lossstft+wenv*lossenv+wwaveform*losswaveform
therein, lossstftError of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signal, lossenvExtracting the channel envelope characteristics extracted for the third deep neural network and the clean speech sample by the traditional CI processing strategyError of the extracted envelope characteristic of the channel, losswaveformError, w, of the simulated speech signal, obtained based on the extracted channel envelope of the third deep neural network, from the clean speech samplestft、wenv、wwaveformThe weights are corresponding to the errors.
It should be noted that, all the cochlear prosthesis signal processing methods in the foregoing embodiments can be implemented based on the cochlear prosthesis signal processing apparatus provided in this embodiment, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the cochlear prosthesis signal processing apparatus described in this embodiment may refer to the corresponding process in the foregoing method embodiments, and details are not described here again.
The cochlear implant signal processing device provided by the embodiment is adopted to acquire a training voice signal, input the training voice signal into an envelope extraction network after preprocessing, and train the envelope extraction network, wherein the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence; preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body; and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number. The lightweight envelope extraction network with lower computation complexity provided by the invention effectively reduces power consumption, improves processing efficiency and noise reduction processing effect, and ensures seamless fusion of CI signal processing and noise reduction.
The third embodiment:
the present embodiment provides a cochlear implant device, as shown in fig. 5, which includes a processor 501, a memory 502 and a communication bus 503, wherein: the communication bus 503 is used for realizing connection communication between the processor 501 and the memory 502; the processor 501 is configured to execute one or more computer programs stored in the memory 502 to implement at least one step of the cochlear implant signal processing method in the first embodiment.
The present embodiments also provide a computer-readable storage medium including volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media include, but are not limited to, RAM (Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact disk Read-Only Memory), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
The computer-readable storage medium in this embodiment may be used for storing one or more computer programs, and the stored one or more computer programs may be executed by a processor to implement at least one step of the method in the first embodiment.
The present embodiment also provides a computer program, which can be distributed on a computer readable medium and executed by a computing device to implement at least one step of the method in the first embodiment; and in some cases at least one of the steps shown or described may be performed in an order different than that described in the embodiments above.
The present embodiments also provide a computer program product comprising a computer readable means on which a computer program as shown above is stored. The computer readable means in this embodiment may include a computer readable storage medium as shown above.
It will be apparent to those skilled in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software (which may be implemented in computer program code executable by a computing device), firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit.
In addition, communication media typically embodies computer readable instructions, data structures, computer program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to one of ordinary skill in the art. Thus, the present invention is not limited to any specific combination of hardware and software.
The foregoing is a more detailed description of embodiments of the present invention, and the present invention is not to be considered limited to such descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A cochlear implant signal processing method is applied to a cochlear implant device, and is characterized by comprising the following steps:
acquiring a training voice signal, preprocessing the training voice signal, inputting the training voice signal to an envelope extraction network, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;
preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals to a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of electrodes implanted in a human body;
and sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals, and then outputting the electrode stimulation signals of the target number to the implanted electrodes of the corresponding number.
2. The cochlear implant signal processing method of claim 1, wherein the training speech signal is input to an envelope extraction network after being preprocessed, and the training of the envelope extraction network comprises:
preprocessing the training voice signal to obtain the characteristics of continuous preset frame numbers;
inputting the features of the continuous preset frame number into a first deep neural network comprising 128 neurons for high-dimensional feature extraction;
estimating the characteristics of the enhanced training speech signal by the output of the first deep neural network through a second deep neural network consisting of two fully-connected layers each containing 512 neurons and a linear layer containing 65 neurons;
extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons;
and performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.
3. The cochlear implant signal processing method of claim 1, wherein said acquiring a training speech signal comprises:
randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set;
and generating a training voice signal under a preset signal-to-noise ratio based on the clean voice sample and the noise sample.
4. A cochlear implant signal processing method as claimed in any one of claims 1 to 3, wherein the loss function of the envelope extraction network is expressed as:
loss=wstft*lossstft+wenv*lossenv+wwaveform*losswaveform
therein, lossstftError, loss, of the feature output by the second deep neural network from the feature of a clean speech sample corresponding to the training speech signalenvThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategywaveformError, w, of the clean speech sample for a simulated speech signal derived based on the extracted channel envelope of the third deep neural networkstft、wenv、wwaveformThe weights are corresponding to the errors.
5. A cochlear implant signal processing device applied to a cochlear implant device, comprising:
the training module is used for acquiring a training voice signal, inputting the training voice signal into an envelope extraction network after preprocessing, and training the envelope extraction network; the envelope extraction network comprises a first deep neural network, a second deep neural network and a third deep neural network which are sequentially connected in sequence, wherein the first deep neural network is used for extracting high-dimensional features from input features, the second deep neural network is used for estimating the features of the enhanced training voice signals, and the third deep neural network is used for extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the features estimated by the second deep neural network;
the extraction module is used for preprocessing the acquired real-time voice signals and inputting the preprocessed real-time voice signals into a trained envelope extraction network, and extracting channel envelopes of which the number corresponds to the number of the electrodes implanted in the body;
and the processing module is used for sequentially carrying out nonlinear compression, channel selection, electrode mapping and pulse modulation on the channel envelopes extracted from the real-time voice signals and then outputting the electrode stimulation signals of a target number to the implanted electrodes of a corresponding number.
6. The cochlear implant signal processing apparatus of claim 5, wherein the training module, after preprocessing the training speech signal, inputs the preprocessed training speech signal to an envelope extraction network, and when training the envelope extraction network, is specifically configured to:
preprocessing the training voice signal to obtain the characteristics of continuous preset frame numbers;
inputting the features of the continuous preset frame number into a first deep neural network comprising 128 neurons for high-dimensional feature extraction;
estimating the characteristics of the enhanced training speech signal by the output of the first deep neural network through a second deep neural network consisting of two fully-connected layers each containing 512 neurons and a linear layer containing 65 neurons;
extracting channel envelopes of which the number corresponds to the number of in-vivo implanted electrodes from the output of the second deep neural network through a third deep neural network consisting of a full-connection layer containing 256 neurons and a linear layer containing 22 neurons;
and performing parameter optimization on the envelope extraction network by adopting a back propagation algorithm, and performing iterative training until the envelope extraction network converges to obtain the trained envelope extraction network.
7. The cochlear implant signal processing apparatus of claim 5, wherein the training module, when acquiring the training speech signal, is specifically configured to:
randomly selecting a target number of clean voice samples from a preset voice database, and selecting a preset type of noise sample from a preset noise set;
and combining and generating a training voice signal under a preset signal-to-noise ratio based on the clean voice sample and the noise sample.
8. A cochlear implant signal processing apparatus as claimed in any one of claims 5 to 7, wherein the loss function of the envelope extraction network is expressed as:
loss=wstft*lossstft+wenv*lossenv+wwaveform*losswaveform
therein, lossstftError, loss, of the feature output by the second deep neural network from the feature of the clean speech sample corresponding to the training speech signalenvThe error, loss, of the channel envelope feature extracted for the third deep neural network from the channel envelope feature extracted for the clean speech sample by the conventional CI processing strategywaveformError, w, of the clean speech sample for a simulated speech signal derived based on the extracted channel envelope of the third deep neural networkstft、wenv、wwaveformThe weights are corresponding to the errors.
9. A cochlear implant device, comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute one or more programs stored in the memory to implement the steps of the cochlear implant signal processing method according to any one of claims 1 to 4.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs executable by one or more processors to implement the steps of the cochlear implant signal processing method according to any one of claims 1 to 4.
CN201910999264.5A 2019-10-21 2019-10-21 Method and device for processing cochlear implant signals and computer readable storage medium Active CN110681051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910999264.5A CN110681051B (en) 2019-10-21 2019-10-21 Method and device for processing cochlear implant signals and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910999264.5A CN110681051B (en) 2019-10-21 2019-10-21 Method and device for processing cochlear implant signals and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110681051A true CN110681051A (en) 2020-01-14
CN110681051B CN110681051B (en) 2023-06-13

Family

ID=69113683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910999264.5A Active CN110681051B (en) 2019-10-21 2019-10-21 Method and device for processing cochlear implant signals and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110681051B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090312820A1 (en) * 2008-06-02 2009-12-17 University Of Washington Enhanced signal processing for cochlear implants
CN101645267A (en) * 2009-04-03 2010-02-10 中国科学院声学研究所 Voice processing method applied in electronic ear
CN101642399A (en) * 2008-12-16 2010-02-10 中国科学院声学研究所 Artificial cochlea speech processing method based on frequency modulation information and artificial cochlea speech processor
CN102314880A (en) * 2010-06-30 2012-01-11 上海视加信息科技有限公司 Coding and synthesizing method for voice elements
CN107767859A (en) * 2017-11-10 2018-03-06 吉林大学 The speaker's property understood detection method of artificial cochlea's signal under noise circumstance
CN109003601A (en) * 2018-08-31 2018-12-14 北京工商大学 A kind of across language end-to-end speech recognition methods for low-resource Tujia language
CN109841220A (en) * 2017-11-24 2019-06-04 深圳市腾讯计算机系统有限公司 Speech processing model training method, device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090312820A1 (en) * 2008-06-02 2009-12-17 University Of Washington Enhanced signal processing for cochlear implants
CN101642399A (en) * 2008-12-16 2010-02-10 中国科学院声学研究所 Artificial cochlea speech processing method based on frequency modulation information and artificial cochlea speech processor
CN101645267A (en) * 2009-04-03 2010-02-10 中国科学院声学研究所 Voice processing method applied in electronic ear
CN102314880A (en) * 2010-06-30 2012-01-11 上海视加信息科技有限公司 Coding and synthesizing method for voice elements
CN107767859A (en) * 2017-11-10 2018-03-06 吉林大学 The speaker's property understood detection method of artificial cochlea's signal under noise circumstance
CN109841220A (en) * 2017-11-24 2019-06-04 深圳市腾讯计算机系统有限公司 Speech processing model training method, device, electronic equipment and storage medium
CN109003601A (en) * 2018-08-31 2018-12-14 北京工商大学 A kind of across language end-to-end speech recognition methods for low-resource Tujia language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张晓薇 等: "《人工耳蜗的环境声识别研究》", 《声学学报》 *

Also Published As

Publication number Publication date
CN110681051B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
US11961533B2 (en) Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments
EP3469586B1 (en) Recursive noise power estimation with noise model adaptation
CN110111769B (en) Electronic cochlea control method and device, readable storage medium and electronic cochlea
CN105814911B (en) The feedback of energy signal for nerve stimulation gates
CN109414581B (en) Bionic rapid fitting of cochlear implant
US20230226352A1 (en) Neural Network Audio Scene Classifier for Hearing Implants
WO2020087716A1 (en) Auditory scene recognition method for artificial cochlea
CN113924786B (en) Neural network model for cochlear mechanics and processing
CN110681051A (en) Artificial cochlea signal processing method and device and computer readable storage medium
Zheng et al. A noise-robust signal processing strategy for cochlear implants using neural networks
WO2021077247A1 (en) Cochlear implant signal processing method and apparatus, and computer-readable storage medium
Legrand et al. Interactive evolution for cochlear implants fitting
Li Speech perception in a sparse domain
Beeston Perceptual compensation for reverberation in human listeners and machines
Wei et al. Improvement of Cochlear Implant Coding Strategy Based on Chinese Speech Boundary Information
Shahidi et al. Application of a graphical model to investigate the utility of cross-channel information for mitigating reverberation in cochlear implants
Kates Extending the Hearing-Aid Speech Perception Index (HASPI): Keywords, sentences, and context
Tu Data-driven speech intelligibility enhancement and prediction for hearing aids
Gallardo A Framework for the Development and Validation of Phenomenologically Derived Cochlear Implant Stimulation Strategies
Parameswaran Objective assessment of machine learning algorithms for speech enhancement in hearing aids
Soleymani Multi-talker Babble Noise Reduction in Cochlear Implant Devices
Khaleelur Rahiman et al. Design of Low Power Speech Processor-Based Cochlear Implants Using Modified FIR Filter with Variable Frequency Mapping
CN116582807A (en) Hearing compensation method for frequency selective damage of auditory system
Van Zyl Objective determination of vowel intelligibility of a cochlear implant model
Dachasilaruk Wavelet filter banks for cochlear implants

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant