CN101301240A - Electric cochlea Chinese fixed electric stimulation amplitude changing pattern in vitro voice processing equipment - Google Patents

Electric cochlea Chinese fixed electric stimulation amplitude changing pattern in vitro voice processing equipment Download PDF

Info

Publication number
CN101301240A
CN101301240A CNA2008100673152A CN200810067315A CN101301240A CN 101301240 A CN101301240 A CN 101301240A CN A2008100673152 A CNA2008100673152 A CN A2008100673152A CN 200810067315 A CN200810067315 A CN 200810067315A CN 101301240 A CN101301240 A CN 101301240A
Authority
CN
China
Prior art keywords
voice
module
changing pattern
chinese
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008100673152A
Other languages
Chinese (zh)
Other versions
CN100563608C (en
Inventor
关添
徐涛
朱子俨
叶大田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CNB2008100673152A priority Critical patent/CN100563608C/en
Publication of CN101301240A publication Critical patent/CN101301240A/en
Application granted granted Critical
Publication of CN100563608C publication Critical patent/CN100563608C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An electronic cochlea Chinese fixed electrostimulation amplitude variation pattern exosomatic speech processing device comprises an audio amplification sampling module, a storage module, a digital signal processor and a signal transmission module, wherein the speech signal processing program of the device comprises a preprocessing unit, an endpoint detecting unit, a speech recognition unit and a feature coding unit; the feature coding unit has a fixed electrostimulation amplitude variation pattern library and a stimulation patter selecting and adjusting module; moreover, the feature coding unit selects a corresponding electrostimulation amplitude variation pattern from a fixed electrostimulation pattern library according to the recognition result of a speech section, and adjusts an electrode channel selection pattern, a stimulation speed variation pattern and stimulation time, thereby finally generating a holonomic electrostimulation parameter corresponding to each stimulation electrode. The electronic cochlea Chinese fixed electrostimulation amplitude variation pattern exosomatic speech processing device adopts a speech recognition technology which takes a Chinese standard syllable as a recognition unit, and carries out electrostimulation coding and adjusting of a recognition result by means of a fixed electrostimulation amplitude variation pattern, thereby restoring the Chinese speech recognition capacity of an electronic cochlea wear more effectively.

Description

The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit
Technical field
The present invention relates to cochlear implant and recover deafness patient audition field, particularly a kind ofly adapt to feature of Chinese language, in conjunction with the cochlear implant fixed electric stimulation amplitude changing pattern in vitro voice processing equipment and the method for Chinese speech recognition technology.
Background technology
Cochlear implant (being also referred to as cochlear implant, electronic ear, bionic ear or artificial cochlea) is present unique device that can make anacusia patient recover audition.It relies on the physiological function of directly imitating peripheral auditory system with the excited acoustic fibers of weak current, produces the nerve similar to normal human ear and provides pattern, thereby recover patient's audition.Cochlear implant can help anacusia patient to recover the communication ability, increases that they accept the education, the chance of employment and social communication.Especially no matter deaf youngster is that prelingual or language back are deaf, in the time still can't hearing sound by means of sonifer, will cause them to lose the chance that obtains normal education throughout one's life, increases white elephant for family and society.Cochlear implant can help them to regain tone sense, obtains extraneous information and knowledge, becomes the people useful to society.
In the cochlear implant product, most of in-vitro voice processing units have adopted the DSP56000 series DSP of Motorola Inc. at present.This series DSP computational speed is slower, and power consumption is bigger, and ram in slice is too small, complex structure when it carries the Peripheral Interface expansion, the very not suitable in-vitro voice processing unit for electronic cochlear duct that requires high-performance, low-power consumption.Simultaneously, the audio signal processing method that existing product adopted all is based on the algorithm of English characteristics exploitation, Indo-European characteristics have mainly been met, and Chinese is as one of Han-Tibetan family, do not belong to Indo-European relatives' family of languages, both differ greatly, and this also is to cause the existing voice processing method be difficult to allow one of reason of China's patient satisfaction for the speech recognition effect of Chinese.Therefore, the in-vitro voice processing unit for electronic cochlear duct of high-performance, low-power consumption, and make full use of the exploitation Chinese characteristic, that utilize fixed electric stimulation amplitude changing pattern coding to transmit the cochlear implant method of speech processing of abundant information such as Chinese language tone and just seem especially important.
The method of speech processing that the external speech processes of existing cochlear implant product adopts can be divided into two big classes, and a class is based on feature extraction, promptly extracts features such as the fundamental frequency of voice signal and formant, produces the stimulus signal of respective electrode then.One class is based on the audio signal processing method of bank of filters, promptly voice signal is carried out the frequency-division section Filtering Processing.
Chinese is the language of single syllable structure, and English is the multisyllable structure.With " Xinhua dictionary " the 10th edition Chinese character of being received is object of statistics, and Chinese has 416 basic syllables (not containing tone), if consider tone information, then the total standard syllable (comprising tone) of Chinese is 1345.Therefore, Chinese speech is discerned as voice recognition unit with the standard syllable, the speech recognition ability of utilizing higher phonetic recognization rate to improve the cochlear implant wearer then is feasible; On the other hand, Chinese is a kind of sound, rhyme, tone language, and wherein tone has great importance for the correct understanding of Chinese.Therefore, utilization adds entering tone and carries out Chinese speech identification as recognition feature, and then utilizes stimulation rates that the tone feature is encoded, and has help for the speech recognition ability that improves the cochlear implant wearer equally.
Summary of the invention
The objective of the invention is to the problems referred to above, provide a kind of electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit and method, to recover the audition situation of China's deafness patient more effectively at present cochlear implant existence.
Electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit of the present invention comprises:
Audio frequency amplifies sampling module, and the voice signal that is used for gathering is converted to digital audio and video signals;
Memory module, its storage inside has the voice signal handling procedure;
Digital signal processor, it is connected with memory module with audio frequency amplification sampling module, by the voice signal handling procedure in the operation memory module digital audio and video signals that audio frequency amplifies after sampling module is changed is handled, and exports corresponding electrical stimulation parameters; And,
Signal transmission module, it is connected with digital signal processor, is used for described electrical stimulation parameters is transferred to part in the supporting cochlear implant body;
Described voice signal handling procedure comprises:
Pretreatment unit is used for sampling, divide frame to the voice signal of input;
The end-point detection unit is used for obtaining voice segments from pretreated every frame signal;
Voice recognition unit is used for voice segments is discerned; And,
The feature coding unit is used for the result of speech recognition is carried out the electricity irritation coding, and it has:
Fixed electric stimulation amplitude changing pattern storehouse, its storage inside have and all Chinese standard syllables that comprise tone information fixed electric stimulation amplitude changing pattern one to one; And,
Stimulus modelity is selected adjusting module, it is according to the recognition result of voice recognition unit to voice segments, from the fixed electric stimulation library, select corresponding electric stimulation amplitude changing pattern, and adjust the final complete electrical stimulation parameters that generates corresponding each stimulating electrode respectively according to initial consonant audible frequency, tone information, duration information counter electrode channel selecting pattern, stimulation rates changing pattern, the stimulation time of voice identification result.
Further, voice recognition unit adopts the continuous speech recognition algorithm based on the unspecified person of hidden Markov model (HMM), medium vocabulary quantity.This voice recognition unit comprises:
Pronunciation extracting module, this module are used for extracting MFCC (Mel frequency cepstral coefficient) and the first-order difference MFCC characteristic vector as this voice segments from voice segments;
The vector quantization module, this module will be done vector quantization one time from the characteristic vector that voice segments is extracted according to the code book that is drawn by the sound bank training;
The coupling computing module, this module is mated calculating according to the entry model that is drawn by the sound bank training to the characteristic vector after quantizing, and draws preliminary voice identification result; And,
Speech understanding and adjusting module, this module is adjusted recognition result according to semanteme, draws final recognition result.
Electric stimulation amplitude changing pattern in the said fixing electric stimulation amplitude changing pattern storehouse is corresponding one by one with 1345 standard syllables of " Xinhua dictionary " the 10th edition; Stimulus modelity selects adjusting module at first to select corresponding electric stimulation amplitude changing pattern according to voice identification result from the fixed electric stimulation library, adjust according to counter electrode channel selecting pattern, stimulation rates changing pattern, stimulation times such as the initial consonant audible frequency of voice identification result, tone information, duration information respectively then, finally generate the complete electrical stimulation parameters of each stimulating electrode.Electrical stimulation parameters comprises: according to the microelectrode channel selecting parameter of syllable initial consonant audible frequency or first vowel audible frequency coding; Stimulation rates parameter according to the tone information coding; Fixed electric stimulation amplitude of variation parameter according to syllable coding; And, according to the stimulation time parameter of duration information coding.
Above-mentioned digital signal processor adopts the TMS 320VC5509A digital signal processor of TI company, audio frequency amplifies sampling module and adopts the micro electromechanical silicon microphone SP0103NC3-3 of company of U.S. Knowles Electronics and WM8950 audio frequency to amplify the sampling filter chip, memory module adopts the ferroelectric memory FM25L512 that can read and write at a high speed, and signal transmission module adopts AD9833 chip and ADL5530 chip.Also can comprise OLED (OLED), be used to show the state of each functional module.
The present invention adopt the power management module control three joint button cells based on TPS63000, TPS65120, TPS71733 and TPS3103K33DBV provide+5V ,+3.3V and+the 12V running voltage.
Electric cochlea Chinese fixed electric stimulation amplitude changing pattern method of speech processing of the present invention is on the basis that feature of Chinese language is analysed in depth, and utilizing with the syllable is the audio recognition method of recognition unit and the result of method synthesis researchs such as fixed electric stimulation amplitude changing pattern generation and adjustment.This method of speech processing is divided into voice signal pretreatment, end-point detection, speech recognition and four parts of feature coding, specifically may further comprise the steps:
The voice signal pre-treatment step of sampling, dividing frame to input;
From pretreated every frame signal, obtain the step of voice segments by the end-point detection unit;
The step of the voice segments of obtaining being discerned by voice recognition unit; With,
Select corresponding electric stimulation amplitude changing pattern from the fixed electric stimulation amplitude changing pattern storehouse according to voice identification result, and adjust according to initial consonant audible frequency, tone information, duration information counter electrode channel selecting pattern, stimulation rates changing pattern, the stimulation time of voice identification result respectively, generate the step of the complete electrical stimulation parameters of corresponding each stimulating electrode, this electrical stimulation parameters is used to control partly makes this voice signal of user perception in the cochlear implant body.
It is the speech recognition technology of recognition unit that the present invention adopts with Chinese standard syllable, utilize fixed electric stimulation amplitude changing pattern that recognition result is carried out the electricity irritation coding and adjusts, and realize the in-vitro voice processing unit for electronic cochlear duct of high-performance, low-power consumption, thereby more effectively recover cochlear implant wearer's Chinese speech recognition ability based on TMS320YC5509A.Its main effect has:
A) adopted the continuous speech recognition technology of the unspecified person of comparative maturity, medium vocabulary quantity, this technology combines modules such as the extraction of Mel parameter attribute, vector quantization, the calculating of hidden Markov model pattern match and semantic understanding adjustment, the accuracy of speech recognition can reach about 95%, thereby provides guarantee for the speech discrimination score that improves the wearer.
B) on the basis of accurately speech recognition, the present invention has adopted and 1345 Chinese standard syllables fixed electric stimulation amplitude changing pattern one to one, guarantee wearer's speech discrimination score and the dependency of voice identification result accuracy under the electrical stimulation pattern, thereby can better recover to use the cochlear implant wearer's of Chinese speech recovery capability.
C) the final electrical stimulation parameters that generates of the present invention combines microelectrode channel selecting pattern, fixed electric stimulation amplitude changing pattern, stimulation rates changing pattern and the stimulation time four big features of electricity irritation, according to the frequency analysis characteristic of cochlea and the syllable frequency distribution characteristic of Chinese speech, above-mentioned four major parameters are carried out permutation and combination, keep the tone color characteristic of Chinese speech to greatest extent, thereby can better recover to use the cochlear implant wearer's of Chinese speech recovery capability.
D) adopting the TMS320VC5509A digital signal processor is core processor, has improved system's operational capability, has reduced system power dissipation.Because the interior ram space of this chip slapper is big, can save the module of expanding external RAM simultaneously, the simplified system design.
E) micro electromechanical (MEMS) the silicon microphone SP0103NC3-3 that adopts company of U.S. Knowles Electronics to produce has improved the acquisition quality to voice signal.This kind silicon microphone is that a kind of low cost, high-performance are to replace the new technique of traditional electret capacitor microphone (EMC), by utilizing integrated circuit technique micromachine system and electronic building brick to be integrated in the surface of silicon wafer panel, integrate production highly repeatability, excellent sound performance and scalability flexibly in the future, begin just to promote the quality of audio signal from the collection of voice signal.
F) adopt audio frequency to amplify sampling module, simplified circuit based on the WM8950 chip.Difference that WM8950 is integrated or single-ended mike possess microphone preamplifier (pregain able to programme), need not external amplifier of microphone, peripheral components is simple, running voltage is (2.5V-3.6V) flexibly, and signal to noise ratio is 95dB, and harmonic distortion is-85dB.This module possesses high pass filter able to programme and iir filter, removes high-frequency noise and disturbs.
G) adopt the ferroelectric memory FM25L512 that can read and write at a high speed, improved performance, reduced cost.This chip is the non-volatile FRAM of 512Kb that has industrial compatible SPI interface, can directly substitute memory devices such as corresponding EEPROM, FLASH, and performance is better, and can carry out the read-write operation that does not have time-delay with bus speed up to 20MHz, the data hold capacity in 10 years is provided simultaneously, and almost unlimited read-write number of times and extremely low operating current, data acquisition and storage capacity can be improved, flexible configuration storage and ram space, and cut down application cost and PCB space.
H) adopt OLED (OLED), improved performance, reduced energy consumption.OLED is different with traditional liquid crystal display mode, need not backlight, and the OLED display screen can do lighter and thinnerly, and visible angle is bigger, and can significantly save electric energy.
Description of drawings
Fig. 1 is the structured flowchart of present embodiment electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit;
Fig. 2 is the structured flowchart of its method of speech processing;
Fig. 3 is the fixed electric stimulation amplitude changing pattern stimulation amplitude distribution schematic diagram of Chinese syllable " ā, á, ǎ, à ".
The specific embodiment
Below in conjunction with description of drawings the specific embodiment of the present invention.
As shown in Figure 1, this electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit amplifies formations such as sampling module 4, signal transmission module 6, display module 2 and data-interface 7 by power management module 3, extension storage module 5, digital signal processor (DSP) 1, audio frequency.The audio frequency that inserts based on the WM8950 chip after the collection of voice signal process silicon microphone amplifies sampling module 4, digital audio and video signals after the processing inserts TMS320VC5509A digital signal processor 1, processor 1 reads the voice signal handling procedure from the extension storage module 5 based on FM25L512, and digital audio and video signals handled, result outputs to the signal transmission module 6 based on AD9833 and ADL5530, through processing section in the antenna transmission donor.The power supply of all in-vitro voice processing unit functional modules is by power management module 3 controls based on TPS63000, TPS65120, TPS71733 and TPS3103K33DBV.The state of each functional module is shown by the 1.04 cun OLED display modules 2 based on SSD1332.
Introduce the specific embodiment of each several part module below respectively:
Digital signal processor 1 adopts the TMS320VC5509A low power processor of TI company, has improved system's operational capability, has reduced system power dissipation.The high primary frequency 200MHz of this processor, calculation function is very powerful, is fit to very much the application requirements of portable set.Because the interior ram space of this chip slapper is big, need not to redesign expanded circuit simultaneously.Chip is also integrated abundant Peripheral Interface can be realized seamless links with a lot of peripheral apparatus, thereby save peripheral cell design, simplied system structure.
Audio frequency amplifies micro electromechanical (MEMS) silicon microphone SP0103NC3-3 and the WM8950 audio frequency amplification collection filtering chip that sampling module 4 has adopted company of U.S. Knowles Electronics (Knowles Electronics LLC) to produce.Silicon microphone SP0103NC3-3 wherein is that a kind of low cost, high-performance are to replace the new technique of traditional electret capacitor microphone (EMC).This mike has adopted packaged type cheaply, with the base plate and the top board of MEMS pick off, CMOS LSI, passive device and be disposed at base plate and top board between post utilize epoxide resin material FR-4 to be packaged together, by utilizing integrated circuit technique micromachine system and electronic building brick to be integrated in the surface of silicon wafer panel, integrate production highly repeatability, excellent sound performance and scalability flexibly in the future, make the present invention begin just to have promoted the quality of audio signal from the collection of voice signal.The utilization of silicon microphone collection audio signal is further processed and digitized based on the audio frequency amplification sampling module of WM8950 chip.WM8950 adopts the Sigma-Delta switch technology as a low-power consumption, high-quality monophonic ADC chip, and sample rate is adjustable flexibly from 8kHz to 48kHz.Difference that chip is also integrated or single-ended mike, possesses microphone preamplifier (pregain able to programme), need not external amplifier of microphone, peripheral components is simple, running voltage is (2.5V-3.6V) flexibly, digital voltage can hang down 1.71V, and the PLL in the sheet is used for providing required master clock from external reference clock.The chip signal to noise ratio is 95dB, and harmonic distortion is-85dB.This module possesses high pass filter able to programme and iir filter, removes high-frequency noise and disturbs excellent performance.This chip adopts 4 * 4mm, 24 pin QFN encapsulation, has saved hardware space.WM8950 utilizes the I2C interface to be connected the transmission of carrying out control signal and data with DSP with the McBSP0 interface.
Extension storage module 5 adopts the ferroelectric memory FM25L512 that can read and write at a high speed.This chip is the non-volatile FRAM of 512Kb that has industrial compatible SPI interface, can directly substitute memory devices such as corresponding EEPROM, FLASH, and performance is better, and can carry out the read-write operation that does not have time-delay with bus speed up to 20MHz, the data hold capacity in 10 years is provided simultaneously, and almost unlimited read-write number of times and extremely low operating current, data acquisition and storage capacity can be improved, flexible configuration storage and ram space, and cut down application cost and PCB space.The FM25L512 chip is connected with digital signal processor 1 by control signal wires such as data, address wire and sheet choosings.
Signal transmission module 6 adopts AD9833 and ADL5530 chip.AD9833 is a low-power consumption that ADI company produces, and programmable waveform generator can produce sine wave, triangular wave, square wave output.AD9833 need not outward element, and output frequency and phase place all can be easy to regulate by software programming, and frequency register is 28, and when the dominant frequency clock was 25MHz, precision was 0.1Hz, and when the dominant frequency clock was 1MHz, precision can reach 0.004Hz.AD9833 produces the sine wave of 10MHz among the present invention, as the carrier signal of wireless transmission, is amplified by ADL5530 after the carrier signal process ASK coded modulation.ADL5530 is one of intermediate frequency amplifier of ADI company product, is to support modal intermediate frequency such as 70MHz, 140MHz, 190MHz, 240MHz and 380MHz to use specialized designs.This intermediate frequency amplifier provides 41dBm high linearity, remains on 2.5dB for its noise coefficient of optimum signal dynamic range, built-in in addition upper offset circuit, supports 1KV Class 1C ESD.
The display module that display module 2 adopts based on OLED (OLED).The NVK-064SC001F-S OLED that OLED has selected for use Korea S Kolon company to produce, resolution is 96 * 64, the 65K color.OLED is different with traditional liquid crystal display mode, need not backlight, and the OLED display screen can do lighter and thinnerly, and visible angle is bigger, and can significantly save electric energy.The OLED screen is by driving chip SSD1332 control.The SSD1332 display driver chip is a CMOS type OLED passive type current driver, has programmable refresh rate, 16 grades driving principal current modulation, the color control of 256 rank contrasts and 65K color.Built-in capacity is 96 * 64 * 16 a image data memory (GDDRAM), supports that the resolution of display screen is 96 * 64.Driving between chip and the DSP utilizes external memory interface EMIF to be connected.Data on 8 position datawires will be as image data, writes or sense data buffer memory GDDRAM according to the gating mode of WR and RD; When D/C when low, the output of P3 mouth through command decoder decoding, and is written to corresponding command register, control display timing generator generator and ranks driver module, thereby the show state of control display screen as order.Built-in crystal oscillator produces clock for the display timing generator generator, determines sweep signal, drives the time that signal, line synchronising signal and field sync signal produce.The GTG decoder is determined each pixel R, G, B three primary colours driving current value separately according to view data, sends to row driver, makes it to produce the drive current of corresponding size.The function of line scanner mainly is the voltage scanning signal that produces on the display screen line, and row driver then provides 96 * 3 (RGB) road current drives OLED display screen, and drive current can 256 contrast systems from 0~200uA.
The power management module that power management module 3 adopts based on TPS63000, TPS65120, TPS71733 and TPS3103K33DBV.Electric power is provided by the empty button cells of three joint zinc, and TPS63000 is fixed on 5V by the step-up/down circuit with output voltage, this chip can be in full battery discharge voltage range efficient operation, prolonged battery life greatly.The voltage of+5V by low dropout voltage regulator TPS71733 convert to low ripple+3.3V voltage.TPS65120 provides then that the OLED display module needs+12V and+3.3V voltage.TPS3103K33DBV then is responsible for the monitoring and the control of whole power management module.
Data-interface 7 has adopted USB interface that DSP carries and jtag interface and PC to carry out communication.
With reference to Fig. 2, the speech processes program that the present invention adopts comprises:
Pretreatment unit S1 is used for sampling, divide frame to the voice signal of input;
End-point detection cell S 2 is used for obtaining voice segments from pretreated every frame signal;
Voice recognition unit S3 is used for voice segments is discerned; And,
Feature coding cell S 4 is used for the result of speech recognition is carried out the electricity irritation coding, and it has:
Fixed electric stimulation amplitude changing pattern storehouse S42, its storage inside has and all Chinese standard syllables that comprise tone information fixed electric stimulation amplitude changing pattern one to one; And,
Stimulus modelity is selected adjusting module S41, it is according to the recognition result of voice recognition unit to voice segments, from the fixed electric stimulation library, select corresponding electric stimulation amplitude changing pattern, and adjust according to counter electrode channel selecting pattern, stimulation rates changing pattern, stimulation times such as the initial consonant audible frequency of voice identification result, tone information, duration information, finally generate the complete electrical stimulation parameters of each stimulating electrode.
Voice recognition unit S3 adopts the continuous speech recognition algorithm based on the unspecified person of hidden Markov model (HMM), medium vocabulary quantity.Comprise:
Pronunciation extracting module S31, this module is used for extracting MFCC (Mel frequency cepstral coefficient) and the first-order difference MFCC characteristic vector as this voice segments from voice segments;
Vector quantization module S32, this module will be done vector quantization one time from the characteristic vector that voice segments is extracted according to the code book S35 that is drawn by the sound bank training;
Coupling computing module S33, this module is mated calculating according to the entry model S36 that is drawn by the sound bank training to the characteristic vector after quantizing, and draws preliminary voice identification result; And,
Speech understanding and adjusting module S34, this module is adjusted recognition result according to semanteme, draws final recognition result.
Further describe below.
Pretreatment unit S1 samples, A/D conversion, divides frame etc. voice signal.It is 16kHz that sampling utilizes AD converter, sample rate.Dividing frame is to be used for realizing cutting apart Audio Processing Unit, thereby makes in the single frames processing unit of voice signal after cutting apart steadily approximate.In order to embody the dependency of adjacent two frames, the present invention has adopted frame to move overlapping frame into frame length 1/2.When dividing frame, calculate for convenience, the present invention has adopted 512 frame length (32ms), and 256 frame moves, 16 quantifications.
The end-point detection of voice signal be meant the Applied Digital treatment technology to the input voice signal judge, accurately find out the starting point and the terminating point of voice segments.The starting point, the terminal point that correctly determine to discern voice in speech recognition are very important for improving discrimination.In the cochlear implant method of speech processing, utilizing accurately on the one hand, the end-point detection technology can make the operation time of system reduce (handling frame number reduces), the efficient of raising system; Can get rid of the noise jamming of unvoiced segments on the other hand, thereby the performance of subsequent treatment is increased.Middle-end point detecting unit S2 of the present invention adopts the end-point detection technology based on the cepstrum distance threshold.The cepstrum distance measurement method detects according to the track of the cepstrum distance of each signal frame and noise frame, and it also adopts the method for threshold judgement, is that co-energy method is compared, and threshold value is cepstrum distance threshold rather than short-time energy thresholding.Computational methods are as follows: 1) at first calculate the cepstrum coefficient of background noise, average after as background noise the estimated value of cepstrum coefficient represent with vectorial C.2) cepstrum coefficient of the every frame signal of calculating calculates the cepstrum coefficient of every frame signal and the cepstrum distance of noise cepstrum coefficient estimated value then.Formula is as follows: d cep ′ = 4.3429 ( c 0 - c 0 ′ ) 2 + 2 Σ n = 1 p ( c n - c n ′ ) 2 . In the formula: c ' nCepstrum coefficient corresponding to C; P is the exponent number of cepstrum coefficient.3) by step 2) each frame cepstrum distance of calculating obtains cepstrum apart from track, and utilize the method for threshold judgement in the approximate ENERGY METHOD to detect voice segments and noise segment then, thereby obtain the end points of voice signal.
Speech recognition technology is one of key technology of the present invention.Speech recognition is object of study with voice, and it is an important research direction of multimedia audio technology, is a branch of pattern recognition, relates to numerous areas such as physiology, psychology, linguistics, computer science and signal processing.What the present invention adopted is the continuous speech recognition technology of nonspecific people, medium vocabulary quantity, mainly comprises phonetic feature extraction, vector quantization, coupling calculating and semantic understanding and adjusting module.Feature Extraction and selection will be taken all factors into consideration the requirement of storage quantitative limitation and recognition performance, the present invention has adopted and has simulated Mel parameter-Mel frequency cepstral coefficient (the Mel-Frequency Ceptral Coefficients of people's ear to the speech processes characteristics to a certain extent, MFCC) and first-order difference as the phonetic feature that extracts, and utilize hidden Markov model (HMM) to carry out pattern match and calculate, understanding and the adjustment by Chinese semantic meaning at last draws final voice identification result.Wherein, the phonetic feature extracting method is as follows: 1) speech frame is carried out 512 discrete Fourier transform (DFT) (DFT), the frequency spectrum of these frame voice is: S ( k , m ) = Σ n = 0 511 s ( n , m ) exp ( - j 2 πnk 512 ) . Square will obtain the discrete power spectrum to the frequency spectrum delivery of voice.2) the discrete power spectrum that obtains is carried out filtering with the triangular filter group, obtain one group of coefficient.This group wave filter is simple triangle on frequency domain, is equally distributed on the Mel frequency axis.All wave filter have covered generally from 0Hz to two/one sample frequency.3) utilize discrete cosine transform (DCT) to try to achieve cepstrum coefficient: C i = 2 p Σ j = 1 p m j cos [ πi p ( j - 0.5 ) ] . 4) must first-order difference as follows according to standard MFCC: d ( n ) = 1 Σ i = - 2 2 i 2 Σ i = - 2 2 i × c ( n + i ) . The present invention has adopted the first-order difference MFCC of 12 dimension MFCC and 12 dimensions as characteristic vector.
The codebook size that the present invention adopts is 128, according to code book the characteristic vector of extracting is done a vector quantization (VQ).The probability distribution of characteristic vector just is reduced to a discrete probability distribution matrix like this.The entry model that draws according to data base's training carries out Model Matching calculating then, draws preliminary voice identification result, according to semanteme the result is adjusted, thereby draws the final result of speech recognition.
After voice identification result is determined, choose the amplitude changing pattern corresponding with recognition result from the S42 of fixed electric stimulation amplitude changing pattern storehouse, the stimulus modelity in the fixed electric stimulation amplitude changing pattern storehouse is fixing stimulation amplitude changing pattern corresponding to 1345 Chinese standard syllables (comprising tone).The statistical data of Chinese standard syllable is that the basic syllable with " Xinhua dictionary " the 10th edition is a standard, amounts to 1345 standard syllables (comprising tone), and the fixed electric stimulation amplitude changing pattern of these 1345 standard syllables is obtained by electric auditory experiment.
Adjust according to counter electrode channel selecting pattern, stimulation rates changing pattern, stimulation times such as the initial consonant audible frequency of voice identification result, tone information, duration information then, finally generate the complete electrical stimulation parameters of each stimulating electrode.Wherein,
Cochlear implant microelectrode channel selecting pattern is to encode according to the syllable initial consonant audible frequency of voice identification result.Position-tone principle according to the audition of people's ear electricity, syllable frequency meter with reference to Chinese speech, 1345 standard syllables (comprising tone) are divided into 8 groups according to its initial consonant audible frequency, the syllable that does not have initial consonant, an for example, then according to the audible frequency grouping of first vowel, 8 groups of telling are 8 kinds of cochlear implant electrode channel preference patterns of correspondence respectively, and are as shown in table 1 for ou etc.Every kind of pattern stimulates 8 passages simultaneously, has improved implantation person's perception rate and discrimination, has reduced because the leakage that individual variation is brought is listened and missed and listen phenomenon.
Figure A20081006731500141
The stimulation rates changing pattern is to encode according to the tone information of voice identification result.Concrete grammar is: the appreciable electricity irritation rate variation of the patient scope that obtains according to the speed-tone principle and the test of people's ear electricity audition, in this scope, determine five different electricity irritation speed, use respectively " soon ", " comparatively fast ", " in ", " slower ", " slowly " represent; The stimulus duration of each syllable is divided into five time periods, changes the tone information of describing this syllable by the stimulation rates speeds of five time periods, the four tones of standard Chinese pronunciation of Chinese syllable transfer four kinds of corresponding different electricity irritation rate variation patterns to see Table 2; The electricity irritation rate variation pattern of each syllable is set by table 2 according to the tone information of voice identification result, if that is: the tone of syllable is " one ", then its electricity irritation rate variation pattern is " fast; fast; fast; fast; fast ", if the tone of syllable is " two ", then its electricity irritation rate variation pattern be " in; in; very fast; fast; fast ", if the tone of syllable is " three ", then its electricity irritation rate variation pattern be " slower; slow; slower; in; very fast ", if the tone of syllable is " four tones of standard Chinese pronunciation ", then its electricity irritation rate variation pattern be " fast; very fast; in; slower; slow ".
The present invention adopts the four tones of standard Chinese pronunciation of four kinds of different electricity irritation rate variations mode-definition Chinese to transfer, and features such as secondary stimulus amplitude constitute electrical stimulation pattern, can improve perception and the identification of implantation person to tone information, thereby improves the identification ability to Chinese information.
Figure A20081006731500151
Stimulation time is to encode according to the duration information of voice identification result.The fixed electric stimulation amplitude changing pattern persistent period is transformable, and the time length that continues is consistent with the actual duration information of voice identification result syllable, for implantation person has transmitted duration information.After the electricity irritation of a syllable finishes, all can there be the quiescent time of a set time section, be convenient to implantation person and separates the stimulus modelity of distinguishing former and later two syllables, improve phonetic recognization rate.
Fig. 3 is the fixed electric stimulation amplitude changing pattern stimulation amplitude distribution schematic diagram of Chinese syllable " ā, á, ǎ, à ".Vertical coordinate is represented different selected passages among the figure, and on behalf of the time of four syllable constant stimulus patterns, abscissa distribute, and gray scale is represented the stimulation amplitude of constant stimulus pattern among the figure.
The present invention's combination is the speech recognition technology of recognition unit with the standard syllable, utilize fixed electric stimulation amplitude changing pattern that recognition result is carried out the electricity irritation coding and adjusts, the perfect cochlear implant method of speech processing of suitable feature of Chinese language, generated the stimulating current parameter that meets method, thereby be that the audition of better recovering China patient is laid a good foundation.

Claims (10)

1, a kind of electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit is characterized in that comprising:
Audio frequency amplifies sampling module, and the voice signal that is used for gathering is converted to digital audio and video signals;
Memory module, its storage inside has the voice signal handling procedure;
Digital signal processor, it is connected with memory module with audio frequency amplification sampling module, by the voice signal handling procedure in the operation memory module digital audio and video signals that audio frequency amplifies after sampling module is changed is handled, and exports corresponding electrical stimulation parameters; And,
Signal transmission module, it is connected with digital signal processor, is used for described electrical stimulation parameters is transferred to part in the supporting cochlear implant body;
Described voice signal handling procedure comprises:
Pretreatment unit is used for sampling, divide frame to the voice signal of input;
The end-point detection unit is used for obtaining voice segments from pretreated every frame signal;
Voice recognition unit is used for voice segments is discerned; And,
The feature coding unit is used for the result of speech recognition is carried out the electricity irritation coding, and it has:
Fixed electric stimulation amplitude changing pattern storehouse, its storage inside have and all Chinese standard syllables that comprise tone information fixed electric stimulation amplitude changing pattern one to one; And,
Stimulus modelity is selected adjusting module, it is according to the recognition result of voice recognition unit to voice segments, from the fixed electric stimulation library, select corresponding electric stimulation amplitude changing pattern, and adjust the final complete electrical stimulation parameters that generates corresponding each stimulating electrode respectively according to initial consonant audible frequency, tone information, duration information counter electrode channel selecting pattern, stimulation rates changing pattern, the stimulation time of voice identification result.
2, in-vitro voice processing unit according to claim 1 is characterized in that: described end-point detection unit adopts the end-point detection algorithm based on the cepstrum distance threshold.
3, in-vitro voice processing unit according to claim 1 is characterized in that: described voice recognition unit adopts the continuous speech recognition algorithm based on the unspecified person of hidden Markov model, medium vocabulary quantity.
4, in-vitro voice processing unit according to claim 3 is characterized in that described voice recognition unit comprises:
Pronunciation extracting module, this module are used for extracting MFCC and the first-order difference MFCC characteristic vector as this voice segments from voice segments;
The vector quantization module, this module will be done vector quantization one time from the characteristic vector that voice segments is extracted according to the code book that is drawn by the sound bank training;
The coupling computing module, this module is mated calculating according to the entry model that is drawn by the sound bank training to the characteristic vector after quantizing, and draws preliminary voice identification result; And,
Speech understanding and adjusting module, this module is adjusted recognition result according to semanteme, draws final recognition result.
5, according to each described in-vitro voice processing unit of claim 1-4, it is characterized in that: described digital signal processor adopts TI company's T MS320VC5509A digital signal processor, audio frequency amplifies acquisition module and adopts the micro electromechanical silicon microphone SP0103NC3-3 of company of U.S. Knowles Electronics and WM8950 audio frequency to amplify the collection filtering chip, memory module adopts the ferroelectric memory FM25L512 that can read and write at a high speed, and signal transmission module adopts AD9833 chip and ADL5530 chip.
6, in-vitro voice processing unit according to claim 5 is characterized in that: also comprise OLED, be used to show the state of each functional module.
7, in-vitro voice processing unit according to claim 5 is characterized in that: adopt power management module control three joint button cells based on TPS63000, TPS65120, TPS71733 and TPS3103K33DBV provide+5V ,+3.3V and+the 12V running voltage.
8, the external method of speech processing of a kind of electric cochlea Chinese fixed electric stimulation amplitude changing pattern is characterized in that may further comprise the steps:
The voice signal pre-treatment step of sampling, dividing frame to input;
From pretreated every frame signal, obtain the step of voice segments by the end-point detection unit;
The step of the voice segments of obtaining being discerned by voice recognition unit; With,
According to voice identification result, select corresponding electric stimulation amplitude changing pattern from the fixed electric stimulation amplitude changing pattern storehouse, and adjust according to initial consonant audible frequency, tone information, duration information counter electrode channel selecting pattern, stimulation rates changing pattern, the stimulation time of voice identification result respectively, generate the step of the complete electrical stimulation parameters of corresponding each stimulating electrode, this electrical stimulation parameters is used to control partly makes this voice signal of user perception in the cochlear implant body.
9, method of speech processing according to claim 8 is characterized in that, the method for adjusting electricity irritation rate variation pattern according to the tone information of voice identification result is:
The appreciable electricity irritation rate variation of the patient scope that obtains according to the speed-tone principle and the test of people's ear electricity audition, in this scope, determine five different electricity irritation speed, use respectively " soon ", " comparatively fast ", " in ", " slower ", " slowly " represent;
The stimulus duration of each syllable is divided into five time periods, changes the tone information of describing this syllable by the stimulation rates speeds of five time periods, the four tones of standard Chinese pronunciation of Chinese syllable transfer four kinds of corresponding different electricity irritation rate variation patterns to see Table 2;
Figure A2008100673150004C1
The electricity irritation rate variation pattern of each syllable is set by table 2 according to the tone information of voice identification result.
10, method of speech processing according to claim 8 is characterized in that: the electric stimulation amplitude changing pattern in the described fixed electric stimulation amplitude changing pattern storehouse is corresponding one by one with 1345 standard syllables of " Xinhua dictionary " the 10th edition.
CNB2008100673152A 2008-05-21 2008-05-21 The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit Expired - Fee Related CN100563608C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2008100673152A CN100563608C (en) 2008-05-21 2008-05-21 The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2008100673152A CN100563608C (en) 2008-05-21 2008-05-21 The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit

Publications (2)

Publication Number Publication Date
CN101301240A true CN101301240A (en) 2008-11-12
CN100563608C CN100563608C (en) 2009-12-02

Family

ID=40111465

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2008100673152A Expired - Fee Related CN100563608C (en) 2008-05-21 2008-05-21 The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit

Country Status (1)

Country Link
CN (1) CN100563608C (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101569777A (en) * 2009-06-09 2009-11-04 钱本文 Electrostimulation system
CN103035237A (en) * 2011-09-30 2013-04-10 西门子公司 Chinese speech signal processing method, device and hearing aid device
CN104038864A (en) * 2013-03-08 2014-09-10 亚德诺半导体股份有限公司 Microphone Circuit Assembly And System With Speech Recognition
CN104835495A (en) * 2015-05-30 2015-08-12 宁波摩米创新工场电子科技有限公司 High-definition voice recognition system based on low pass filter
CN104954532A (en) * 2015-06-19 2015-09-30 深圳天珑无线科技有限公司 Voice recognition method, voice recognition device and mobile terminal
CN104606762B (en) * 2015-01-30 2017-11-28 上海泰亿格康复医疗科技股份有限公司 A kind of sense of hearing integration training aids based on digital information processing system
CN108280188A (en) * 2018-01-24 2018-07-13 成都安信思远信息技术有限公司 Intelligence inspection business platform based on big data
CN109344099A (en) * 2018-08-03 2019-02-15 清华大学 FPGA application system wirelessly debugs download apparatus
CN111150934A (en) * 2019-12-27 2020-05-15 重庆大学 Evaluation system of Chinese tone coding strategy of cochlear implant

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101569777A (en) * 2009-06-09 2009-11-04 钱本文 Electrostimulation system
CN103035237A (en) * 2011-09-30 2013-04-10 西门子公司 Chinese speech signal processing method, device and hearing aid device
CN103035237B (en) * 2011-09-30 2015-04-29 西门子公司 Chinese speech signal processing method, device and hearing aid device
CN104038864A (en) * 2013-03-08 2014-09-10 亚德诺半导体股份有限公司 Microphone Circuit Assembly And System With Speech Recognition
CN104038864B (en) * 2013-03-08 2018-04-10 亚德诺半导体股份有限公司 Microphone circuit assembly and system with speech recognition
CN104606762B (en) * 2015-01-30 2017-11-28 上海泰亿格康复医疗科技股份有限公司 A kind of sense of hearing integration training aids based on digital information processing system
CN104835495A (en) * 2015-05-30 2015-08-12 宁波摩米创新工场电子科技有限公司 High-definition voice recognition system based on low pass filter
CN104835495B (en) * 2015-05-30 2018-05-08 宁波摩米创新工场电子科技有限公司 A kind of high definition speech recognition system based on low-pass filtering
CN104954532A (en) * 2015-06-19 2015-09-30 深圳天珑无线科技有限公司 Voice recognition method, voice recognition device and mobile terminal
CN108280188A (en) * 2018-01-24 2018-07-13 成都安信思远信息技术有限公司 Intelligence inspection business platform based on big data
CN109344099A (en) * 2018-08-03 2019-02-15 清华大学 FPGA application system wirelessly debugs download apparatus
CN109344099B (en) * 2018-08-03 2020-06-19 清华大学 Wireless debugging and downloading device for FPGA application system
CN111150934A (en) * 2019-12-27 2020-05-15 重庆大学 Evaluation system of Chinese tone coding strategy of cochlear implant

Also Published As

Publication number Publication date
CN100563608C (en) 2009-12-02

Similar Documents

Publication Publication Date Title
CN100563608C (en) The electric cochlea Chinese fixed electric stimulation amplitude changing pattern in-vitro voice processing unit
CN1681002B (en) Speech synthesis system, speech synthesis method
US5035242A (en) Method and apparatus for sound responsive tactile stimulation of deaf individuals
CN201532762U (en) Simultaneous interpretation device special for individuals
CN106782591A (en) A kind of devices and methods therefor that phonetic recognization rate is improved under background noise
CN102499815A (en) Device for assisting deaf people to perceive environmental sound and method
CN109036395A (en) Personalized speaker control method, system, intelligent sound box and storage medium
CN102426834B (en) Method for testing rhythm level of spoken English
CN102208186A (en) Chinese phonetic recognition method
Freitas et al. An introduction to silent speech interfaces
CN110349565B (en) Auxiliary pronunciation learning method and system for hearing-impaired people
CN108320625A (en) Vibrational feedback system towards speech rehabilitation and device
CN108520759A (en) Time-frequency characteristics image extraction method for Parkinson's disease speech detection
Tillmann Pitch processing in music and speech.
CN205486100U (en) Wrist -watch translater
CN107358955A (en) A kind of light harvesting, vibratory stimulation are in the voice signal output device and method of one
Wand Advancing electromyographic continuous speech recognition: Signal preprocessing and modeling
CN201431454Y (en) Human biological information acquisition system with dialect recognition function
CN103955149A (en) DSP voice recognition used for laser large-screen splicing control system
Wada et al. Development and evaluation of a tactile display for a tactile vocoder
Ainsworth et al. Auditory processing of speech
CN115019820A (en) Touch sensing and finger combined sounding deaf-mute communication method and system
CN202307120U (en) Device for assisting deaf person to perceive environmental sound
Simpson et al. Detecting larynx movement in non-pulmonic consonants using dual-channel electroglottography
CN102426839B (en) Voice recognition method for deaf people

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Shenzhen Graduate School of Guangdong Province, Shenzhen City Xili 518055 Nanshan District University City Tsinghua University

Patentee after: Shenzhen International Graduate School of Tsinghua University

Address before: Shenzhen Graduate School of Guangdong Province, Shenzhen City Xili 518055 Nanshan District University City Tsinghua University

Patentee before: GRADUATE SCHOOL AT SHENZHEN, TSINGHUA University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091202