WO2017025108A2 - Sequencing the speech signal - Google Patents

Sequencing the speech signal Download PDF

Info

Publication number
WO2017025108A2
WO2017025108A2 PCT/EG2016/000029 EG2016000029W WO2017025108A2 WO 2017025108 A2 WO2017025108 A2 WO 2017025108A2 EG 2016000029 W EG2016000029 W EG 2016000029W WO 2017025108 A2 WO2017025108 A2 WO 2017025108A2
Authority
WO
WIPO (PCT)
Prior art keywords
frequency bands
presented
speech
frequency band
sequencing
Prior art date
Application number
PCT/EG2016/000029
Other languages
French (fr)
Other versions
WO2017025108A3 (en
Inventor
Taha Kais Taha AL-SHALASH
Original Assignee
Al-Shalash Taha Kais Taha
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Al-Shalash Taha Kais Taha filed Critical Al-Shalash Taha Kais Taha
Priority to PCT/EG2016/000029 priority Critical patent/WO2017025108A2/en
Publication of WO2017025108A2 publication Critical patent/WO2017025108A2/en
Publication of WO2017025108A3 publication Critical patent/WO2017025108A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R25/00Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
    • H04R25/35Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception using translation techniques
    • H04R25/353Frequency, e.g. frequency shift or compression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility

Abstract

A method of operating an audio processing device to improve a user's perception to speech sound. The method comprising: Sequencing Speech Signal by splitting an audio signal into a plurality of frequency bands, and presenting in sequence (Non-simultaneous) these speech frequency bands from the high frequency bands to the low frequency bands.

Description

SEQUENCING THE SPEECH SIGNAL
TECHNICAL FIELD
THE PRESENT APPLICATION RELATES TO IMPROVE SPEECH PERCEPTION, E.G. SPEECH INTELLIGIBILITY, IN PARTICULAR TO IMPROVING SOUND PERCEPTION FOR A PERSON, E.G. A HEARING IMPAIRED PERSON.
THE APPLICATION RELATES TO AN AUDIO PROCESSING DEVICE AND IT'S USE LIKE ALL KINDS OF HEARING AIDS AND COCHLEAR IMPLANTS.
THE APPLICATION FURTHER RELATES TO A DATA PROCESSING SYSTEM COMPRISING A PROCESSOR PERFORMING THE METHOD.
THE DISCLOSURE MAY BE USEFUL IN APPLICATIONS SUCH AS COMMUNICATION DEVICES, E.G. TELEPHONES, OR LISTENING DEVICES, E.G. HEARING INSTRUMENTS, HEADSETS, HEAD PHONES, ACTIVE EAR PROTECTION DEVICES.
BACKGROUND ART
THE FOLLOWING WAYS FOR THE LOSS OF HEARING DESCRIBED IS TO MODIFY SPEECH TO MAKE IT MORE INTELLIGIBLE FOR PEOPLE WITH SENSORY NEURAL HEARING LOSS. 1- ENHANCEMENT SPECTRAL SHAPE:-
IT INCREASE THE GAIN FOR FREQUENCIES WITH HIGH CONCENTRATION OF ACOUSTIC ENERGY IN THE SPEECH WAVE (E.G. FORMANTS) TO MAKE THESE PEAKS MORE PROMINENT IN SPEECH SPECTRUM, UNFORTUNATELY, IMPROVEMENT IN INTELLIGIBILITY HAVE BEEN SMALL OR NON-EXISTENT.
2- ENHANCEMENT OF CONSONANT TO VOWEL RATIO :-
IT INCREASE THE GAIN FOR CONSONANT SOUNDS BUT NOT FOR VOWEL SOUNDS BASED ON THE IMPORTANCE OF CONSONANTS SOUND FOR INTELLIGIBILITY AND TO PREVENT MASKING OF CONSONANTS BY VOWELS.
3- TRANSIENT ENHANCEMENT:-
IT INCREASE THE RATE OF CHANGE IN INTENSITY FOR SOUNDS BASED ON THAT MANY CONSONANT SOUNDS HAVE RAPID INTENSITY CHANGES THAT MIGHT BE IMPORTANT FOR THESE SOUNDS RECOGNITION, BUT ITS USEFULNESS IN REAL LIFE MIGHT BE DISAPPOIMTNG.
4- ENHANCEMENT OF DURATION:-
VOWELS PRECEDING A VOICED CONSONANT ARE LONGER IN DURATION THAN VOWELS THAT PRECEDE AN UNVOICED CONSONANT. SO ENHANCEMENT THE DURATION OF VOWELS MIGHT BE USEFUL IN RECOGNITION OF FOLLOWING CONSONANT.
UNFORTUNATELY THE INTELLIGIBILITY WITH THIS METHOD IS NOT HIGH.
5- SPEECH SIMPLIFICATION;-
INTERACTION OF MANY CUES OF SPEECH STIMULUS MAY BE DIFFICULT FOR HEARING IMPAIRED PERSON WITH A LIMITED HEARING ABILITY TO SEPARATE THESE CUES, SO REPLACE THE SPEECH SIGNAL AS AN EXTREME WITH PURE TONES WILL DECREASE THE CUES NEEDED TO BE RECOGNIZED AND SEPARATED, THAT MIGHT BE USEFUL IN INTELLIGIBILITY.
THIS METHOD APPEAR TO BE BENEFICIAL ONLY FOR SEVER HEARING IMPAIRED.
6- ENHANCEMENT BY RE-SYNTHESIS:-
IT CONSIST OF RECOGNITION OF SPEECH SIGNAL BY HEARING AID PROCESSOR THEN RESYNTHESIZED IT IN A CLEAR, NOISE FREE WAY.
THIS METHOD IS HIGHLY AFFECTED BY NOISE WHILE ACCENTS AND EMOTION WILL NOT BE CONVOYED. DISCLOSURE OF INVENTION
SEQUENCING SPEECH SIGNAL:
MASKING WHICH OBSCURES A SOUND IMMEDIATELY FOLLOWING THE MASKER IS CALLED FORWARD MASKING. THAT MEAN THE SIGNAL IS PERSIST FOR SOME TIME AFTER IT TURNED OFF.
UPWARD SPREAD OF MASKING IS LOW-FREQUENCY SOUNDS MASKING HIGH-FREQUENCY SOUNDS.
WE CAN TAKE BENEFIT OF THESE TWO FACTS BY SEQUENCING
(NON-SIMULTANEOUS) THE SPEECH SIGNAL FOR EACH SPEECH PHONEME SO THE HIGH FREQUENCY INFORMATION IS PRESENTED FIRST THEN THE LOW FREQUENCY INFORMATION IS PRESENTED LATER. BY THIS MECHANISM THE UPWARD SPREAD OF MASKING WILL NOT OCCUR BECAUSE THE HIGH FREQUENCY WILL BE PRESENTED WITHOUT THE LOW FREQUENCY PART OF SPEECH SIGNAL.
THERE ARE TWO SUGGESTED METHODS TO SEQUENCING SPEECH SIGNAL: a. EACH FREQUENCY BAND PRESENTED ALONE FROM HIGH FREQUENCY BANDS TO LOW FREQUENCY BANDS, SEE FIG (1), AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN MIDDLE FREQUENCY BAND PRESENTED SECOND THEN LOW FREQUENCY BAND PRESENTED LASTLY. b. HIGH FREQUENCY BANDS PRESENTED FIRST THEN LOWER FREQUENCY BANDS ADDED TO THE HIGHER BANDS THEN PRESENTED SIMULTANEOUSLY, SEE FIG (2) AS EXAMPLE THE HIGH FREQUENCY BAND PRESENTED FIRST THEN HIGH AND MIDDLE FREQUENCY BANDS PRESENTED SECOND THEN ALL FREQUENCY BANDS PRESENTED LASTLY .
DURATION OF PRESENTATION OF EACH FREQUENCY BAND:-
THERE ARE MANY METHODS COULD BE USED TO DETERMINE THE DURATION OF EACH FREQUENCY BAND PRESENTATION, I WILL DISCUSS TWO METHODS OF THEM AS EXAMPLES :-
1- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME COULD BE CONSTANT, I.E. THE DURATION FIXED FOR ANY PHONEME , BUT WE MUST BE SURE THAT THE SUM OF ALL FREQUENCY BANDS DURATION NOT EXCEED THE DURATION OF ANY PHONEME, THIS COULD BE DONE BY PROVIDE RELATIVELY SMALL DURATION FOR ALL FREQUENCY BANDS EXCEPT THE LAST ONE THAT COULD BE PRESENTED AS LONG AS PHONEME PRESENTED; FOR EXAMPLE THE DURATION OF PHONEME IS 90 MSEC THEN HIGH FREQUENCY PART LAST FOR (E.G. 25MSEC), MIDDLE FREQUENCY BAND LAST FOR (E.G. 25 MSEC), AND LOW FREQUENCY BAND LAST (E.G. 40 MSEC) SEE FIG (3).
2- THE DURATION OF EACH FREQUENCY BAND OF EACH PHONEME CORRELATED WITH THE DISTRIBUTION OF ACOUSTIC ENERGY ACROSS FREQUENCIES, FOR EXAMPLE THE MORE ACOUSTIC ENERGY WITHIN HIGH FREQUENCY REGION COULD INDICATE MORE DURATION FOR HIGH FREQUENCY BANDS AND VISE VERSA.
BRIEF DESCRIPTION OF FIGURES
FIG (1): FIRST SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1=HIGH FREQUENCY BAND PRESENTED 1st, 2= MIDDLE FREQUENCY BAND PRESENTED 2nd, 3= LOW FREQUENCY BAND PRESENTED LASTLY.
FIG (2): SECOND SUGGESTED METHOD TO SEQUENCING A SINGLE SPEECH PHONEME SIGNAL. 1= HIGH FREQUENCY BAND PRESENTED 1st, 2= HIGH AND MIDDLE FREQUENCY BANDS PRESENTED 2nd, 3= ALL FREQUENCY BANDS PRESENTED LASTLY.
FIG (3): EXAMPLE ON FREQUENCY BAND DURATION. l=DURATION OF SINGLE PHONEME (90 MSEC), 2= HIGH FREQUENCY BAND PRESENTED FOR 25 MSEC, 3= MIDDLE FREQUENCY BAND PRESENTED FOR 25 MSEC, 4= LOW FREQUENCY BAND PRESENTED FOR 40 MSEC.

Claims

CLAIMS :-
1 - A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS. WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON- SIMULTANEOUS).
2- A METHOD OF OPERATING AN AUDIO PROCESSING DEVICE TO IMPROVE A USER'S PERCEPTION OF AN SPEECH SOUND, THE METHOD COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.
3- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN SAID FREQUENCY BANDS PRESENTED IN SEQUENCE (NON-SIMULTANEOUS).
4- A HEARING ASSISTANCE APPARATUS, COMPRISING: SPLITTING AN AUDIO SIGNAL INTO A PLURALITY OF FREQUENCY BANDS, WHEREIN THE LOWER FREQUENCY BANDS PRESENTED TOGETHER WITH THE HIGHER FREQUENCY BANDS IN SEQUENCE.
PCT/EG2016/000029 2016-10-04 2016-10-04 Sequencing the speech signal WO2017025108A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EG2016/000029 WO2017025108A2 (en) 2016-10-04 2016-10-04 Sequencing the speech signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EG2016/000029 WO2017025108A2 (en) 2016-10-04 2016-10-04 Sequencing the speech signal

Publications (2)

Publication Number Publication Date
WO2017025108A2 true WO2017025108A2 (en) 2017-02-16
WO2017025108A3 WO2017025108A3 (en) 2017-07-06

Family

ID=57983118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EG2016/000029 WO2017025108A2 (en) 2016-10-04 2016-10-04 Sequencing the speech signal

Country Status (1)

Country Link
WO (1) WO2017025108A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653285A (en) * 2020-06-01 2020-09-11 北京猿力未来科技有限公司 Packet loss compensation method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2177054B1 (en) * 2007-07-31 2014-04-09 Phonak AG Method for adjusting a hearing device with frequency transposition and corresponding arrangement
CN102293017B (en) * 2009-11-25 2014-10-15 松下电器产业株式会社 System, method and integrated circuit for hearing aid
DK2649812T3 (en) * 2010-12-08 2014-08-04 Widex As HEARING AND A PROCEDURE FOR IMPROVING SPEECHING
EP2880761B1 (en) * 2012-08-06 2020-10-21 Father Flanagan's Boys' Home Doing Business as Boy Town National Research Hospital Multiband audio compression system and method
DE102015201073A1 (en) * 2015-01-22 2016-07-28 Sivantos Pte. Ltd. Method and apparatus for noise suppression based on inter-subband correlation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653285A (en) * 2020-06-01 2020-09-11 北京猿力未来科技有限公司 Packet loss compensation method and device
CN111653285B (en) * 2020-06-01 2023-06-30 北京猿力未来科技有限公司 Packet loss compensation method and device

Also Published As

Publication number Publication date
WO2017025108A3 (en) 2017-07-06

Similar Documents

Publication Publication Date Title
JP4761506B2 (en) Audio processing method and apparatus, program, and audio system
Huang et al. Lombard speech model for automatic enhancement of speech intelligibility over telephone channel
Krause et al. Evaluating the role of spectral and envelope characteristics in the intelligibility advantage of clear speech
Clarke et al. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech
Wang et al. Improving the intelligibility of speech for simulated electric and acoustic stimulation using fully convolutional neural networks
Kusumoto et al. Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments
Amano-Kusumoto et al. A review of research on speech intelligibility and correlations with acoustic features
Henry et al. Noise reduction in cochlear implant signal processing: A review and recent developments
Kleczkowski et al. Lombard effect in Polish speech and its comparison in English speech
Müsch et al. Using statistical decision theory to predict speech intelligibility. II. Measurement and prediction of consonant-discrimination performance
Bhattacharya et al. Combined spectral and temporal enhancement to improve cochlear-implant speech perception
WO2017025108A2 (en) Sequencing the speech signal
Fitzpatrick et al. The effect of seeing the interlocutor on speech production in different noise types
Saba et al. Formant priority channel selection for an “n-of-m” sound processing strategy for cochlear implants
Huang et al. Combination and comparison of sound coding strategies using cochlear implant simulation with mandarin speech
WO2017036486A2 (en) Enhancement of temporal information
Mohammadi et al. Making Conversational Vowels More Clear.
Amano et al. Acoustic features of pop-out voice in babble noise
Hodoshima et al. Intelligibility of speech spoken in noise/reverberation for older adults in reverberant environments
Shobha et al. Influence of consonant-vowel intensity ratio on speech perception for hearing impaired listeners
Hodoshima Reverberation-induced speech improves intelligibility in reverberation: Effects of taker gender and speaking rate
Heracleous et al. The role of the Lombard reflex in parkinson's disease
Kulkarni et al. Multi-band frequency compression for sensorineural hearing impairment
Bauer et al. Digital speech signal processing to compensate severe sensory hearing deficits: The/s, z, C, t/transposer module in simulation-An overview and examples
Kohler et al. Cross-modal sensory boosting to improve high-frequency hearing loss

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16834702

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16834702

Country of ref document: EP

Kind code of ref document: A2