WO2021175390A1 - Methods to assist verbal communication for both listeners and speakers - Google Patents

Methods to assist verbal communication for both listeners and speakers Download PDF

Info

Publication number
WO2021175390A1
WO2021175390A1 PCT/DK2020/050058 DK2020050058W WO2021175390A1 WO 2021175390 A1 WO2021175390 A1 WO 2021175390A1 DK 2020050058 W DK2020050058 W DK 2020050058W WO 2021175390 A1 WO2021175390 A1 WO 2021175390A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
user
hearing
speech
audio data
Prior art date
Application number
PCT/DK2020/050058
Other languages
French (fr)
Inventor
Hiroki Sato
Original Assignee
Hiroki Sato
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hiroki Sato filed Critical Hiroki Sato
Priority to PCT/DK2020/050058 priority Critical patent/WO2021175390A1/en
Priority to US17/909,622 priority patent/US20230122715A1/en
Publication of WO2021175390A1 publication Critical patent/WO2021175390A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/04Devices for conversing with the deaf-blind
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/004Monitoring arrangements; Testing arrangements for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/008Visual indication of individual signal levels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2225/00Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
    • H04R2225/43Signal processing in hearing aids to enhance the speech intelligibility

Abstract

Methods implemented in a system utilizing computing programs for a speaker and a listener in conversation are provided. Aspects include (i) a reminder provisioner for a speaker which is triggered according to speed, pitch or volume of the speaker's speech, (ii) a speech training provisioner for a speaker, and (iii) an application which records and plays back difficult conversation to understand.

Description

Title of the Invention : Methods to assist verbal communication for both listeners and speakers <Background>
Communication between hard-of-hearing people and normal hearing people can often be clunky and cause stress and frustration to both parties especially when something has to be repeated in the conversation.
While the burden in the communication should be shared by both people with hearing difficulty (as defined below) and normal hearing people, recent technological advancement seems to focus on developing surrounding 'hearing strategy', but not 'speaking strategy'. Here, it is possible to look at problem areas from the perspectives of both listeners and speakers.
For speakers, it is practically impossible to fully understand each listener's hearing difficulty as everyone hears differently. Speakers may not know how to speak properly or how their speech is understood by listeners. Also, even when people are aware of the necessity to speak more clearly in talking with a person with hearing difficulty such as hard-of-hearing person, people start to speak less clearly as conversation goes along.
From a listener's perspective, it is considered impolite in some culture to ask for repetition multiple times. Also, it becomes even more difficult to understand and ask for repetition in conversation where there are multiple people speaking.
<Summary>
Verbal communication assisting technique implementations described herein generally assist people with hearing difficulty and people who talk with them. People with hearing difficulty, as used herein, include people with less capability in listening to conversation due to physical constrain such as far distance and obstacle, people who are not proficient enough to hear and understand in the language of the conversation or people with hearing device / technology including hearing aid, cochlear implant, born anchored hearing aid and auditory brainstem implant. The invention comprises 3 functions of a system implemented by computing programs. Firstly, the system enables a listener to let a speaker know the fact that the speaker's speech is difficult to understand by the system on behalf of the listener by evaluating speed, pitch and volume of the speech. Secondly, the system helps speakers to speak in a proper way by giving them feedback, herein the users include people with hearing impairment such as the one having sensorineural hearing loss, who could have trouble understanding how to speak. Thirdly, the system records conversation and enables user to play back or save the audio data which is difficult to understand so that users can understand the missed conversation immediately or later. description of drawings>
FIG. 1 is a diagram illustrating one implementation, in simplified form, of a system framework for realizing the method of verbal communication for both a speaker and a listener in conversation;
SUBSTITUTE SHEETS (RULE 26) FIG. 2 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for providing reminders for speakers based on data evaluation of speech(Component 1);
FIG. 3 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for helping speakers understand how to speak properly or naturally by visualizing affecting matters(Component 2); and
FIG. 4 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for recording and playing back audio data of a difficult conversation part(Component 3).
<Detailed Description>
In the following description of verbal communication assisting technique implementations reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific implementations in which the verbal communication assisting technique can be practiced. It is understood that other implementations can be utilized and structural changes can be made without departing from the scope of the verbal communication assisting technique implementations.
FIG. 1 illustrates one implementation, in simplified form, of a system framework which comprises of multiple components of computing programs, which function independently and also dependently with each other that shares the same database in the system.
-Component 1
The system enables a listener (104) to let a speaker (102) know the fact that the speaker's speech is difficult to understand by the system on behalf of the listener by evaluating speed, pitch and volume of the speech.
For having input, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of- hearing people such as hearing aid and cochlear implant.
Upon receiving audio input (114, 116) from the microphone, a computing program in the system evaluates factors of the speech (202) as below:
Speech speed : voice data is translated into text after processing the data through transcription (audio-to-text) tool (such as Javascript SpeechRecognition API) and then the length of the text divided by the duration of speech calculates character/word per second, which can be used as a metric to evaluate the speed of speech.
Volume : voice data is translated into numeric value through a computing program such as sound volume detection program in p5.js library in JavaScript.
SUBSTITUTE SHEETS (RULE 26) Pitch : voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.
Each value is evaluated whether it is within range of minimum and maximum value and according to the evaluation, a feedback to a speaker (102) is triggered in the system (206).
In an example, the preset value for threshold values (minimum / maximum) is below, which can be configured by the user:
Speech speed (character per second) : In an example of English, 3 characters per second is set as the maximum value and no value is set for minimum value.
Volume : 45 dB for minimum value and 65 dB for maximum value.
Pitch : The duration of voice whose pitch range is within 1% has to be less than 30% (maximum value). In speech-language pathology, speaking with rich tone (rich change of pitch) is considered easy for hard-of-hearing people to understand.
The configuration of the threshold value can be done manually by the user, or automatically by the system which sets preset value of normally difficult sound to understand, or learns each user's hearing preference / capability from the audio data labeled as difficult as later described in Component 3.
Upon receiving trigger information by said evaluation, the system gives feedback (204) in such ways as below:
-The system gives speakers a haptic feedback through a wristband with a vibrator, a mobile phone or wearable devices (such as Apple Watch, FitBit or Galaxy Watch) which can be programmed to give vibration to user.
-The system gives speakers a visual feedback by showing numeric information or graphical representation on speech speed, pitch or volume through screen or user interface computing devices have, which include mobile phone, tablets or wearable devices among others.
-The system gives speakers an aural feedback by playing sound by a loudspeaker equipped in said computing devices.
FIG. 2 is a flow diagram of an exemplary implementation, in simplified form, of a process for providing reminders for speakers based on data evaluation of speech. Upon receiving audio input from a microphone (502), the system measure / calculate speed (504), volume (506) and pitch (508) of the speech in said ways, and if either of the values does not fit in threshold values (minimum / max) (510), the system gives a speaker (102) a feedback in said ways (512).
SUBSTITUTE SHEETS (RULE 26) -Component 2
Referring again to FIG. 1, the system visualizes pitch, speed or volume of the speech and gives a clue for a speaker (102) to understand how to speak properly or naturally so that listeners can easily understand. Especially for people having sensorineural hearing loss who could not understand the way of changing tone (pitch) of voice or speaking naturally, the visualization of pitch could be beneficial.
For having input, same as in Component 1, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.
Upon receiving audio data through the microphone (114), a computing program in the system evaluates factors of the speech as below (302):
Speech speed : voice data is translated into text after processing the data through transcription (audio-to-text) tool (such as Javascript SpeechRecognition API) and then the length of the text divided by the duration of speech calculates character/word per second, which can be used as a metric to evaluate the speed of speech.
Volume : voice data is translated into numeric value through a computing program such as volume detection program in p5.js library in JavaScript.
Pitch : voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.
According to the value measured and calculated in said ways, a computing program creates content (304) such as charts / text / numeric information or graphical object in a browser program such as Google Chrome or FireFox (306, 308). For a user test, it is considered effective to interactively control the size of a graphical object by the volume of voice or surrounding sounds, and control its color by the pitch of voice.
The system can also have a gamification element utilizing graphical object representing speaker's speaking way by preparing a set of rules or a target line to attract more interest from users such as hard-of-hearing children.
The system can further have a speech training / coaching element advising users to change or keep their way of speaking according to evaluation.
FIG. 3 is a flow diagram of an exemplary implementation, in simplified form, of a process for helping speakers understand how to speak properly or naturally by visualizing aspects of speech. Upon receiving audio input from a microphone (602), the system measure / calculate
SUBSTITUTE SHEETS (RULE 26) speed (604), volume (606) and pitch (608) of the speech in the said way and the system gives a speaker a feedback (610) in said ways, such as showing a graphical object in a screen.
-Component 3
Referring again to FIG. 1, the system enables a user to record the conversation and play back audio data of a difficult conversation part which is classified by a user or by the system.
People with hearing difficulty can suffer understanding a sentence by missing one or more words. Even if what they missed is just a few words, they could find it hard and stressful to always ask for repeating. As a solution for such difficulty, the system lets a listener (104) understand missed conversation part by herself / himself.
For having input, same as in Component 1, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.
In prior to use of this technology implementation, it should be agreed on recording conversation among participants in the conversation.
Upon receiving audio input (114, 116) through a microphone, a computing program in the system records (402) and divides the audio data in multiple small blocks. The system can record and upload audio blocks to a server through a computing program such as Recorder.js in JavaScript.
In conversation, when a listener finds it hard to hear, she/he can trigger the system (410) to save (404) and play back (406) the recent audio data which is short enough to comfortably listen back (414). In a user testing, 10 seconds was considered effective for the duration of the audio data to be played back, but a user can also change the duration of a play back. Also, rather than playing back the audio right away, a user can save / mark the difficult audio and play it back later (412).
The playing back / marking timing can also be triggered automatically by the system. The system can classify the audio data a user previously played back as a difficult sound and understand a user's personal hearing capability / preference through a machine learning process (408).
When playing back, a user can change the speed, volume or pitch of the conversation so that it is easier for the user to understand.
FIG. 4 is a flow diagram of an exemplary implementation, in simplified form, of a process for recording and playing back audio data of a difficult conversation part. Upon receiving audio input from a microphone (702), the system starts recording conversation (704), and if a user
SUBSTITUTE SHEETS (RULE 26) triggers the system (706) or the values of the voice get outside the range of set thresholds (708), the system sets out to give a feedback. If a user has not changed setting in the application (710), the system immediately plays back the audio data of recent conversation (714). If a user has changed setting in the application (710) and if the user prefers saving the audio data and listening back later, the user can later play back an audio file the system creates (716).
SUBSTITUTE SHEETS (RULE 26)

Claims

Claim 1.
A system for assisting speakers in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a speaker feedback; and one or more computer program which triggers feedback according to audio data said microphone receives, which is evaluated by one or more of: the speech speed calculated by character or word per second or other metrics; the pitch / frequency (Hz) and its transition; and the volume (dB).
Claim 2.
The system of claim 1, wherein a speaker gets feedback by one or more of: a haptic feedback by vibration through an electronic device users have; a visual feedback by screen or user interface of an electronic device users have; and an aural feedback by a speaker or an audio output device users have.
Claim 3.
The system of claim 1, wherein audio data is evaluated whether it fits within range of the minimum and the maximum value with regards to one or more of speech speed, pitch and volume.
Claim 4.
The system of claim 1, wherein the threshold values (minimum and maximum values) in evaluation of speakers' speech, which are utilized to determine when to give users feedback, are configured manually by users or by the system, which sets preset threshold values for usually difficult sound or by the system, which provides personalized values according to each user's hearing capability by utilizing the previously recorded audio data labeled as difficult and training the system as described in Claim 6.
Claim 5.
The system of claim 2, wherein said feedback to a speaker comprises speech training by visualization of one or more of pitch, speed and volume of a speaker's speech through a screen or other user interfaces, which shows one or more of: charts, text or numeric information regarding the speaker's speech; one of multiple graphical objects changing the color, size or shape; and content with the factor of gamification or coaching utilizing said 2 elements;
SUBSTITUTE SHEETS (RULE 26)
Claim 6.
A system for assisting listeners in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a user a feedback; and one or more computer program which records conversation and lets the user play back or save the audio data which captures conversation part which is difficult to understand so that users can understand the missed conversation immediately or later when users have time to check back, through a trigger by users, by the system which sets preset threshold values in evaluating speech as described in Claim 1, or by the system which provides the personalized values for threshold according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.
Claim 7.
The system of Claim 6, wherein duration of said audio data to be played back or saved is short enough to comfortably check or listen back to, and the option of the duration of one audio data to be played back or saved comprises one of: a preset value by the system ranging between 0 second and 30 seconds; or a customized value configured by the user in the system.
Claim 8.
The system of Claim 6, wherein the configurable setting comprises changing one or more of speed, pitch and volume of said audio data to be played back or saved so that a user can understand better, by the user or by the system which provides the personalized values according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.
SUBSTITUTE SHEETS (RULE 26)
PCT/DK2020/050058 2020-03-04 2020-03-04 Methods to assist verbal communication for both listeners and speakers WO2021175390A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/DK2020/050058 WO2021175390A1 (en) 2020-03-04 2020-03-04 Methods to assist verbal communication for both listeners and speakers
US17/909,622 US20230122715A1 (en) 2020-03-04 2020-03-04 Methods to assist verbal communication for both listeners and speakers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/DK2020/050058 WO2021175390A1 (en) 2020-03-04 2020-03-04 Methods to assist verbal communication for both listeners and speakers

Publications (1)

Publication Number Publication Date
WO2021175390A1 true WO2021175390A1 (en) 2021-09-10

Family

ID=70189640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2020/050058 WO2021175390A1 (en) 2020-03-04 2020-03-04 Methods to assist verbal communication for both listeners and speakers

Country Status (2)

Country Link
US (1) US20230122715A1 (en)
WO (1) WO2021175390A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109224A1 (en) * 2006-11-02 2008-05-08 Motorola, Inc. Automatically providing an indication to a speaker when that speaker's rate of speech is likely to be greater than a rate that a listener is able to comprehend
US7653543B1 (en) * 2006-03-24 2010-01-26 Avaya Inc. Automatic signal adjustment based on intelligibility
US20120215532A1 (en) * 2011-02-22 2012-08-23 Apple Inc. Hearing assistance system for providing consistent human speech
US20160210982A1 (en) * 2015-01-16 2016-07-21 Social Microphone, Inc. Method and Apparatus to Enhance Speech Understanding
CN207010905U (en) * 2017-07-28 2018-02-13 恩平市恒胜电子科技有限公司 Speech training microphone
US20190122663A1 (en) * 2017-09-06 2019-04-25 Healables Ltd. Non-verbal speech coach

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653543B1 (en) * 2006-03-24 2010-01-26 Avaya Inc. Automatic signal adjustment based on intelligibility
US20080109224A1 (en) * 2006-11-02 2008-05-08 Motorola, Inc. Automatically providing an indication to a speaker when that speaker's rate of speech is likely to be greater than a rate that a listener is able to comprehend
US20120215532A1 (en) * 2011-02-22 2012-08-23 Apple Inc. Hearing assistance system for providing consistent human speech
US20160210982A1 (en) * 2015-01-16 2016-07-21 Social Microphone, Inc. Method and Apparatus to Enhance Speech Understanding
CN207010905U (en) * 2017-07-28 2018-02-13 恩平市恒胜电子科技有限公司 Speech training microphone
US20190122663A1 (en) * 2017-09-06 2019-04-25 Healables Ltd. Non-verbal speech coach

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SEETHARAMAN PREM PREM@U NORTHWESTERN EDU ET AL: "VoiceAssist Guiding Users to High-Quality Voice Recordings", HUMAN FACTORS IN COMPUTING SYSTEMS, ACM, 2 PENN PLAZA, SUITE 701NEW YORKNY10121-0701USA, 2 May 2019 (2019-05-02), pages 1 - 6, XP058449451, ISBN: 978-1-4503-5970-2, DOI: 10.1145/3290605.3300539 *

Also Published As

Publication number Publication date
US20230122715A1 (en) 2023-04-20

Similar Documents

Publication Publication Date Title
US10475467B2 (en) Systems, methods and devices for intelligent speech recognition and processing
Cowie et al. Postlingually acquired deafness: speech deterioration and the wider consequences
US5765134A (en) Method to electronically alter a speaker&#39;s emotional state and improve the performance of public speaking
Vlaming et al. HearCom: Hearing in the communication society
Vandali et al. Training of cochlear implant users to improve pitch perception in the presence of competing place cues
JP2008500573A (en) Method and system for changing messages
Woods et al. Predicting the effect of hearing loss and audibility on amplified speech reception in a multi-talker listening scenario
Best et al. Development and preliminary evaluation of a new test of ongoing speech comprehension
US10334376B2 (en) Hearing system with user-specific programming
Keidser et al. Cognitive spare capacity: evaluation data and its association with comprehension of dynamic conversations
Kreitewolf et al. Perceptual grouping in the cocktail party: Contributions of voice-feature continuity
US20100235169A1 (en) Speech differentiation
WO2014077182A1 (en) Mobile information terminal, shadow speech management method, and computer program
US20230122715A1 (en) Methods to assist verbal communication for both listeners and speakers
Meyer et al. Improving museum docents' communication skills
CN111818418A (en) Earphone background display method and system
Villegas et al. Effects of task and language nativeness on the Lombard effect and on its onset and offset timing
James et al. The French MBAA2 sentence recognition in noise test for cochlear implant users
Fu et al. Recognition of simulated telephone speech by cochlear implant users
Kisenwether et al. Cell Yell!: health risks in Telehealth
Heldner et al. Is breathing silence?
Rakerd On making oral histories more accessible to persons with hearing loss
JP2009000248A (en) Game machine
Andersen Speech intelligibility prediction for hearing aid systems
Chilo Implementing UX/UI in a real-time speech recognition and translation application in Android and a clinical study

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20717090

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20717090

Country of ref document: EP

Kind code of ref document: A1