WO2021175390A1

WO2021175390A1 - Methods to assist verbal communication for both listeners and speakers

Info

Publication number: WO2021175390A1
Application number: PCT/DK2020/050058
Authority: WO
Inventors: Hiroki Sato
Original assignee: Hiroki Sato
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2021-09-10
Also published as: US20230122715A1

Abstract

Methods implemented in a system utilizing computing programs for a speaker and a listener in conversation are provided. Aspects include (i) a reminder provisioner for a speaker which is triggered according to speed, pitch or volume of the speaker's speech, (ii) a speech training provisioner for a speaker, and (iii) an application which records and plays back difficult conversation to understand.

Description

Title of the Invention : Methods to assist verbal communication for both listeners and speakers <Background>

Communication between hard-of-hearing people and normal hearing people can often be clunky and cause stress and frustration to both parties especially when something has to be repeated in the conversation.

While the burden in the communication should be shared by both people with hearing difficulty (as defined below) and normal hearing people, recent technological advancement seems to focus on developing surrounding 'hearing strategy', but not 'speaking strategy'. Here, it is possible to look at problem areas from the perspectives of both listeners and speakers.

For speakers, it is practically impossible to fully understand each listener's hearing difficulty as everyone hears differently. Speakers may not know how to speak properly or how their speech is understood by listeners. Also, even when people are aware of the necessity to speak more clearly in talking with a person with hearing difficulty such as hard-of-hearing person, people start to speak less clearly as conversation goes along.

From a listener's perspective, it is considered impolite in some culture to ask for repetition multiple times. Also, it becomes even more difficult to understand and ask for repetition in conversation where there are multiple people speaking.

Verbal communication assisting technique implementations described herein generally assist people with hearing difficulty and people who talk with them. People with hearing difficulty, as used herein, include people with less capability in listening to conversation due to physical constrain such as far distance and obstacle, people who are not proficient enough to hear and understand in the language of the conversation or people with hearing device / technology including hearing aid, cochlear implant, born anchored hearing aid and auditory brainstem implant. The invention comprises 3 functions of a system implemented by computing programs. Firstly, the system enables a listener to let a speaker know the fact that the speaker's speech is difficult to understand by the system on behalf of the listener by evaluating speed, pitch and volume of the speech. Secondly, the system helps speakers to speak in a proper way by giving them feedback, herein the users include people with hearing impairment such as the one having sensorineural hearing loss, who could have trouble understanding how to speak. Thirdly, the system records conversation and enables user to play back or save the audio data which is difficult to understand so that users can understand the missed conversation immediately or later. description of drawings>

FIG. 1 is a diagram illustrating one implementation, in simplified form, of a system framework for realizing the method of verbal communication for both a speaker and a listener in conversation;

SUBSTITUTE SHEETS (RULE 26) FIG. 2 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for providing reminders for speakers based on data evaluation of speech(Component 1);

FIG. 3 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for helping speakers understand how to speak properly or naturally by visualizing affecting matters(Component 2); and

FIG. 4 depicts a flow diagram of an exemplary implementation, in simplified form, of a process for recording and playing back audio data of a difficult conversation part(Component 3).

In the following description of verbal communication assisting technique implementations reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific implementations in which the verbal communication assisting technique can be practiced. It is understood that other implementations can be utilized and structural changes can be made without departing from the scope of the verbal communication assisting technique implementations.

FIG. 1 illustrates one implementation, in simplified form, of a system framework which comprises of multiple components of computing programs, which function independently and also dependently with each other that shares the same database in the system.

-Component 1

The system enables a listener (104) to let a speaker (102) know the fact that the speaker's speech is difficult to understand by the system on behalf of the listener by evaluating speed, pitch and volume of the speech.

For having input, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of- hearing people such as hearing aid and cochlear implant.

Upon receiving audio input (114, 116) from the microphone, a computing program in the system evaluates factors of the speech (202) as below:

Speech speed : voice data is translated into text after processing the data through transcription (audio-to-text) tool (such as Javascript SpeechRecognition API) and then the length of the text divided by the duration of speech calculates character/word per second, which can be used as a metric to evaluate the speed of speech.

Volume : voice data is translated into numeric value through a computing program such as sound volume detection program in p5.js library in JavaScript.

SUBSTITUTE SHEETS (RULE 26) Pitch : voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.

Each value is evaluated whether it is within range of minimum and maximum value and according to the evaluation, a feedback to a speaker (102) is triggered in the system (206).

In an example, the preset value for threshold values (minimum / maximum) is below, which can be configured by the user:

Speech speed (character per second) : In an example of English, 3 characters per second is set as the maximum value and no value is set for minimum value.

Volume : 45 dB for minimum value and 65 dB for maximum value.

Pitch : The duration of voice whose pitch range is within 1% has to be less than 30% (maximum value). In speech-language pathology, speaking with rich tone (rich change of pitch) is considered easy for hard-of-hearing people to understand.

The configuration of the threshold value can be done manually by the user, or automatically by the system which sets preset value of normally difficult sound to understand, or learns each user's hearing preference / capability from the audio data labeled as difficult as later described in Component 3.

Upon receiving trigger information by said evaluation, the system gives feedback (204) in such ways as below:

-The system gives speakers a haptic feedback through a wristband with a vibrator, a mobile phone or wearable devices (such as Apple Watch, FitBit or Galaxy Watch) which can be programmed to give vibration to user.

-The system gives speakers a visual feedback by showing numeric information or graphical representation on speech speed, pitch or volume through screen or user interface computing devices have, which include mobile phone, tablets or wearable devices among others.

-The system gives speakers an aural feedback by playing sound by a loudspeaker equipped in said computing devices.

FIG. 2 is a flow diagram of an exemplary implementation, in simplified form, of a process for providing reminders for speakers based on data evaluation of speech. Upon receiving audio input from a microphone (502), the system measure / calculate speed (504), volume (506) and pitch (508) of the speech in said ways, and if either of the values does not fit in threshold values (minimum / max) (510), the system gives a speaker (102) a feedback in said ways (512).

SUBSTITUTE SHEETS (RULE 26) -Component 2

Referring again to FIG. 1, the system visualizes pitch, speed or volume of the speech and gives a clue for a speaker (102) to understand how to speak properly or naturally so that listeners can easily understand. Especially for people having sensorineural hearing loss who could not understand the way of changing tone (pitch) of voice or speaking naturally, the visualization of pitch could be beneficial.

For having input, same as in Component 1, the system is operable with any type of end-user computing device (106, 108) which has a microphone (110, 112) such as a mobile phone, a portable computer, a wearable device (Apple Watch, Fitbit or Galaxy Watch among others) or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant.

Upon receiving audio data through the microphone (114), a computing program in the system evaluates factors of the speech as below (302):

Volume : voice data is translated into numeric value through a computing program such as volume detection program in p5.js library in JavaScript.

Pitch : voice data is translated into numeric value through a computing program such as pitch detection program in ml5.js library (CREPE) in JavaScript.

According to the value measured and calculated in said ways, a computing program creates content (304) such as charts / text / numeric information or graphical object in a browser program such as Google Chrome or FireFox (306, 308). For a user test, it is considered effective to interactively control the size of a graphical object by the volume of voice or surrounding sounds, and control its color by the pitch of voice.

The system can also have a gamification element utilizing graphical object representing speaker's speaking way by preparing a set of rules or a target line to attract more interest from users such as hard-of-hearing children.

The system can further have a speech training / coaching element advising users to change or keep their way of speaking according to evaluation.

FIG. 3 is a flow diagram of an exemplary implementation, in simplified form, of a process for helping speakers understand how to speak properly or naturally by visualizing aspects of speech. Upon receiving audio input from a microphone (602), the system measure / calculate

SUBSTITUTE SHEETS (RULE 26) speed (604), volume (606) and pitch (608) of the speech in the said way and the system gives a speaker a feedback (610) in said ways, such as showing a graphical object in a screen.

-Component 3

Referring again to FIG. 1, the system enables a user to record the conversation and play back audio data of a difficult conversation part which is classified by a user or by the system.

People with hearing difficulty can suffer understanding a sentence by missing one or more words. Even if what they missed is just a few words, they could find it hard and stressful to always ask for repeating. As a solution for such difficulty, the system lets a listener (104) understand missed conversation part by herself / himself.

In prior to use of this technology implementation, it should be agreed on recording conversation among participants in the conversation.

Upon receiving audio input (114, 116) through a microphone, a computing program in the system records (402) and divides the audio data in multiple small blocks. The system can record and upload audio blocks to a server through a computing program such as Recorder.js in JavaScript.

In conversation, when a listener finds it hard to hear, she/he can trigger the system (410) to save (404) and play back (406) the recent audio data which is short enough to comfortably listen back (414). In a user testing, 10 seconds was considered effective for the duration of the audio data to be played back, but a user can also change the duration of a play back. Also, rather than playing back the audio right away, a user can save / mark the difficult audio and play it back later (412).

The playing back / marking timing can also be triggered automatically by the system. The system can classify the audio data a user previously played back as a difficult sound and understand a user's personal hearing capability / preference through a machine learning process (408).

When playing back, a user can change the speed, volume or pitch of the conversation so that it is easier for the user to understand.

FIG. 4 is a flow diagram of an exemplary implementation, in simplified form, of a process for recording and playing back audio data of a difficult conversation part. Upon receiving audio input from a microphone (702), the system starts recording conversation (704), and if a user

SUBSTITUTE SHEETS (RULE 26) triggers the system (706) or the values of the voice get outside the range of set thresholds (708), the system sets out to give a feedback. If a user has not changed setting in the application (710), the system immediately plays back the audio data of recent conversation (714). If a user has changed setting in the application (710) and if the user prefers saving the audio data and listening back later, the user can later play back an audio file the system creates (716).

SUBSTITUTE SHEETS (RULE 26)

Claims

Claim 1.

A system for assisting speakers in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a speaker feedback; and one or more computer program which triggers feedback according to audio data said microphone receives, which is evaluated by one or more of: the speech speed calculated by character or word per second or other metrics; the pitch / frequency (Hz) and its transition; and the volume (dB).

Claim 2.

The system of claim 1, wherein a speaker gets feedback by one or more of: a haptic feedback by vibration through an electronic device users have; a visual feedback by screen or user interface of an electronic device users have; and an aural feedback by a speaker or an audio output device users have.

Claim 3.

The system of claim 1, wherein audio data is evaluated whether it fits within range of the minimum and the maximum value with regards to one or more of speech speed, pitch and volume.

Claim 4.

The system of claim 1, wherein the threshold values (minimum and maximum values) in evaluation of speakers' speech, which are utilized to determine when to give users feedback, are configured manually by users or by the system, which sets preset threshold values for usually difficult sound or by the system, which provides personalized values according to each user's hearing capability by utilizing the previously recorded audio data labeled as difficult and training the system as described in Claim 6.

Claim 5.

The system of claim 2, wherein said feedback to a speaker comprises speech training by visualization of one or more of pitch, speed and volume of a speaker's speech through a screen or other user interfaces, which shows one or more of: charts, text or numeric information regarding the speaker's speech; one of multiple graphical objects changing the color, size or shape; and content with the factor of gamification or coaching utilizing said 2 elements;

SUBSTITUTE SHEETS (RULE 26)

Claim 6.

A system for assisting listeners in conversation comprising: one or more microphone built in a device such as a phone, a portable computer, a wearable device or a hearing device for hard-of-hearing people such as hearing aid and cochlear implant, which is held by a speaker or a listener or is located in the environment; one or more electronic device a speaker or a listener brings to give a user a feedback; and one or more computer program which records conversation and lets the user play back or save the audio data which captures conversation part which is difficult to understand so that users can understand the missed conversation immediately or later when users have time to check back, through a trigger by users, by the system which sets preset threshold values in evaluating speech as described in Claim 1, or by the system which provides the personalized values for threshold according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.

Claim 7.

The system of Claim 6, wherein duration of said audio data to be played back or saved is short enough to comfortably check or listen back to, and the option of the duration of one audio data to be played back or saved comprises one of: a preset value by the system ranging between 0 second and 30 seconds; or a customized value configured by the user in the system.

Claim 8.

The system of Claim 6, wherein the configurable setting comprises changing one or more of speed, pitch and volume of said audio data to be played back or saved so that a user can understand better, by the user or by the system which provides the personalized values according to each user's hearing capability by utilizing previously recorded audio data labeled as difficult and training the system.

SUBSTITUTE SHEETS (RULE 26)