WO2018108284A1 - Audio recording device for presenting audio speech missed due to user not paying attention and method thereof - Google Patents

Audio recording device for presenting audio speech missed due to user not paying attention and method thereof Download PDF

Info

Publication number
WO2018108284A1
WO2018108284A1 PCT/EP2016/081229 EP2016081229W WO2018108284A1 WO 2018108284 A1 WO2018108284 A1 WO 2018108284A1 EP 2016081229 W EP2016081229 W EP 2016081229W WO 2018108284 A1 WO2018108284 A1 WO 2018108284A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
speech
recording device
paying attention
audio recording
Prior art date
Application number
PCT/EP2016/081229
Other languages
French (fr)
Inventor
Matthew John LAWRENSON
Jan Jasper VAN DEN BERG
Jacob STRÖM
Lars Andersson
Till BURKERT
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2016/081229 priority Critical patent/WO2018108284A1/en
Publication of WO2018108284A1 publication Critical patent/WO2018108284A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/25Bioelectric electrodes therefor
    • A61B5/279Bioelectric electrodes therefor specially adapted for particular uses
    • A61B5/291Bioelectric electrodes therefor specially adapted for particular uses for electroencephalography [EEG]
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/377Electroencephalography [EEG] using evoked responses
    • A61B5/38Acoustic or auditory stimuli
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6801Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient specially adapted to be attached to or worn on the body surface
    • A61B5/6813Specially adapted to be attached to a specific body part
    • A61B5/6814Head
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6801Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient specially adapted to be attached to or worn on the body surface
    • A61B5/6813Specially adapted to be attached to a specific body part
    • A61B5/6814Head
    • A61B5/6815Ear
    • A61B5/6816Ear lobe
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6801Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient specially adapted to be attached to or worn on the body surface
    • A61B5/6813Specially adapted to be attached to a specific body part
    • A61B5/6814Head
    • A61B5/6815Ear
    • A61B5/6817Ear canal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7246Details of waveform analysis using correlation, e.g. template matching or determination of similarity
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/034Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • G11B27/32Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording on separate auxiliary tracks of the same or an auxiliary record carrier
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
    • A61B5/168Evaluating attention deficit, hyperactivity
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/74Details of notification to user or communication with user or patient ; user input means
    • A61B5/746Alarms related to a physiological condition, e.g. details of setting alarm thresholds or avoiding false alarms

Abstract

An audio recording device (100, 150) is provided. The audio recording device is operative to detect that a user (110) is paying attention to speech (121) captured by a microphone (101), record the captured speech to which the user is paying attention, detect that the user has stopped paying attention, and render a representation of the recorded speech starting at a point in time when the user has stopped paying attention. The representation of the recorded speech may either be rendered audibly, or by transcribing the recorded speech into text and displaying the transcribed text (123), or a summary thereof, to the user. The audio recording device is operative to detect that the user is paying attention to the captured speech by acquiring Electroencephalography (EEG) data captured by electrodes (102) which are attached to a body part of the user, calculating a correlation between the acquired EEG data and the captured speech, and determining that the user is paying attention to the captured speech if the calculated correlation is larger than an upper threshold value.

Description

AUDIO RECORDING DEVICE FOR PRESENTING AUDIO SPEECH M ISSED DUE TO USER NOT PAYING ATTENTION AND METHOD THEREOF
Technical field The invention relates to an audio recording device, a method of an audio recording device, a corresponding computer program, and a corresponding computer program product.
Background
Known solutions for alleviating a person from taking notes during an oral presentation he or she is attending, such as a lecture, or during a meeting or discussion with other persons, include software applications which record audio, and optionally video, and automatically take notes. The recorded presentation and/or notes can subsequently be reviewed for the purpose of retrieving information from parts of the lecture or presentation which the user has missed, e.g., due to a lack of attention. In addition, the recorded presentation may be transcribed into text and optionally
summarized. An example is the Audio Notetaker software by Sonocent (https://www.sonocent.com/en-us/audio-notetaker, retrieved on 1 December 2016).
The known solutions for transcribing a recorded presentation into text oftentimes rely on speech recognition for automatically translating speech into text. The accuracy rate of such systems can be high, exceeding 90%, though this depends strongly on conditions such as background noise, quality of the recorded audio, characteristics of the recorded voice, and vocabulary used by the speaker. An increasing interest in speech recognition techniques is driven by recent developments in voice assistants such as Apple's Siri, Google Assistant, and Microsoft Cortana. The automatic creation of a summary from a transcription of a recorded presentation typically involves Natural Language Processing (NLP) techniques. One of the main aspects of such solutions is to identify the most relevant information within a text. Various algorithms are available to determine keywords, making use of statistical properties of the text, such as word frequency and word co-occurrence in a corpus. One approach to automatic summarization is to select a subset of all sentences and take those that are likely to be most informative, using general statistical rules, document specific queries keywords, or machine learning techniques which rely on supervised key-phrase extraction.
The existing solutions require the user to review the recorded presentation, or the transcribed text, in order to find information which the user has missed due to a lack of attention. Depending on the length of the recorded presentation or the transcribed text, considerable time is required for identifying the missing information.
Summary
It is an object of the invention to provide an improved alternative to the above techniques and prior art.
More specifically, it is an object of the invention to provide an improved solution for assisting a person attending an oral presentation or partaking in a discussion to acquire information from the presentation which has been missed due to a lack of attention.
These and other objects of the invention are achieved by means of different aspects of the invention, as defined by the independent claims. Embodiments of the invention are characterized by the dependent claims.
According to a first aspect of the invention, an audio recording device is provided. The audio recording device comprises processing means which is operative to detect that a user of the audio recording device is paying attention to speech which is captured by a microphone operatively connected to the audio recording device, and record the captured speech to which the user is paying attention. The processing means is further operative to detect that the user has stopped paying attention to the captured speech, and render a representation of the recorded speech starting at a point in time when the user has stopped paying attention to the captured speech.
According to a second aspect of the invention, a method performed by an audio recording device is provided. The method comprises detecting that a user of the audio recording device is paying attention to speech captured by a microphone operatively connected to the audio recording device, and recording the captured speech to which the user is paying attention. The method further comprises detecting that the user has stopped paying attention to the captured speech, and rendering a representation of the recorded speech starting at a point in time when the user has stopped paying attention to the captured speech.
According to a third aspect of the invention, a computer program is provided. The computer program comprises computer-executable
instructions for causing a device to perform the method according to an embodiment of the second aspect of the invention, when the computer- executable instructions are executed on a processing unit comprised in the device.
According to a fourth aspect of the invention, a computer program product is provided. The computer program product comprises a computer- readable storage medium which has the computer program according to the third aspect of the invention embodied therein.
The invention makes use of an understanding that people which attend a presentation, e.g., listen to a speaker presenting a speech, attend a lecture, take part in a meeting or a discussion, can be assisted in acquiring information which they have missed due to a lack of attention to speech by recording speech which they pay attention to, detecting that they have stopped paying attention to the speech, and subsequently rendering a presentation of the recorded speech, either audibly or as written text. The rendered representation of the recorded speech may start at a point in time when the user has stopped paying attention, or shortly before that.
Alternatively, the rendered representation of the recorded speech may start at a point in time well before the user has stopped paying attention, in particular at a point in time when the user still is paying attention.
Embodiments of the invention are advantageous in that people following a presentation are alleviated from reviewing, either by listening or reading, a potentially extensive amount of recorded or transcribed material in order to find the information which they have missed.
In the present context, an audio recording device may, e.g., be embodied as a mobile phone, a smartphone, a tablet, a personal computer, a laptop, a Brain-Computer Interface (BCI) headset, an in-ear
Electroencephalography (EEG) device, or an around-ear EEG device.
According to an embodiment of the invention, the representation of the recorded speech is rendered by audibly rendering the recorded speech. In other words, the recorded speech is re-played to the user. In this way, the user can listen to the part of the presentation, recorded as speech, which he/she has missed due to lack of attention. The recorded speech is rendered using an acoustic transducer, e.g., a loudspeaker, a headphone, earphones, or earbuds, which are operatively connected to the audio recording device. Optionally, the recorded speech is audibly rendered, i.e., re-played, at an increased speed. Thereby, the delay, or gap, between what the user hears and the speech which is currently uttered by the speaker is continuously reduced until the user hears the speech in real time, i.e., until the user "catches up". This is advantageous in that the user can react to the presentation, e.g., ask a question, make a comment, or laugh, at an appropriate time. Alternatively, or additionally, the audio recording device can "catch up" with the real presentation by skipping silent gaps in the recorded speech when the audio recording device audibly renders the recorded speech.
According to another embodiment of the invention, the representation of the recorded speech is rendered by transcribing the recorded speech into text, using a speech-recognition algorithm, and displaying the transcribed text to the user, e.g., on a built-in display of the audio recording device or an external display. Optionally, a summary of the transcribed text is generated, using an NLP algorithm, and displayed to the user instead of, or in addition to, the transcribed text.
According to an embodiment of the invention, the user is alerted in response to detecting that the user has stopped paying attention to the captured speech. For instance, this may be achieved by means of an audible or a haptic notification. Advantageously, by notifying the user, the period of time during which the user is not paying attention to the speech is reduced, as is the duration of the representation of the speech which needs to be rendered.
According to an embodiment of the invention, the representation of the recorded speech is rendered in response to detecting that the user has resumed paying attention to the captured speech. In other words, the representation of the recorded speech is rendered as soon as the user is paying attention again, e.g., in response to an alert or a notification by the audio recording device.
According to another embodiment of the invention, the representation of the recorded speech is rendered in response to receiving from the user a request to render a representation of the recorded speech. The request may, e.g., be a spoken instruction, a pressed button, a gesture, or the like.
Optionally, if it is detected that the user has stopped paying attention to the captured speech at several occasions, the user may select at which of the occasions the rendering the representation of the recorded speech should start. For instance, with each request received from the user, the audio recording device may "skip back" to a preceding occasion at which the user has lost attention. In that way, the user is supported in selecting a suitable starting point for rendering the representation of the recorded speech.
According to an embodiment of the invention, it is detected that the user is paying attention, or has resumed paying attention, to the captured speech by acquiring EEG data which is captured by electrodes which are operatively connected to the audio recording device and which are attached to a body part of the user, calculating a correlation between the acquired EEG data and the captured speech, and determining that the user is paying attention to the captured speech if the calculated correlation is larger than an upper threshold value. For instance, the EEG data may be captured from electrodes which are attached to the scalp of the user, from electrodes which are attached to the skin within the ear-channel of the user, or from electrodes which are attached to the skin around the ear of the user. Preferably, it is determined that the user is not paying attention, or has stopped paying attention, to the captured speech if the calculated correlation is smaller than a lower threshold value.
Even though advantages of the invention have in some cases been described with reference to embodiments of the first aspect of the invention, corresponding reasoning applies to embodiments of other aspects of the invention.
Further objectives of, features of, and advantages with, the invention will become apparent when studying the following detailed disclosure, the drawings, and the appended claims. Those skilled in the art realize that different features of the invention can be combined to create embodiments other than those described in the following. Brief description of the drawings
The above, as well as additional objects, features and advantages of the invention, will be better understood through the following illustrative and non-limiting detailed description of embodiments of the invention, with reference to the appended drawings, in which:
Fig. 1 shows audio recording devices, in accordance with
embodiments of the invention.
Fig. 2 shows an audio recording device, in accordance with another embodiment of the invention.
Fig. 3 shows an audio recording device, in accordance with a further embodiment of the invention.
Fig. 4 shows an audio recording device, in accordance with yet another embodiment of the invention.
Fig. 5 illustrates determining user attention to speech using EEG data, in accordance with embodiments of the invention.
Fig. 6 shows an embodiment of the processing means comprised in the audio recording device.
Fig. 7 shows another embodiment of the processing means comprised in the audio recording device.
Fig. 8 shows a method of an audio recording device, in accordance with embodiments of the invention.
All the figures are schematic, not necessarily to scale, and generally only show parts which are necessary in order to elucidate the invention, wherein other parts may be omitted or merely suggested.
Detailed description
The invention will now be described more fully herein after with reference to the accompanying drawings, in which certain embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In Fig. 1 , an embodiment 100 of the audio recording device is illustrated as a tablet, a smartphone, or a phablet (a device which is intermediate in size between that of a smartphone and that of a tablet). Audio recording device 100 is illustrated to comprise a microphone 101 , an acoustic transducer 103, e.g., a loudspeaker, processing means 104, a
communications module 105, and a display 107, e.g., a touchscreen.
Communications module 105 is operative to effect wireless
communications through a Wireless Local Arena Network (WLAN)/Wi-Fi network, Bluetooth, ZigBee, or any other short-range communications technology. Alternatively, or additionally, communications module 105 may further be operative to effect wireless communications with a Radio Access Network (RAN) or with another compatible device, based on a cellular telecommunications technique such as the Global System for Mobile communications (GSM), Universal Mobile Telecommunications System (UMTS), Long Term Evolution (LTE), or any 5G standard.
Audio recording device 100 is operative to detect that a user 1 10 of audio recording device 100 is paying attention to speech 121 originating from a speaker 120, or one 120 of several people which are involved in a discussion, e.g., during a discussion or a meeting. Speech 121 may, e.g., be captured by built-in microphone 101 . Alternatively, speech 121 may be captured by an external microphone which is operatively connected to audio recording device 100. For instance, this may be a microphone which speaker 120 is wearing or which is provided close to speaker 120 so as to capture speech 121 , e.g., for the purpose of recording and/or broadcasting a presentation 122 which speaker 120 is presenting. If speech 121 is captured by built-in microphone 101 , or an external microphone which is connected to audio recording device 100, e.g., using a headset jack which audio recording device 100 is provided with (not illustrated in Fig. 1 ), captured speech 121 is transformed by an audio codec into a digital format for further processing, e.g., any known audio format such as WAV, AIFF, AU, PCM, FLAC, ATRAC, ALAC, MPEG, WMA, or the like. Alternatively, speech 121 may also be retrieved by audio recording device 100 encoded into an audio format, e.g., if speech 121 is captured by an external microphone which speaker 120 is wearing or which is provided close to speaker 120. In this case, captured speech 121 may be retrieved from the external microphone via
communications module 105, either by streaming or as chunks of data.
Audio recording device 100 is further operative to record captured speech 121 to which user 1 10 is paying attention, i.e., to at least temporarily store captured speech 121 in audio recording device 100 for subsequent use, e.g., using a memory comprised in audio recording device 100. Even further, audio recording device 100 is operative to detect that user 1 10 has stopped paying attention to captured speech 121 , and render a representation of recorded speech 121 starting at a point in time when user 1 10 has stopped paying attention to captured speech 121 . In the present context, the representation of recorded speech 1 10 may, e.g., be an audible
representation of the recorded speech 121 , or a text-based representation of recorded speech 121 , as is described further below. Preferably, audio recording device 100 is operative to render the representation of recorded speech 121 in response to detecting that user 1 10 has resumed paying attention to captured speech 121 , i.e., when user 1 10 is paying attention again and is ready to listen to, or read, what he or she has missed due to a lack of attention to speech 121 . Optionally, audio recording device 100 may further be operative to alert user 1 10 in response to detecting that user 1 10 has stopped paying attention to captured speech 121 . For instance, this may be achieved by rendering a sound, using loudspeaker 103 or any external acoustic transducer, or by a haptic notification, e.g., a vibration.
Advantageously, the period of time during which user 1 10 is not paying attention to speech 121 is thereby minimized.
Alternatively, audio recording device 100 may be operative to render the representation of recorded speech 121 in response to receiving a request to render a representation of recorded speech 121 . The request is received from user 1 10 and may, e.g., be a spoken instruction, a pressed button, a gesture, or the like. Advantageously, user 1 10 may thereby control the rendering of a representation of recorded speech 121 , e.g., by pressing a button provided on audio recording device 100, by pressing a button operatively connected to audio recording device 100, such as a button provided on a headset or headphones, by uttering an instruction, by performing a gesture with a hand or another body part (e.g., the head of user 1 10), or by shaking audio recording device 100. Moreover, audio recording device 100 may be operative to detect that user 1 10 has stopped paying attention to captured speech 121 at multiple occasions and store information pertaining to these occasions. The stored information may subsequently be utilized for enabling user 1 10 to select at which one of the multiple occasions, i.e., at which point in time when user 1 10 has stopped paying attention to speech 121 the rendering of a representation of recorded speech 121 should start. Thereby, user 1 10 may skip back and forth between the multiple occasions in order to find a suitable occasion from which user 1 10 prefers to have audio recording device 100 render a representation of recorded speech 121 .
Optionally, the information pertaining to the occasions at which user 1 10 has stopped paying attention may be presented to user 1 10 so as to facilitate selecting a suitable occasion for rendering a representation of recorded speech 121 . This is illustrated in Fig. 4, in which another
embodiment 400 of the audio recording device is shown. Audio recording device 400 is similar to audio recording device 100, but is further operative to display a list 410 to user 1 10 using display 107, or an external display, indicating, for each occasion, a time 41 1 which has lapsed since user 1 10 has stopped paying attention, and optionally one or more keywords 412 or a short summary of the topic covered by speech 121 when user 1 10 stopped paying attention. User 1 10 may select one of the listed occasions, e.g., by pressing a button 413 which is provided in association with each occasion.
In the present context, it is to be understood that paying attention to speech implies that a listener, such as user 1 10, actively follows, or attempts to follow, what is being said by another person, e.g., speech 121 uttered by speaker 120. Likewise, a lack of attention implies that the listener does not actively follow what is being said by another person. Lack of attention may have different causes, e.g., the listener may be tired or have difficulties focusing on speech 121 for other reasons.
Further with reference to Fig. 1 , audio recording device 100 may, e.g., be operative to detect that user 1 10 is paying attention to captured
speech 1 10 by acquiring EEG data which is captured by electrodes which are attached to a body part of user 1 10. For instance, the EEG data may be acquired from a BCI headset 150 which user 1 10 is wearing, comprising electrodes 102 which are arranged for contacting a scalp and/or forehead of user 1 10 and capturing nerve signals from user 1 10. The EEG data may, e.g., be acquired from BCI headset 150 via communications module 105 and a similar communications module 105 comprised in BCI headset 150.
More specifically, audio recording device 100 may be operative to calculate a value for the correlation between the acquired EEG data and captured speech 121 . Based on the calculated correlation, it is determined whether user 1 10 is paying attention to captured speech 121 , or not. More specifically, if the calculated correlation is larger than an upper threshold value, it is determined that user 1 10 is paying attention to captured
speech 121 . Optionally, it may be determined that user 1 10 is not paying attention to captured speech 121 if the calculated correlation is smaller than a lower threshold value. In particular, it can be determined that user 1 10 is not paying attention to captured speech 121 at all, so that captured speech 121 is not recorded. Moreover, it can be determined that user 1 10 has stopped paying attention to speech 121 which he/she initially has paid attention to, which eventually triggers audio recording device 100 to render a representation of recorded speech 121 . It will be appreciated that the upper threshold value and the lower threshold value may either be configured by a manufacturer of audio recording device 100 or by user 1 10.
In the following, the process of detecting that user 1 10 is paying attention to speech 121 , has resumed paying attention to speech 121 , or has stopped paying attention to speech 121 , is described in more detail.
EEG is a technique which can be used for detecting brain activity by placing electrodes on a subject's scalp and other parts of the subject's head, e.g., in the ear channel and around the ear. These electrodes are used for measuring small electric potentials which are generated by action potentials of firing neurons, which are electrochemical excitations caused by the creation of an ion current in the cell's axon to activate connected cells through the synapses. Whereas the most common method to capture EEG signals is by placing the electrodes directly on the scalp of the subject, as is illustrated in Fig. 1 , it has recently been demonstrated that EEG signals from within the subject's ear channel may be detected and give robust results which have a similar sensitivity as on-scalp devices (see, e.g., "Reference Configurations for Ear-EEG Steady-State Responses", by S. L. Kappel, C. B. Christensen, K. B. Mikkelsen, and P. Kidmose, 38th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5689- 5692, 2016).
Useful signals which can be measured with the use of EEG are event- related potentials, which are signals that are caused by certain events (e.g., motor event) or external stimuli to the subject. An example is the Auditory Evoked Potential (AEP), which is specific neural activity arising from acoustic stimulation (see, e.g., "Auditory evoked potentials", by N. Kraus and T. Nicol, Encyclopedia of Neuroscience, pages 214-218, Springer, 2009). These signals may suffer from a poor signal-to-noise ratio, the noise being the EEG baseline and other influences such as bio-signals originating from face muscle contractions or eye blinks, and external electromagnetic noise.
Therefore, these type of signals usually require averaging over several measurements or over some time interval.
It has been shown that it is also possible to detect brain activity caused by a subject's attention to auditory speech from a single, continuous measurement over a longer time period ("Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG", by
J. A. O'Sullivan, A. J. Power, N. Mesgarani, S. Rajaram, J. J. Foxe,
B. G. Shinn-Cunningham, M. Slaney, S. A. Shamma, and E. C. Lalor, Cerebral Cortex, vol. 25, pages 1697-1706, Oxford University Press, 2014), even in the presence of a second auditory speech input to which the user is not paying attention. This may be achieved continuous monitoring auditory stimuli, e.g., captured speech 121 , and the resulting neural data, i.e., captured EEG data, and calculating a correlation between the amplitude envelopes for both signals.
To this end, the temporal brain activity related to auditory/speech input of user 1 10 and the sound input to user 1 10, including speech 121 , are simultaneously monitored using an EEG device, e.g., BCI headset 150 capturing EEG data, and microphone 101 comprised in audio recording device 100. Then, amplitude envelopes are derived for captured speech 121 and the acquired EEG data, respectively, and a correlation between the two envelopes is calculated over a pre-determined time window, of the order of one second. This is illustrated in Fig. 5, which is a partial reproduction from Figure 1 of "Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG", by J. A. O'Sullivan, A. J. Power,
N. Mesgarani, S. Rajaram, J. J. Foxe, B. G. Shinn-Cunningham, M. Slaney, S. A. Shamma, and E. C. Lalor, Cerebral Cortex, vol. 25, pages 1697-1706, Oxford University Press, 2014.
In Fig. 5, the amplitude envelope 51 1 of captured audio data
("Attended Speech", e.g., speech 121 which user 1 10 is paying attention to, is illustrated. In addition, the amplitude envelope 512 of a second auditory input ("Unattended Speech") to which user 1 10 is not paying attention is also illustrated. This may, e.g., be speech to which user 1 10 is not listening, typically originating from a person other than speaker 120. For both amplitude envelopes 51 1 and 512, a respective correlation with the amplitude envelope 513 derived for EEG data acquired from user 1 10, e.g., using BCI headset 150, is calculated.
As is known in the art, correlation is a statistical relationship which reflects the extent to which two random variables, such as auditory input and EEG data, are related with each other. The correlation between two random variables is commonly referred to as cross-correlation and can be quantified by means of a correlation function, which can be expressed as an integral over the two random variables over time. Typically, correlation functions are normalized such that a perfect correlation between the two random variables, i.e., the two random variables are identical, result in a maximum value which oftentimes is chosen to be equal to one ("1 "). Correspondingly, the
correlation of two completely independent random variables yields a correlation value of zero ("0"). An example is the well-known Pearson product-moment correlation coefficient.
As can be seen in Fig. 5, the correlation between EEG envelope 513 and envelope 51 1 for the attended speech ("Attended Correlation"), i.e., speech 121 to which user 1 10 is paying attention, is much more pronounced than the correlation between EEG envelope 513 and envelope 512 for the unattended speech ("Unattended Correlation"), i.e., any speech to which user 1 10 is not paying attention. Accordingly, the calculated correlation between attended envelope 51 1 and EEG envelope 513 is expected to be larger than the calculated correlation between unattended envelope 512 and EEG envelope 513. Therefore, by calculating the correlation between the amplitude envelope of captured speech 121 and the amplitude envelope of the acquired EEG data captured from user 1 10 over a pre-determined time window, it can be determined if user 1 10 is paying attention to speech 121 or not. More specifically, this may be achieved by comparing the calculated correlation to an upper threshold value and determining that user 1 10 is paying attention to speech 121 , or has resumed paying attention to speech 121 , if the calculated correlation exceeds the upper threshold value. Correspondingly, it can be determined that user 1 10 is not paying attention to speech 121 , or has stopped paying attention to speech 121 , if the calculated correlation is lower than a lower threshold value. Utilizing two distinct threshold values, an upper and a lower threshold value, rather than a single threshold value, is advantageous in that frequent changes between a determination that the user paying attention and the determination that the user not paying attention are avoided.
Preferably, a suitable value for the upper threshold value may be obtained during a calibration or learning phase, during which user 1 10 is instructed to pay attention to speech 121 in order to obtain representative correlation values reflecting that user 1 10 is paying attention to speech 121 . A lower bound of the correlation values obtained during the calibration phase may then be used as the upper threshold value. Correspondingly, a suitable value for the lower threshold value may be obtained by instructing user 1 10 to not pay attention to speech 121 in order to obtain representative correlation values reflecting that user 1 10 is not paying attention to
speech 121 . An upper bound of the correlation values obtained during the calibration phase may then be used as the lower threshold value.
As an alternative to acquiring the EEG data from BCI headset 150, i.e., from the scalp and/or forehead of user 1 10, the EEG data may be acquired from an in-ear EEG device 200 illustrated in Fig. 2. In-ear device 200 is designed for insertion into the ear channel of ear 1 1 1 and comprises electrodes 102 which are arranged for contacting the skin within the ear channel. The use of EEG data captured by in-ear devices has, e.g., been reported in "In-Ear EEG From Viscoelastic Generic Earpieces: Robust and Unobtrusive 24/7 Monitoring", by V. Goverdovsky, D. Looney,
P. Kidmose, and D. P. Mandic, IEEE Sensors Journal, vol. 16, pages 271- 277, IEEE, 2016, and "EEG Recorded from the Ear: Characterizing the Ear- EEG Method", by K. B. Mikkelsen, S. L. Kappel, D. P. Mandic, and
P. Kidmose, Frontiers in Neuroscience, vol. 9, article 438 (2015). In-ear EEG devices are commercially available, e.g., "Aware" from United Sciences (http://efitaware.com/, retrieved on 1 December 2016). The EEG data may, e.g., be acquired from in-ear device 200 via communications module 105 and a similar communications module 105 comprised in in-ear device 200.
As yet a further alternative, the EEG data may be acquired from an around-ear EEG device 300 illustrated in Fig. 3. Around-ear device 300 comprises electrodes 102 which are arranged for contacting the skin around ear 1 1 1 of user 1 10. The use of EEG data from around-ear device has been reported in "Target Speaker Detection with Concealed EEG Around the Ear", by B. Mirkovic, M. G. Bleichner, M. De Vos, and S. Debener, Frontiers in Neuroscience, vol. 10, article 349 (2016). The EEG data may, e.g., be acquired from around-ear device 300 via communications module 105 and a similar communications module 105 comprised in around-ear device 300.
It will be appreciated that, depending on the type of electrodes 102 comprised in BCI headset 150, in-ear device 200, and around-ear
device 300, respectively, and in particular the number of electrodes used and their placement on the skin of user 1 10, additional decoding, data fusion, or data processing, may be required for deriving EEG data which is suitable for calculating a correlation between user 1 10's brain activity and speech 121 so as to determine whether user 1 10 is paying attention to speech or not. This can be achieved using a decoder which implements a transfer function that transforms the raw signals captured by several EEG electrodes channels into one time-dependent function which reflects auditory attention, i.e., attention to speech 121 .
As an alternative to utilizing continuous EEG data, user 1 10's attention to speech 121 may be assessed based on AEPs (see, e.g.,
"Auditory evoked potentials", by N. Kraus and T. Nicol, Encyclopedia of Neuroscience, pages 214-218, Springer, 2009). The detection of AEPs usually requires averaging out over multiple samples, because AEP signals might be mimicked by artifacts such as eye blinking. Though it may be difficult to conclude if an AEP was measured from a single EEG spike, based on measurement conditions, it is typically easier to conclude that an AEP was not measured if there is a steady baseline in the EEG data and no signal which qualifies as an AEP candidate. In other words, a lack of attention of user 1 10 to speech 121 can be detected based on AEPs. Also, if there is a sequence of events, in particular words 121 uttered by presenter 120, which are likely to cause an AEP response, a sequence of AEP responses can be retrieved from noisy EEG data by utilizing the fact that captured speech 121 and the captured EEG data have a highly similar time dependence, i.e., they are correlated.
An example of how AEPs can be used to correlate captured speech 121 with EEG data is described in the following. First, the onset of words in captured speech 121 is detected based on a rise in the amplitude envelope 51 1 , which here is assumed to represent captured speech 121 . Then, possible signals which qualify as AEP are identified within a time window of the order of, e.g., 500 ms, around each onset of a word in captured speech 121 . Finally, the time series of word onsets in captured speech 121 and the time series of AEP candidates in the EEG data are correlated and it is determined that user 1 10 is paying attention to captured speech 121 if the calculated correlation is larger than an upper threshold value. Moreover, it can further be determined that user 1 10 is not paying attention to captured speech 121 if the calculated correlation is smaller than a lower threshold value.
Further with reference to Fig. 1 , audio recording device 100 may be operative to render the representation of recorded speech 121 by audibly rendering recorded speech 121 , i.e., by re-playing recorded speech 121 to user 1 10. This may be achieved by audibly rendering recorded speech 121 using loudspeaker 103, or any external acoustic transducer which is operatively connected to audio recording device 100, such as headphones, earphones, earbuds, or the like. Audibly rendering recorded speech 121 introduces a delay, or gap, between what user 1 10 hears, i.e., what is being rendered by audio recording device 100, and what is currently spoken 121 . This delay or gap may be reduced by audibly rendering recorded speech 121 at an increased speed, as compared to the speed of recorded speech 121 . In this way, audio recording device 100 can eventually "catch up", i.e., the delay or gap is continuously reduced until user 1 10 hears speech 121 in real-time. Thereby, it is avoided that user 1 10 laughs or asks a questions at an inappropriate point in time. Alternatively, or additionally, audio recording device 100 may be operative to skip silent gaps in recorded speech 121 , i.e., periods of silence when speaker 120 does not make any utterance, when audibly rendering recorded speech 121 .
As an alternative, or in addition, to audibly rendering the
representation of recorded speech 121 , audio recording device 100 may be operative to render the representation of recorded speech 121 by
transcribing recorded speech 121 into text, and displaying the transcribed text 123 to user 121 , e.g., using display 107 or an external display which is operatively connected to audio recording device 100. Optionally, audio recording device 100 may be further operative to generate a summary of the transcribed text, using an NLP algorithm. Advantageously, the generated summary 123 may be displayed to user 1 10 instead of the transcribed text 123, thereby alleviating user 1 10 from reading the transcribed text in its entirety.
Extractive summarization of text is usually based on converting sentences into a sentence representation, sentence scoring, and sentence selection ("A Survey of Text Summarization Techniques", by A. Nenkova and K. McKeown, in "Mining Text Data", pages 43-76, Springer, 2012). Sentence representation is whereby the sentences of a text are converted into a representation which allows comparison between themselves and the other sentences in the text. Such representations are usually vectorial, thus allowing the application of various measurements of distance. The sentence representation should also capture the semantic meaning of the sentence to some level, and to reflect its relevance and salience with respect to the text being summarized.
Sentence scoring uses the sentence representation to assign a score to each sentence. For this purpose, graph-based methods, such as TextRank ("TextRank: Bringing Order into Texts", by R. Mihalcea and P. Tarau,
Proceedings of EMNLP 2004, pages 404-41 1 , Association for Computational Linguistics, 2004), can be used to calculate the score. Machine learning algorithms can be used to assign an importance to each sentence.
Sentence selection is the process of sentence selection, i.e., selecting a subset of sentences as summary. One alternative is to simply select the top-scored sentences. Alternatively, maximal marginal relevance methods may be used, whereby the focus is on both maximizing the coverage of the content and minimizing the redundancy. In this method a sentence is evaluated based on its initial score (as calculated in the previous step) and its similarity with already chosen sentences. Sentences which are similar to already chosen sentences are less preferred. This is especially important in multi-documents summarization, where similar sentences can occur in multiple documents. In case of meetings, the summarization problem is domain-specific, and it is advantageous to also include text which describes particular aspects of the subjects being discussed, for example products, terminology which is particular to a given company or market domain, and the like. Also any calculation of relevance may be assisted by having a summary of the topic being discussed. The summary may be generated from speech-to-text summaries of previous meetings, an agenda, or the words which have been spoken in the current meeting so far. Also in meetings, significant contextual and meta information may be present. By using this kind of information, the sentence representations may be enriched by including different features besides the words themselves. For example, who was the person uttering a certain sentence, or what is his role, in what time of the meeting the sentence was said (usually in the beginning of the meeting there is small talk, and the important things are said in the end). In addition, information about the tone with which the sentence is pronounced, word rates, pauses, and the like, also be considered.
Further with reference to Figs. 1 to 4, alternative embodiments of the audio recording device may be based on any one of BCI headset 150, in-ear EEG device 200, and around-ear EEG device 300, respectively.
More specifically, and with reference to Fig. 1 , BCI headset 150 may, in addition to electrodes 102 and communications module 105, comprise a microphone 101 , an acoustic transducer 103, e.g., an earphone, and a processing means 104. Processing means 104 is operative to cause audio recording device 150 to perform similar to audio recording device 100 described hereinbefore. It will be appreciated that audio recording device 150 is operative to utilize an external display for displaying the transcribed text, or a summary thereof, to user 1 10. The external display may, e.g., be a display comprised in a mobile phone, a smartphone, or a tablet, of user 1 10, e.g., display 107. The transcribed text or its summary may be transmitted from audio recording device 150 to tablet 100 using communications modules 105. It will also be appreciated that audio recording device 150 may receive captured speech 121 from an external microphone, e.g., microphone 101 comprised in tablet 100.
With reference to Fig. 2, a further alternative embodiment of the audio recording device may be based on in-ear EEG device 200. In addition to electrodes 102 and communications module 105, audio recording device 200 may comprise a microphone 101 , an acoustic transducer 103, and a processing means 104. Processing means 104 is operative to cause audio recording device 200 to perform similar to audio recording device 100 described hereinbefore. It will be appreciated that audio recording device 200 is operative to utilize an external display for displaying the transcribed text, or a summary thereof, to user 1 10. The external display may, e.g., be a display comprised in a mobile phone, a smartphone, or a tablet, of user 1 10, e.g., display 107. The transcribed text or its summary may be transmitted from audio recording device 200 to tablet 100 using communications modules 105. It will also be appreciated that audio recording device 200 may receive captured speech 121 from an external microphone, e.g., microphone 101 comprised in tablet 100.
With reference to Fig. 3, yet a further alternative embodiment of the audio recording device may be based on around-ear EEG device 300. In addition to electrodes 102 and communications module 105, audio recording device 300 may comprise a microphone 101 , an acoustic transducer 103, e.g., an earphone, and a processing means 104. Processing means 104 is operative to cause audio recording device 300 to perform similar to audio recording device 100 described hereinbefore. It will be appreciated that audio recording device 300 is operative to utilize an external display for displaying the transcribed text, or a summary thereof, to user 1 10. The external display may, e.g., be a display comprised in a mobile phone, a smartphone, or a tablet, of user 1 10, e.g., display 107. The transcribed text or its summary may be transmitted from audio recording device 300 to tablet 100 using communications modules 105. It will also be appreciated that audio recording device 300 may receive captured speech 121 from an external microphone, e.g., microphone 101 comprised in tablet 100.
In the following, embodiments of processing means 104 comprised in an embodiment 100, 150, 200, 300, or 400, of the audio recording device (hereinafter referred to as 100-400), respectively, are described with reference to Figs. 6 and 7.
A first embodiment 600 of processing means 104 is shown in Fig. 6. Processing means 600 comprises a processing unit 602, such as a general purpose processor, and a computer-readable storage medium 603, such as a Random Access Memory (RAM), a Flash memory, or the like. In addition, processing means 600 comprises one or more interfaces 601 ("I/O" in Fig. 6) for controlling and/or receiving information from other components comprised in audio recording device 100-400, such as microphone 101 , electrodes 102, acoustic transducer 103, communications module 105, and display 107. In particular, interface(s) 601 may be operative to receive speech captured by microphone 101 or an external microphone, and EEG data captured by electrodes 102. The acquired speech and EEG data may either be received as analog signals, which are digitalized in processing means 600 for subsequent processing, or in a digital representation. Memory 603 contains computer-executable instructions 604, i.e., a computer program or software, for causing audio recording device 100-400 to become operative to perform in accordance with embodiments of the invention as described herein, when computer-executable instructions 604 are executed on processing unit 602.
An alternative embodiment 700 of processing means 104 is illustrated in Fig. 7. Similar to processing means 600, processing means 700 comprises one or more interfaces 701 ("I/O" in Fig. 7) for controlling and/or receiving information from other components comprised in audio recording
device 100^100, such as microphone 101 , electrodes 102, acoustic transducer 103, communications module 105, and display 107. In particular, interface(s) 701 may be configured to receive speech captured by
microphone 101 or an external microphone, and EEG data captured by electrodes 102. The acquired speech and EEG data may either be received as analog signals, which are digitalized in processing means 700 for subsequent processing, or in a digital representation. Processing means 700 further comprises a capturing module 702, an attention module 703, a recording module 704, a rendering module 705, and, optionally, an alert module 706, which are configured for causing audio recording device 100- 400 to perform in accordance with embodiments of the invention as described herein.
In particular, capturing module 702 is configured to capture speech 121 , using microphone 101 , and attention module 703 is configured to detect that user 1 10 of audio recording device 100-400 is paying attention to captured speech 121 . Recording module 704 is configured to record captured speech 121 to which user 1 10 is paying attention. Attention module 703 is further configured to detect that user 1 10 has stopped paying attention to captured speech 121 , and rendering module 705 is configured to render a representation of recorded speech 121 starting at a point in time when user 1 10 has stopped paying attention to captured speech 121 .
Optionally, attention module 703 may be configured to detect that user 1 10 has resumed paying attention to captured speech 121 , and rendering module 705 may be configured to render the representation of recorded speech 121 in response thereto. Alternatively, rendering module 705 may be configured to render the representation of recorded speech 121 in response to receiving from user 1 10 a request to render a representation of recorded speech 121 .
Optionally, rendering module 705 is configured to render the representation of recorded speech 121 by audibly rendering recorded speech 121 . Further optionally, rendering module 705 may be configured to audibly render recorded speech 121 at an increased speed. Alternatively, or additionally, rendering module 705 may be configured to skip silent gaps in recorded speech 121 when audibly rendering recorded speech 121 .
Alternatively, rendering module 705 may be configured to render the representation of recorded speech 121 by transcribing recorded speech 121 into text, and displaying the transcribed text to user 1 10. Optionally, rendering module 705 may be configured to generate a summary of the transcribed text, wherein the generated summary of the transcribed text is displayed to user 1 10.
Optional alert module 706 is configured to alert user 1 10 in response to detecting that user 121 has stopped paying attention to captured speech 121 .
Preferably, attention module 703 is configured to detect that user 1 10 is paying attention to captured speech 121 by acquiring EEG data captured by electrodes 102 which are attached to a body part of user 1 10, calculating a correlation between the acquired EEG data and captured speech 121 , and determining that user 1 10 is paying attention to captured speech 121 if the calculated correlation is larger than an upper threshold value. Optionally, attention module 703 may be configured to determine that user 1 10 is not paying attention to captured speech 121 if the calculated correlation is smaller than a lower threshold value.
Interfaces 601 and 701 , and modules 702-706, as well as any additional modules comprised in processing means 700, may be
implemented by any kind of electronic circuitry, e.g., any one, or a combination of, analogue electronic circuitry, digital electronic circuitry, and processing means executing a suitable computer program, i.e., software.
In the following, embodiments 800 of the method of an audio recording device are described with reference to Fig. 8. Method 800 is performed by an audio recording device, such as a mobile phone, a smartphone, a tablet, a personal computer, a laptop, a BCI headset, an in- ear device, or an around-ear device. Method 800 comprises detecting 802 that a user 1 10 of the audio recording device is paying attention to speech 121 captured 801 by a microphone operatively connected to the audio recording device,
recording 803 the captured speech 121 to which user 1 10 is paying attention, detecting 804 that user 1 10 has stopped paying attention to captured speech 121 , and rendering 807 a representation of recorded speech 121 starting at a point in time when user 1 10 has stopped paying attention to captured speech 121 . Optionally, the representation of recorded speech 121 is rendered 807 in response to detecting 806 that user 1 10 has resumed paying attention to captured speech 121 . Alternatively, the representation of recorded speech 121 is rendered 807 in response to receiving 808 from user 1 10 a request to render a representation of recorded speech 121 .
Optionally, the representation of recorded speech 121 may be rendered 807 by audibly rendering recorded speech 121 . Further optionally, recorded speech 121 may be audibly rendered 807 at an increased speed. Alternatively, or additionally, silent gaps in recorded speech 121 may be skipped when audibly rendering 807 recorded speech 121 .
Alternatively, the representation of recorded speech 121 may be rendered 807 by transcribing recorded speech 121 into text, and displaying the transcribed text to user 1 10. Optionally, a summary of the transcribed text may be generated, wherein the generated summary of the transcribed text is displayed to user 1 10.
Method 800 may further comprise alerting 805 user 1 10 in response to detecting 804 that user 1 10 has stopped paying attention to captured speech 121 .
Preferably, detecting 802 that user 1 10 is paying attention to captured speech 121 comprises acquiring EEG data captured by electrodes
operatively connected to the audio recording device, which electrodes are attached to a body part of user 1 10, calculating a correlation between the acquired EEG data and captured speech 121 , and determining that user 121 is paying attention to captured speech 121 if the calculated correlation is larger than an upper threshold value. Optionally, detecting 802 that user 1 10 is paying attention to captured speech 121 may further comprise determining that user 121 is not paying attention to captured speech 121 if the calculated correlation is smaller than a lower threshold value.
It will be appreciated that method 800 may comprise additional, or modified, steps in accordance with what is described throughout this disclosure. An embodiment of method 800 may be implemented as software, such as computer program 604, to be executed by a processing unit comprised in the audio recording device, whereby the audio recording device becomes operative to perform in accordance with embodiments of the invention described herein
The person skilled in the art realizes that the invention by no means is limited to the embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims.

Claims

1 . An audio recording device (100, 150; 200; 300; 400) comprising processing means (104; 600; 700) being operative to:
detect that a user (1 10) of the audio recording device is paying attention to speech (121 ) captured by a microphone (101 ) operatively connected to the audio recording device,
record the captured speech to which the user is paying attention, detect that the user has stopped paying attention to the captured speech, and
render a representation of the recorded speech starting at a point in time when the user has stopped paying attention to the captured speech.
2. The audio recording device according to claim 1 , the processing means being operative to render the representation of the recorded speech in response to detecting that the user has resumed paying attention to the captured speech.
3. The audio recording device according to claim 1 , the processing means being operative to render the representation of the recorded speech in response to receiving from the user a request to render a representation of the recorded speech.
4. The audio recording device according to any one of claims 1 to 3, the processing means being operative to render the representation of the recorded speech by audibly rendering the recorded speech.
5. The audio recording device according to claim 4, the processing means being operative to audibly render the recorded speech at an increased speed.
6. The audio recording device according to claim 4, the processing means being operative to skip silent gaps in the recorded speech when audibly rendering the recorded speech.
7. The audio recording device according to any one of claims 1 to 3, the processing means being operative to render the representation of the recorded speech by:
transcribing the recorded speech into text, and
displaying the transcribed text (123) to the user.
8. The audio recording device according to claim 7, the processing means being further operative to generate a summary of the transcribed text, wherein the generated summary of the transcribed text is displayed to the user.
9. The audio recording device according to any one of claims 1 to 8, the processing means being further operative to alert the user in response to detecting that the user has stopped paying attention to the captured speech.
10. The audio recording device according to any one of claims 1 to 9, the processing means being operative to detect that the user is paying attention to the captured speech by:
acquiring electroencephalography, EEG, data captured by
electrodes (102) operatively connected to the audio recording device, which electrodes are attached to a body part of the user,
calculating a correlation between the acquired EEG data and the captured speech, and
determining that the user is paying attention to the captured speech if the calculated correlation is larger than an upper threshold value.
1 1 . The audio recording device according to claim 10, the processing means being operative to determine that the user is not paying attention to the captured speech if the calculated correlation is smaller than a lower threshold value.
12. The audio recording device according to any one of claims 1 to 1 1 , being any one of: a mobile phone (100; 400), a smartphone (100; 400), a tablet (100; 400), a personal computer, a laptop, a brain-computer interface headset (150), an in-ear device (200), and an around-ear device (300).
13. A method (800) of an audio recording device, the method comprising:
detecting (802) that a user of the audio recording device is paying attention to speech captured (801 ) by a microphone operatively connected to the audio recording device,
recording (803) the captured speech to which the user is paying attention,
detecting (804) that the user has stopped paying attention to the captured speech, and
rendering (807) a representation of the recorded speech starting at a point in time when the user has stopped paying attention to the captured speech.
14. The method according to claim 13, wherein the representation of the recorded speech is rendered (807) in response to detecting that the user has resumed (806) paying attention to the captured speech.
15. The method according to claim 13, wherein the representation of the recorded speech is rendered (807) in response to receiving (808) from the user a request to render a representation of the recorded speech.
16. The method according to any one of claims 13 to 15, wherein the representation of the recorded speech is rendered (807) by audibly rendering the recorded speech.
17. The method according to claim 16, wherein the recorded speech is audibly rendered (807) at an increased speed.
18. The method according to claim 16, wherein silent gaps in the recorded speech are skipped when audibly rendering (807) the recorded speech.
19. The method according to any one of claims 13 to 15, wherein the representation of the recorded speech is rendered (807) by:
transcribing the recorded speech into text, and
displaying the transcribed text to the user.
20. The method according to claim 19, further comprising generating a summary of the transcribed text, wherein the generated summary of the transcribed text is displayed to the user.
21 . The method according to any one of claims 13 to 20, further comprising alerting (805) the user in response to detecting (804) that the user has stopped paying attention to the captured speech.
22. The method according to any one of claims 13 to 20, wherein it is detected (802, 806) that the user is paying attention to the captured speech by:
acquiring electroencephalography, EEG, data captured by electrodes operatively connected to the audio recording device, which electrodes are attached to a body part of the user,
calculating a correlation between the acquired EEG data and the captured speech, and
determining that the user is paying attention to the captured speech if the calculated correlation is larger than an upper threshold value.
23. The method according to claim 22, wherein it is determined (802, 804) that the user is not paying attention to the captured speech if the calculated correlation is smaller than a lower threshold value.
24. A computer program (604) comprising computer-executable instructions for causing a device to perform the method according to any one of claims 13 to 23, when the computer-executable instructions are executed on a processing unit (602) comprised in the device.
25. A computer program product comprising a computer-readable storage medium (603), the computer-readable storage medium having the computer program (604) according to claim 24 embodied therein.
PCT/EP2016/081229 2016-12-15 2016-12-15 Audio recording device for presenting audio speech missed due to user not paying attention and method thereof WO2018108284A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/081229 WO2018108284A1 (en) 2016-12-15 2016-12-15 Audio recording device for presenting audio speech missed due to user not paying attention and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/081229 WO2018108284A1 (en) 2016-12-15 2016-12-15 Audio recording device for presenting audio speech missed due to user not paying attention and method thereof

Publications (1)

Publication Number Publication Date
WO2018108284A1 true WO2018108284A1 (en) 2018-06-21

Family

ID=57570256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/081229 WO2018108284A1 (en) 2016-12-15 2016-12-15 Audio recording device for presenting audio speech missed due to user not paying attention and method thereof

Country Status (1)

Country Link
WO (1) WO2018108284A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134756A (en) * 2019-04-15 2019-08-16 深圳壹账通智能科技有限公司 Minutes generation method, electronic device and storage medium
US11184723B2 (en) * 2019-04-14 2021-11-23 Massachusetts Institute Of Technology Methods and apparatus for auditory attention tracking through source modification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1773040A1 (en) * 2005-09-30 2007-04-11 BRITISH TELECOMMUNICATIONS public limited company Audio conference buffering system with proximity detector
WO2007113580A1 (en) * 2006-04-05 2007-10-11 British Telecommunications Public Limited Company Intelligent media content playing device with user attention detection, corresponding method and carrier medium
US20140278405A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Automatic note taking within a virtual meeting
WO2014205327A1 (en) * 2013-06-21 2014-12-24 The Trustees Of Dartmouth College Hearing-aid noise reduction circuitry with neural feedback to improve speech comprehension
US9462230B1 (en) * 2014-03-31 2016-10-04 Amazon Technologies Catch-up video buffering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1773040A1 (en) * 2005-09-30 2007-04-11 BRITISH TELECOMMUNICATIONS public limited company Audio conference buffering system with proximity detector
WO2007113580A1 (en) * 2006-04-05 2007-10-11 British Telecommunications Public Limited Company Intelligent media content playing device with user attention detection, corresponding method and carrier medium
US20140278405A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Automatic note taking within a virtual meeting
WO2014205327A1 (en) * 2013-06-21 2014-12-24 The Trustees Of Dartmouth College Hearing-aid noise reduction circuitry with neural feedback to improve speech comprehension
US9462230B1 (en) * 2014-03-31 2016-10-04 Amazon Technologies Catch-up video buffering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11184723B2 (en) * 2019-04-14 2021-11-23 Massachusetts Institute Of Technology Methods and apparatus for auditory attention tracking through source modification
CN110134756A (en) * 2019-04-15 2019-08-16 深圳壹账通智能科技有限公司 Minutes generation method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
US11144596B2 (en) Retroactive information searching enabled by neural sensing
US10944708B2 (en) Conversation agent
US10706873B2 (en) Real-time speaker state analytics platform
Ramakrishnan Recognition of emotion from speech: A review
US8209181B2 (en) Personal audio-video recorder for live meetings
US20190370283A1 (en) Systems and methods for consolidating recorded content
Fenn et al. When less is heard than meets the ear: Change deafness in a telephone conversation
JP2017532082A (en) A system for speech-based assessment of patient mental status
US10536786B1 (en) Augmented environmental awareness system
Hantke et al. I hear you eat and speak: Automatic recognition of eating condition and food type, use-cases, and impact on asr performance
WO2017085992A1 (en) Information processing apparatus
US20220231873A1 (en) System for facilitating comprehensive multilingual virtual or real-time meeting with real-time translation
US11425072B2 (en) Inline responses to video or voice messages
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
Potter et al. Effect of vocal-pitch difference on automatic attention to voice changes in audio messages
WO2018108284A1 (en) Audio recording device for presenting audio speech missed due to user not paying attention and method thereof
US20220036878A1 (en) Speech assessment using data from ear-wearable devices
Barthel et al. Acoustic-phonetic properties of smiling revised: Measurements on a natural video corpus
JP2010086356A (en) Apparatus, method and program for measuring degree of involvement
US10650055B2 (en) Data processing for continuous monitoring of sound data and advanced life arc presentation analysis
CN111818418A (en) Earphone background display method and system
Thompson et al. Reliance on visible speech cues during multimodal language processing: Individual and age differences
Gilmartin et al. Chats and Chunks: Annotation and Analysis of Multiparty Long Casual Conversations
US11657814B2 (en) Techniques for dynamic auditory phrase completion
Hodson et al. Revisiting Jarrod: applications of gestural phonology theory to the assessment and treatment of speech sound disorder

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16812940

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16812940

Country of ref document: EP

Kind code of ref document: A1