EP4189974A2 - Système et procédé d'égalisation de casque d'écoute et d'adaptation à la salle pour une restitution binaurale en réalité augmentée - Google Patents
Système et procédé d'égalisation de casque d'écoute et d'adaptation à la salle pour une restitution binaurale en réalité augmentéeInfo
- Publication number
- EP4189974A2 EP4189974A2 EP21751796.0A EP21751796A EP4189974A2 EP 4189974 A2 EP4189974 A2 EP 4189974A2 EP 21751796 A EP21751796 A EP 21751796A EP 4189974 A2 EP4189974 A2 EP 4189974A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio
- impulse responses
- sound
- binaural
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 73
- 230000003190 augmentative effect Effects 0.000 title claims description 12
- 230000004044 response Effects 0.000 claims abstract description 57
- 230000000694 effects Effects 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 32
- 230000005236 sound signal Effects 0.000 claims description 24
- 239000002775 capsule Substances 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 14
- 230000033001 locomotion Effects 0.000 claims description 12
- 230000001419 dependent effect Effects 0.000 claims description 11
- 210000003128 head Anatomy 0.000 claims description 11
- 239000003607 modifier Substances 0.000 claims description 11
- 238000013136 deep learning model Methods 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 4
- 210000000613 ear canal Anatomy 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 238000001454 recorded image Methods 0.000 claims description 2
- 230000013707 sensory perception of sound Effects 0.000 description 77
- 238000012545 processing Methods 0.000 description 55
- 238000000926 separation method Methods 0.000 description 53
- 238000001514 detection method Methods 0.000 description 37
- 238000004891 communication Methods 0.000 description 19
- 230000004807 localization Effects 0.000 description 19
- 238000013528 artificial neural network Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 239000000203 mixture Substances 0.000 description 14
- 238000012549 training Methods 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 13
- 230000033458 reproduction Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 206010041235 Snoring Diseases 0.000 description 7
- 230000008447 perception Effects 0.000 description 7
- 230000000306 recurrent effect Effects 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 230000001755 vocal effect Effects 0.000 description 7
- 230000006978 adaptation Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 6
- 208000016354 hearing loss disease Diseases 0.000 description 6
- 230000002829 reductive effect Effects 0.000 description 6
- 230000001629 suppression Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 230000003203 everyday effect Effects 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000003321 amplification Effects 0.000 description 4
- 210000005069 ears Anatomy 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 238000002560 therapeutic procedure Methods 0.000 description 4
- 238000013526 transfer learning Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 239000000523 sample Substances 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000008093 supporting effect Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- IPNDWAYKBVVIBI-UHFFFAOYSA-N 2-hydroxy-3,5-bis(morpholin-4-ium-4-ylmethyl)-7-propan-2-ylcyclohepta-2,4,6-trien-1-one;dichloride Chemical compound [Cl-].[Cl-].C=1C(C[NH+]2CCOCC2)=C(O)C(=O)C(C(C)C)=CC=1C[NH+]1CCOCC1 IPNDWAYKBVVIBI-UHFFFAOYSA-N 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 2
- 206010039740 Screaming Diseases 0.000 description 2
- 241000269400 Sirenidae Species 0.000 description 2
- 208000009205 Tinnitus Diseases 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000013434 data augmentation Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000013213 extrapolation Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000010370 hearing loss Effects 0.000 description 2
- 231100000888 hearing loss Toxicity 0.000 description 2
- 230000001976 improved effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000009527 percussion Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 239000003826 tablet Substances 0.000 description 2
- 231100000886 tinnitus Toxicity 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 208000006096 Attention Deficit Disorder with Hyperactivity Diseases 0.000 description 1
- 208000036864 Attention deficit/hyperactivity disease Diseases 0.000 description 1
- 241000125183 Crithmum maritimum Species 0.000 description 1
- 206010011469 Crying Diseases 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 206010048865 Hypoacusis Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 239000007943 implant Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012625 in-situ measurement Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 238000009413 insulation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 208000019116 sleep disease Diseases 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000002255 vaccination Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
- H04S7/306—For headphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1083—Reduction of ambient noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/41—Detection or adaptation of hearing aid parameters or programs to listening situation, e.g. pub, forest
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2225/00—Details of deaf aids covered by H04R25/00, not provided for in any of its subgroups
- H04R2225/43—Signal processing in hearing aids to enhance the speech intelligibility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2460/00—Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
- H04R2460/01—Hearing devices using active noise cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R25/00—Deaf-aid sets, i.e. electro-acoustic or electro-mechanical hearing aids; Electric tinnitus maskers providing an auditory perception
- H04R25/50—Customised settings for obtaining desired overall acoustical characteristics
- H04R25/505—Customised settings for obtaining desired overall acoustical characteristics using digital signal processing
- H04R25/507—Customised settings for obtaining desired overall acoustical characteristics using digital signal processing implemented by neural network or fuzzy logic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/027—Spatial or constructional arrangements of microphones, e.g. in dummy heads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/033—Headphones for stereophonic communication
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/301—Automatic calibration of stereophonic sound system, e.g. with test microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
Definitions
- the present invention relates to Kcyhn iurEQ and room mmf, of binaural display in augmented reality, (AW
- Audio analysis has a number of specific challenges that need to be addressed. Due to their complexity, deep learning models are very data-hungry. Compared to the research areas of image processing and speech processing, only relatively small data sets are currently available for audio processing. The largest data set is the AudioSet data set from Google [83] with around 2 million sound samples and 632 different sound event classes, although most of the data sets used in research are much smaller. This small amount of training data can be addressed, for example, with transfer learning (transfer learning), in which a model that has been pre-trained on a large data set is then fine-tuned to a smaller data set with new classes intended for the application (fine tuning) [77 ].
- transfer learning transfer learning
- Real-time capability of the sound source detection algorithms is of elementary importance in the scenario of use planned in this project within a headphone.
- a trade-off between the complexity of the neural network and the maximum possible number of arithmetic operations on the underlying computing platform must be carried out. Even if a sound event has a longer duration, it still has to be recognized as quickly as possible in order to start an appropriate source separation.
- Audio signals at different locations in the city can both measure noise levels and classify between 14 different acoustic scene and event classes [69]. Processing in the v a -iren on the
- Source separation algorithms usually leave behind artifacts such as distortion and crosstalk between the sources [5], which are generally perceived as annoying by the listener. However, such artefacts can be partially masked and thus reduced by mixing the tracks again (re-mixing) [10].
- the voice of a conversation partner can also be passed through while the ANCs are activated. This works but only if the interlocutor is in a frontal SCT cone A riclv. « ⁇ .w « ⁇ nfit is not possible.
- a method was disclosed which is designed to generate a listening environment for a user. The method includes receiving a signal that reflects an ambient listening environment of the user, and also processing the signal «; using a microprocessor to at least one sound type a f '''of sound types in the ambient
- Headphone equalization and room adaptation of binaural playback in augmented reality (AR) represent a significant problem ⁇ in a typical scenario, a human listener wears acoustically (partially) transparent headphones and hears his surroundings through them.
- additional sound sources are reproduced via the headphones, which are embedded in the real environment in such a way that it is not possible for the listener to distinguish between the real sound scene and the additional sound.
- a measured BRIR is carried out without headphones, either individually or with an artificial head using a probe microphone.
- the spatial properties of the recording room are analyzed based on the measured BRIR.
- the headphone transfer function is then measured individually or with an artificial head using a probe microphone at the same location. This determines an equalization function.
- a source to be augmented is convolved with the position-correct, optionally adjusted, BRIR in order to obtain two raw channels. Convolve the raw channels with the equalization function to get the headphone signals.
- the headphone signals are reproduced via headphones.
- claim 1 provides a system, claim 19 a method and claim 20 a computer program according to embodiments of the invention
- a system with an embodiment of the invention comprises an anal, sAw for determining a plurality of binaural room impulse responses and a loudspeaker signal generator for generating at least two loudspeaker signals depending on the plurality of binaural room impulse responses and depending on the audio source signal of at least one audio source.
- the analyzer is designed to determine the plurality of binaural spatial impulse responses in such a way that each of the plurality of binaural spatial impulse responses takes into account an effect resulting from the wearing of headphones by a user.
- a computer program according to an embodiment of the invention is provided with a program code for carrying out the method described above.
- FIG. 1 shows a system according to an embodiment.
- FIG 2 shows another system for supporting selective hearing according to another embodiment.
- FIG. 3 shows a system for recording selective listening that includes a user interface.
- the BRIR is measured with headphones either individually or with headphones using a probe microphone.
- the spatial properties of the recording room are analyzed based on the measured BRIR.
- At least one built-in microphone in each shell records the real sound situation in the playback room before playback begins. From these recordings, an estimate of the raw audio signal from one or more sources is determined and the respective BRIR of the sound source/audio source in the playback room is determined. The acoustic room properties of the playback room are determined from this estimate and the BRIR of the recording room is thus adjusted.
- additional Wüki are optionally attached to the outside of the headphones, possibly also to the top of the bracket, for measuring and analyzing the situation in the playback room.
- sound from natural and augmented sources is realized to be the same.
- Embodiments realize that no measurement of headphone characteristics is required.
- Embodiments thus provide concepts for measuring the spatial properties of the rendering space.
- Some embodiments provide a seed and (post) optimization of the spatial adaptation.
- the concepts provided also work if the room acoustics of the playback room change, e.g. if the listener changes to another room.
- embodiments are based on installing different techniques for hearing assistance in technical systems and combining them in such a way that an improvement in the quality of sound and quality of life (e.g. desired sound is louder, undesired sound is quieter, better speech intelligibility) both for people with normal hearing and for people with damage to the hearing is achieved.
- quality of sound and quality of life e.g. desired sound is louder, undesired sound is quieter, better speech intelligibility
- FIG. 2 shows a system for supporting selective listening according to an embodiment.
- the system includes a detector i tu for detecting an audio source signal portion of one or more audio sources using at least two received microphone signals of a listening environment.
- the system also includes a signal component modifier 140 for changing the audio source signal component of at least one audio source of the one or more audio sources depending on the audio signal type of the audio source signal component of the at least one audio source in order to obtain a modified audio signal component of the at least one audio source.
- a signal component modifier 140 for changing the audio source signal component of at least one audio source of the one or more audio sources depending on the audio signal type of the audio source signal component of the at least one audio source in order to obtain a modified audio signal component of the at least one audio source.
- the analyzer 152 of the signal generator 150 is designed to generate the plurality of binaural spatial impulse responses, the plurality of binaural spatial impulse responses being a plurality of binaural spatial impulse responses for each audio source of the one or more audio sources which, depending on the position information of this audio source and one Orientation of a user's head.
- the loudspeaker signal generator 154 of the signal generator 150 is designed to generate the at least two loudspeaker signals as a function of the plurality of binaural room impulse responses and as a function of the modified audio signal component of the at least one audio source.
- the detector 110 may be configured to detect the audio source signal portion of the one or more audio sources using deep learning models.
- the position determiner 120 can be designed, for example, to determine the position information for each of the one or more audio sources depending on a recorded image or on a recorded video.
- the position determiner 120 can be designed, for example, to determine the position information for each of the one or more audio sources as a function of the video by detecting a lip movement of a person in the video
- the *>n' ⁇ i' can be designed to determine one or more acoustic properties of the listening environment as a function of the at least two received microphone signals.
- the system may include a user interface 160 for selecting the previously learned user scenario from a set of two or more previously learned user scenarios.
- a user interface 160 for selecting the previously learned user scenario from a set of two or more previously learned user scenarios.
- 3 shows such a system according to an embodiment, which additionally comprises such a user interface 160 .
- the detector 110 and/or the position determiner 120 and/or the audio type classifier 130 and/or the signal component modifier 140 and/or the signal generator 150 can be implemented, for example, parallel signal processing using a Hough transform or under Perform using a plurality of VLSI chips or using a plurality of memristors.
- Embodiment that such a hearing aid 170 with two corresponding speakers 171, 172 includes
- the system may include, for example, at least two speakers 181, 182 for outputting the at least two speaker signals and a housing structure 183 accommodating the at least two speakers, the at least one housing structure 183 being adapted to be attached to a head 185 of a user or another
- the system can include headphones bV, for example, which include at least two loudspeakers a' for outputting the at least two loudspeaker signals.
- Fig. 5b ao M a corresponding headphone 180 with two speakers 181 182 according to an 't Arrangementsform.
- the system may include a remote device 190 that includes detector 110 and position determiner 120 and audio type classifier 130 and signal component modifier 140 and signal generator 150 .
- the remote device 190 can be spatially separated from the headphones 180, for example.
- remote device 190 may be a smartphone.
- Embodiments do not necessarily use a microprocessor, but use parallel signal processing steps, such as Hough transformation, VLSI chips or memristors for the power-saving implementation, including artificial neural networks.
- the auditory environment mvi is spatially recorded and reproduced, which on the one hand uses more than one signal to represent the input signal and on the other hand also uses a spatial reproduction.
- n * “ .. Deep Learning (DL) models (e.g. CNN, RCNN 8 M Siamese Network) and simultaneously processes the information from at least two microphone channels, mi ' Vns one microphone in each Hearable is. According to the like ... ⁇ .. e Ac yso several output signals (according to the individual sound sources) together mii. their respective spatial position. If the recording device (microphones) is connected to the head, then the positions ' change. , effectss from head movements. This enables a natural nki i ⁇ cio . i sound, :trcb turning to
- the signal analysis algorithms are based on a deep learning architecture, for example.
- Alternative variants with an analysis unit or variants with separate networks are used for the aspects of localization, detection and source separation.
- the alternative use of generalized cross-correlation takes account of the frequency-dependent shadowing by the head and improves localization, detection and source separation.
- the training steps mentioned above use, for example, multi-channel audio data, with a first training run usually being carried out in the laboratory with simulated or recorded audio data. This is followed by a training session in different natural environments (e.g. living room, classroom, train station, (industrial) production environment, etc.), i.e. transfer learning and domain adaptation takes place.
- natural environments e.g. living room, classroom, train station, (industrial) production environment, etc.
- the position detector could be coupled to one or more cameras to also determine the visual position of sound/audio sources.
- the position detector could be coupled to one or more cameras to also determine the visual position of sound/audio sources.
- lip movement and the audio signals coming from the source separator are correlated and thus a more precise localization is achieved.
- the auralization is performed using binaural synthesis.
- the binaural synthesis offers the further advantage that it is possible not to completely delete unwanted L components, but only to reduce them to the extent that they
- the analysis of the auditory environment is not only used to separate the objects, but also to analyze the acoustic properties (eg reverberation time, initial time gap). These properties are then used in the binaural synthesis to adapt the pre-stored (possibly also individualized) binaural room impulse responses (BRIR) to the actual room. Due to the reduction in room divergence, the listener has a significantly reduced listening effort when understanding the optimized signals. Minimizing room divergence affects the externalization of auditory events and thus the plausibility of spatial audio reproduction in the listening room. There are no known solutions in the prior art for speech understanding or for the general understanding of optimized signals.
- Some embodiments are independent of the hardware used, i.e. both open and closed headphones can be used.
- the signal processing can be integrated in the headphones, in an ev* - -n device, or also in a smartphone.
- signals from the smartphone e.g. music, telephony
- an ecosystem for “selective listening with cl support” be > ' r? kills.
- Exemplary embodiments relate to the “personalized auditory reality ( PARty )”.
- PARty personalized auditory reality
- the listener is able to amplify, reduce or modify defined acoustic objects.
- Some embodiments u-, n the analysis of the real sound environment and detection of the individual akus m ⁇ i Qjekte, the separation, tracking and editability of the existing objects unu uie reconstruction and playback of the modified acoustic scene.
- a detection of sound events, a separation of the sound events, and a suppression of some of the sound events is implemented.
- AI methods meaning in particular deep learning-based methods.
- embodiments create spatiality and three-dimensionality in multimedia systems when the user interacts
- Exemplary embodiments are based on researched knowledge of perceptual and cognitive processes of spatial hearing.
- Scene decomposition This includes a room-acoustic recording of the real environment and parameter estimation and/or a position-dependent sound field analysis.
- Scene Representation This includes representation and identification of the objects and the environment and/or efficient representation and storage.
- scene composition and playback This includes an adjustment and Vt wQumig of the objects and the environment and/or a rendering and a fi ig.
- This i technical and/or auditory quality measurement Signal processing: This includes feature extraction and dataset generation for ML (machine learning).
- Estimation of room and environment acoustics This includes in-situ measurement and estimation of room acoustic parameters and/or provision of room acoustic characteristics for source separation and ML.
- Auralization This includes a spatial audio reproduction with an auditory fit to the environment and/or a validation and evaluation and/or a proof of function and a quality assessment.
- FIG. 8 shows a corresponding scenario according to an embodiment.
- Embodiments combine concepts for sound source detection, classification, separation, localization, and enhancement, highlighting recent advances in each area and showing relationships between them.
- Unified concepts are provided that can combine detect/classify/locate and separate/enhance sound sources to provide both the flexibility and robustness required for real-life SH.
- Some of the embodiments utilize deep learning concepts, machine hearing and smart headphones (fengi smart hearables) that allow listeners to selectively modify their auditory scene.
- the user represents the center of the auditory scene.
- four external sound sources (S1-S4) are active around the user.
- S1-S4 are active around the user.
- the user is usually the center of the system and controls the auditory scene via a control unit.
- the user can edit the auditory scene with a
- the next step is a capture/classification/localization stage.
- the acquisition is necessary, e.g. B. when the user wants to keep every speech utterance occurring in the auditory scene.
- classification might be necessary, e.g. B. if the user wants to keep fire alarms in the auditory scene, but not phone rings or office noise.
- the location of the source is relevant to the system. This is the case, for example, with the four sources in Figure 9: the user can choose to remove or attenuate the sound source coming from a certain direction, regardless of the type or characteristics of the source.
- FIG. 10 illustrates a processing workflow of an SH application according to an embodiment.
- Sound Source Identification is used, which goes one step further and aims to identify specific instances of a sound source in an Aud ⁇ v MI. Speaker identification is perhaps the most common use n source identification today. The goal of this process is to identify whether a specific speaker is present in the scene. In the example in Figure 1, the user has selected "Speaker X" as one of the sources to keep in the auditory scene. This requires hierarchies that go beyond capturing and classifying speech, and requires speaker-specific models that allow for this precise identification. For example, embodiments utilize sound source enhancement, which refers to the process of increasing the prominence of a given sound source in the auditory scene.[8] In the case of speech signals, the goal is often to improve their quality and increase perception of intelligibility.
- source enhancement refers to the concept of making remixes and is often performed to add a musical instrument (sound source) to the mix to let stand out.
- Applications for making remixes often use sound separation pre-stages (sound Separation front-enc to preserve the individual sound sources and to change the characteristics of the mix [10], Although sound enhancement can be preceded by a sound source separation stage, this is not always the case, and therefore emphasize we also emphasize the distinction between these two terms.
- some of the embodiments use, for example, one of the following concepts, such as the detection and classification of acoustic scenes and events [18] in this context
- Methods for audio event detection (AED) in home settings have been proposed, where the goal is to capture the time boundaries of a given sound event within 10-second recordings [19], [20].
- 10 sound event classes were considered, including cat, dog, speech, alarm, and running water.
- noise labels in classification is particularly relevant for applications to selective hearing, in which the class designators * can be so different that q" bk ' ky high-quality designations are very expensive [24], noise designations b for sound event classification were in (25] mmn «S mt wo ge syndromes rtiuuubte loss functions on the basis of * V. ( k' ⁇ Kr ⁇ ⁇ "uz ⁇ R* ''' ohow ways to evaluate both data with noise labels and manual data are presented.
- some embodiments implement simultaneous detection and localization of sound events.
- some embodiments, as in [27], perform the detection as a multi-label classification process and the location is given as the 3D coordinates of the direction of arrival (DOA) for each sound event.
- DOA direction of arrival
- Some embodiments use concepts of voice activity detection and speaker recognition/identification for SH.
- Voice activity detection has been addressed in noisy environments using denoising autoencoders [28], recurrent neural networks [29] or as an end-to-end system using raw waveforms [30].
- Many systems have been proposed in the literature for speaker recognition applications [31], with the vast majority focusing on increasing robustness to different conditions, for example with data augmentation or with improved embeddings that facilitate recognition [32]- [34], So some of the embodiments utilize these concepts.
- sound source localization is closely related to the problem of source counting, since the number of sound sources in the auditory scene is usually not known in real-life applications.
- Some systems operate on the assumption that the number of sources in the scene is known. This is the case, for example, with the model presented in [39], which uses histograms of active intensity vectors to locate the sources.
- [40] proposes, from a controlled perspective, a CNN-based algorithm to estimate the DOA of multiple speakers in the auditory scene using phase maps as input representations. In contrast, several works in the literature collectively estimate the number of sources in the scene and their location information.
- Sound source localization algorithms can be computationally demanding as they often involve scanning a large space around the auditory scene [42].
- some of the embodiments use concepts that expand the search space by using clustering algorithms [43] or by performing multi-resolution searches [42] relative to best practices such as those based on the steered-response phase transform (steered response power phase transform, SRP-PHAT).
- Other methods make matrix sparsity requirements and assume that only one sound source is predominant in a given time-frequency domain [44].
- unprocessed waveforms Some of the embodiments utilize these concepts.
- some embodiments employ concepts of speaker independent separation. There a separation occurs without any prior information about the speakers in the scene [46] Some embodiments also evaluate the spatial location of the speaker to perform a separation [47].
- Some embodiments employ music sound separation (MSS) concepts to extract a music source from an audio mix [5], such as main instrument/accompaniment separation concepts [52]. These algorithms take the most prominent sounds in the mix, regardless of their class designation, and try to separate them from the rest of the accompaniment.
- Some embodiments use vocal separation concepts [53] In most cases, either specific source models [54] or data-driven models [55] are used to capture the character ⁇ t'ka of the vocal.
- systems such as that provided in FIG . 5A do not explicitly include a classification or a detection stage to achieve separation, the data-driven nature of these approaches allows these systems to implicitly learn to detect the singing voice with some accuracy prior to separation .
- ANC anti-noise
- ANC Active Noise Cancellation
- ANC systems mainly aim to reduce background noise for headphone users by employing an anti-noise signal to cancel it [11]
- ANC can be considered a special case of SH and faces an equally stringent requirement [14].
- Some work focused refer to anti-noise in specific environments such as automotive interiors [56] or operational scenarios [57].
- the work in [56] analyzes the cancellation of different types of noise, such as road noise and engine noise, and requires unified systems capable of dealing with different types of noise.
- Some work has focused on developing ANC systems for canceling noise over specific spatial regions.
- ANC is discussed over a spatial region using spherical harmonics as basis functions to represent the noise field.
- Some of the embodiments use sound source enhancement concepts.
- Speech enhancement is often performed in conjunction with sound source separation approaches, where the basic idea is to first extract the speech utterance and then apply enhancement techniques to the isolated speech signal [6].
- the concepts described here are employed by some of the embodiments.
- Source enhancement in the context of music mostly refers to applications for making music remixes.
- speech enhancement where the assumption is often that the speech utterance is only affected by noise sources
- music applications mostly assume that other sound sources (musical instruments) are playing simultaneously with the source to be enhanced. Therefore, music remix applications are always provided preceded by a source separation application. For example, in [10] early jazz recordings were remixed using techniques to separate lead and accompaniment, harmonic and percussion instruments to achieve a better tonal balance in the mix.
- [63] investigated the use of different vocal separation algorithms to vary the relative loudness of the vocal and backing track, showing that a 6 dB increase is possible by introducing slight but audible distortions into the final mix.
- the authors explore ways to improve music perception for cochlear implant users by applying sound source separation techniques to achieve new mixes. The concepts described there are used by some of the embodiments.
- the amount of acceptable latency is both frequency and attenuation dependent, but can be as low as 1 ms for around 5 dB attenuation from frequencies below 200 Hz [14].
- a final consideration regarding SH applications refers to the quality perception of the modified auditory scene. A significant amount of work has been done on methodologies for reliable audio quality assessment In some embodiments, concepts for counting and localization in [41], for localization and detection in [27], for separation and classification in [65] and for separation ut ⁇ , len in [66] are used as described there.
- Some embodiments employ concepts to improve the robustness of current machine hearing methods as described in [25], [26], [32], [34], new emerging directions range adaptation [67] and learning based on datasets recorded with multiple devices [68].
- Some of the embodiments employ concepts for improving the computational efficiency of machine hearing as described in [48], or concepts described in [30], [45], [50], [61] that are able to deal with unprocessed waveforms.
- Some embodiments implement a unified optimization scheme that combines detection/classification/location and separation/enhancement to selectively modify sound sources in the scene, with independent detection, separation, localization, classification, and enhancement methods being reliable and applicable to SH provide the required robustness and flexibility.
- Some embodiments are suitable for real-time processing, with a good trade-off between algorithmic complexity and performance.
- Some embodiments combine ANC and listening. For example, the auditory scene is first classified and then ANC is selectively applied.
- the transfer functions map the properties of the sound sources, as well as the direct sound between the objects and the user, as well as all reflections that occur in the room. In order to ensure correct spatial audio reproductions for the room acoustics of a real room in which the listener is currently located, the transfer functions must also represent the room acoustic properties of the listening room with sufficient accuracy.
- the challenge lies in the appropriate recognition and separation of the individual audio objects when a large number of audio objects are present. Furthermore, the audio signals of the objects in the recording position or in the listening position of the room overlap. Both the room acoustics and the superimposition of the audio signals change when the objects and/or the listening positions in the room change.
- Room acoustics parameters must be estimated quickly enough in the case of relative movement. A low latency of the estimation is more important than a high accuracy. On the other hand, if the position of the source and receiver do not change (static case), a high degree of accuracy is required.
- room acoustics parameters, as well as room geometry and listener position are estimated or extracted from a stream of audio signals. The audio signals are recorded in a real environment in which the source(s) and receiver(s) can move in any direction, and in which the source(s) and/or receiver(s) change their orientation in any way be able.
- the to i. iioinialstioim can be the result of any microphone setup snim 'fern ein or r,n>hl ⁇ na mics includes.
- the streams will work into a sion I ⁇ i *i ⁇ . - u o m
- a second data stream is generated by a 6DoF (“ s of freedom” - degrees of freedom: three dimensions each for position in space) sensor » the
- the position data stream is fed into an oDoF signal processing stage for preprocessing or further A ⁇ i, ⁇ 'n.
- Some of the embodiments realize a blind estimation of room acoustics parameters by using arbitrary microphone arrays and by adding position and pose information of the user, and by analyzing the data with machine learning methods.
- Systems according to embodiments may be used for acoustic augmented reality (AAR), for example.
- AAR acoustic augmented reality
- the device is designed to receive tracking data relating to a position and/or an orientation of a user.
- the device may be configured to use cloud-based processing for machine learning.
- the one or more S"mmacoustics" parameters may e.g. include a reverberation time.
- the one or more room acoustic parameters may include a direct-to-reverberation ratio.
- the tracking data to indicate the user's location may include, for example, an x-coordinate, a y-coordinate, and a z-coordinate.
- the tracking data to indicate the user's orientation may include, for example, a pitch coordinate, a yaw coordinate, and a roll coordinate.
- the device can be designed, for example, to transform the one or more microphone signals from a time domain into a frequency domain, wherein the device can be designed, for example, to extract one or more characteristics of the one or more microphone signals in the frequency domain, and wherein the device can be designed, for example, to determine the one or more room acoustics parameters depending on the associated characteristics.
- the device may be configured to use cloud-based processing to extract the one or more features.
- the device may include a microfort array of multiple microphones to capture the multiple microphone signals.
- system described above may further comprise, for example, a device as described above for determining one or more room acoustic parameters.
- the signal portion modifier 140 can be configured, for example, to change the audio source signal portion of the at least one audio source of the one or more audio sources as a function of at least one of the one or more room acoustics parameters; and/or the signal generator 150 can be designed, for example, to generate at least one of the plurality of binaural room impulse responses for each audio source of the one or more audio sources depending on the at least one of the one or more room acoustics parameters.
- Figure 7 shows a system according to an embodiment comprising five sub-systems (sub-systems 1-5).
- Sub-system 1 includes a microphone setup of one, two or more individual microphones, which can be combined to form a microphone array if more than one microphone is available.
- the positioning and relative arrangement of the microphone(s) can be arbitrary.
- the microphone assembly can be part of a device worn by the user or may be a separate device positioned in the space of interest
- Subsystem 1 thus represents an input interface that includes a microphone signal input interface 101 and a position information input interface 102 .
- Sub-system 2 includes signal processing for the captured microphone signal(s). This includes frequency transformations and/or time domain based processing. Furthermore, this includes methods for combining different microphone signals in order to realize field processing. It is possible to feed back from subsystem 4 in order to adapt parameters of the signal processing in subsystem 2.
- the signal processing block of the microphone signal(s) can be part of the device in which the microphone(s) are built or it can be part of a separate device. It can also be part of cloud-based processing.
- sub-system 2 includes signal processing for the recorded tracking data. This includes frequency transforms and/or time domain based processing. It also includes methods to improve the technical quality of the signals using noise reduction, G, Phyg, interpolation and extrapolation. It also includes procedures to derive information from higher levels. This includes speeds, accelerations, travel directions, rest times, movement areas, movement paths. Further, this includes predicting a near-future trajectory and a near-future velocity.
- the signal processing block of the tracking signals can be part of the tracking device or it can be part of a separate device. It can also be part of Cioud-based processing. n ins/the
- the kV ⁇ mass extraction B!ock can be part of the user's wearable device » or it can be I t.» of a separate device. It can also be part of cloud-based processing.
- module 121 can be the result of an audio type classification on sub-system 2, pass module 111 (feedback).
- subsystems 2 and 3 can also implement the signal generator 150, for example by subsystem 2 » module 111 generating the binaural room impulse responses and generating the loudspeaker signals.
- Sub-system 4 includes methods and algorithms to estimate room acoustic parameters using the processed microphone signal(s), the extracted features of the microphone signal(s), and the processed tracking data.
- the output of this block is the room acoustic parameters as rest data and a control and modification of the parameters of the microphone signal processing in subsystem 2.
- the machine learning block 131 can be part of the user's device or it can be part of a separate device. It can also be part of cloud-based processing.
- sub-system 4 includes post-processing of the room-acoustic resting data parameters (for example in block 132). This includes a detection of outliers, a combination of single parameters to a new parameter, smoothing, extrapolation » j. un and plausibility check.
- This block also gets information v ⁇ > , ',> ss terri 2. This includes near-future positions of the user in the space to estimate near-future acoustic parameters m'.
- This block can be part of the user's device or it can be part of a pumped device. It can also be part of cloud-based processing.
- Sub-system 5 includes the storage of the room-acoustic parameters
- Downstream systems (e.g. in memory 141). Allocation of the parameters can be done just-in- ti ' I r 'be celebrated, and/or d ⁇ ' w /erlauf kam i ... e ⁇ > IL nn ' i den.
- the storage can be in the device, since « « you » m l'imm r or near Vintage your Nu iui is located, or be made in a « m ANwl I MH ih n system.
- One use case of an embodiment is home entertainment and relates to users in a home environment.
- the user goes near the target location.
- the user selects the target sound source via a suitable interface, and the hearable adjusts the audio playback based on the user's position, user's line of sight and the target sound source, so that the target sound source can be clearly understood even in the presence of background noise.
- the user moves close to a particularly disruptive sound source.
- the user selects this noise source via a suitable interface, and the hearable (hearing aid) adjusts the audio playback based on the user's position, user's line of sight and the noise source in order to explicitly mask out the noise source.
- E_ lo TM for example in the case of presencelusi Irr cheeky on a> in 'ar as well as other sources of interference: ⁇ H i or attenuate.
- the speakers are randomly distributed and move relative to the listener.
- Noise wi H - 1 Music can be comparatively loud under certain circumstances.
- the selected speaker VVÜU is acoustically highlighted and recognized even after pauses in speaking, changing his position ⁇ the pose.
- a hearable recognizes a speaker in the user's environment.
- the user can use a suitable control option (e.g. line of sight, attention control) to select preferred speakers.
- the hearable adapts the audio playback according to the user's line of sight and the selected target sound source in order to be able to understand the target sound source even with background noise.
- Another application of a further exemplary embodiment is live music and relates to a visitor to a live music event.
- the visitor to a concert or live music performance would like to use the hearable to increase the focus on the performance and block out distracting listeners.
- the audio signal itself can be optimized, for example to compensate for an unfavorable listening position or room acoustics.
- the user selects the stage area or the musician(s) as the target sound source(s).
- the user can use a suitable control option to define the position of the stage/musicians, and the hearable adapts the audio playback to the target sound source according to the user's viewing direction and the selected target sound source to be able to understand well even with background noise.
- warning information e.g. evacuation, impending thunderstorms at open-air events
- warning signals can interrupt the normal process and override the user's selection. The normal process then restarts.
- Hearable can be used to emphasize the voices of family and friends who would otherwise be lost in the noise of the crowds.
- a large event takes place in a stadium or a large concert hall where a large number of visitors go.
- a group family, friends, Scr. K ;use
- One or more children lose eye contact with the group and, despite the high noise level, call out to the group due to the surrounding noise. Then the user turns off voice recognition, and Hearable no longer amplifies the voice(s).
- one person from the group on the Hearable chooses the voice of the missing child.
- the hearable localizes the voice. Then the hearable amplifies the voice and the user can find the missing item again (quicker) using the amplified voice.
- the missing child also wears a hearable, for example, and selects the voice of their parents.
- the hearable amplifies the parents' voice(s). The reinforcement then allows the child to locate its parents. So the child can walk back to his parents.
- the missing child also wears a hearable and selects the voice of their parents. The hearable locates the parent's voice(s) and the hearable announces the distance to the voices. The child can find its parents more easily. An optional playback of an artificial voice from the hearable for the distance announcement is provided.
- the hearables are coupled for a targeted amplification of the voice(s) and voice profiles are stored.
- a further application of a further exemplary embodiment is leisure sports and relates to leisure athletes. Listening to music while exercising is popular, but it also poses risks. Warning signals or other road users may not be heard. In addition to music playback, the hearable can react to warning signals or shouts and temporarily interrupt music playback.
- Another use case in this context is sport in small groups. [ s arables of the sports group can be connected to ensure good communication with each other during the sport while other disturbing noises are suppressed.
- the user is mobile and any warning signals are overlaid by numerous sources of interference.
- the problem is that not all warning signals may affect the user
- the hearable immediately suspends music playback and acoustically highlights the warning signal or the communication partner until the user cancels his selection.
- the music continues to play normally.
- a user does sports and listens to music via Hearabie. Warning signals or shouts affecting the user are automatically recognized and the Hearabie interrupts music playback.
- the Hearabie adjusts the audio playback in order to be able to clearly understand the target scbaiiquelietclie acoustic environment. Then fumblen ⁇ s
- Hearabie auto W M (e.g. after the end of the warning signal » ⁇ : r at the request of the user with the music playback soU ''.
- athletes in a group can connect their hearables, for example.
- the speech intelligibility between the group members is optimized and at the same time other disturbing noises are suppressed.
- Another application of another embodiment is snoring suppression and affects all sleep seekers disturbed by snoring. People whose partners snore, for example, are disturbed in their nightly rest and have problems sleeping.
- the Hearabie provides relief by suppressing the snoring noises, thus ensuring nightly rest and domestic peace. At the same time, the Hearabie allows other noises (crying babies, alarm sirens, etc.) to pass through so that the user is not completely acoustically isolated from the outside world.
- a snoring detection is provided, for example.
- noises such as construction noise, lawn mower noise, etc. can be suppressed while sleeping.
- a r '-er wn Jungsfaii w. ⁇ w ,en Aoshmrungsbeispiels is a diagnostic device for users in everyday life.
- the Hearabie records the preferences (eg: which sound sources, which amplification/attenuation are chosen) and establishes U ⁇ the duration of use For example, if the user uses the device in everyday life or in the use cases mentioned over several months or years, the hearable creates analyzes based on the selected setting and the suggestions and recommendations to the mutzer.
- the user has hearing or attention problems and uses the hearable temporarily/transitionally as a hearing aid.
- this is reduced by the hearable, for example by: amplification of all signals (hearing impairment), high selectivity for preferred sound sources (attention deficits), reproduction of therapy noises (tinnitus treatment).
- a person switches on the attached hearable.
- the user sets the hearable to select nearby voices, and the hearable amplifies the closest voice or a few nearby voices while suppressing background noise.
- the user understands the relevant voice(s) better.
- the hearable detects that a passenger is actively addressing the driver and temporarily disables noise cancellation.
- Another use case of another embodiment is in the military and pertains to soldiers.
- Verbal communication between soldiers on deployment takes place on the one hand via radios and on the other hand via shouts and direct addressing.
- Radio is mostly used when greater distances have to be bridged and when communication between different units and subgroups is to be carried out.
- a fixed radio etiquette is often applied.
- Shouting and direct addressing is mostly used for communication within a squad or group. Difficult acoustic conditions can arise during the deployment of soldiers (e.g. screaming people, noise from weapons, storms), which can impair both communication channels.
- a soldier's equipment often includes a radio set with earphones. In addition to the purpose of audio reproduction, these also protect against excessive sound pressure levels.
- the hearable recognizes sound sources with potential sources of danger.
- a security officer chooses which sound source or event he would like to investigate (e.g. by selecting it on a tablet).
- the hearable then adjusts the audio playback in order to be able to understand and locate the target sound source even with background noise.
- hearing aid users find it difficult to separate different sources in a complex PiWvun situation, for example to focus on to place a specific speaker
- external additional systems e.g.
- the hearing device user buys the additional function for selective listening as software or the like for his own hearing device.
- the user installs the add-on feature for their hearing aid.
- the user sets the selective listening function on the hearing aid.
- the user selects a profile (amplify loudest/closest source, amplify voice recognition of specific voices from their personal environment (like the UC-CE5 at major events), and the hearing aid amplifies the source(s) according to the set profile, while suppressing background noise if necessary.
- the hearing aid user hears individual sources from the complex auditory scene instead of just a "noise muddle" from acoustic sources.
- the hearable can provide voice profiles that can be stored.
- the user may be very mobile and the nature of the noise depends on the sport. Due to the intense sporting activity, the athlete is not able to control the device actively or only to a limited extent. However, in most sports there is a fixed procedure (biathlon: running, shooting) and the important discussion partners (coaches, team members) can be defined in advance. Noise is suppressed in general or in certain phases of the sport. Communication between athletes and team members and coaches is always emphasized.
- the athlete uses a hearable specially adapted to the sport.
- the Hearable suppresses background noise fully automatically (preset), especially in situations where a high degree of attention is required for the sport in question.
- the Hearable automatically highlights coaches and team members when they are within hearing range.
- the user turns on the hearable. He selects the desired musical instrument to be suppressed. When listening to the song, the voice(s) of the selected song will be muted so that only the remaining voices can be heard. The user can then practice the voice on their own instrument with the other voices without being distracted by the voice from the recording.
- Live translator software module and affects users of a live translator.
- Live translators translate spoken foreign languages in real time and can benefit from an upstream source separation software module.
- the software module can extract the target speaker and potentially improve the translation.
- a further application of another exemplary embodiment is working skis for emergency services and relates to the fire brigade, THW, possibly the police. rescuers.
- emergency services good communication is essential for a successful deployment. It is often not possible for the emergency services to wear hearing protection despite loud ambient noise, since then no communication with each other is possible. For example, firefighters must be able to communicate and understand commands precisely despite the loud engine noise, some of which is happening over radios. For this reason, emergency services are exposed to a high level of noise pollution, where the Hearing Protection Ordinance cannot be implemented. On the one hand, a hearable would offer hearing protection for the emergency services and, on the other hand, would continue to enable communication between the emergency services. Other points are that the emergency services with the help of the Hearabie when wearing helmets/protective equipment are acoustically un > ⁇ ⁇ >>i w ⁇ !t and thus himself
- the user wears the hearable during an operation. He turns on the Hearable, and the Hearable blocks out ambient noise and amplifies co-workers' speech over the radio.
- the digital storage medium can be computer-readable.
- some embodiments according to the invention comprise a data carrier having electronically readable control signals capable of interacting with a programmable computer system in such a way that one of the methods described herein is carried out.
- embodiments of the present invention can be implemented as a computer program product with a program code, wherein the program code is effective to perform one of the methods when the computer program product runs on a computer.
- the program code can also be stored on a machine-readable carrier, for example.
- exemplary embodiments include the computer program for performing one of the methods described herein, the computer program being stored on a machine-readable carrier.
- an exemplary embodiment of the method according to the invention is therefore a computer program that has a program code for performing one of the methods described herein when the computer program runs on a computer.
- Data carrier (or a dsg V.wu b f storage medium or a computer-readable kt Jtu,n) on which the computer program for performing any of the procedures described herein is recorded* PRG data carrier or the digital storage ledium or the computer-readable ⁇ . m are typically tangible and/or non-ephemeral.
- a further exemplary embodiment of the method according to the invention is therefore a data stream or a sequence of signals which represents the computer program for carrying out one of the methods described herein.
- the Another embodiment includes a processing device, such as a computer or programmable logic device, configured or adapted to perform any of the methods described herein.
- Another embodiment includes a computer on which the computer program for performing one of the methods described herein is installed.
- a further exemplary embodiment according to the invention comprises a device or a system which is designed to transmit a computer program for carrying out at least one of the methods described herein to a recipient.
- the transmission can take place electronically or optically, for example.
- the recipient may be a computer, mobile device, storage device, or similar device.
- the device or the system can, for example, comprise a file server for transmission of the computer program to the recipient.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
- Headphones And Earphones (AREA)
- Stereophonic Arrangements (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
La présente invention concerne un système. Ledit système comprend un analyseur (152) pour déterminer une pluralité de réponses impulsionnelles binaurales de salle et un générateur de signaux de haut-parleur (154) pour générer au moins deux signaux de haut-parleur en fonction de la pluralité de réponses impulsionnelles binaurales de salle et en fonction du signal de source audio d'au moins une source audio. L'analyseur (152) est conçu pour déterminer la pluralité de réponses impulsionnelles binaurales de salle, de telle sorte que chacune des réponses impulsionnelles binaurales de salle tient compte d'un effet résultant du port d'un casque d'écoute par un utilisateur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20188945.8A EP3945729A1 (fr) | 2020-07-31 | 2020-07-31 | Système et procédé d'égalisation de casque d'écoute et d'adaptation spatiale pour la représentation binaurale en réalité augmentée |
PCT/EP2021/071151 WO2022023417A2 (fr) | 2020-07-31 | 2021-07-28 | Système et procédé d'égalisation de casque d'écoute et d'adaptation à la salle pour une restitution binaurale en réalité augmentée |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4189974A2 true EP4189974A2 (fr) | 2023-06-07 |
Family
ID=71899608
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20188945.8A Withdrawn EP3945729A1 (fr) | 2020-07-31 | 2020-07-31 | Système et procédé d'égalisation de casque d'écoute et d'adaptation spatiale pour la représentation binaurale en réalité augmentée |
EP21751796.0A Pending EP4189974A2 (fr) | 2020-07-31 | 2021-07-28 | Système et procédé d'égalisation de casque d'écoute et d'adaptation à la salle pour une restitution binaurale en réalité augmentée |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20188945.8A Withdrawn EP3945729A1 (fr) | 2020-07-31 | 2020-07-31 | Système et procédé d'égalisation de casque d'écoute et d'adaptation spatiale pour la représentation binaurale en réalité augmentée |
Country Status (4)
Country | Link |
---|---|
US (1) | US20230164509A1 (fr) |
EP (2) | EP3945729A1 (fr) |
JP (1) | JP2023536270A (fr) |
WO (1) | WO2022023417A2 (fr) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115938376A (zh) * | 2021-08-06 | 2023-04-07 | Jvc建伍株式会社 | 处理装置和处理方法 |
US20230199420A1 (en) * | 2021-12-20 | 2023-06-22 | Sony Interactive Entertainment Inc. | Real-world room acoustics, and rendering virtual objects into a room that produce virtual acoustics based on real world objects in the room |
US20230283950A1 (en) * | 2022-03-07 | 2023-09-07 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Sound Event Localization and Detection |
WO2023208333A1 (fr) * | 2022-04-27 | 2023-11-02 | Huawei Technologies Co., Ltd. | Dispositifs et procédés de rendu audio binauriculaire |
US20240179487A1 (en) | 2022-11-28 | 2024-05-30 | Treble Technologies | Methods and systems for generating acoustic impulse responses |
US12063491B1 (en) | 2023-09-05 | 2024-08-13 | Treble Technologies | Systems and methods for generating device-related transfer functions and device-specific room impulse responses |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9716939B2 (en) | 2014-01-06 | 2017-07-25 | Harman International Industries, Inc. | System and method for user controllable auditory environment customization |
DE102014210215A1 (de) * | 2014-05-28 | 2015-12-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Ermittlung und Nutzung hörraumoptimierter Übertragungsfunktionen |
US10409548B2 (en) * | 2016-09-27 | 2019-09-10 | Grabango Co. | System and method for differentially locating and modifying audio sources |
-
2020
- 2020-07-31 EP EP20188945.8A patent/EP3945729A1/fr not_active Withdrawn
-
2021
- 2021-07-28 WO PCT/EP2021/071151 patent/WO2022023417A2/fr active Application Filing
- 2021-07-28 EP EP21751796.0A patent/EP4189974A2/fr active Pending
- 2021-07-28 JP JP2023506248A patent/JP2023536270A/ja active Pending
-
2023
- 2023-01-24 US US18/158,724 patent/US20230164509A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2022023417A3 (fr) | 2022-03-24 |
JP2023536270A (ja) | 2023-08-24 |
EP3945729A1 (fr) | 2022-02-02 |
US20230164509A1 (en) | 2023-05-25 |
WO2022023417A2 (fr) | 2022-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021023667A1 (fr) | Système et procédé d'aide à l'audition sélective | |
EP4189974A2 (fr) | Système et procédé d'égalisation de casque d'écoute et d'adaptation à la salle pour une restitution binaurale en réalité augmentée | |
Gabbay et al. | Visual speech enhancement | |
Wang | Time-frequency masking for speech separation and its potential for hearing aid design | |
Darwin | Listening to speech in the presence of other sounds | |
Arons | A review of the cocktail party effect | |
Blauert | Communication acoustics | |
DE102007048973B4 (de) | Vorrichtung und Verfahren zum Erzeugen eines Multikanalsignals mit einer Sprachsignalverarbeitung | |
CN110517705B (zh) | 一种基于深度神经网络和卷积神经网络的双耳声源定位方法和系统 | |
US10825353B2 (en) | Device for enhancement of language processing in autism spectrum disorders through modifying the auditory stream including an acoustic stimulus to reduce an acoustic detail characteristic while preserving a lexicality of the acoustics stimulus | |
Bressler et al. | Bottom-up influences of voice continuity in focusing selective auditory attention | |
Marxer et al. | The impact of the Lombard effect on audio and visual speech recognition systems | |
CN107210032A (zh) | 在掩蔽语音区域中掩蔽再现语音的语音再现设备 | |
CN112352441B (zh) | 增强型环境意识系统 | |
CN103325383A (zh) | 音频处理方法和音频处理设备 | |
Hummersone | A psychoacoustic engineering approach to machine sound source separation in reverberant environments | |
EP3216235B1 (fr) | Appareil et procédé de traitement de signal audio | |
Keshavarzi et al. | Use of a deep recurrent neural network to reduce wind noise: Effects on judged speech intelligibility and sound quality | |
Josupeit et al. | Modeling speech localization, talker identification, and word recognition in a multi-talker setting | |
Abel et al. | Novel two-stage audiovisual speech filtering in noisy environments | |
Somayazulu et al. | Self-supervised visual acoustic matching | |
US12073844B2 (en) | Audio-visual hearing aid | |
CN111009259B (zh) | 一种音频处理方法和装置 | |
CN113347551B (zh) | 一种单声道音频信号的处理方法、装置及可读存储介质 | |
US20240365081A1 (en) | System and method for assisting selective hearing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230124 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) |