WO2012097150A1 - Automotive sound recognition system for enhanced situation awareness - Google Patents

Automotive sound recognition system for enhanced situation awareness Download PDF

Info

Publication number
WO2012097150A1
WO2012097150A1 PCT/US2012/021077 US2012021077W WO2012097150A1 WO 2012097150 A1 WO2012097150 A1 WO 2012097150A1 US 2012021077 W US2012021077 W US 2012021077W WO 2012097150 A1 WO2012097150 A1 WO 2012097150A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
vehicle
warning
ambient
processor
Prior art date
Application number
PCT/US2012/021077
Other languages
French (fr)
Inventor
John Usher
Steven W. Goldstein
John G. Casali
Original Assignee
Personics Holdings, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Personics Holdings, Inc. filed Critical Personics Holdings, Inc.
Publication of WO2012097150A1 publication Critical patent/WO2012097150A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices

Definitions

  • the present invention relates to a device that monitors sound directed to a vehicle cabin, and more particularly, though not exclusively, to an audio system and method that detects ambient warning sounds and adjusts audio delivered to a vehicle cabin based on the detected warning sounds to enhance auditory situation awareness.
  • a sound recognition system includes at least one ambient sound microphone (ASM), at least one vehicle cabin receiver (VCR) and a processor.
  • the ASM is disposed on the vehicle and configured to capture ambient sound external to the vehicle.
  • the VCR is configured to deliver audio content to a vehicle cabin of the vehicle.
  • the processor coupled to the at least one ASM and the at least one VCR.
  • the processor is configured to detect at least one sound signature in the ambient sound and to adjust the audio content delivered to the vehicle cabin based on the detected at least one sound signature.
  • aspects of the present invention also relate to methods for increasing auditory situation awareness in a vehicle.
  • the method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; monitoring the ambient sound for a target sound by detecting a sound signature corresponding to the target sound in the ambient sound ; and adjusting a delivery of audio content by at least one vehicle cabin receiver (VCR) to a vehicle cabin of the vehicle based on the target sound.
  • ASM ambient sound microphone
  • aspects of the present invention further relate to methods for sound signature detection for a vehicle.
  • the method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; and receiving a directive to learn a sound signature within the ambient sound .
  • a voice command or an indication from a user is received and is used to initiate the steps of capturing and learning.
  • aspects of the present invention also relate to methods for personalized listening in a vehicle.
  • the method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; detecting a sound signature within the ambient sound that is associated with a warning sound; and mixing the warning sound with audio content delivered to the vehicle cabin via at least one vehicle cabin receiver (VCR) in accordance with a priority of the warning sound and a personalized hearing level (PHL) .
  • ASM ambient sound microphone
  • VCR vehicle cabin receiver
  • FIG. 1 is a pictorial diagram of a vehicle including an exemplary automotive sound recognition system for enhanced situation awareness in accordance with an embodiment of the present invention ;
  • FIG. 2 is a block diagram of the system shown in FIG. 1 in accordance with an exemplary embodiment of the present invention ;
  • FIG. 3 is a flowchart of an exemplary method for ambient sound monitoring and warning detection in accordance with an embodiment of the present invention ;
  • FIG. 4 illustrates various system modes in accordance with an exemplary embodiment of the present invention
  • FIG. 5 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention .
  • FIG. 6 is a flowchart of an exemplary method for managing audio delivery based on detected sound signatures in accordance with an embodiment of the present invention ;
  • FIG. 7 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention ;
  • FIG. 8 is a pictorial diagram for mixing ambient sounds and warning sounds with audio content in accordance with an exemplary embodiment of the present invention.
  • FIG. 9 is a flowchart of an exemplary method for updating the sound signature detection library dependent on the vehicle location in accordance with an embodiment of the present invention .
  • any specific values for example the sound pressure level change, should be interpreted to be illustrative only and non-limiting .
  • other examples of the exemplary embodiments could have different values.
  • Automotive vehicle operators are often auditorially removed from their external ambient environment. Ambient sound cues such as from oncoming emergency (and non-emergency) vehicle sound alerts a re often not heard by the vehicle operator due to acoustic isolation of the vehicle cabin and internal cabin noise from engine and road noise and, especially, due to loud music and speech reproduction levels in the vehicle cabin.
  • Automotive vehicle operators are often auditorially removed from their external ambient environment. For example, high sound isolation from the external environment by be provided by cabin structural insu lation, close-fitting window seals and thick or double paned glass. Ambient sound cues (from external acoustic signals), such as oncoming emergency (and non-emergency) vehicle sound alerts; vocal messages from pedestrians; and sounds generated by the operator's own vehicle may often not be heard by the vehicle operator.
  • the reduced "situation awareness" of the vehicle operator may be a consequence of at least two principle factors.
  • One factor includes acoustic isolation of the vehicle cabin (e.g., from the vehicle windows and structural isolation).
  • a second factor includes sound masking .
  • the sound masking may include masking from internal cabin noise (such as from engine and road noise) and masking from loud music reproduction levels within the vehicle.
  • the masking effect may be further compounded with telephone communications, where the vehicles operator's attention may be further distracted by the conversation.
  • Telephone conversation thus, may introduce an additional cognitive load that may further reduces the vehicle operator's auditory situation awareness of the vehicle surroundings.
  • the reduction of the situation awareness of the vehicle operator may lead to danger. For example, a personal safety of the vehicle operator may be reduced. In addition, personal safety of other vehicle operators and pedestrians in the vicinity of the vehicle may also be threatened.
  • One definition of situation awareness includes "the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the nea r future" . While some definitions are specific to the environment from which they were adapted, the above definition may be applicable across multiple task domains from visual to auditory modalities.
  • a method and system is herein disclosed to address this problem of reduced auditory situation awareness of vehicle operators.
  • ambient warning sounds in the vicinity of a vehicle may be automatically detected and may be actively reproduced in the vehicle cabin, to inform the vehicle operator of detected sounds.
  • a library of known warning sounds may be acquired automatically based on the vehicle's location.
  • Personal safety of the vehicle operator and passengers in the vehicle may thereby enhanced by exemplary systems and methods of the present invention, as described herein. Accordingly, the safety of other vehicles (such as oncoming emergency vehicles, other motorists, and pedestrians) may also be increased.
  • the safety benefit comes not only from the enhanced auditory situation awareness, but also via reduced driver workload.
  • the system may reduce the burden on the driver to constantly visually scan the environment for emergency vehicles or other dangers that may also have recognizable acoustical signatures (that may ordinarily be inaudible inside the vehicle cabin).
  • One focus of the present invention is to enhance (i.e., increase) the auditory situation awareness of a typical vehicle operator and, thereby, improve the personal safety of the vehicle operator, and other motorists and pedestrians.
  • ASRS 100 may include user interface 106, central audio processor system 114 (also referred to herein as processor 114), indicator 116 and at least one loudspeaker (for example, right loudspeaker 112 and left loudspeaker 120) (also referred to herein as vehicle cabin receivers (VCRs) 112, 120).
  • ASRS 100 may also include one or more ambient microphones (for example, right microphone 104, front microphone 108, rear microphone 110 and left microphone 122) for capturing ambient sound external to vehicle 102.
  • ASRS 100 may also include at least one vehicle cabin microphone (VCM) 118 for capturing sound within vehicle cabin 126.
  • VCM vehicle cabin microphone
  • Processor 114 may be coupled to one or more of user interface 106, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122.
  • Processor 114 may be configured to control acquisition of ambient sound signals from ambient microphones 104, 108, 110, 122 and (optionally) a cabin sound signal from VCM 118.
  • Processor 114 may be configured analyze ambient and/or cabin sound signals, and to present information by system 100 to vehicle operator 124 (such as via VCRs 112, 120 and/or indicator 116) responsive to the analysis.
  • processor 114 may be configured to receive AC signal 107 and reproduce AC signal 107 through VCRs 112, 120 into vehicle cabin 126.
  • Processor 114 may also be configured to receive ambient sound signals from respective ambient microphones 104, 108, 110, 122.
  • Processor 114 may also be configured to receive a cabin sound signal from VCM 118.
  • processor 114 may mix the ambient sound signal from at least one of ambient microphones 104, 108, 110, 122 with AC signal 107.
  • the mixed signal may be output to VCRs 112, 120.
  • acoustic cues in the ambient signal (such as an ambulance siren, a vocal warning from a pedestrian, or a vehicle malfunction sound) may be passed into vehicle cabin 126, thereby providing detectable and spatial localization cues for vehicle operator 124.
  • AC signal 107 may include any audio signal provided to (and/or generated by) processor 114 that may be reproduced through VCRs 112, 120.
  • AC signal 107 may correspond to (without being limited to) at least one of the following exemplary signals: a music or voice audio signal from a music audio source (for example, a radio, a portable media player, a computing device) ; voice audio (for example, from a telephone, a radio device or an occupant of vehicle 102); or an audio warning signal automatically generated by vehicle 102 (for example, in response to a backup proximity sensor, an unbelted passenger restraint, an engine malfunction condition, or other audio alert signals).
  • a music or voice audio signal from a music audio source (for example, a radio, a portable media player, a computing device) ; voice audio (for example, from a telephone, a radio device or an occupant of vehicle 102); or an audio warning signal automatically generated by vehicle 102 (for example, in response to a backup proximity sensor, an unbelted passenger restraint,
  • ASRS 100 may include more or fewer loudspeakers.
  • ASRS 100 may have more than two loudspeakers for right, left, front and back balance of sound in vehicle cabin 126.
  • ASRS 100 may include five loudspeakers (and a subwoofer) for 5.1 channel surround sound. It is understood that, in general, ASRS 100 may include one or more loudspeakers.
  • User interface 106 may include any suitable user interface capable of providing parameters for one or more of processor 114, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122.
  • User interface 106 may include, for example, one or more buttons, a pointing device, a keyboard and/or a display device.
  • Processor 114 may also issue alerts to vehicle operator 124, for example, via indicator 116.
  • Indicator 116 may provide alerts via a visual indication, an auditory indication (such as a tonal alert) and/or a haptic indication.
  • Indicator 116 may include any suitable indicator such as (without being limited to) : a display (such as a heads-up display), a loudspeaker or a haptic transducer (for example, mounted in the vehicle's steering wheel or operator seat).
  • processor 114 may also use ambient microphones 104, 108, 110, 122 and/or VCM 118 and VCRs 112, 120 to cancel a background noise component (such as road noise) in vehicle cabin 126.
  • a background noise component such as road noise
  • the noise cancellation may be centered at the position of vehicle operator 124.
  • Ambient microphones 104, 108, 110, 122 may be positioned on vehicle 102 (for example, on an exterior of vehicle 102 or a ny other suitable location) such that ambient microphones 104, 108, 110, 122 may transduce sound that is external to vehicle 102.
  • ambient microphones 104, 108, 110, 122 may be configured to detect specific sounds in a vicinity of vehicle 102.
  • system 100 may include any number of microphones and at least one ambient sound microphone.
  • An ambient sound signal (from one or more of ambient microphones 104, 108, 110, 122) may also be mixed with AC signal 107 before being presented through at least one cabin loudspeaker 112, 120.
  • processor 114 may determine a sound pressure level (SPL) of vehicle cabin 126 (referred to herein as the cabin SPL) by analyzing a signal level and signal gain reproduced with at least one of loudspeakers 112, 120, and the sensitivity of respective loudspeakers 112, 120.
  • processor 114 may determine the cabin SPL via VCM 118.
  • VCM 118 may allow consideration of other sound sources in vehicle cabin 126 (i.e., other than sound sources contributed by loudspeakers 112, 120), such as an air conditioning system, and sound from other passengers in vehicle 102.
  • ASRS 100 may be coupled to a remote location (not shown), for example, by wireless communication. Information collected by ASRS 100 may be provided to the remote location (such as for further analysis) .
  • ASRS 100 can include processor 114 operatively coupled to the Ambient Sound Microphone (ASM) 201, one or more VCRs 112, 120, and VCM 118 via one or more Analog to Digital Converters (ADC) 202 and Digital to Analog Converters (DAC) 203.
  • ASM 201 may represent one or more of ambient microphones 104, 108, 110, 122 shown in FIG. 1.
  • ASRS 100 can include an audio interface 212 operatively coupled to the processor 114 to receive AC signal 107 (for example, from a media player, a cell phone, voice mail), and deliver the AC signal 107 to the processor 114.
  • AC signal 107 for example, from a media player, a cell phone, voice mail
  • the processor 114 may include sound signature detection block 214 and may monitor the ambient sound captured by the ASM 201 for warning sounds in the environment, such as an alarm (e.g ., bell, emergency vehicle, security system, etc.), siren (e.g., police car, ambulance, etc.) , voice (e.g., "help", "stop", “police”, etc.), or specific noise type (e.g., breaking glass, gunshot, etc.) .
  • the memory 208 can store sound signatures for previously learned wa rning sounds to which the processor 114 refers for detecting warning sounds.
  • the sound signatures can be resident in the Q memory 208 or downloaded to processor 114 via the transceiver 204 during operation as needed .
  • the processor 114 can report the warning to the vehicle operator 124 (also referred to herein as user 124) via audio delivered from the VCRs 112, 120 to the vehicle cabin.
  • the processor 114 responsive to detecting warning sounds can adjust the audio content signal 107 and the warning sounds delivered to the vehicle cabin 126.
  • the processor 114 can actively monitor the sound exposure level inside the vehicle cabin 126 and adjust the audio to within a safe and subjectively optimized listening level range.
  • the processor 114 can utilize computing technologies such as a microprocessor,
  • ASIC Application Specific Integrated Chip
  • DSP digital signal processor
  • storage memory 208 such as Flash, ROM, RAM, SRAM, DRAM or other like technologies for controlling operations.
  • the ASRS 100 can further include a transceiver 204 that can support singly or in combination any number of wireless access technologies including without limitation BluetoothTM, Wireless Fidelity (Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX), and/or other short or long range communication protocols.
  • the transceiver 204 can also provide support for dynamic downloading over-the-air to the ASRS 100. It should be noted also that next generation access technologies can also be applied to the present disclosure.
  • the power supply 210 can utilize common power management technologies such as replaceable batteries, supply regulation technologies, and charging system technologies for supplying energy to the components of the ASRS 100 and to facilitate portable applications.
  • a motor (not shown) can be a single supply motor driver coupled to the power supply 210 to improve sensory input via haptic vibration (for example, via indicator 116 (FIG. 1) configured as a haptic indicator), e.g. connected to the vehicle steering wheel or vehicle operator chair.
  • the processor 114 can direct the motor to vibrate or pulse responsive to an action, such as a detection of a warning sound or an incoming voice call.
  • FIG. 3 is a flowchart of a method 300 for vehicle ambient sound monitoring and warning detection in accordance with an exemplary embodiment.
  • the method 300 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 300, reference will be made to components of FIG 2 , although it is understood that the method 300 can be implemented in any other manner using other suitable components.
  • the processor 114 can monitor the environment for warning sounds, such as an alarm, a horn, a voice, or a noise.
  • Each of the warning sounds can have certain identifiable features that characterize the sound.
  • the features can be collectively referred to as a sound signature which can be used for recognizing the warning sound.
  • the sound signature may include statistical properties or parametric properties of the warning sound.
  • a sound signature can describe prominent frequencies with associated amplitude and phase information.
  • the sound signature can contain principal components identifying the most likely recognizable features of a warning sound.
  • the processor 114 may detect the warning sounds within the environment based on the sound signatures.
  • feature extraction techniques may be applied to the ambient sound captured at the ASM 201 to generate the sound signatures.
  • Pattern recognition approaches may be applied based on known sound signatures to detect the warning sounds from their corresponding sound signatures. More specifically, sound signatures may be compared to learned models to identify a corresponding warning sound.
  • the processor 114 may adjust sound delivered to the vehicle cabin 126 in view of a detected warning sound. Upon detecting a warning sound in the ambient sound of the user's environment, the processor 114, at step 308, may generate an audible alarm within the vehicle cabin 126 that identifies the detected sound signature.
  • the audible alarm can be a reproduction of the warning sound, an
  • processor 114 can generate a sound bite (i.e., audio clip) corresponding to the detected warning sound such as an ambulance, fire engine, or other environmental sound.
  • processor 114 can synthesize a voice to describe the detected warning sound (e.g., "ambulance approaching").
  • processor 114 may send a message to a mobile device identifying the detected sound signature (e.g., "alarm sounding").
  • FIG. 4 illustrates system modes of ASRS 100 in accordance with an exemplary embodiment.
  • the system mode may be manually selected by user 124, for example, by pressing a button; or automatically selected, for example, when the processor 114 detects it is in an active listen state or in a media state.
  • the system mode can correspond to Signature Sound Pass Through Mode (SSPTM), Signature Sound Boost Mode (SSBM), Signature Sound Rejection Mode (SSRJM), Signature Sound Attenuation Mode (SSAM), and Signature Sound Replacement Mode (SSRM).
  • SSPTM Signature Sound Pass Through Mode
  • SSBM Signature Sound Boost Mode
  • SSRJM Signature Sound Rejection Mode
  • SSAM Signature Sound Attenuation Mode
  • SSRM Signature Sound Replacement Mode
  • SSPTM mode ambient sound captured at the ASM 201 is passed transparently to the VCRs 112, 120 for reproduction within the vehicle cabin 126.
  • the sound produced in the vehicle cabin 126 sufficiently matches the ambient sound outside the vehicle cabin 126, thereby providing a "transparency" effect. That is, the loudspeakers 112, 120 in the vehicle cabin 126 recreate the sound captured at the ASM 201.
  • the processor 114 by way of sound measured at the VCM 118, may adjust the properties of sound delivered to the vehicle cabin 126 so the sound within the occluded vehicle cabin 126 is the same as the ambient sound outside the vehicle 102.
  • warning sounds and/or ambient sounds are amplified upon the processor 114 detecting a warning sound.
  • the warning sound can be amplified relative to the normal level received, or amplified above an audio content level if audio content is being delivered to the vehicle cabin 126.
  • SSRJM mode sounds other than warning sounds may be rejected upon the processor 114 detecting a specific sound signature.
  • the specific sound can be minimized relative to the normal level received.
  • SSAM mode sounds other than warning sounds can be attenuated.
  • annoying sounds or noises not associated with warning sounds can be suppressed.
  • the user 124 can establish which sounds are considered warning sounds (e.g ., "ambulance") and which sounds are considered non-warning sounds (e.g. "jackhammer").
  • the processor 114 upon detecting non-warning sounds can thus attenuate or reject these sounds within the vehicle cabin 126.
  • warning sounds detected in the environment can be replaced with audible warning messages.
  • the processor 114 upon detecting a warning sound, can generate synthetic speech identifying the warning sound (e.g., "ambulance detected") .
  • the processor 114 may audibly report the warning sound identified, thereby relieving the user 124 from having to interpret the warning sound.
  • the synthetic speech can be mixed with the ambient sound (e.g., amplified, attenuated, cropped, etc.), or played alone with the ambient sound muted .
  • FIG. 5 is a flowchart of a method 500 for sound signature detection in accordance with an exemplary embodiment.
  • the method 500 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 500, reference will be made to components of FIG 2 , although it is understood that the method 500 can be implemented in any other manner using other suitable components.
  • the method can start at step 502, in which the processor 114 can enter a learn mode.
  • the processor 114 upon completion of a learning mode or previous learning configuration can start instead at step 520.
  • the processor 114 can actively generate and learn sound signatures from ambient sounds within the environment.
  • the processor 114 can also receive 1077
  • the user 124 can press a button or otherwise (e.g . voice recognition) initiate a recording of ambient sou nds in the environ ment.
  • the user can, upon hearing a new warning sound in the environment ("car horn"), activate the processor 114 to learn the new wa rning sound .
  • the processor 114 upon generating a sound signature for the new warning sound, it can be stored in the user defined database 504.
  • the processor 114 upon detecting a unique sou nd, cha racteristic to a warning sou nd, can ask the user 124 if they desire to have the sou nd signature for the unique sound learned .
  • the processor 114 may actively sense sounds and may query the user 124 about the environment to learn the sounds.
  • the processor 114 can organize learned sounds based on
  • ASRS 100 may provide for delayed recording, to allow a previously encountered sound to be learned .
  • ASRS 100 may include a buffer to store ambient sounds recorded for a period of time.
  • User 124 may review the recorded ambient sounds and select one or more of these recorded ambient sounds for learning (such as via user interface 106 (FIG . 1 )) .
  • trained models can be retrieved from an on-line database 506 for use in detecting warning sounds.
  • the previously learned models can be transmitted on a scheduled basis to the processor 114, or as needed , depending on the environmental context. For example, upon the processor 114, upon detecting traffic noise, may retrieve sound sig nature models associated with warning sounds (e.g . , ambulance, police car) in traffic. In another embodiment, upon the processor 114 detecting conversational noise (e .g . people talki ng), sound signature models for verbal warnings ("help", "pol ice”) may be retrieved . Groups of sound signature models may be retrieved based on the environmental context or on user directed action .
  • the ASRS processor 114 can also generate speech recog nition models for warning sounds corresponding to voice, such as " help", " police", "fire”, etc.
  • the speech recog nition models may be retrieved from the on-line database 506 or the user defined database 504.
  • the user 124 may say a word or enter a text version of a word to associate the word with a verbal warning sound .
  • the user 124 may define a set of words of interest along with mappings to their meanings, and then use keyword spotting to detect their occurrences. If the user 124 enters an environment wherein another i ndividual says the sa me word (e.g . , " help") the processor 114 may inform the user 124 of the verbal warning sound .
  • U 2012/021077 U 2012/021077
  • the processor 114 may generate sound signature models as shown in step 510.
  • the processor 114 itself may generate the sound signature models, or transmit the captured warning sounds to external systems (e.g . , a remote server) that generate the sound signature models.
  • external systems e.g . , a remote server
  • Such learning may be conducted off-line in a training phase, and the processor 114 can be uploaded with the new learning models.
  • the learning models can be updated during use of the AS S 100, for example, when the processor 114 detects warning sounds.
  • the detected warning sounds can be used to adapt the learning models as new warning sound variants are encountered .
  • the processor 114 upon detecting a warning sound, can use the sound signature of the warning sound to u pdate the learned models in accordance with the training phase.
  • a first learned model is adapted based on new training data collected in the environment by the processor 114.
  • a new set of "horn" warning sounds could be included in real-time training without discarding the other "horn” sounds already captured in the existing model.
  • the processor 114 can monitor and report warning sounds within the
  • ambient sounds e.g . an input signal
  • the ambient sounds can be digitized by way of the ADC 202 and stored temporarily to a data buffer in memory 208 as shown in step 522.
  • the data buffer may be capable of holding enough data to allow for generation of a sound signature as will be described ahead in FIG. 7.
  • the processor 114 can implement a "look ahead" analysis system by way of the data buffer for reproduction of pre-recorded audio content, using a data buffer to offset the reproduction of the audio signal.
  • the look- ahead system allows the processor 114 to analyze potentially harmful audio artifacts (e.g. high level onsets, bursts, etc.) either received from an external media device, or detected with the ambient microphones 201, in-situ before it is reproduced .
  • the processor 114 can thus mitigate the audio artifacts in advance to reduce timbral distortion effects caused by, for instance, attenuating high level transients.
  • signal conditioning techniques may be applied to the ambient sound, for example, to suppress noise or gate the noise to a predetermined threshold.
  • Other signal processing steps such as threshold detection shown in step 526 may be used to determine whether ambient sounds should be evaluated for warning sounds. For instance, to conserve computational processing resources (e.g., battery, processor), only ambient sounds that exceed a predetermined power level may be evaluated for warning sounds.
  • Other metrics such as signal spectrum, duration, and stationarity may be considered in determining whether the ambient sound is analyzed for warning sounds.
  • other metrics e.g., context aware
  • the processor 114 at step 530 can proceed to generate a sound signature for the ambient sound .
  • the sound signature is a feature vector which can include statistical parameters or salient features of the ambient sound.
  • An ambient sound with a warning sound (e.g . "bell”, “siren"), such as shown in step 532, is generally expected to exhibit features similar to sound signatures for similar warning sounds (e.g. "bell”, “siren") stored in the user defined database 504 or the on-line database 506.
  • the processor 114 can also identify a direction and speed of the sound source if it is moving, for example, by evaluating Doppler shift as shown in step 534 and 536.
  • the processor 114 by way of beam-forming among multiple ASMs 201 may also estimate a direction of a sound source generating the warning sound.
  • the speed and bearing of the sound source can also be estimated using pitch analysis to detect changes predicted by Doppler effect, or alternatively by an analysis in changes in relative phase and magnitude between the two ASM signals.
  • the processor 114 by way of a sound recognition engine, may detect general warning signals such as car horns or emergency sirens (and other signals referenced by ISO 7731) using spectral and temporal analysis.
  • the processor 114 can also analyze the ambient sound to determine if a verbal warning (e.g. "help”, "police”, “excuse me”) is present.
  • a verbal warning e.g. "help", "police”, "excuse me”
  • the sound signature of the ambient sound can be analyzed for speech content.
  • the sound signature may be analyzed for voice information, such as vocal cord pitch periodicities, time-varying voice formant envelopes, or other articulation parameter attributes.
  • the processor 114 can perform key word detection (e.g . "help") in the spoken content as shown in step 542.
  • Speech recognition models as well as language models may be used to identify key words in the spoken content.
  • the user 124 may say or enter in one or more warning sounds that may be mapped to associated learning models for sound signature detection.
  • the user 124 may also provide user input to direct operation of the processor 114, for example, to select an operational mode as shown in 550.
  • the operational mode can enable, disable or adj ust monitoring of warning sounds.
  • the processor 114 may mix audio content with ambient sound while monitoring for warning sounds.
  • the processor 114 may suppress or attempt to actively cancel all noises except detected warning sounds.
  • the user input may be in the form of a physical interaction (e.g., button press) or a vocalization (e.g., spoken command).
  • the operating mode can also be controlled by a prioritizing module as shown in step 554.
  • the prioritizing module may prioritize warning sounds based on severity and context. For example, if the user 124 is in a phone call, and a warning sound is detected, the processor 114 may audibly inform the user 124 of the warning and/or present a text message of the warning sound. If the user 124 is listening to music or a voice communication, and a warning sound is detected, the processor 114 may automatically shut off the music or voice audio and alert the user.
  • the user 124 by way of user interface 106 (FIG. 1) or an administrator, may rank warning sounds and instruct the processor 114 how to respond to warnings in various contexts.
  • FIG. 6 is a flowchart of a method 600 for managing audio delivery based on detected sound signatures in accordance with an exemplary embodiment.
  • the method 600 can be practiced with more or less than the number of steps shown and is not limited to the order shown. To describe the method 600, reference will be made to components of FIG 2 , although it is understood that the method 600 can be
  • the audio interface 212 can supply audio content (e.g., music, cell phone, voice mail, etc.) to the processor 114.
  • audio content e.g., music, cell phone, voice mail, etc.
  • the user 124 may listen to music, talk on the phone, receive voice mail, or perform other audio related tasks while the processor 114 additionally monitors warning sounds in the environment.
  • the processor 114 may operate normally to recreate the sound experience requested by the user 124. If however the processor 114 detects a warning sound, the processor 114 may manage audio content delivery to notify the user 124 of the warning sound.
  • Managing audio content delivery can include adjusting or overriding other current audio settings.
  • the audio interface 212 receives audio content from a media player, such as a portable music player, or cell phone.
  • the audio content can be delivered to the user's vehicle cabin by way of the VCRs 112, 120 as shown in step 604.
  • the processor 114 monitors ambient sound in the environment captured at the ASM 201.
  • Ambient sound may be sampled at sufficiently data rates (e.g. 8 kHz, 16 kHz, and 32 KHz) to allow for feature extraction of sound signatures.
  • the processor 114 may adjust the sampling rate based on the information content of the ambient signal. For example, upon the ambient sound exceeding a first threshold, the sampling rate may be set to a first rate (e.g ., 4 KHz) . As the ambient sound increases in volume, or as prominent features are identified, the sampling rate may be increased to a second rate (e.g., 8 KHz) to increase signal resolution. Although, the higher sampling rate may improve resolution of features, the lower sampling rate may preserve use of computational resources for minimally sufficient feature resolution (e.g., power supply 210, processor 114).
  • a first rate e.g . 4 KHz
  • a second rate e.g. 8 KHz
  • the processor 114 may determine a priority of the detected sound signature (at step 610) .
  • the priority establishes how the processor 114 manages audio content.
  • warning sounds for various environmental conditions and user experiences can be learned.
  • the user 124 or an administrator can establish priorities for warning sounds.
  • these priorities may be based on environmental context. For example, if a user 124 is in a warehouse where loading vehicles emit a beeping sound, sound signatures for such vehicles can be given the highest priority.
  • a user 124 may also prioritize learned warning sounds, for example, via a user interface on a paired device (e.g ., cell phone), or via speech recognition (e.g ., "prioritize - 'ambulance' - high").
  • the processor 114 Upon detecting a warning sound and identifying a priority, the processor 114, at step 612, selectively manages at least a portion of the audio content based on the priority. For example, if the user 124 is listening to music during the time a warning sound is detected, the processor 114 may decrease the music volume to present an audible notification. This may represent one indication that the processor 114 has detected a warning sound.
  • the processor 114 may further present an audible notification to the user 124.
  • an audible notification For example, upon detecting a "horn” sound, a speech-to-text message can be presented to the user 124 to audibly inform them that a horn sound has been detected (e.g. , "horn detected”).
  • Information related to the warning sound e.g ., direction, speed, priority, etc. may also be presented with the audible notification.
  • the processor 114 may send a message to a device operated by the user 124 to visually display the notification, as shown in step 616.
  • the processor 114 may transmit a text message to a paired device (e.g ., a cell phone) containing the audible warning (or to indicator 116 configured as a visual indicator) .
  • the processor 114 may beacon out an audible alarm to other devices within a vicinity, for example via Wi-Fi (e.g ., IEEE 802.16x) .
  • Wi-Fi e.g ., IEEE 802.16x
  • Other devices in a proximity of the user 124 may sign up to receive audible alarms from the processor 114.
  • the processor 114 can beacon a warning notification to other devices in the area to share warning information with other people.
  • FIG. 7 is a flowchart of a method 700 further describing sound signature detection in accordance with an exemplary embodiment.
  • the method 700 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown .
  • the method 700 can begin in a state in which the processor 114 is actively monitoring warning sounds in the environment.
  • ambient sound captured from the ASM 201 may be buffered into short term memory as frames.
  • the ambient sound may be sampled at 8 KHz with 10-20 ms frame sizes (80 to 160 samples) .
  • the frame size may also vary depending on the energy level of the ambient sound.
  • the processor 114 upon detecting low level sounds (e.g ., about 70-74 dB SPL) may use a frame size of about 30 ms, and update the frame size to about 10 ms as the power level increases (e.g., greater than about 86 dB SPL) .
  • the processor 114 may also increase the sampling rate in accordance with the power level and/or a duration of the ambient sound . (A longer frame size with lower sampling may compromise resolution for computational resources.)
  • the data buffer is desirably of sufficient length to hold a history of frames (e.g ., about 10- 15 frames) for short-term historical analysis.
  • the processor 114 may perform feature extraction on the frame as the ambient sound is buffered into the data buffer.
  • feature extraction may include performing a filter-bank analysis and summing frequencies in auditory bandwidths.
  • Features may also include Fast Fourier Transform (FFT) coefficients, Discrete Cosine Transform (DCT) coefficients, cepstral coefficients, partial autocorrelation (PARCOR) coefficients, wavelet coefficients, statistical values (e.g., energy, mean, skew, variance), parametric features, or any other suitable data compression feature set.
  • FFT Fast Fourier Transform
  • DCT Discrete Cosine Transform
  • PARCOR partial autocorrelation
  • dynamic features such as derivatives of any order, may be added to the static feature set.
  • mel-frequency-cepstral analysis may be performed on the frame to generate between about 10-16 mel-frequency-cepstral coefficients.
  • the small number of coefficients represent features that may be compactly stored to memory for that particular frame.
  • Such front end feature extraction techniques may reduce the amount of data used to represent the data frame.
  • the features may be incorporated as a sound signature and compared to learned models, for example, those retrieved from the warning sounds database 718 (e.g., user defined database 504 or the on-line database 506 of FIG. 5).
  • a sound signature may be defined as a sound in the user's ambient environment which has significant perceptual saliency.
  • a sound signature may correspond to an alarm, an ambulance, a siren, a horn, a police car, a bus, a bell, a gunshot, a window breaking, or any other warning sound, including voice.
  • the sound signature may include features characteristic to the sound.
  • the sound signature may be classified by statistical features of the sound (e.g., envelope, harmonics, spectral peaks, modulation, etc.).
  • each learned model used to identify a sound signature has a set of features specific to a warning sound.
  • a feature vector of a learned model for an "alarm” is sufficiently different from a feature vector of a learned model for a "bell sound”.
  • the learned model may describe interconnectivity (e.g., state transitions, emission probabilities, initial probabilities, synaptic connections, hidden layers) among the feature vectors (e.g. frames).
  • the features of a "bell” sound may change in a specific manner compared to the features of an "alarm” sound.
  • the learned model may be a statistical model such as a Gaussian mixture model (GMM), a Hidden Markov Model (HMM), a Bayes Classifier, or a Neural Network (NN) that requires training.
  • GMM Gaussian mixture model
  • HMM Hidden Markov Model
  • NN Neural Network
  • each warning sound may have an associated GMM used for detecting the warning sound.
  • the warning sound for an "alarm” may have its own GMM
  • a warning sound for a "bell” may have its own GMM.
  • Separate GMMs may also be used as a basis for the absence of the sounds (“anti-models"), such as "not alarm” or "not bell.”
  • Each GMM may provide a model for the distribution of the feature statistics for each warning sound in a multi-dimensional space.
  • each warning sound's GMM may be evaluated relative to its anti-model, and a score related to the likelihood of that warning sound may be computed. A threshold may be applied directly to this score to decide whether the warning sound is present or absent.
  • sequence of scores may be relayed to yet another module which uses a more complex rule to decide presence or absence.
  • rules include linear smoothing or median filtering.
  • each warning sound in the database 718 may have a corresponding HMM.
  • a sound signature for a warning sound captured at the ASM 201 in ambient sound may be processed through a lattice network (e.g. Viterbi network) for comparison to each HMM, to determine which HMM, if any, corresponds to the warning sound.
  • the sound signature may be input to the NN, where the output states of the NN correspond to warning sound indices.
  • the NN may include various topologies such as a Feed-Forward, Radial Basis Function, Hopfield , Time-Delay Recurrent, or other optimized topologies for real-time sound signature detection.
  • a distortion metric is performed with each learned model to determine which learned models are closest to the captured feature vector (e.g., sound signature) .
  • the learned model with the smallest distortion e.g., mathematical distance
  • the distortion may be calculated as part of the model comparison in step 713. This is because the distortion metric may depend on the type of model used (e.g. , HMM, NN, GMM, etc.) and in fact may be internal to the model (e.g . Viterbi decoding, back- propagation error update, etc.) .
  • the distortion module is merely presented in FIG. 7 as a separate component to suggest use with other types of pattern recognition methods or learning models.
  • the ambient sound at step 715 may be classified as a warning sound.
  • Each of the learned models may be associated with a score. For example, upon the presentation of a sound signature, each GMM may produce a score. The scores may be evaluated against a threshold, and the GMM with the highest score may be identified as the detected warning sound. For example, if the learned model for the "alarm" sound produces the highest score (e.g., smallest distortion result) compared to other learned models, the ambient sound may be classified as an "alarm" warning sound.
  • the classification step 715 also takes into account likelihoods (e.g. recognition probabilities) .
  • likelihoods e.g. recognition probabilities
  • each GMM may produce a likelihood result, or output.
  • these likelihood results may be evaluated against each other or in a logical context, to determine the GMM considered "most likely" to match the sound signature of the warning sound.
  • the processor 114 may then select the GMM with the highest likelihood or score via soft decisions.
  • the processor 114 may continually monitor the environment for warning sounds, or monitor the environment on a scheduled basis. In one arrangement, the processor 114 may increase monitoring in the presence of high ambient noise possibly signifying environmental danger or activity. Upon classifying an ambient sound as a warning sound the processor 114, at step 716, the ASRS 100 may generate an alarm. As previously noted, the processor 114 may mix the warning sound with audio content, amplify the warning sound, reproduce the warning sound, and/or deliver an audible message. As one example, spectral bands of the audio content that mask the warning sound may be suppressed to increase an audibility of the warning sound . This serves to notify the user 124 of a warning sounded detected in the environment, of which the user 124 may not be aware of depending on their environmental context.
  • the processor 114 may present an amplified audible notification to the user via the VCRs 112, 120.
  • the audible notification may be a synthetic voice identifyi ng the warning sound (e.g. "car alarm”), a location or direction of the sound source generating the warning sound (e.g . "to your left"), a duration of the warning sound (e.g., "3 minutes") from initial capture, and any other information (e.g., proximity, severity level, etc.) related to the warning sound .
  • the processor 114 may selectively mix the warning sound with the audio content based on a
  • predetermined threshold level For example, the user 124 may prioritize warning sound types for receiving various levels of notification, and/or identify the sound types as desirable of undesirable.
  • FIG. 8 presents a pictorial diagram 800 for mixing ambient sounds and warning sounds with audio content.
  • the processor 114 is directing music 136 to the vehicle cabin 126 via VCR 120 while simultaneously monitoring warning sounds in the environment.
  • the processor 114 upon detecting a warning sou nd (signature 135), can lower the music volume from the media player 150 (graph 141), and increase the volume of the ambient sound received at the ASM 201 (graph 142) .
  • Other mixing arrangements are herein contemplated.
  • the ramp up and down times can also be adjusted based on the priority of the warning sound .
  • the processor 114 may immediately shut off the music, and present the audible warning.
  • Other various implementations for mixing audio and managing audio content delivery are herein contemplated .
  • the audio content may be managed with other media devices (e.g., a cell phone) .
  • the processor 114 may inform the user 124 and the called party of a warning sound .
  • the user 124 does not need to inform the called party since they also receive the notification, which may save them time to explain an emergency situation.
  • the processor 114 may spectrally enhance the audio content in view of the ambient sound. Moreover, a timbral balance of the audio content may be maintained by taking into account level dependent equal loudness curves and other psychoacoustic criteria (e.g. , masking) associated with a personalized hearing level (PHL) 430. For example, auditory queues in a received audio content may be enhanced 1077
  • Frequency peaks within the audio content may be elevated relative to ambient noise frequency levels and in accordance with the PHL 430 to permit sufficient audibility of the ambient sound.
  • the PHL 430 reveals frequency dynamic ranges that may be used to limit the compression range of the peak elevation in view of the ambient noise spectrum.
  • the processor 114 may compensate for a masking of the ambient sound by the audio content.
  • the audio content if sufficiently loud, may mask auditory queues in the ambient sound, which can : i) potentially cause hearing damage, and ii) prevent the user 124 from hearing warning sounds in the environment (e.g., an approaching ambulance, an alarm, etc.).
  • the processor 114 may accentuate and attenuate frequencies of the audio content and ambient sound to permit maximal sound reproduction while simultaneously permitting audibility of ambient sounds.
  • the processor 114 may narrow noise frequency bands within the ambient sound to permit sensitivity to audio content between the frequency bands.
  • the processor 114 may also determine if the ambient sound contains salient information (e.g., warning sounds) that should be un-masked with respect to the audio content. If the ambient sound is not relevant, the processor 114 may mask the ambient sound (e.g., increase levels) with the audio content until warning sounds are detected.
  • salient information e.g., warning sounds
  • FIG. 9 is a flowchart of a method 900 for updating the sound signature detection library dependent on the vehicle location.
  • a current vehicle location may be acquired.
  • the location of vehicle 102 may be determined with a number of methods, including : a Global Positioning System (GPS) for determining a GPS location and cell-phone signal cell codes for triangulating the location (for example, based on a signal strength of the received signal(s) from cell phone networks).
  • GPS Global Positioning System
  • cell-phone signal cell codes for triangulating the location (for example, based on a signal strength of the received signal(s) from cell phone networks).
  • the acquired vehicle location may be used, at step 904, to determine which "sound library cell" the vehicle 102 is located in.
  • the sound library cell may refer to a geographic region, such as a country, state, province or city.
  • the sound library cell may contain the target sound signatures which the vehicle's sound recognition system detects (for example, for a particular sound signature of a particular fire engine siren, there may be associated GMM statistics, as described above).
  • the cell library may be determined from the acquired location (for example, using a look-up table), or may be acquired directly from coded signals embedded in the received location signal (for example, a cellphone signal or a radio signal).
  • the current sound signature library of the vehicle is updated (at step 910).
  • the current vehicle sound signature library may be updated from data provided by an on-line sound signature library 908, where the new sound signature data may be received wirelessly or from a storage device located on the vehicle 102.
  • one or more steps and/or components may be implemented in software for use with microprocessors/general purpose computers (not shown).
  • one or more of the functions of the various components and/or steps described above may be implemented in software that controls a computer.
  • the software may be embodied in non-transitory tangible computer readable media (such as, by way of non-limiting example, a magnetic disk, optical disk, flash memory, hard drive, etc.) for execution by the computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

Sound recognition systems for a vehicle and methods for increasing auditory situation awareness in a vehicle are provided. A sound recognition system includes at least one ambient sound microphone (ASM), at least one vehicle cabin receiver (VCR) and a processor. The ASM is disposed on the vehicle and configured to capture ambient sound external to the vehicle. The VCR is configured to deliver audio content to a vehicle cabin of the vehicle. The processor coupled to the at least one ASM and the at least one VCR. The processor is configured to detect at least one sound signature in the ambient sound and to adjust the audio content delivered to the vehicle cabin based on the detected at least one sound signature.

Description

AUTOMOTIVE SOUND RECOGNITION SYSTEM FOR ENHANCED SITUATION AWARENESS
CROSS REFERENCE TO RELATED APPLICATIONS
[OOOl] This application is related to and claims the benefit of U.S. Provisional
Application No. 61/432,016 entitled "AUTOMOTIVE SOUND RECOGNITION SYSTEM FOR ENHANCED SITUATION AWARENESS" filed on January 12, 2011, the contents of which are incorporated herein by reference.
FIELD OF THE INVENTION
[0002] The present invention relates to a device that monitors sound directed to a vehicle cabin, and more particularly, though not exclusively, to an audio system and method that detects ambient warning sounds and adjusts audio delivered to a vehicle cabin based on the detected warning sounds to enhance auditory situation awareness.
BACKGROUND OF THE INVENTION
[0003] People that use audio systems in vehicles generally do so for either music enjoyment or voice communication. The user is generally immersed in the audio experience when using such devices and is acoustically isolated within a sealed vehicle cabin. Background noises in the external vehicle environment (e.g., road, engine, wind and traffic noise) can contend with the acoustic sounds produced from these devices. As the background noise levels change, the user may need to adjust the volume to listen to their music over the background noise. Alternatively, the level of reproduced audio may be automatically increased, for example, by audio systems that increase the audio level as the vehicle velocity increases (i.e., to compensate for the rise in noise level from road, engine and aerodynamic noise). One example of such an automatic gain control system is described in US patent No. 5,081,682.
SUMMARY OF THE INVENTION
[0004] Aspects of the present invention relate to a sound recognition system for a vehicle. A sound recognition system includes at least one ambient sound microphone (ASM), at least one vehicle cabin receiver (VCR) and a processor. The ASM is disposed on the vehicle and configured to capture ambient sound external to the vehicle. The VCR is configured to deliver audio content to a vehicle cabin of the vehicle. The processor coupled to the at least one ASM and the at least one VCR. The processor is configured to detect at least one sound signature in the ambient sound and to adjust the audio content delivered to the vehicle cabin based on the detected at least one sound signature.
[0005] Aspects of the present invention also relate to methods for increasing auditory situation awareness in a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; monitoring the ambient sound for a target sound by detecting a sound signature corresponding to the target sound in the ambient sound ; and adjusting a delivery of audio content by at least one vehicle cabin receiver (VCR) to a vehicle cabin of the vehicle based on the target sound.
[0006] Aspects of the present invention further relate to methods for sound signature detection for a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; and receiving a directive to learn a sound signature within the ambient sound . A voice command or an indication from a user is received and is used to initiate the steps of capturing and learning.
[0007] Aspects of the present invention also relate to methods for personalized listening in a vehicle. The method includes capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; detecting a sound signature within the ambient sound that is associated with a warning sound; and mixing the warning sound with audio content delivered to the vehicle cabin via at least one vehicle cabin receiver (VCR) in accordance with a priority of the warning sound and a personalized hearing level (PHL) .
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The invention may be u nderstood from the following detailed description when read in connection with the accompanying drawing. It is emphasized, according to common practice, that various features of the drawings may not be drawn to scale. On the contrary, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. Moreover, in the drawing, common numerical references are used to represent like features. Included in the drawing are the following figures:
[0009] FIG. 1 is a pictorial diagram of a vehicle including an exemplary automotive sound recognition system for enhanced situation awareness in accordance with an embodiment of the present invention ;
[0010] FIG. 2 is a block diagram of the system shown in FIG. 1 in accordance with an exemplary embodiment of the present invention ;
[0011] FIG. 3 is a flowchart of an exemplary method for ambient sound monitoring and warning detection in accordance with an embodiment of the present invention ;
[0012] FIG. 4 illustrates various system modes in accordance with an exemplary embodiment of the present invention ;
[0013] FIG. 5 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention ;
[0014] FIG. 6 is a flowchart of an exemplary method for managing audio delivery based on detected sound signatures in accordance with an embodiment of the present invention ; [0015] FIG. 7 is a flowchart of an exemplary method for sound signature detection in accordance with an embodiment of the present invention ;
[0016] FIG. 8 is a pictorial diagram for mixing ambient sounds and warning sounds with audio content in accordance with an exemplary embodiment of the present invention; and
[0017] FIG. 9 is a flowchart of an exemplary method for updating the sound signature detection library dependent on the vehicle location in accordance with an embodiment of the present invention .
DETAILED DESCRIPTION OF THE INVENTION
[0018] The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
[0019] Processes, techniques, apparatus, and materials as known by one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the enabling description where appropriate, for example the fabrication and use of transducers. Additionally in at least one exemplary embodiment the sampling rate of the transducers can be varied to pick up pulses of sound, for example less than 50 milliseconds.
[0020] In all of the examples illustrated and discussed herein, any specific values, for example the sound pressure level change, should be interpreted to be illustrative only and non-limiting . Thus, other examples of the exemplary embodiments could have different values.
[0021] Note that herein when referring to correcting or preventing an error or damage (e.g ., hearing damage), a reduction of the damage or error and/or a correction of the damage or error are intended.
[0022] Automotive vehicle operators are often auditorially removed from their external ambient environment. Ambient sound cues such as from oncoming emergency (and non-emergency) vehicle sound alerts a re often not heard by the vehicle operator due to acoustic isolation of the vehicle cabin and internal cabin noise from engine and road noise and, especially, due to loud music and speech reproduction levels in the vehicle cabin.
[0023] Accordingly, background noises in the external vehicle environment contend with the acoustic sounds produced from the vehicle audio system. The vehicle operator, thus, becomes auditorially disassociated with their ambient environment, thereby increasing the danger of accidents from collisions with oncoming vehicles. A need therefore exists for improving the auditory situation awareness of vehicle operators to automatically alert the operator to ambient warning alerts. [0024] Music and speech audio reproduction levels in vehicles and ambient sound levels are antagonistic. For example, vehicle operators typically play vehicle audio devices louder to hear over the traffic and general urban noise. The same applies to voice communication.
[0025] Automotive vehicle operators are often auditorially removed from their external ambient environment. For example, high sound isolation from the external environment by be provided by cabin structural insu lation, close-fitting window seals and thick or double paned glass. Ambient sound cues (from external acoustic signals), such as oncoming emergency (and non-emergency) vehicle sound alerts; vocal messages from pedestrians; and sounds generated by the operator's own vehicle may often not be heard by the vehicle operator.
[0026] To summarize, the reduced "situation awareness" of the vehicle operator may be a consequence of at least two principle factors. One factor includes acoustic isolation of the vehicle cabin (e.g., from the vehicle windows and structural isolation). A second factor includes sound masking . The sound masking may include masking from internal cabin noise (such as from engine and road noise) and masking from loud music reproduction levels within the vehicle. The masking effect may be further compounded with telephone communications, where the vehicles operator's attention may be further distracted by the conversation. Telephone conversation, thus, may introduce an additional cognitive load that may further reduces the vehicle operator's auditory situation awareness of the vehicle surroundings.
[0027] The reduction of the situation awareness of the vehicle operator may lead to danger. For example, a personal safety of the vehicle operator may be reduced. In addition, personal safety of other vehicle operators and pedestrians in the vicinity of the vehicle may also be threatened.
[0028] One definition of situation awareness includes "the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the nea r future" . While some definitions are specific to the environment from which they were adapted, the above definition may be applicable across multiple task domains from visual to auditory modalities.
[0029] A method and system is herein disclosed to address this problem of reduced auditory situation awareness of vehicle operators. In an exemplary embodiment, ambient warning sounds in the vicinity of a vehicle may be automatically detected and may be actively reproduced in the vehicle cabin, to inform the vehicle operator of detected sounds. According to an exemplary embodiment, a library of known warning sounds may be acquired automatically based on the vehicle's location. [0030] Personal safety of the vehicle operator and passengers in the vehicle may thereby enhanced by exemplary systems and methods of the present invention, as described herein. Accordingly, the safety of other vehicles (such as oncoming emergency vehicles, other motorists, and pedestrians) may also be increased. The safety benefit comes not only from the enhanced auditory situation awareness, but also via reduced driver workload. For example, the system may reduce the burden on the driver to constantly visually scan the environment for emergency vehicles or other dangers that may also have recognizable acoustical signatures (that may ordinarily be inaudible inside the vehicle cabin).
[0031] One focus of the present invention is to enhance (i.e., increase) the auditory situation awareness of a typical vehicle operator and, thereby, improve the personal safety of the vehicle operator, and other motorists and pedestrians.
[0032] Referring to FIG. 1, a pictorial diagram of vehicle 102 including an exemplary automotive sound recognition system (ASRS) 100 for enhanced auditory situation awareness is shown. ASRS 100 may include user interface 106, central audio processor system 114 (also referred to herein as processor 114), indicator 116 and at least one loudspeaker (for example, right loudspeaker 112 and left loudspeaker 120) (also referred to herein as vehicle cabin receivers (VCRs) 112, 120). ASRS 100 may also include one or more ambient microphones (for example, right microphone 104, front microphone 108, rear microphone 110 and left microphone 122) for capturing ambient sound external to vehicle 102. ASRS 100 may also include at least one vehicle cabin microphone (VCM) 118 for capturing sound within vehicle cabin 126.
[0033] Processor 114 may be coupled to one or more of user interface 106, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122. Processor 114 may be configured to control acquisition of ambient sound signals from ambient microphones 104, 108, 110, 122 and (optionally) a cabin sound signal from VCM 118. Processor 114 may be configured analyze ambient and/or cabin sound signals, and to present information by system 100 to vehicle operator 124 (such as via VCRs 112, 120 and/or indicator 116) responsive to the analysis.
[0034] In operation, processor 114 may be configured to receive AC signal 107 and reproduce AC signal 107 through VCRs 112, 120 into vehicle cabin 126. Processor 114 may also be configured to receive ambient sound signals from respective ambient microphones 104, 108, 110, 122. Processor 114 may also be configured to receive a cabin sound signal from VCM 118.
[0035] Based on an analysis of the ambient sound signals (and, optionally, the cabin sound signal), processor 114 may mix the ambient sound signal from at least one of ambient microphones 104, 108, 110, 122 with AC signal 107. The mixed signal may be output to VCRs 112, 120. Accordingly, acoustic cues in the ambient signal (such as an ambulance siren, a vocal warning from a pedestrian, or a vehicle malfunction sound) may be passed into vehicle cabin 126, thereby providing detectable and spatial localization cues for vehicle operator 124.
[0036] AC signal 107 may include any audio signal provided to (and/or generated by) processor 114 that may be reproduced through VCRs 112, 120. AC signal 107 may correspond to (without being limited to) at least one of the following exemplary signals: a music or voice audio signal from a music audio source (for example, a radio, a portable media player, a computing device) ; voice audio (for example, from a telephone, a radio device or an occupant of vehicle 102); or an audio warning signal automatically generated by vehicle 102 (for example, in response to a backup proximity sensor, an unbelted passenger restraint, an engine malfunction condition, or other audio alert signals). AC signal 107 may be manually selected by vehicle operator 124 (for example, with user interface 106), or may be automatically generated by vehicle 102 (for example, by processor 114).
[0037] Although in FIG. 1, two loudspeakers 112, 120 are illustrated, ASRS 100 may include more or fewer loudspeakers. For example, ASRS 100 may have more than two loudspeakers for right, left, front and back balance of sound in vehicle cabin 126. As another example, ASRS 100 may include five loudspeakers (and a subwoofer) for 5.1 channel surround sound. It is understood that, in general, ASRS 100 may include one or more loudspeakers.
[0038] User interface 106 may include any suitable user interface capable of providing parameters for one or more of processor 114, indicator 116, VCRs 112, 120, VCM 118 and ambient microphones 104, 108, 110, 122. User interface 106 may include, for example, one or more buttons, a pointing device, a keyboard and/or a display device.
[0039] Processor 114 may also issue alerts to vehicle operator 124, for example, via indicator 116. Indicator 116 may provide alerts via a visual indication, an auditory indication (such as a tonal alert) and/or a haptic indication. Indicator 116 may include any suitable indicator such as (without being limited to) : a display (such as a heads-up display), a loudspeaker or a haptic transducer (for example, mounted in the vehicle's steering wheel or operator seat)..
[0040] In an exemplary embodiment, processor 114 may also use ambient microphones 104, 108, 110, 122 and/or VCM 118 and VCRs 112, 120 to cancel a background noise component (such as road noise) in vehicle cabin 126. For example, the noise cancellation may be centered at the position of vehicle operator 124. [0041] Ambient microphones 104, 108, 110, 122 may be positioned on vehicle 102 (for example, on an exterior of vehicle 102 or a ny other suitable location) such that ambient microphones 104, 108, 110, 122 may transduce sound that is external to vehicle 102. In general, ambient microphones 104, 108, 110, 122 may be configured to detect specific sounds in a vicinity of vehicle 102. Although four ambient microphones 104, 108, 110, 122 are illustrated in the positions (i.e., front, right, left and rear of vehicle 102) shown in FIG. 1, in general, system 100 may include any number of microphones and at least one ambient sound microphone. An ambient sound signal (from one or more of ambient microphones 104, 108, 110, 122) may also be mixed with AC signal 107 before being presented through at least one cabin loudspeaker 112, 120.
[0042] According to an exemplary embodiment, processor 114 may determine a sound pressure level (SPL) of vehicle cabin 126 (referred to herein as the cabin SPL) by analyzing a signal level and signal gain reproduced with at least one of loudspeakers 112, 120, and the sensitivity of respective loudspeakers 112, 120. In another exemplary embodiment, processor 114 may determine the cabin SPL via VCM 118. Use of VCM 118 may allow consideration of other sound sources in vehicle cabin 126 (i.e., other than sound sources contributed by loudspeakers 112, 120), such as an air conditioning system, and sound from other passengers in vehicle 102.
[0043] ASRS 100 may be coupled to a remote location (not shown), for example, by wireless communication. Information collected by ASRS 100 may be provided to the remote location (such as for further analysis) .
[0044] Referring to FIG. 2, a block diagram of ASRS 100 in accordance with an exemplary embodiment is shown. As illustrated, the ASRS 100 can include processor 114 operatively coupled to the Ambient Sound Microphone (ASM) 201, one or more VCRs 112, 120, and VCM 118 via one or more Analog to Digital Converters (ADC) 202 and Digital to Analog Converters (DAC) 203. In FIG. 2, ASM 201 may represent one or more of ambient microphones 104, 108, 110, 122 shown in FIG. 1. ASRS 100 can include an audio interface 212 operatively coupled to the processor 114 to receive AC signal 107 (for example, from a media player, a cell phone, voice mail), and deliver the AC signal 107 to the processor 114.
[0045] The processor 114 may include sound signature detection block 214 and may monitor the ambient sound captured by the ASM 201 for warning sounds in the environment, such as an alarm (e.g ., bell, emergency vehicle, security system, etc.), siren (e.g., police car, ambulance, etc.) , voice (e.g., "help", "stop", "police", etc.), or specific noise type (e.g., breaking glass, gunshot, etc.) . The memory 208 can store sound signatures for previously learned wa rning sounds to which the processor 114 refers for detecting warning sounds. The sound signatures can be resident in the Q memory 208 or downloaded to processor 114 via the transceiver 204 during operation as needed . Upon detecting a warning sound, the processor 114 can report the warning to the vehicle operator 124 (also referred to herein as user 124) via audio delivered from the VCRs 112, 120 to the vehicle cabin.
[0046] The processor 114 responsive to detecting warning sounds can adjust the audio content signal 107 and the warning sounds delivered to the vehicle cabin 126. The processor 114 can actively monitor the sound exposure level inside the vehicle cabin 126 and adjust the audio to within a safe and subjectively optimized listening level range. The processor 114 can utilize computing technologies such as a microprocessor,
Application Specific Integrated Chip (ASIC), and/or a digital signal processor (DSP) with associated storage memory 208 such as Flash, ROM, RAM, SRAM, DRAM or other like technologies for controlling operations.
[0047] The ASRS 100 can further include a transceiver 204 that can support singly or in combination any number of wireless access technologies including without limitation Bluetooth™, Wireless Fidelity (Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX), and/or other short or long range communication protocols. The transceiver 204 can also provide support for dynamic downloading over-the-air to the ASRS 100. It should be noted also that next generation access technologies can also be applied to the present disclosure.
[0048] The power supply 210 can utilize common power management technologies such as replaceable batteries, supply regulation technologies, and charging system technologies for supplying energy to the components of the ASRS 100 and to facilitate portable applications. A motor (not shown) can be a single supply motor driver coupled to the power supply 210 to improve sensory input via haptic vibration (for example, via indicator 116 (FIG. 1) configured as a haptic indicator), e.g. connected to the vehicle steering wheel or vehicle operator chair. As an example, the processor 114 can direct the motor to vibrate or pulse responsive to an action, such as a detection of a warning sound or an incoming voice call.
[0049] FIG. 3 is a flowchart of a method 300 for vehicle ambient sound monitoring and warning detection in accordance with an exemplary embodiment. The method 300 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 300, reference will be made to components of FIG 2 , although it is understood that the method 300 can be implemented in any other manner using other suitable components.
[0050] As shown in step 302, the processor 114 can monitor the environment for warning sounds, such as an alarm, a horn, a voice, or a noise. Each of the warning sounds can have certain identifiable features that characterize the sound. The features can be collectively referred to as a sound signature which can be used for recognizing the warning sound. As an example, the sound signature may include statistical properties or parametric properties of the warning sound. For example, a sound signature can describe prominent frequencies with associated amplitude and phase information. As another example, the sound signature can contain principal components identifying the most likely recognizable features of a warning sound.
[0051] At step 304, the processor 114 may detect the warning sounds within the environment based on the sound signatures. As described below, feature extraction techniques may be applied to the ambient sound captured at the ASM 201 to generate the sound signatures. Pattern recognition approaches may be applied based on known sound signatures to detect the warning sounds from their corresponding sound signatures. More specifically, sound signatures may be compared to learned models to identify a corresponding warning sound.
[0052] At step 306, the processor 114 may adjust sound delivered to the vehicle cabin 126 in view of a detected warning sound. Upon detecting a warning sound in the ambient sound of the user's environment, the processor 114, at step 308, may generate an audible alarm within the vehicle cabin 126 that identifies the detected sound signature.
[0053] The audible alarm can be a reproduction of the warning sound, an
amplification of the warning sound (or the entire ambient sound), a text-to-speech message (e.g., synthetic voice) identifying the warning sound, a haptic vibration via indicator 116 (configured as a haptic indicator), or an audio clip. For example, the processor 114 can generate a sound bite (i.e., audio clip) corresponding to the detected warning sound such as an ambulance, fire engine, or other environmental sound. As another example, the processor 114 can synthesize a voice to describe the detected warning sound (e.g., "ambulance approaching"). At step 310, processor 114 may send a message to a mobile device identifying the detected sound signature (e.g., "alarm sounding").
[0054] FIG. 4 illustrates system modes of ASRS 100 in accordance with an exemplary embodiment. The system mode may be manually selected by user 124, for example, by pressing a button; or automatically selected, for example, when the processor 114 detects it is in an active listen state or in a media state. As shown in FIG. 4, the system mode can correspond to Signature Sound Pass Through Mode (SSPTM), Signature Sound Boost Mode (SSBM), Signature Sound Rejection Mode (SSRJM), Signature Sound Attenuation Mode (SSAM), and Signature Sound Replacement Mode (SSRM).
[0055] In SSPTM mode, ambient sound captured at the ASM 201 is passed transparently to the VCRs 112, 120 for reproduction within the vehicle cabin 126. In this mode, the sound produced in the vehicle cabin 126 sufficiently matches the ambient sound outside the vehicle cabin 126, thereby providing a "transparency" effect. That is, the loudspeakers 112, 120 in the vehicle cabin 126 recreate the sound captured at the ASM 201. The processor 114, by way of sound measured at the VCM 118, may adjust the properties of sound delivered to the vehicle cabin 126 so the sound within the occluded vehicle cabin 126 is the same as the ambient sound outside the vehicle 102.
[0056] In SSBM mode, warning sounds and/or ambient sounds are amplified upon the processor 114 detecting a warning sound. The warning sound can be amplified relative to the normal level received, or amplified above an audio content level if audio content is being delivered to the vehicle cabin 126.
[0057] In SSRJM mode, sounds other than warning sounds may be rejected upon the processor 114 detecting a specific sound signature. The specific sound can be minimized relative to the normal level received. In SSAM mode, sounds other than warning sounds can be attenuated. For example, in both SSRJM and SSAM modes, annoying sounds or noises not associated with warning sounds can be suppressed. For instance, by way of a learning session, the user 124 can establish which sounds are considered warning sounds (e.g ., "ambulance") and which sounds are considered non-warning sounds (e.g. "jackhammer"). The processor 114 upon detecting non-warning sounds can thus attenuate or reject these sounds within the vehicle cabin 126.
[0058] In SSRM mode, warning sounds detected in the environment can be replaced with audible warning messages. For example, the processor 114, upon detecting a warning sound, can generate synthetic speech identifying the warning sound (e.g., "ambulance detected") . In such regard, the processor 114 may audibly report the warning sound identified, thereby relieving the user 124 from having to interpret the warning sound. The synthetic speech can be mixed with the ambient sound (e.g., amplified, attenuated, cropped, etc.), or played alone with the ambient sound muted .
[0059] FIG. 5 is a flowchart of a method 500 for sound signature detection in accordance with an exemplary embodiment. The method 500 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown. To describe the method 500, reference will be made to components of FIG 2 , although it is understood that the method 500 can be implemented in any other manner using other suitable components.
[0060] The method can start at step 502, in which the processor 114 can enter a learn mode. Notably, the processor 114 upon completion of a learning mode or previous learning configuration can start instead at step 520. In the learning mode of step 502, the processor 114 can actively generate and learn sound signatures from ambient sounds within the environment. In learning mode, the processor 114 can also receive 1077
previously trained learni ng models to use for detecting warning sounds in the
environment.
[0061] In an active learning mode, the user 124 can press a button or otherwise (e.g . voice recognition) initiate a recording of ambient sou nds in the environ ment. For example, the user can, upon hearing a new warning sound in the environment ("car horn"), activate the processor 114 to learn the new wa rning sound . Upon generating a sound signature for the new warning sound, it can be stored in the user defined database 504. In another arrangement, the processor 114, upon detecting a unique sou nd, cha racteristic to a warning sou nd, can ask the user 124 if they desire to have the sou nd signature for the unique sound learned . In such regard , the processor 114 may actively sense sounds and may query the user 124 about the environment to learn the sounds. Moreover, the processor 114 can organize learned sounds based on
environmental context, for example, in city and country environments.
[0062] In an exemplary embodiment, ASRS 100 (FIG. 2) may provide for delayed recording, to allow a previously encountered sound to be learned . For example, ASRS 100 may include a buffer to store ambient sounds recorded for a period of time. User 124 may review the recorded ambient sounds and select one or more of these recorded ambient sounds for learning (such as via user interface 106 (FIG . 1 )) .
[0063] In another learni ng mode, trained models can be retrieved from an on-line database 506 for use in detecting warning sounds. The previously learned models can be transmitted on a scheduled basis to the processor 114, or as needed , depending on the environmental context. For example, upon the processor 114, upon detecting traffic noise, may retrieve sound sig nature models associated with warning sounds (e.g . , ambulance, police car) in traffic. In another embodiment, upon the processor 114 detecting conversational noise (e .g . people talki ng), sound signature models for verbal warnings ("help", "pol ice") may be retrieved . Groups of sound signature models may be retrieved based on the environmental context or on user directed action .
[0064] As shown in step 508, the ASRS processor 114 can also generate speech recog nition models for warning sounds corresponding to voice, such as " help", " police", "fire", etc. The speech recog nition models may be retrieved from the on-line database 506 or the user defined database 504. In the latter for example, the user 124 may say a word or enter a text version of a word to associate the word with a verbal warning sound . For instance, the user 124 may define a set of words of interest along with mappings to their meanings, and then use keyword spotting to detect their occurrences. If the user 124 enters an environment wherein another i ndividual says the sa me word (e.g . , " help") the processor 114 may inform the user 124 of the verbal warning sound . U 2012/021077
[0065] For other acoustic sounds, the processor 114 may generate sound signature models as shown in step 510. Notably, the processor 114 itself may generate the sound signature models, or transmit the captured warning sounds to external systems (e.g . , a remote server) that generate the sound signature models. Such learning may be conducted off-line in a training phase, and the processor 114 can be uploaded with the new learning models.
[0066] It should also be noted that the learning models can be updated during use of the AS S 100, for example, when the processor 114 detects warning sounds. The detected warning sounds can be used to adapt the learning models as new warning sound variants are encountered . For example, the processor 114 upon detecting a warning sound, can use the sound signature of the warning sound to u pdate the learned models in accordance with the training phase. In such an embodiment a first learned model is adapted based on new training data collected in the environment by the processor 114. In such regard, for example, a new set of "horn" warning sounds could be included in real-time training without discarding the other "horn" sounds already captured in the existing model.
[0067] Upon completion of learning, uploading, or retrieval of sound signature models, the processor 114 can monitor and report warning sounds within the
environment. As shown in step 520, ambient sounds (e.g . an input signal) within the environment are captured by the ASM 201. The ambient sounds can be digitized by way of the ADC 202 and stored temporarily to a data buffer in memory 208 as shown in step 522. The data buffer may be capable of holding enough data to allow for generation of a sound signature as will be described ahead in FIG. 7.
[0068] In another configuration, the processor 114 can implement a "look ahead" analysis system by way of the data buffer for reproduction of pre-recorded audio content, using a data buffer to offset the reproduction of the audio signal. The look- ahead system allows the processor 114 to analyze potentially harmful audio artifacts (e.g. high level onsets, bursts, etc.) either received from an external media device, or detected with the ambient microphones 201, in-situ before it is reproduced . The processor 114 can thus mitigate the audio artifacts in advance to reduce timbral distortion effects caused by, for instance, attenuating high level transients.
[0069] At step 524, signal conditioning techniques may be applied to the ambient sound, for example, to suppress noise or gate the noise to a predetermined threshold. Other signal processing steps such as threshold detection shown in step 526 may be used to determine whether ambient sounds should be evaluated for warning sounds. For instance, to conserve computational processing resources (e.g., battery, processor), only ambient sounds that exceed a predetermined power level may be evaluated for warning sounds. Other metrics such as signal spectrum, duration, and stationarity may be considered in determining whether the ambient sound is analyzed for warning sounds. Notably, other metrics (e.g., context aware) may also be used to determine when the ambient sound should be processed for warning sound detection.
[0070] If at least one property (e.g., power, spectral shape, duration, etc. ) of the ambient sound exceeds a threshold (or adaptive threshold), the processor 114 at step 530 can proceed to generate a sound signature for the ambient sound . In one embodiment the sound signature is a feature vector which can include statistical parameters or salient features of the ambient sound.
[0071] An ambient sound with a warning sound (e.g . "bell", "siren"), such as shown in step 532, is generally expected to exhibit features similar to sound signatures for similar warning sounds (e.g. "bell", "siren") stored in the user defined database 504 or the on-line database 506. The processor 114 can also identify a direction and speed of the sound source if it is moving, for example, by evaluating Doppler shift as shown in step 534 and 536. The processor 114, by way of beam-forming among multiple ASMs 201 may also estimate a direction of a sound source generating the warning sound.
[0072] The speed and bearing of the sound source can also be estimated using pitch analysis to detect changes predicted by Doppler effect, or alternatively by an analysis in changes in relative phase and magnitude between the two ASM signals. The processor 114, by way of a sound recognition engine, may detect general warning signals such as car horns or emergency sirens (and other signals referenced by ISO 7731) using spectral and temporal analysis.
[0073] The processor 114 can also analyze the ambient sound to determine if a verbal warning (e.g. "help", "police", "excuse me") is present. As shown in step 540, the sound signature of the ambient sound can be analyzed for speech content. For example, the sound signature may be analyzed for voice information, such as vocal cord pitch periodicities, time-varying voice formant envelopes, or other articulation parameter attributes. Upon detecting the presence of voice in the ambient sound, the processor 114 can perform key word detection (e.g . "help") in the spoken content as shown in step 542. Speech recognition models as well as language models may be used to identify key words in the spoken content. As previously noted, the user 124 may say or enter in one or more warning sounds that may be mapped to associated learning models for sound signature detection.
[0074] As shown in step 552, the user 124 may also provide user input to direct operation of the processor 114, for example, to select an operational mode as shown in 550. As one example, the operational mode can enable, disable or adj ust monitoring of warning sounds. For example, in a listening mode, the processor 114 may mix audio content with ambient sound while monitoring for warning sounds. In a quiet mode, the processor 114 may suppress or attempt to actively cancel all noises except detected warning sounds.
[0075] The user input may be in the form of a physical interaction (e.g., button press) or a vocalization (e.g., spoken command). The operating mode can also be controlled by a prioritizing module as shown in step 554. The prioritizing module may prioritize warning sounds based on severity and context. For example, if the user 124 is in a phone call, and a warning sound is detected, the processor 114 may audibly inform the user 124 of the warning and/or present a text message of the warning sound. If the user 124 is listening to music or a voice communication, and a warning sound is detected, the processor 114 may automatically shut off the music or voice audio and alert the user. The user 124, by way of user interface 106 (FIG. 1) or an administrator, may rank warning sounds and instruct the processor 114 how to respond to warnings in various contexts.
[0076] FIG. 6 is a flowchart of a method 600 for managing audio delivery based on detected sound signatures in accordance with an exemplary embodiment. The method 600 can be practiced with more or less than the number of steps shown and is not limited to the order shown. To describe the method 600, reference will be made to components of FIG 2 , although it is understood that the method 600 can be
implemented in any other manner using other suitable components.
[0077] As noted previously, the audio interface 212 can supply audio content (e.g., music, cell phone, voice mail, etc.) to the processor 114. In such regard, the user 124 may listen to music, talk on the phone, receive voice mail, or perform other audio related tasks while the processor 114 additionally monitors warning sounds in the environment. During normal use, when a warning sound is not present, the processor 114 may operate normally to recreate the sound experience requested by the user 124. If however the processor 114 detects a warning sound, the processor 114 may manage audio content delivery to notify the user 124 of the warning sound. Managing audio content delivery can include adjusting or overriding other current audio settings.
[0078] By way of example, as shown in step 602, the audio interface 212 receives audio content from a media player, such as a portable music player, or cell phone. The audio content can be delivered to the user's vehicle cabin by way of the VCRs 112, 120 as shown in step 604.
[0079] At step 606, the processor 114 monitors ambient sound in the environment captured at the ASM 201. Ambient sound may be sampled at sufficiently data rates (e.g. 8 kHz, 16 kHz, and 32 KHz) to allow for feature extraction of sound signatures.
Moreover, the processor 114 may adjust the sampling rate based on the information content of the ambient signal. For example, upon the ambient sound exceeding a first threshold, the sampling rate may be set to a first rate (e.g ., 4 KHz) . As the ambient sound increases in volume, or as prominent features are identified, the sampling rate may be increased to a second rate (e.g., 8 KHz) to increase signal resolution. Although, the higher sampling rate may improve resolution of features, the lower sampling rate may preserve use of computational resources for minimally sufficient feature resolution (e.g., power supply 210, processor 114).
[0080] If at step 608, a sound signature is detected, the processor 114 may determine a priority of the detected sound signature (at step 610) . The priority establishes how the processor 114 manages audio content. Notably, warning sounds for various environmental conditions and user experiences can be learned. Accordingly, the user 124 or an administrator, can establish priorities for warning sounds. Moreover, these priorities may be based on environmental context. For example, if a user 124 is in a warehouse where loading vehicles emit a beeping sound, sound signatures for such vehicles can be given the highest priority. A user 124 may also prioritize learned warning sounds, for example, via a user interface on a paired device (e.g ., cell phone), or via speech recognition (e.g ., "prioritize - 'ambulance' - high").
[0081] Upon detecting a warning sound and identifying a priority, the processor 114, at step 612, selectively manages at least a portion of the audio content based on the priority. For example, if the user 124 is listening to music during the time a warning sound is detected, the processor 114 may decrease the music volume to present an audible notification. This may represent one indication that the processor 114 has detected a warning sound.
[0082] At step 614, the processor 114 may further present an audible notification to the user 124. For example, upon detecting a "horn" sound, a speech-to-text message can be presented to the user 124 to audibly inform them that a horn sound has been detected (e.g. , "horn detected"). Information related to the warning sound (e.g ., direction, speed, priority, etc.) may also be presented with the audible notification.
[0083] In a further arrangement, the processor 114 may send a message to a device operated by the user 124 to visually display the notification, as shown in step 616. For example, if the user has disengaged audible notification, the processor 114 may transmit a text message to a paired device (e.g ., a cell phone) containing the audible warning (or to indicator 116 configured as a visual indicator) . Moreover, the processor 114 may beacon out an audible alarm to other devices within a vicinity, for example via Wi-Fi (e.g ., IEEE 802.16x) . Other devices in a proximity of the user 124 may sign up to receive audible alarms from the processor 114. In such regard, the processor 114 can beacon a warning notification to other devices in the area to share warning information with other people.
[0084] FIG. 7 is a flowchart of a method 700 further describing sound signature detection in accordance with an exemplary embodiment. The method 700 can be practiced with more or fewer than the number of steps shown and is not limited to the order shown . The method 700 can begin in a state in which the processor 114 is actively monitoring warning sounds in the environment.
[0085] At step 711, ambient sound captured from the ASM 201 may be buffered into short term memory as frames. As an example, the ambient sound may be sampled at 8 KHz with 10-20 ms frame sizes (80 to 160 samples) . The frame size may also vary depending on the energy level of the ambient sound. For example, the processor 114 upon detecting low level sounds (e.g ., about 70-74 dB SPL) may use a frame size of about 30 ms, and update the frame size to about 10 ms as the power level increases (e.g., greater than about 86 dB SPL) . The processor 114 may also increase the sampling rate in accordance with the power level and/or a duration of the ambient sound . (A longer frame size with lower sampling may compromise resolution for computational resources.) The data buffer is desirably of sufficient length to hold a history of frames (e.g ., about 10- 15 frames) for short-term historical analysis.
[0086] At step 712, the processor 114 may perform feature extraction on the frame as the ambient sound is buffered into the data buffer. As one example, feature extraction may include performing a filter-bank analysis and summing frequencies in auditory bandwidths. Features may also include Fast Fourier Transform (FFT) coefficients, Discrete Cosine Transform (DCT) coefficients, cepstral coefficients, partial autocorrelation (PARCOR) coefficients, wavelet coefficients, statistical values (e.g., energy, mean, skew, variance), parametric features, or any other suitable data compression feature set.
[0087] Additionally, dynamic features, such as derivatives of any order, may be added to the static feature set. As one example, mel-frequency-cepstral analysis may be performed on the frame to generate between about 10-16 mel-frequency-cepstral coefficients. The small number of coefficients represent features that may be compactly stored to memory for that particular frame. Such front end feature extraction techniques may reduce the amount of data used to represent the data frame.
[0088] At step 713, the features may be incorporated as a sound signature and compared to learned models, for example, those retrieved from the warning sounds database 718 (e.g., user defined database 504 or the on-line database 506 of FIG. 5). A sound signature may be defined as a sound in the user's ambient environment which has significant perceptual saliency. As an example, a sound signature may correspond to an alarm, an ambulance, a siren, a horn, a police car, a bus, a bell, a gunshot, a window breaking, or any other warning sound, including voice. The sound signature may include features characteristic to the sound. As an example, the sound signature may be classified by statistical features of the sound (e.g., envelope, harmonics, spectral peaks, modulation, etc.).
[0089] Notably, each learned model used to identify a sound signature has a set of features specific to a warning sound. For example, a feature vector of a learned model for an "alarm" is sufficiently different from a feature vector of a learned model for a "bell sound". Moreover, the learned model may describe interconnectivity (e.g., state transitions, emission probabilities, initial probabilities, synaptic connections, hidden layers) among the feature vectors (e.g. frames). For example, the features of a "bell" sound may change in a specific manner compared to the features of an "alarm" sound. The learned model may be a statistical model such as a Gaussian mixture model (GMM), a Hidden Markov Model (HMM), a Bayes Classifier, or a Neural Network (NN) that requires training.
[0090] In the foregoing, a Gaussian Mixture Model (GMM) is presented, although it should be noted that any of the above models may be used for sound signature detection. In this case, each warning sound may have an associated GMM used for detecting the warning sound. As an example, the warning sound for an "alarm" may have its own GMM, and a warning sound for a "bell" may have its own GMM. Separate GMMs may also be used as a basis for the absence of the sounds ("anti-models"), such as "not alarm" or "not bell." Each GMM may provide a model for the distribution of the feature statistics for each warning sound in a multi-dimensional space.
[0091] Upon presentation of a new feature vector, the likelihood of the presence of each warning sound may then be calculated. In order to detect a warning sound, each warning sound's GMM may be evaluated relative to its anti-model, and a score related to the likelihood of that warning sound may be computed. A threshold may be applied directly to this score to decide whether the warning sound is present or absent.
Similarly, the sequence of scores may be relayed to yet another module which uses a more complex rule to decide presence or absence. Examples of such rules include linear smoothing or median filtering.
[0092] As previously noted, a HMM model or NN model with their associated connection logic may be used in place of each GMM for each learning model. For example, each warning sound in the database 718 may have a corresponding HMM. A sound signature for a warning sound captured at the ASM 201 in ambient sound may be processed through a lattice network (e.g. Viterbi network) for comparison to each HMM, to determine which HMM, if any, corresponds to the warning sound. Alternatively, in a trained NN, the sound signature may be input to the NN, where the output states of the NN correspond to warning sound indices. The NN may include various topologies such as a Feed-Forward, Radial Basis Function, Hopfield , Time-Delay Recurrent, or other optimized topologies for real-time sound signature detection.
[0093] At step 714, a distortion metric is performed with each learned model to determine which learned models are closest to the captured feature vector (e.g., sound signature) . The learned model with the smallest distortion (e.g., mathematical distance) is generally considered the correct match, or recognition result. It should also be noted that the distortion may be calculated as part of the model comparison in step 713. This is because the distortion metric may depend on the type of model used (e.g. , HMM, NN, GMM, etc.) and in fact may be internal to the model (e.g . Viterbi decoding, back- propagation error update, etc.) . The distortion module is merely presented in FIG. 7 as a separate component to suggest use with other types of pattern recognition methods or learning models.
[0094] Upon evaluating the feature vector (e.g . sound signature) against the candidate warning sound learned models, the ambient sound at step 715 may be classified as a warning sound. Each of the learned models may be associated with a score. For example, upon the presentation of a sound signature, each GMM may produce a score. The scores may be evaluated against a threshold, and the GMM with the highest score may be identified as the detected warning sound. For example, if the learned model for the "alarm" sound produces the highest score (e.g., smallest distortion result) compared to other learned models, the ambient sound may be classified as an "alarm" warning sound.
[0095] The classification step 715 also takes into account likelihoods (e.g. recognition probabilities) . For example, as part of the step of comparing the sound signature of the unknown ambient sound against all the GMMs for the learned models, each GMM may produce a likelihood result, or output. As an example, these likelihood results may be evaluated against each other or in a logical context, to determine the GMM considered "most likely" to match the sound signature of the warning sound. The processor 114 may then select the GMM with the highest likelihood or score via soft decisions.
[0096] The processor 114 may continually monitor the environment for warning sounds, or monitor the environment on a scheduled basis. In one arrangement, the processor 114 may increase monitoring in the presence of high ambient noise possibly signifying environmental danger or activity. Upon classifying an ambient sound as a warning sound the processor 114, at step 716, the ASRS 100 may generate an alarm. As previously noted, the processor 114 may mix the warning sound with audio content, amplify the warning sound, reproduce the warning sound, and/or deliver an audible message. As one example, spectral bands of the audio content that mask the warning sound may be suppressed to increase an audibility of the warning sound . This serves to notify the user 124 of a warning sounded detected in the environment, of which the user 124 may not be aware of depending on their environmental context.
[0097] As an example, the processor 114 may present an amplified audible notification to the user via the VCRs 112, 120. The audible notification may be a synthetic voice identifyi ng the warning sound (e.g. "car alarm"), a location or direction of the sound source generating the warning sound (e.g . "to your left"), a duration of the warning sound (e.g., "3 minutes") from initial capture, and any other information (e.g., proximity, severity level, etc.) related to the warning sound . Moreover, the processor 114 may selectively mix the warning sound with the audio content based on a
predetermined threshold level. For example, the user 124 may prioritize warning sound types for receiving various levels of notification, and/or identify the sound types as desirable of undesirable.
[0098] FIG. 8, presents a pictorial diagram 800 for mixing ambient sounds and warning sounds with audio content. In the illustration show, the processor 114 is directing music 136 to the vehicle cabin 126 via VCR 120 while simultaneously monitoring warning sounds in the environment. At time, T, the processor 114, upon detecting a warning sou nd (signature 135), can lower the music volume from the media player 150 (graph 141), and increase the volume of the ambient sound received at the ASM 201 (graph 142) . Other mixing arrangements are herein contemplated.
[0099] In such regard, there is a smooth audio transition between the music 136 and the warning sound 135. Notably, the ramp up and down times can also be adjusted based on the priority of the warning sound . For example, in an extreme case, the processor 114 may immediately shut off the music, and present the audible warning. Other various implementations for mixing audio and managing audio content delivery are herein contemplated .
[00100] Moreover, the audio content may be managed with other media devices (e.g., a cell phone) . For example, upon detecting a warning sound, the processor 114 may inform the user 124 and the called party of a warning sound . In such regard, the user 124 does not need to inform the called party since they also receive the notification, which may save them time to explain an emergency situation.
[00101] As one example, the processor 114 may spectrally enhance the audio content in view of the ambient sound. Moreover, a timbral balance of the audio content may be maintained by taking into account level dependent equal loudness curves and other psychoacoustic criteria (e.g. , masking) associated with a personalized hearing level (PHL) 430. For example, auditory queues in a received audio content may be enhanced 1077
~ 20 ~ based on the PHL 430 and a spectrum of the ambient sound captured at the ASM 201. Frequency peaks within the audio content may be elevated relative to ambient noise frequency levels and in accordance with the PHL 430 to permit sufficient audibility of the ambient sound. The PHL 430 reveals frequency dynamic ranges that may be used to limit the compression range of the peak elevation in view of the ambient noise spectrum.
[00102] In one arrangement, the processor 114 may compensate for a masking of the ambient sound by the audio content. Notably, the audio content, if sufficiently loud, may mask auditory queues in the ambient sound, which can : i) potentially cause hearing damage, and ii) prevent the user 124 from hearing warning sounds in the environment (e.g., an approaching ambulance, an alarm, etc.). Accordingly, the processor 114 may accentuate and attenuate frequencies of the audio content and ambient sound to permit maximal sound reproduction while simultaneously permitting audibility of ambient sounds.
[00103] In one exemplary embodiment, the processor 114 may narrow noise frequency bands within the ambient sound to permit sensitivity to audio content between the frequency bands. The processor 114 may also determine if the ambient sound contains salient information (e.g., warning sounds) that should be un-masked with respect to the audio content. If the ambient sound is not relevant, the processor 114 may mask the ambient sound (e.g., increase levels) with the audio content until warning sounds are detected.
[00104] FIG. 9 is a flowchart of a method 900 for updating the sound signature detection library dependent on the vehicle location. At step, 902, a current vehicle location may be acquired. The location of vehicle 102 may be determined with a number of methods, including : a Global Positioning System (GPS) for determining a GPS location and cell-phone signal cell codes for triangulating the location (for example, based on a signal strength of the received signal(s) from cell phone networks).
[00105] The acquired vehicle location may be used, at step 904, to determine which "sound library cell" the vehicle 102 is located in. The sound library cell may refer to a geographic region, such as a country, state, province or city. The sound library cell may contain the target sound signatures which the vehicle's sound recognition system detects (for example, for a particular sound signature of a particular fire engine siren, there may be associated GMM statistics, as described above). The cell library may be determined from the acquired location (for example, using a look-up table), or may be acquired directly from coded signals embedded in the received location signal (for example, a cellphone signal or a radio signal).
[00106] If, at step 906, the vehicle library cell location is determined to have changed, then the current sound signature library of the vehicle is updated (at step 910). For example, the current vehicle sound signature library may be updated from data provided by an on-line sound signature library 908, where the new sound signature data may be received wirelessly or from a storage device located on the vehicle 102.
[00107] Although the invention has been described in terms of automotive sound recognition systems and methods for enhancing situation awareness in a vehicle, it is contemplated that one or more steps and/or components may be implemented in software for use with microprocessors/general purpose computers (not shown). In this embodiment, one or more of the functions of the various components and/or steps described above may be implemented in software that controls a computer. The software may be embodied in non-transitory tangible computer readable media (such as, by way of non-limiting example, a magnetic disk, optical disk, flash memory, hard drive, etc.) for execution by the computer.
[00108] Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the invention.

Claims

What is Claimed :
1. A sound recognition system for a vehicle comprising : at least one ambient sound microphone (ASM), disposed on the vehicle, config ured to capture ambient sound external to the vehicle;
at least one vehicle cabin receiver (VCR) configured to deliver audio content to a vehicle cabin of the vehicle; and
a processor, coupled to the at least one ASM and the at least one VCR, the processor configured to detect at least one sound signature in the ambient sound and to adjust the aud io content delivered to the vehicle cabin based on the detected at least one sound signatu re.
2. The system according to claim 1, wherein the sound signature includes at least one of a non-verbal warning sound or a verbal warning sound .
3. The system according to claim 2, wherein the non-verbal warning sound includes at least one of an alarm, a horn, a siren or a noise.
4. The system accord ing to claim 2, wherein the verbal warning sou nd includes one or more spoken words associated with a verbal warning .
5. The system according to claim 1 , wherein the processor is configured to selectively adjust a volu me of the audio content delivered to the vehicle cabin when the at least one sound signature is detected .
6. The system according to claim 1, wherein the processor is configured to reproduce the ambient sound associated with the at least one sou nd signature within the vehicle cabin.
7. The system according to claim 1 , wherein the processor is configured to selectively mix the aud io content with the ambient sound when the at least one sound signatu re is detected .
8. The system according to claim 1 , further including a memory configured to store a target sound captu red by the at least one ASM for learning the corresponding sound sig nature.
9. The system according to claim 8, wherein the memory is configured to store the target sound responsive to an indication by a user of the system .
10. The system according to claim 8, wherein the memory is configured to store the target sound automatical ly by the system .
1 1. The system according to claim 1 , further including an audio interface coupled to the processor to receive the audio content from at least one of a media player or a mobile phone.
12. The system according to claim 1, wherein the system is configured to transmit one or more of the ambient sou nd a nd the at least one sound signature to a remote location .
13. A method for increasing auditory situation awareness in a vehicle, the method comprising the steps of:
capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM ) disposed on the vehicle;
monitoring the ambient sound for a target sound by detecting a sound signature corresponding to the target sound in the ambient sound ; and
adjusting a delivery of audio content by at least one vehicle cabin receiver (VCR) to a vehicle cabin of the vehicle based on the target sound .
14. The method accord ing to claim 13, wherein the adjusting of the delivery of the audio content includes mixing the target sound with the audio content for delivery to the vehicle cabin .
15. The method according to claim 14, wherein the target sound is mixed with the aud io content in accordance with a priority of the target sou nd .
16. The method according to claim 13, wherein the adjusting of the delivery of the aud io content includes at least o ne of:
passing the target sou nd to the at least one VCR,
amplifying the target sou nd for delivery to the vehicle cabi n, attenuating the target sound for delivery to the vehicle cabin, generating an audible message based on the target sound for delivery to the vehicle cabin, or
replacing the target sou nd with a predetermined sou nd corresponding to the target sound for delivery to the vehicle cabi n .
17. The method according to claim 13, the method further including : detecting at least one of a direction of a sound sou rce or a speed of the sound source generating the target sound from the sound signature; a nd
indicating the at least one of the direction or the speed of the sou nd source in the vehicle cabin .
18. The method according to claim 13, the method further including transmitting a warning notification to other devices.
19. The method of claim 13, wherein the target sou nd includes at least one of an alarm, a horn , a siren , a spoken utterance or a noise .
20. The method according to claim 13, wherein the detecting of the sou nd signatu re incl udes detecting a spoken utterance in the ambient sound associated with a verbal warning , the method further including : indicating the verbal warning in the vehicle cabin.
21. The method according to claim 13, the method further including : acquiring a current location of the vehicle;
associating the current location with a sound signature; and updating a sound signature library containing a plurality of predetermined target sounds with the sound signature associated with the current location.
22. A method for sound signature detection for a vehicle, the method comprising :
capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle; and
receiving a directive to learn a sound signature within the ambient sound, wherein a voice command or an indication from a user is received and is used to initiate the steps of capturing and learning.
23. The method according to claim 22, further including saving the sound signature at least one of locally on the vehicle or remotely to a server.
24. The method according to claim 22, further including adapting a previously learned warning sound model using the sound signature within the ambient sound.
25. A method for personalized listening in a vehicle, the method comprising :
capturing ambient sound external to the vehicle from at least one ambient sound microphone (ASM) disposed on the vehicle;
detecting a sound signature within the ambient sound that is associated with a warning sound; and
mixing the warning sound with audio content delivered to the vehicle cabin via at least one vehicle cabin receiver (VCR) in accordance with a priority of the warning sound and a personalized hearing level (PHL).
26. The method according to claim 25, wherein the detecting of the sound signature includes:
retrieving learned models from a database;
comparing the sound signature to the learned models; and identifying the warning sound from the learned models responsive to the comparison.
27. The method according to claim 25, further including enhancing auditory queues in the warning sound relative to the audio content based on a spectrum of the ambient sound captured at the at least one ASM.
PCT/US2012/021077 2011-01-12 2012-01-12 Automotive sound recognition system for enhanced situation awareness WO2012097150A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161432016P 2011-01-12 2011-01-12
US61/432,016 2011-01-12

Publications (1)

Publication Number Publication Date
WO2012097150A1 true WO2012097150A1 (en) 2012-07-19

Family

ID=46507437

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/021077 WO2012097150A1 (en) 2011-01-12 2012-01-12 Automotive sound recognition system for enhanced situation awareness

Country Status (1)

Country Link
WO (1) WO2012097150A1 (en)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102012016820A1 (en) * 2012-08-24 2014-04-10 GM Global Technology Operations, LLC (n.d. Ges. d. Staates Delaware) Driver assistance system for motor car, has signal processor for assigning safety level of relevance to identified noise, and loudspeaker reproducing identified noise with volume dependant on assigned safety level
CN104635596A (en) * 2013-11-11 2015-05-20 现代摩比斯株式会社 Warning sound output control device of vehicle and method thereof
DE102013226040A1 (en) * 2013-12-16 2015-06-18 Continental Teves Ag & Co. Ohg Warning device and method for warning a motor vehicle driver
US20150269779A1 (en) * 2014-03-20 2015-09-24 Syndiant Inc. Head-Mounted Augumented Reality Display System
US9275136B1 (en) 2013-12-03 2016-03-01 Google Inc. Method for siren detection based on audio samples
US20160125867A1 (en) * 2013-05-31 2016-05-05 Nokia Technologies Oy An Audio Scene Apparatus
US9357320B2 (en) 2014-06-24 2016-05-31 Harmon International Industries, Inc. Headphone listening apparatus
GB2535246A (en) * 2015-05-19 2016-08-17 Ford Global Tech Llc A method and system for increasing driver awareness
US9469247B2 (en) 2013-11-21 2016-10-18 Harman International Industries, Incorporated Using external sounds to alert vehicle occupants of external events and mask in-car conversations
WO2016196003A1 (en) * 2015-06-02 2016-12-08 Karma Automotive Llc Systems and methods for use in a vehicle for detecting external events
US20170075120A1 (en) * 2015-09-11 2017-03-16 Syndiant Inc. See-Through Near-to-Eye Viewing Optical System
CN106569333A (en) * 2015-10-12 2017-04-19 美商晶典有限公司 Perspective-type near-to-eye display optical system
WO2017151937A1 (en) * 2016-03-04 2017-09-08 Emergency Vehicle Alert Systems Llc Emergency vehicle alert and response system
WO2018157251A1 (en) 2017-03-01 2018-09-07 Soltare Inc. Systems and methods for detection of a target sound
EP3376487A1 (en) * 2017-03-15 2018-09-19 Volvo Car Corporation Method and system for providing representative warning sounds within a vehicle
IT201700044705A1 (en) * 2017-04-24 2018-10-24 Guidosimplex S R L System to recognize an emergency vehicle from the sound emitted by a siren of said emergency vehicle and relative method.
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
DE102018202143A1 (en) * 2018-02-12 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Method for operating a vehicle audio system, vehicle audio system and vehicle
US10549719B2 (en) 2018-02-22 2020-02-04 Zubie, Inc. OBD device vehicle security alarm detection
DE102018219255A1 (en) * 2018-11-12 2020-05-14 Zf Friedrichshafen Ag Training system, data set, training method, evaluation device and deployment system for a road vehicle for recording and classifying traffic noise
EP3416408B1 (en) 2017-06-13 2020-12-02 Krauss-Maffei Wegmann GmbH & Co. KG Vehicle with an interior and method for sound transmission into a vehicle interior of a vehicle
DE102019212789A1 (en) * 2019-08-27 2021-03-04 Zf Friedrichshafen Ag Method for recognizing an explosion noise in the surroundings of a vehicle
EP3799033A1 (en) 2019-09-25 2021-03-31 CLAAS E-Systems GmbH Method for driving a noise damping system for an agricultural working vehicle
DE102019130264A1 (en) * 2019-11-08 2021-05-12 Bayerische Motoren Werke Aktiengesellschaft SYSTEM FOR THE ENVIRONMENTAL DETECTION OF A VEHICLE WITH THE AID OF PASSIVE ACOUSTIC SENSORS
DE102019218058A1 (en) * 2019-11-22 2021-05-27 Zf Friedrichshafen Ag Device and method for recognizing reversing maneuvers
CN112991770A (en) * 2021-02-03 2021-06-18 拉扎斯网络科技(上海)有限公司 Travel state monitoring method, travel state monitoring device, electronic apparatus, medium, and program product
US11097745B2 (en) * 2018-02-27 2021-08-24 Toyota Jidosha Kabushiki Kaisha Driving support method, vehicle, and driving support system
US20220191608A1 (en) 2011-06-01 2022-06-16 Staton Techiya Llc Methods and devices for radio frequency (rf) mitigation proximate the ear
US11410673B2 (en) 2017-05-03 2022-08-09 Soltare Inc. Audio processing for vehicle sensory systems
US11489966B2 (en) 2007-05-04 2022-11-01 Staton Techiya, Llc Method and apparatus for in-ear canal sound suppression
US11550535B2 (en) 2007-04-09 2023-01-10 Staton Techiya, Llc Always on headwear recording system
WO2023016924A1 (en) * 2021-08-13 2023-02-16 Zf Friedrichshafen Ag Method and system for generating noises in an interior on the basis of extracted and classified real noise sources, and vehicle which is acoustically transparent to specific target noises and which comprises a system of this type
US11589329B1 (en) 2010-12-30 2023-02-21 Staton Techiya Llc Information processing using a population of data acquisition devices
US20230058709A1 (en) * 2021-08-23 2023-02-23 Toyota Jidosha Kabushiki Kaisha Autonomous vehicle control device
US11610587B2 (en) 2008-09-22 2023-03-21 Staton Techiya Llc Personalized sound management and method
WO2023077067A1 (en) * 2021-10-29 2023-05-04 Atieva, Inc. Attribute utilization to deliver immersive simultaneous sound experience
US11659315B2 (en) 2012-12-17 2023-05-23 Staton Techiya Llc Methods and mechanisms for inflation
US11683643B2 (en) 2007-05-04 2023-06-20 Staton Techiya Llc Method and device for in ear canal echo suppression
US11693617B2 (en) 2014-10-24 2023-07-04 Staton Techiya Llc Method and device for acute sound detection and reproduction
US20230217167A1 (en) * 2022-01-05 2023-07-06 Ford Global Technologies, Llc Vehicle audio enhancement system
US11710473B2 (en) 2007-01-22 2023-07-25 Staton Techiya Llc Method and device for acute sound detection and reproduction
US11708084B2 (en) * 2019-11-25 2023-07-25 Ford Global Technologies, Llc Vehicle sound attenuation
US11741985B2 (en) 2013-12-23 2023-08-29 Staton Techiya Llc Method and device for spectral expansion for an audio signal
US11750965B2 (en) 2007-03-07 2023-09-05 Staton Techiya, Llc Acoustic dampening compensation system
US11818545B2 (en) 2018-04-04 2023-11-14 Staton Techiya Llc Method to acquire preferred dynamic range function for speech enhancement
US11818552B2 (en) 2006-06-14 2023-11-14 Staton Techiya Llc Earguard monitoring system
US11848022B2 (en) 2006-07-08 2023-12-19 Staton Techiya Llc Personal audio assistant device and method
US11856375B2 (en) 2007-05-04 2023-12-26 Staton Techiya Llc Method and device for in-ear echo suppression
US11889275B2 (en) 2008-09-19 2024-01-30 Staton Techiya Llc Acoustic sealing analysis system
US11898870B2 (en) 2021-09-02 2024-02-13 Here Global B.V. Apparatus and methods for providing a route using a map layer of one or more sound events
US11917100B2 (en) 2013-09-22 2024-02-27 Staton Techiya Llc Real-time voice paging voice augmented caller ID/ring tone alias
US11917367B2 (en) 2016-01-22 2024-02-27 Staton Techiya Llc System and method for efficiency among devices

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127734A1 (en) * 2003-06-30 2007-06-07 Christian Brulle-Drews Configurable information distribution system for a vehicle
US20080240458A1 (en) * 2006-12-31 2008-10-02 Personics Holdings Inc. Method and device configured for sound signature detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070127734A1 (en) * 2003-06-30 2007-06-07 Christian Brulle-Drews Configurable information distribution system for a vehicle
US20080240458A1 (en) * 2006-12-31 2008-10-02 Personics Holdings Inc. Method and device configured for sound signature detection

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11818552B2 (en) 2006-06-14 2023-11-14 Staton Techiya Llc Earguard monitoring system
US11848022B2 (en) 2006-07-08 2023-12-19 Staton Techiya Llc Personal audio assistant device and method
US11710473B2 (en) 2007-01-22 2023-07-25 Staton Techiya Llc Method and device for acute sound detection and reproduction
US12047731B2 (en) 2007-03-07 2024-07-23 Staton Techiya Llc Acoustic device and methods
US11750965B2 (en) 2007-03-07 2023-09-05 Staton Techiya, Llc Acoustic dampening compensation system
US11550535B2 (en) 2007-04-09 2023-01-10 Staton Techiya, Llc Always on headwear recording system
US11856375B2 (en) 2007-05-04 2023-12-26 Staton Techiya Llc Method and device for in-ear echo suppression
US11489966B2 (en) 2007-05-04 2022-11-01 Staton Techiya, Llc Method and apparatus for in-ear canal sound suppression
US11683643B2 (en) 2007-05-04 2023-06-20 Staton Techiya Llc Method and device for in ear canal echo suppression
US11889275B2 (en) 2008-09-19 2024-01-30 Staton Techiya Llc Acoustic sealing analysis system
US11610587B2 (en) 2008-09-22 2023-03-21 Staton Techiya Llc Personalized sound management and method
US11589329B1 (en) 2010-12-30 2023-02-21 Staton Techiya Llc Information processing using a population of data acquisition devices
US11832044B2 (en) 2011-06-01 2023-11-28 Staton Techiya Llc Methods and devices for radio frequency (RF) mitigation proximate the ear
US11736849B2 (en) 2011-06-01 2023-08-22 Staton Techiya Llc Methods and devices for radio frequency (RF) mitigation proximate the ear
US20220191608A1 (en) 2011-06-01 2022-06-16 Staton Techiya Llc Methods and devices for radio frequency (rf) mitigation proximate the ear
DE102012016820A1 (en) * 2012-08-24 2014-04-10 GM Global Technology Operations, LLC (n.d. Ges. d. Staates Delaware) Driver assistance system for motor car, has signal processor for assigning safety level of relevance to identified noise, and loudspeaker reproducing identified noise with volume dependant on assigned safety level
US11659315B2 (en) 2012-12-17 2023-05-23 Staton Techiya Llc Methods and mechanisms for inflation
US10685638B2 (en) 2013-05-31 2020-06-16 Nokia Technologies Oy Audio scene apparatus
US20160125867A1 (en) * 2013-05-31 2016-05-05 Nokia Technologies Oy An Audio Scene Apparatus
US10204614B2 (en) * 2013-05-31 2019-02-12 Nokia Technologies Oy Audio scene apparatus
US11917100B2 (en) 2013-09-22 2024-02-27 Staton Techiya Llc Real-time voice paging voice augmented caller ID/ring tone alias
CN104635596A (en) * 2013-11-11 2015-05-20 现代摩比斯株式会社 Warning sound output control device of vehicle and method thereof
US9870764B2 (en) 2013-11-21 2018-01-16 Harman International Industries, Incorporated Using external sounds to alert vehicle occupants of external events and mask in-car conversations
US9469247B2 (en) 2013-11-21 2016-10-18 Harman International Industries, Incorporated Using external sounds to alert vehicle occupants of external events and mask in-car conversations
US9275136B1 (en) 2013-12-03 2016-03-01 Google Inc. Method for siren detection based on audio samples
US10140998B2 (en) 2013-12-03 2018-11-27 Waymo Llc Method for siren detection based on audio samples
DE102013226040A1 (en) * 2013-12-16 2015-06-18 Continental Teves Ag & Co. Ohg Warning device and method for warning a motor vehicle driver
US11741985B2 (en) 2013-12-23 2023-08-29 Staton Techiya Llc Method and device for spectral expansion for an audio signal
US20150269779A1 (en) * 2014-03-20 2015-09-24 Syndiant Inc. Head-Mounted Augumented Reality Display System
US9357320B2 (en) 2014-06-24 2016-05-31 Harmon International Industries, Inc. Headphone listening apparatus
US9591419B2 (en) 2014-06-24 2017-03-07 Harman International Industries, Inc. Headphone listening apparatus
US11693617B2 (en) 2014-10-24 2023-07-04 Staton Techiya Llc Method and device for acute sound detection and reproduction
CN106166989A (en) * 2015-05-19 2016-11-30 福特全球技术公司 A kind of method and system improving driver alertness
GB2535246A (en) * 2015-05-19 2016-08-17 Ford Global Tech Llc A method and system for increasing driver awareness
GB2535246B (en) * 2015-05-19 2019-04-17 Ford Global Tech Llc A method and system for increasing driver awareness
WO2016196003A1 (en) * 2015-06-02 2016-12-08 Karma Automotive Llc Systems and methods for use in a vehicle for detecting external events
US9844981B2 (en) 2015-06-02 2017-12-19 Karma Automotive Llc Systems and methods for use in a vehicle for detecting external events
US20170075120A1 (en) * 2015-09-11 2017-03-16 Syndiant Inc. See-Through Near-to-Eye Viewing Optical System
CN106569333A (en) * 2015-10-12 2017-04-19 美商晶典有限公司 Perspective-type near-to-eye display optical system
US11917367B2 (en) 2016-01-22 2024-02-27 Staton Techiya Llc System and method for efficiency among devices
WO2017151937A1 (en) * 2016-03-04 2017-09-08 Emergency Vehicle Alert Systems Llc Emergency vehicle alert and response system
WO2018157251A1 (en) 2017-03-01 2018-09-07 Soltare Inc. Systems and methods for detection of a target sound
CN110431434A (en) * 2017-03-01 2019-11-08 索尔塔雷有限公司 System and method for detecting target sound
US10916260B2 (en) 2017-03-01 2021-02-09 Soltare Inc. Systems and methods for detection of a target sound
EP3589968A4 (en) * 2017-03-01 2021-04-14 Soltare Inc. Systems and methods for detection of a target sound
EP3376487A1 (en) * 2017-03-15 2018-09-19 Volvo Car Corporation Method and system for providing representative warning sounds within a vehicle
IT201700044705A1 (en) * 2017-04-24 2018-10-24 Guidosimplex S R L System to recognize an emergency vehicle from the sound emitted by a siren of said emergency vehicle and relative method.
WO2018198150A1 (en) * 2017-04-24 2018-11-01 Guidosimplex S.R.L. System for recognizing an emergency vehicle from the sound emitted from a siren of said emergency vehicle and method thereof
US11410673B2 (en) 2017-05-03 2022-08-09 Soltare Inc. Audio processing for vehicle sensory systems
EP3416408B1 (en) 2017-06-13 2020-12-02 Krauss-Maffei Wegmann GmbH & Co. KG Vehicle with an interior and method for sound transmission into a vehicle interior of a vehicle
EP3416408B2 (en) 2017-06-13 2023-08-09 Krauss-Maffei Wegmann GmbH & Co. KG Vehicle with an interior and method for sound transmission into a vehicle interior of a vehicle
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device
DE102018202143A1 (en) * 2018-02-12 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Method for operating a vehicle audio system, vehicle audio system and vehicle
US10549719B2 (en) 2018-02-22 2020-02-04 Zubie, Inc. OBD device vehicle security alarm detection
US11097745B2 (en) * 2018-02-27 2021-08-24 Toyota Jidosha Kabushiki Kaisha Driving support method, vehicle, and driving support system
US11818545B2 (en) 2018-04-04 2023-11-14 Staton Techiya Llc Method to acquire preferred dynamic range function for speech enhancement
DE102018219255A1 (en) * 2018-11-12 2020-05-14 Zf Friedrichshafen Ag Training system, data set, training method, evaluation device and deployment system for a road vehicle for recording and classifying traffic noise
DE102019212789A1 (en) * 2019-08-27 2021-03-04 Zf Friedrichshafen Ag Method for recognizing an explosion noise in the surroundings of a vehicle
EP3799033A1 (en) 2019-09-25 2021-03-31 CLAAS E-Systems GmbH Method for driving a noise damping system for an agricultural working vehicle
DE102019130264A1 (en) * 2019-11-08 2021-05-12 Bayerische Motoren Werke Aktiengesellschaft SYSTEM FOR THE ENVIRONMENTAL DETECTION OF A VEHICLE WITH THE AID OF PASSIVE ACOUSTIC SENSORS
DE102019218058A1 (en) * 2019-11-22 2021-05-27 Zf Friedrichshafen Ag Device and method for recognizing reversing maneuvers
DE102019218058B4 (en) * 2019-11-22 2021-06-10 Zf Friedrichshafen Ag Device and method for recognizing reversing maneuvers
US11708084B2 (en) * 2019-11-25 2023-07-25 Ford Global Technologies, Llc Vehicle sound attenuation
CN112991770A (en) * 2021-02-03 2021-06-18 拉扎斯网络科技(上海)有限公司 Travel state monitoring method, travel state monitoring device, electronic apparatus, medium, and program product
WO2023016924A1 (en) * 2021-08-13 2023-02-16 Zf Friedrichshafen Ag Method and system for generating noises in an interior on the basis of extracted and classified real noise sources, and vehicle which is acoustically transparent to specific target noises and which comprises a system of this type
US20230058709A1 (en) * 2021-08-23 2023-02-23 Toyota Jidosha Kabushiki Kaisha Autonomous vehicle control device
US11898870B2 (en) 2021-09-02 2024-02-13 Here Global B.V. Apparatus and methods for providing a route using a map layer of one or more sound events
WO2023077067A1 (en) * 2021-10-29 2023-05-04 Atieva, Inc. Attribute utilization to deliver immersive simultaneous sound experience
US20230217167A1 (en) * 2022-01-05 2023-07-06 Ford Global Technologies, Llc Vehicle audio enhancement system

Similar Documents

Publication Publication Date Title
WO2012097150A1 (en) Automotive sound recognition system for enhanced situation awareness
US8150044B2 (en) Method and device configured for sound signature detection
US11501772B2 (en) Context aware hearing optimization engine
US11605456B2 (en) Method and device for audio recording
EP3108646B1 (en) Environment sensing intelligent apparatus
CN109714663B (en) Earphone control method, earphone and storage medium
JP3913771B2 (en) Voice identification device, voice identification method, and program
US8194865B2 (en) Method and device for sound detection and audio control
CN104658548B (en) Alerting vehicle occupants to external events and masking in-vehicle conversations with external sounds
US20180233125A1 (en) Wearable audio device
US20090010456A1 (en) Method and device for voice operated control
US20120121103A1 (en) Audio/sound information system and method
US20080079571A1 (en) Safety Device
US10155523B2 (en) Adaptive occupancy conversational awareness system
CN108370457B (en) Personal audio system, sound processing system and related methods
US11211080B2 (en) Conversation dependent volume control
US11626096B2 (en) Vehicle and control method thereof
US20230305797A1 (en) Audio Output Modification
KR101748270B1 (en) Method for providing sound detection information, apparatus detecting sound around vehicle, and vehicle including the same
US10438458B2 (en) Apparatus and method for detection and notification of acoustic warning signals
CN114974289A (en) Method and apparatus for improving speech intelligibility in a space

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12734278

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12734278

Country of ref document: EP

Kind code of ref document: A1