CN108141694B - Event detection for playback management in audio devices - Google Patents

Event detection for playback management in audio devices Download PDF

Info

Publication number
CN108141694B
CN108141694B CN201680058340.7A CN201680058340A CN108141694B CN 108141694 B CN108141694 B CN 108141694B CN 201680058340 A CN201680058340 A CN 201680058340A CN 108141694 B CN108141694 B CN 108141694B
Authority
CN
China
Prior art keywords
ambient sound
sound
detecting
microphone
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201680058340.7A
Other languages
Chinese (zh)
Other versions
CN108141694A (en
Inventor
山缪尔·王·帕尔玛·爱贝耐泽尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cirrus Logic International Semiconductor Ltd
Original Assignee
Cirrus Logic International Semiconductor Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cirrus Logic International Semiconductor Ltd filed Critical Cirrus Logic International Semiconductor Ltd
Priority claimed from PCT/US2016/045834 external-priority patent/WO2017027397A2/en
Publication of CN108141694A publication Critical patent/CN108141694A/en
Application granted granted Critical
Publication of CN108141694B publication Critical patent/CN108141694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/002Damping circuit arrangements for transducers, e.g. motional feedback circuits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/81Detection of presence or absence of voice signals for discriminating voice from music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/05Noise reduction with a separate noise microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Abstract

According to an embodiment of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving at least one input signal representative of ambient sound external to the audio device, detecting near-field sound in the ambient sound from the at least one input signal, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detecting the near-field sound.

Description

Event detection for playback management in audio devices
Cross reference to related applications
The present disclosure claims priority to U.S. non-provisional patent application serial No. 15/229,429 filed on day 5/8/2016, U.S. non-provisional patent application serial No. 15/229,429 claims priority to U.S. provisional patent application serial No. 62/202,303 filed on day 7/8/2015, U.S. provisional patent application serial No. 62/237,868 filed on day 6/10/2015, and U.S. provisional patent application serial No. 62/351,499 filed on day 17/6/2016, each of which is incorporated herein by reference.
Technical Field
The field of representative embodiments of the present disclosure relates to methods, apparatuses, or implementations relating to or relating to playback management in audio devices. Applications include the detection of certain ambient events, but are not limited to applications related to near-field sound detection, proximity sound detection, and tonal alarm detection using spatial processing based on signals received from multiple microphones.
Background
Personal audio devices have become commonplace and are used in a wide variety of ambient environments. The headphones used in these audio devices have been advanced so that occlusion due to passive or active methods prevents the user from tracking the ambient sound field outside the audio device. Although increased isolation and uninterrupted listening is preferred in most cases, sometimes for security or enhanced user experience it is inevitable that the user hears some specific surrounding event and takes appropriate action in response to the event. For example, if a user is listening to music through his headphones and is interrupted by someone attempting to start talking to him or her, it may be difficult to keep talking unless the user pauses the playback signal or reduces the volume of the playback signal. For example, U.S. patent No. 7,903,825 proposes an audio device in which the playback signal is modified according to the ambient sound field. As another example, U.S. patent No. 8,804,974 teaches ambient event detection in a personal audio device, which can then be used to implement event-based modifications to the played back content. The above references also teach the use of microphones to detect a variety of acoustic events. As another example, U.S. application serial No. 14/324,286, filed 7/2014, teaches the use of a voice detector as an event detector to adjust the playback signal during a conversation. As another example, U.S. patent No. 8,565,446 teaches the use of direction of arrival (DOA) estimates and interference-to-desired (near-field) speech signal ratio estimates from a set of multiple microphones to detect desired speech in the presence of non-stationary background noise to control speech enhancement algorithms in Noise Reduction Echo Cancellation (NREC) systems. Likewise, U.S. application serial No. 13/199,593 teaches that the maximum value of the normalized cross-correlation statistic obtained by cross-correlation analysis of multiple microphones may be an effective discriminator for detecting near-field speech. A music detector based on spectral flatness measure for NREC systems is proposed in us patent No. 8,126,706 to distinguish the presence of background noise from background music. U.S. patent No. 7,903,825, U.S. patent No. 8,804,974, U.S. application serial No. 14/324,286, U.S. patent No. 8,565,446, U.S. application serial No. 13/199,593, and U.S. patent No. 8,126,706 are incorporated herein by reference.
Disclosure of Invention
In accordance with the teachings of the present disclosure, one or more disadvantages and problems associated with existing approaches to event detection for playback management in personal audio devices may be reduced or eliminated.
According to an embodiment of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving at least one input signal representative of ambient sound external to the audio device, detecting near-field sound in the ambient sound from the at least one input signal, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detecting the near-field sound.
In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may comprise: an audio output configured to reproduce audio information by generating an audio output signal for delivery to at least one transducer of an audio device; a microphone input configured to receive an input signal representing ambient sound external to the audio device; a processor configured to detect near-field sound in the ambient sound from the input signal, and to modify a characteristic of the audio information in response to detecting the near-field sound.
In accordance with these and other embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving at least one input signal representative of ambient sound external to the audio device, detecting an audio event from the at least one input signal, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detecting the audio event for at least a predetermined time.
In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may comprise: an audio output configured to reproduce audio information by generating an audio output signal for delivery to at least one transducer of an audio device; a microphone input configured to receive an input signal representing ambient sound external to the audio device; a processor configured to detect an audio event from the input signal, and in response to detecting the audio event for at least a predetermined time, modify a characteristic of audio information reproduced to the at least one transducer.
The technical advantages of the present disclosure will be readily apparent to one skilled in the art from the figures, descriptions, and claims included herein. The objects and advantages of the embodiments will be realized and attained by at least the elements, features, and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the claims as set forth in the disclosure.
Drawings
A more complete understanding of embodiments of the present invention and certain advantages thereof may be acquired by referring to the following description in consideration with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
fig. 1 illustrates an example of a use case scenario in which such a detector may be used in conjunction with a playback management system to enhance a user experience, in accordance with an embodiment of the present disclosure;
FIG. 2 illustrates an exemplary playback management system that modifies a playback signal based on a decision of an event detector according to an embodiment of the present disclosure;
FIG. 3 illustrates an exemplary event detector according to an embodiment of the present disclosure;
FIG. 4 shows a functional block of a system for obtaining near-field spatial statistics that may be used to detect audio events, according to an embodiment of the present disclosure;
FIG. 5 illustrates exemplary fusion logic for detecting near-field sounds according to an embodiment of the present disclosure;
FIG. 6 illustrates exemplary fusion logic for detecting near sounds in accordance with embodiments of the present disclosure;
FIG. 7 illustrates an embodiment of a close-in voice detector according to an embodiment of the present disclosure;
FIG. 8 illustrates exemplary fusion logic for detecting tone alarm events according to embodiments of the present disclosure;
FIG. 9 illustrates an exemplary timing diagram showing delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a validated audio event signal in accordance with an embodiment of the present disclosure;
FIG. 10 illustrates different audio event detectors with delay and hysteresis logic according to embodiments of the disclosure.
Detailed Description
According to embodiments of the present disclosure, systems and methods are presented that may use at least three different audio event detectors that may be used in an automatic playback management framework. Such audio event detectors of audio devices may include: a near-field detector that can detect when near-field sounds of the audio device are detected, such as when a user of the audio device (e.g., a user wearing or otherwise using the audio device) is speaking; a proximity detector that can detect when a proximity sound of the audio device is detected, such as when another person near a user of the audio device speaks; a tone alarm detector that detects an acoustic alarm that may have occurred in the vicinity of the audio device. Fig. 1 illustrates an example of a use case scenario in which such a detector may be used in conjunction with a playback management system to enhance a user experience, in accordance with an embodiment of the present disclosure.
Fig. 2 illustrates an exemplary playback management system that modifies the playback signal based on the decision of the event detector 2 according to an embodiment of the present disclosure. The signal processing functions in the processor 50 may include an acoustic echo canceller 1, which acoustic echo canceller 1 may cancel acoustic echoes received at a microphone 52 due to echo coupling between an output audio transducer 51 (e.g., a speaker) and the microphone 52. The echo reduced signal may be passed to an event detector 2, which event detector 2 may detect one or more different ambient events, including but not limited to near field events detected by a near field detector 3 (e.g., including but not limited to speech from a user of the audio device), near field events detected by a near field detector 4 (e.g., including but not limited to speech or other ambient sounds in addition to near field sounds), and/or tonal alarm events detected by an alarm detector 5. If an audio event is detected, the event-based playback control 6 may modify the characteristics of the audio information (shown as "playback content" in FIG. 2) reproduced to the output audio transducer 51. The audio information may include any information that may be reproduced at the output audio transducer 51, including, but not limited to, downlink speech associated with a telephone conversation received via a communication network (e.g., a cellular network) and/or internal audio from an internal audio source (e.g., a music file, a video file, etc.).
Fig. 3 illustrates an example event detector in accordance with an embodiment of the present disclosure. As shown in fig. 3, an exemplary event detector may include a voice activity detector 10, a music detector 9, a direction of arrival estimator 7, a near-field spatial information extractor 8, a background noise sound pressure level estimator 11, and a decision fusion logic device 12. The decision fusion logic device 12 uses information from the voice activity detector 10, the music detector 9, the direction of arrival estimator 7, the near field spatial information extractor 8 and the background noise sound pressure level estimator 11 to detect audio events including, but not limited to, near field sounds, close range sounds other than near field sounds and tone alarms.
The near field detector 3 may detect near field sounds including voices. When such near-field sounds are detected, it may be desirable to modify the audio information reproduced to the output audio transducer 51, since the detection of near-field sounds may indicate that the user is participating in a conversation. Such near-field detection may need to be able to detect near-field sounds in noisy sound conditions and accommodate false detection of near-field sounds in very diverse background noise conditions (e.g., background noise in restaurants, noise while driving a car, etc.). As explained in more detail below, near field detection may require spatial sound processing using multiple microphones 51. In some embodiments, such near field sound detection may be implemented in the same or similar manner as described in U.S. patent No. 8,565,446 and/or U.S. application serial No. 13/199,593.
The proximity detector 4 may detect ambient sounds other than near-field sounds (e.g., speech from a person near the user, background music, etc.). As explained in more detail below, because it may be difficult to distinguish near sounds from non-stationary background noise and background music, the near detector may utilize the music detector and noise sound pressure level estimation to disable near detection by the near detector 4 to avoid poor user experience due to false detection of near sounds. In some embodiments, such close proximity sound detection may be accomplished in the same or similar manner as described in U.S. patent No. 8,126,706, U.S. patent No. 8,565,446, and/or U.S. application serial No. 13/199,593.
The tone alarm detector 5 may detect a tone alarm (e.g., siren) near the audio device. To provide the maximum user experience, it may be desirable for the tonal alarm detector 5 to ignore certain alarms (e.g., weak or low volume alarms). As described in more detail below, tone alarm detection may require spatial sound processing using multiple microphones 51. In some embodiments, such tone alarm detection may be accomplished in the same or similar manner as described in U.S. patent No. 8,126,706 and/or U.S. application serial No. 13/199,593.
FIG. 4 shows functional blocks of a system for obtaining near-field spatial statistics that may be used to detect audio events, according to an embodiment of the present disclosure. Sound pressure level analysis 41 may be performed on microphone 52 by estimating an inter-microphone sound pressure level difference between the near and far microphones (imd) (e.g., as described in U.S. application serial No. 13/199,593). Cross-correlation analysis 13 may be performed on the signals received by microphone 52 to obtain direction of arrival information DOA of ambient sound impinging on microphone 52 (e.g., as described in U.S. patent No. 8,565,446). In the cross-correlation analysis 13, a maximum normalized correlation value norm maxcorr (e.g., as described in U.S. application serial No. 13/199,593) may also be obtained. The voice activity detector 10 may detect the presence of speech and generate a signal speechDet indicative of the presence or absence of speech in the ambient sound (e.g., as described in the probabilistic based speech presence/absence method of U.S. patent No. 7,492,889). The beamformer 15 may generate near-field signal estimates and interference signal estimates based on the signals from the microphones 52, which may be used by the noise analysis 14 to determine the noise sound pressure level noiseLevel and the interference-to-near-field signal ratio idr in the ambient sound. Us patent No. 8,565,446 describes an example method of estimating the interference-to-near-field signal ratio idr using a pair of beamformers 15. The voice activity detector 36 may use the interference estimate to detect any voice signals that do not originate from the desired signal direction (prox spechdet). As long as the direction of arrival estimate DOA of the ambient sound is outside the acceptance angle of the near-field sound, the noise analysis 14 may be performed by updating the interfering signal energy based on the direction of arrival estimate DOA. The direction of arrival of near-field sound may be known a priori for a given microphone array configuration in the industrial design of a personal audio device.
The presence of near-field sound may then be detected using a variety of statistics generated by the system of fig. 4. Fig. 5 illustrates exemplary fusion logic for detecting near-field sounds according to embodiments of the present disclosure. As shown in fig. 5, near-field speech may be detected when all of the following criteria are met:
the direction of arrival estimate DOA of the ambient sound is within the acceptance angle of the near-field sound (block 16);
the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres1 (block 17);
the interference-to-near-field desired signal ratio idr is less than the threshold idrThres1 (block 18);
voice activity is detected, as represented by the signal speeddet (block 19);
the inter-microphone pressure level difference statistic imd is greater than the threshold imdTh (block 42).
In some embodiments, the thresholds idrThres and imdTh may be dynamically adjusted based on the background noise sound pressure level estimate.
The close-in detection by the close-in detector 4 may differ from the near-field sound detection by the near-field detector 3, because the signal characteristics of close-in speech may be very similar to surrounding signals such as music and noise. Therefore, the proximity detector 4 must avoid false detection of near speech to achieve an acceptable user experience. Thus, as long as there is music in the background, the music detector 9 can be used to disable close-range detection. Likewise, the close-range detector 4 may be disabled as long as the background noise sound pressure level is above a certain threshold. The background noise threshold may be determined a priori such that false detections below the threshold sound pressure level are very unlikely. Fig. 6 illustrates exemplary fusion logic for detecting near sounds (e.g., speech) in accordance with embodiments of the present disclosure. Furthermore, there may be many sources of ambient noise that produce transient acoustic stimuli. These noise types may be erroneously detected as voice signals by the voice detector. To reduce the likelihood of false detections, Spectral Flatness Measure (SFM) statistics from the music detector 9 may be used to distinguish speech from transient noise. For example, the SFM may be tracked over a period of time and the difference between the maximum SFM value and the minimum SFM value over the same period of time may be calculated, the difference being defined as sfmSwing. The value of sfmSwing may typically be small for transient noise signals because the spectral content of these signals is broad-band and they tend to level out over short time intervals (300ms-500 ms). The value of sfmSwing may be higher for a voice signal because the spectral content of the voice signal may change faster than the transient signal. As shown in fig. 6, a near sound (e.g., speech) may be detected when all of the following criteria are met:
no music detected in the background (block 20);
the direction of arrival estimate DOA is within the acceptance angle of the near sound (block 21);
the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres2 (block 22);
the background noise sound pressure level noiseLevel is below the threshold noiseLevel th (block 23);
detection of near speech activity, as represented by the signal proxSpeechDet (block 19);
SFM change statistic sfmSwing greater than threshold sfmSwing th (block 37);
the interference-to-near-field desired signal ratio idr is greater than a threshold idrThres2 (block 40);
the inter-microphone pressure level difference statistic imd is close to 0dB (block 43).
In some embodiments, the music detector 9 used to detect the presence of background music may be implemented using a music detector as taught in U.S. patent No. 8,126,706. Another embodiment of a near speech detector according to an embodiment of the present disclosure is shown in fig. 7. According to the present embodiment, a close-up voice can be detected if the following conditions are satisfied.
The interference-to-near-field desired signal ratio idr is greater than a threshold idrThres2 (block 39);
detecting near voice activity (block 27);
the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres3 (block 28);
the direction of arrival estimate DOA is within the acceptance angle of the near sound (block 29);
no music detected in the background (block 30);
the presence of low or medium sound pressure level background noise or the absence of background noise (block 31). This condition is verified by comparing the estimated background noise sound pressure level with a threshold noiseLevelThLo. If a low noise sound pressure level is detected, the following two conditions are also tested to confirm the presence of near speech:
SFM change statistic sfmSwing greater than threshold sfmSwing th (block 38);
the inter-microphone pressure level difference statistic imd is close to 0dB (block 44).
If the above-described background noise pressure level condition is not met at block 31, then the following condition may indicate a near voice to improve the detection rate of near voice without increasing the occurrence of false alarms (e.g., due to background noise conditions):
there is a stationary background noise (block 32). Stationary background noise may be detected by calculating the peak-to-root mean square ratio of the SFM generated by the music detector (block 9) over a period of time. In particular, if the above ratio is high, non-stationary noise may be present because the spectral flatness measure of non-stationary noise tends to vary faster than stationary noise;
there is a high noise sound pressure level (block 32). A high noise condition may be detected if the estimated background noise is greater than the threshold noiseLevelLo and less than the threshold noiseLevelHi. If the stationary noise and direction of arrival conditions are not met at block 32, then the presence of the following set of two conditions may indicate the presence of near speech:
there are close talking close talkers (block 33). A close-talking close-talker may be detected when the maximum normalized cross-correlation statistic normmaxcorrr is greater than a threshold normMaxCorrThres4 (the threshold normMaxCorrThres4 may be greater than normMaxCorrThres3 to indicate the presence of an close-talking talker);
the presence or absence of low or medium or high sound pressure level background noise (block 34). This condition may be detected if the estimated background noise sound pressure level is less than the threshold noiseLevelThHi.
If the above direction-of-arrival condition is not met at block 29, then the presence of the following condition may indicate near speech:
music is not present (block 35);
there are close talking close talkers (block 33). A close-talking close-talker may be detected when the maximum normalized cross-correlation statistic normmaxcorrr is greater than a threshold normMaxCorrThres4 (the threshold normMaxCorrThres4 may be greater than normMaxCorrThres3 to indicate the presence of an close-talking talker);
the presence or absence of low or medium or high sound pressure level background noise (block 34). This condition may be detected if the estimated background noise sound pressure level is less than the threshold noiseLevelThHi.
The tonal alarm detector 5 may be configured to detect tonal alarm signals, where the acoustic bandwidth of such alarm signals is also narrow (e.g., siren, beep). In some embodiments, the pitch of the ambient sound may be detected by dividing the time domain signal into a plurality of sub-bands by time-frequency transformation, and a spectral flatness measure, shown in fig. 6 as the signal sfm [ ] generated by the music detector 9, may be calculated in each sub-band. The spectral flatness measure sfm can be estimated for all sub-bands, and a tone alarm can be detected if the spectrum is flat in most but not all sub-bands. Furthermore, in a playback management system, it may not be necessary to detect far-field alarm signals. Thus, the near field spatial statistics 8 of FIG. 3 may be used to distinguish far field alarm signals from near field signals. Fig. 8 illustrates exemplary fusion logic for detecting tone alarm events (e.g., siren, beep), in accordance with embodiments of the present disclosure. As shown in FIG. 8, a tone alarm event may be detected when all of the following criteria are met:
the direction of arrival estimate DOA is within the acceptance angle of the alarm signal (block 24);
the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres5 (block 25);
the spectral flatness measure sfm [ ] indicates that the noise spectrum is flat in most but not all sub-bands (block 26).
In fact, the transient audio event detections of the near field detector 3, the proximity detector 4 and the tone alert detector 5 as shown in fig. 5, 6,7 and 8 may represent false audio events. Therefore, it may be desirable to verify the transient audio event detection signal before passing it to the playback control 6. FIG. 9 illustrates an exemplary timing diagram showing delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a validated audio event signal, according to an embodiment of the disclosure. As shown in fig. 9, in response to the instantaneous detection of an audio event (e.g., near-field sound, tonal alarm event) lasting at least a predetermined time, the delay logic may generate a validated audio event signal, while the hysteresis logic may continue to assert the validated audio event signal until the instantaneous detection of the audio event has ceased for a second predetermined time.
The following pseudo-code may demonstrate the application of delay and hysteresis logic to reduce false detection of audio events, according to embodiments of the present disclosure.
/*If the instant.detect is true,increment the hold off counter and reset the hang over counter*/
If(instDet==TRUE)
{
holdOffCntr=holdOffCntr+1;
hangOverCntr=0;
}
/*If the instant.detect is false,increment the hang over counter and reset the hold off counter*/
else
{
hangOverCntr=hangOverCntr+1;
holdOffCntr=0;
}
/******************
*Hold-off Logic*
******************/
/*Valid detect will transition to true state if the instant.detect is continuously true for certain time and the previous valid detect is false*/if(holdOffCntr>holdOffThres&&validDet==FALSE)
{
validDet=TRUE;
holdOffCntr=0;
hangOverCntr=0;
}
/******************
*Hang-Over Logic*
******************/
/*Valid NF detect will transition to false state if the instant.NF detect is continuously false for certain time and the previous valid NF detect is true*/
If(hangOverCntr>hangOverThres&&validDet==TRUE)
{
validDet=FALSE;
holdOffCntr=0;
hangOverCntr=0;
}
The verified event may be further verified before generating the playback mode switching control. For example, the following pseudo-code may demonstrate the application of delay and hysteresis logic for gracefully switching between a talk mode (e.g., where audio information reproduced to the output audio transducer 51 may be modified in response to an audio event) and a normal playback mode (e.g., where audio information reproduced to the output audio transducer 51 is unmodified).
/***********************************
*Conversational Mode Enter Logic*
***********************************/
/*Increment the time to enter conversational mode counter if the event detect is true and the mode is not in the conversational mode.If the counter exceeds the threshold,switch to conversational mode and reset the counters.Note that the event detect need not be true contiguously.*/if(convModeEn==FALSE&&validDet==TRUE)
{
timeToEnterConvModeCntr=timeToEnterConvModeCntr+1;
if(timeToEnterConvModeCntr>timeToEnterConvModeThres)
{
convModeEn=TRUE;
timeToEnterConvModeCntr=0;
timeToExitConvModeCntr=0;
}
}
/***********************************
*Conversational Mode Exit Logic*
***********************************/
/*Increment the time to exit conversational mode counter if the event detect is false and the mode is in the conversational mode.If the counter exceeds the threshold,switch to normal mode and reset the counters.
Note that the event detect must be false contiguously.*/
if(convModeEn==TRUE&&validDet==FALSE)
{
timeToExitConvModeCntr++;
if(timeToExitConvModeCntr>timeToExitConvModeThres)
{
convModeEn=FALSE;
timeToEnterConvModeCntr=0;
timeToExitConvModeCntr=0;
}
}
else
{
timeToExitConvModeCntr=0;
}
FIG. 10 illustrates different audio event detectors with delay and hysteresis logic according to embodiments of the disclosure. The delay period and/or the hysteresis period of the respective detectors may be set differently. In addition, in some embodiments, playback management may be controlled differently based on the type of event detected. In these and other embodiments, as shown in fig. 9, the playback gain (and thus the audio information reproduced at the output audio transducer 51) may be attenuated whenever one or more of the audio events are detected. In these and other embodiments, to provide smooth gain transitions, the playback gain may be smoothed using a first order exponential averaging filter represented by the following pseudocode:
if(convModeEn==TRUE)
{
playBackGain=(1-alpha)*convModeGain+alpha*playBackGain
}
else
{
playBackGain=(1-beta)*normalModeGain+beta*playBackGain
}
the smoothing parameters a and β may be set to different values to adjust the gain slope.
It should be understood that various operations described herein, particularly in conjunction with the figures, may be implemented by other circuits or other hardware components, particularly by those of ordinary skill in the art having the benefit of this disclosure. The order in which the various operations of a given method are performed can be varied, and various elements of the systems illustrated herein can be added, reordered, combined, omitted, modified, etc. The disclosure is intended to embrace all such modifications and changes, and therefore the above description should be taken as illustrative and not restrictive.
Likewise, although the present disclosure makes reference to specific embodiments, certain modifications and changes may be made to these embodiments without departing from the scope of the present disclosure. Furthermore, any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element.
Further embodiments will likewise be apparent to those of ordinary skill in the art, given the benefit of this disclosure, and such embodiments should be considered to be encompassed herein.

Claims (68)

1. A method for processing audio information in an audio device, the method comprising:
receiving a first signal representing audio information;
generating, based on the first signal, an audio output signal for delivery to at least one transducer of the audio device;
causing the at least one transducer to generate sound from the audio output signal;
receiving at least one input signal representing ambient sound external to the audio device;
determining a near-field spatial statistic of the ambient sound;
detecting near-field sound and near-range sound in surrounding sound according to the at least one input signal;
modifying a characteristic of the audio output signal in response to detecting near-field sound;
causing the at least one transducer to generate a modified sound from the modified audio output signal;
close-in detection by the close-in detector is disabled using the music detector and the noise sound pressure level estimation.
2. The method of claim 1, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is sound from a user of the audio device.
3. The method of claim 1, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is speech from a user of the audio device.
4. The method of claim 1, wherein modifying a characteristic of the audio output signal comprises attenuating the audio information.
5. The method of claim 1, further comprising modifying a characteristic of the audio output signal in response to detecting near-field sound for at least a predetermined time.
6. The method of claim 5, further comprising:
detecting an absence of near-field sound in ambient sound from the at least one input signal;
in response to the absence of near-field sound for at least a second predetermined time, ceasing to modify the characteristic of the audio output signal.
7. The method of claim 1, further comprising:
detecting ambient sound other than near-field sound among the ambient sound based on the at least one input signal in addition to the near-field sound;
in response to detecting ambient sound, a characteristic of the audio output signal is modified.
8. The method of claim 7, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is a sound other than near-field sound.
9. The method of claim 7, further comprising:
detecting whether ambient sound includes background noise based on the at least one input signal;
in response to detecting background noise in the ambient sound, modifying a characteristic of the audio output signal.
10. The method of claim 7, further comprising:
detecting whether ambient sound includes a tonal alarm based on the at least one input signal;
modifying a characteristic of the audio output signal in response to detecting a tonal alarm in the ambient sound.
11. The method of claim 10, wherein detecting a tonal alarm in ambient sound comprises:
detecting a direction of an ambient sound from the at least one input signal;
detecting a spectral flatness measure of the ambient sound from the at least one input signal; and
a tonal alarm is detected based on the direction of ambient sound, the presence or absence of background noise, and near-field spatial statistics.
12. The method of claim 11, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
13. The method of claim 11, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.
14. The method of claim 11, wherein detecting near-field spatial statistics comprises detecting whether a normalized cross-correlation statistic is greater than a threshold.
15. The method of claim 11, wherein detecting a spectral flatness measure of the ambient sound comprises detecting whether a noise spectrum is flat in most, but not all, of the sub-bands of the ambient sound.
16. The method of claim 1, wherein detecting near-field sounds in ambient sounds comprises:
detecting a direction of an ambient sound from the at least one input signal;
detecting a presence of speech in ambient sound from the at least one input signal; and
near-field sound is detected based on the direction, the presence or absence of speech, and near-field spatial statistics.
17. The method of claim 16, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
18. The method of claim 16, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a disturbance-to-signal ratio associated with near-field sound.
19. The method of claim 16, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include an inter-microphone sound pressure level difference between the first microphone signal and the second microphone signal.
20. The method of claim 16, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.
21. The method of claim 16, wherein detecting near-field spatial statistics comprises: detecting whether the normalized cross-correlation statistic is greater than a first threshold;
detecting whether the interference-to-near-field desired signal ratio is less than a second threshold;
it is detected whether the inter-microphone sound pressure level difference is larger than a third threshold.
22. The method of claim 21, wherein the second threshold is adjusted based on an estimate of background noise in ambient sound.
23. The method of claim 21, wherein the third threshold is adjusted based on an estimate of background noise in ambient sound.
24. The method of claim 1, further comprising:
detecting a direction of an ambient sound from the at least one input signal;
detecting the presence of background noise in ambient sound from the at least one input signal;
detecting the presence of near speech in the ambient sound from the at least one input signal;
detecting a volume of ambient sound based on the at least one input signal;
detecting the presence of an audio event comprising a near-sound event based on the direction, the presence or absence of background noise, the presence or absence of speech, the volume, and near-field spatial statistics; and
modifying a characteristic of the audio output signal in response to the detection of the presence of the audio event.
25. The method of claim 24, further comprising:
detecting a change in a spectral component of the ambient sound;
the presence of audio events, including near-sound events, is detected based on direction, presence or absence of background noise, presence or absence of speech, volume, near-field spatial statistics, and spectral content of ambient sound.
26. The method of claim 25, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
27. The method of claim 25, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a disturbance-to-signal ratio associated with near-field sound.
28. The method of claim 25, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include an inter-microphone sound pressure level difference between the first microphone signal and the second microphone signal.
29. The method of claim 25, wherein detecting the presence of near speech in ambient sound comprises detecting stationary background noise.
30. The method of claim 25, wherein detecting the presence of near speech in ambient sound comprises detecting speech from a close-talking near talker.
31. The method of claim 25, wherein detecting the presence of near speech in the ambient sound comprises detecting a spectral flatness measure of the ambient sound from the at least one input signal, wherein detecting the spectral flatness measure of the ambient sound comprises detecting a spectral component change of the ambient sound.
32. An integrated circuit for implementing at least a portion of an audio device, the integrated circuit comprising:
an input configured to receive a first signal representing audio information;
an audio output configured to generate an audio output signal for delivery to at least one transducer of the audio device based on the first signal, the audio output operative to cause the at least one transducer to generate sound from the audio output signal;
a microphone input configured to receive at least one input signal representative of ambient sound external to the audio device; and
a processor configured to:
determining a near-field spatial statistic of the ambient sound;
detecting near-field sound and near-range sound in surrounding sound according to the at least one input signal;
modifying a characteristic of the audio output information in response to detecting near-field sound;
causing the at least one transducer to generate a modified sound from the modified audio output signal;
close-in detection by the close-in detector is disabled using the music detector and the noise sound pressure level estimation.
33. The integrated circuit of claim 32, the processor further configured to:
determining a direction of the ambient sound based on the at least one input signal;
modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is sound from a user of the audio device.
34. The integrated circuit of claim 32, the processor further configured to:
determining a direction of the ambient sound based on the at least one input signal;
modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is speech from a user of the audio device.
35. The integrated circuit of claim 32, wherein modifying a characteristic of the audio output signal comprises attenuating the audio information.
36. The integrated circuit of claim 32, the processor further configured to modify a characteristic of the audio output signal in response to detecting near-field sound for at least a predetermined time.
37. The integrated circuit of claim 36, the processor further configured to:
detecting an absence of near-field sound in ambient sound from the at least one input signal;
in response to the absence of near-field sound for at least a second predetermined time, ceasing to modify the characteristic of the audio output signal.
38. The integrated circuit of claim 36, the processor further configured to:
detecting ambient sound other than near-field sound among the ambient sound based on the at least one input signal in addition to the near-field sound;
in response to detecting ambient sound, a characteristic of the audio output signal is modified.
39. The integrated circuit of claim 38, the processor further configured to:
determining a direction of the ambient sound based on the at least one input signal;
modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is a sound other than near-field sound.
40. The integrated circuit of claim 38, the processor further configured to:
detecting whether ambient sound includes background noise based on the at least one input signal;
in response to detecting background noise in the ambient sound, modifying a characteristic of the audio output signal.
41. The integrated circuit of claim 38, the processor further configured to:
detecting whether ambient sound includes a tonal alarm based on the at least one input signal;
modifying a characteristic of the audio output signal in response to detecting a tonal alarm in the ambient sound.
42. The integrated circuit of claim 41, wherein detecting a tonal alarm in ambient sound comprises:
detecting a direction of an ambient sound from the at least one input signal;
detecting a spectral flatness measure of the ambient sound from the at least one input signal;
a tonal alarm is detected based on the direction of ambient sound, the presence or absence of background noise, and near-field spatial statistics.
43. The integrated circuit of claim 41, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
44. The integrated circuit of claim 42, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.
45. The integrated circuit of claim 42, wherein detecting near-field spatial statistics comprises detecting whether a normalized cross-correlation statistic is greater than a threshold.
46. The integrated circuit of claim 42, wherein detecting a spectral flatness measure of the ambient sound comprises detecting whether a noise spectrum is flat in most, but not all, of the sub-bands of the ambient sound.
47. The integrated circuit of claim 32, wherein detecting near-field sounds in ambient sounds comprises:
detecting a direction of an ambient sound from the at least one input signal;
detecting a presence of speech in ambient sound from the at least one input signal;
near-field sound is detected based on the direction, the presence or absence of speech, and near-field spatial statistics.
48. The integrated circuit of claim 47, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
49. The integrated circuit of claim 47, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a disturbance-to-signal ratio associated with near-field sound.
50. The integrated circuit of claim 47, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include an inter-microphone sound pressure level difference between the first microphone signal and the second microphone signal.
51. The integrated circuit of claim 47, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.
52. The integrated circuit of claim 47, wherein detecting near-field spatial statistics comprises:
detecting whether the normalized cross-correlation statistic is greater than a first threshold;
detecting whether the interference-to-near-field desired signal ratio is less than a second threshold;
it is detected whether the inter-microphone sound pressure level difference is larger than a third threshold.
53. The integrated circuit of claim 52, wherein the second threshold is adjusted based on an estimate of background noise in ambient sound.
54. The integrated circuit of claim 52, wherein the third threshold is adjusted based on an estimate of background noise in ambient sound.
55. The integrated circuit of claim 32, the processor further configured to:
detecting a direction of an ambient sound from the at least one input signal;
detecting the presence of background noise in ambient sound from the at least one input signal;
detecting the presence of near speech in the ambient sound from the at least one input signal;
detecting a volume of ambient sound based on the at least one input signal;
detecting the presence of an audio event comprising a near-sound event based on the direction, the presence or absence of background noise, the presence or absence of speech, the volume, and near-field spatial statistics; and
modifying a characteristic of the audio output signal in response to the detection of the presence of the audio event.
56. The integrated circuit of claim 32, the processor further configured to:
detecting a change in a spectral component of the ambient sound;
the presence of audio events, including near-sound events, is detected based on direction, presence or absence of background noise, presence or absence of speech, volume, near-field spatial statistics, and spectral content of ambient sound.
57. The integrated circuit of claim 56, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.
58. The integrated circuit of claim 56, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include a disturbance-to-signal ratio associated with near-field sound.
59. The integrated circuit of claim 56, wherein:
the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;
the near-field spatial statistics include an inter-microphone sound pressure level difference between the first microphone signal and the second microphone signal.
60. The integrated circuit of claim 56, wherein detecting the presence of near speech in ambient sound comprises detecting stationary background noise.
61. The integrated circuit of claim 56, wherein detecting the presence of near speech in ambient sound comprises detecting speech from a close-talking near talker.
62. The integrated circuit of claim 56, wherein detecting the presence of near speech in the ambient sound comprises detecting a spectral flatness measure of the ambient sound from the at least one input signal, wherein detecting a spectral flatness measure of the ambient sound comprises detecting a change in a spectral composition of the ambient sound.
63. A method for processing audio information in an audio device, the method comprising:
receiving a first signal representing audio information;
generating, based on the first signal, an audio output signal for delivery to at least one transducer of the audio device;
causing the at least one transducer to generate sound from the audio output signal;
receiving at least one input signal representing ambient sound external to the audio device;
determining a near-field spatial statistic of the ambient sound;
detecting an audio event comprising a close-in sound from the at least one input signal;
modifying a characteristic of the audio output signal in response to detecting an audio event for at least a predetermined time;
causing the at least one transducer to generate a modified sound from the modified audio output signal;
close-in detection by the close-in detector is disabled using the music detector and the noise sound pressure level estimation.
64. The method of claim 63, further comprising ceasing to modify the characteristic of the audio information in response to the absence of an audio event for at least a second predetermined time.
65. The method of claim 63, wherein the audio event comprises at least one of a near-field event, a near event, and an alarm event.
66. An integrated circuit for implementing at least a portion of an audio device, the integrated circuit comprising:
an input configured to receive a first signal representing audio information;
an audio output configured to generate an audio output signal for delivery to at least one transducer of the audio device based on the first signal, the audio output operative to cause the at least one transducer to generate sound from the audio output signal;
a microphone input configured to receive at least one input signal representative of ambient sound external to the audio device; and
a processor configured to:
determining a near-field spatial statistic of the ambient sound;
detecting an audio event comprising a close-up sound from the input signal;
modifying a characteristic of the audio output signal in response to detecting an audio event for at least a predetermined time;
causing the at least one transducer to generate a modified sound from the modified audio output signal;
close-in detection by the close-in detector is disabled using the music detector and the noise sound pressure level estimation.
67. The integrated circuit of claim 66, the processor further configured to stop modifying the characteristic of the audio output signal in response to the absence of an audio event for at least a second predetermined time.
68. The integrated circuit of claim 66, wherein the audio event comprises at least one of a near-field event, a near event, and an alarm event.
CN201680058340.7A 2015-08-07 2016-08-05 Event detection for playback management in audio devices Active CN108141694B (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US201562202303P 2015-08-07 2015-08-07
US62/202,303 2015-08-07
US201562237868P 2015-10-06 2015-10-06
US62/237,868 2015-10-06
US201662351499P 2016-06-17 2016-06-17
US62/351,499 2016-06-17
PCT/US2016/045834 WO2017027397A2 (en) 2015-08-07 2016-08-05 Event detection for playback management in an audio device

Publications (2)

Publication Number Publication Date
CN108141694A CN108141694A (en) 2018-06-08
CN108141694B true CN108141694B (en) 2021-03-16

Family

ID=62079093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680058340.7A Active CN108141694B (en) 2015-08-07 2016-08-05 Event detection for playback management in audio devices

Country Status (2)

Country Link
EP (1) EP3332558B1 (en)
CN (1) CN108141694B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4306115A (en) * 1980-03-19 1981-12-15 Humphrey Francis S Automatic volume control system
JP2004336251A (en) * 2003-05-02 2004-11-25 Alpine Electronics Inc Hearing defect preventing apparatus
CN1682441A (en) * 2002-07-26 2005-10-12 摩托罗拉公司(在特拉华州注册的公司) Electrical impedance based audio compensation in audio devices and methods therefor
JP2011097268A (en) * 2009-10-28 2011-05-12 Sony Corp Playback device, headphone, and playback method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8150044B2 (en) * 2006-12-31 2012-04-03 Personics Holdings Inc. Method and device configured for sound signature detection
US9270244B2 (en) * 2013-03-13 2016-02-23 Personics Holdings, Llc System and method to detect close voice sources and automatically enhance situation awareness
US9338551B2 (en) * 2013-03-15 2016-05-10 Broadcom Corporation Multi-microphone source tracking and noise suppression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4306115A (en) * 1980-03-19 1981-12-15 Humphrey Francis S Automatic volume control system
CN1682441A (en) * 2002-07-26 2005-10-12 摩托罗拉公司(在特拉华州注册的公司) Electrical impedance based audio compensation in audio devices and methods therefor
JP2004336251A (en) * 2003-05-02 2004-11-25 Alpine Electronics Inc Hearing defect preventing apparatus
JP2011097268A (en) * 2009-10-28 2011-05-12 Sony Corp Playback device, headphone, and playback method

Also Published As

Publication number Publication date
CN108141694A (en) 2018-06-08
EP3332558A2 (en) 2018-06-13
EP3332558B1 (en) 2021-12-01

Similar Documents

Publication Publication Date Title
US10079026B1 (en) Spatially-controlled noise reduction for headsets with variable microphone array orientation
CN110741434B (en) Dual microphone speech processing for headphones with variable microphone array orientation
US10546593B2 (en) Deep learning driven multi-channel filtering for speech enhancement
US11621017B2 (en) Event detection for playback management in an audio device
US10331396B2 (en) Filter and method for informed spatial filtering using multiple instantaneous direction-of-arrival estimates
US8842851B2 (en) Audio source localization system and method
US7464029B2 (en) Robust separation of speech signals in a noisy environment
JP5581329B2 (en) Conversation detection device, hearing aid, and conversation detection method
KR102352927B1 (en) Correlation-based near-field detector
US8143620B1 (en) System and method for adaptive classification of audio sources
WO2008041878A2 (en) System and procedure of hands free speech communication using a microphone array
US9813808B1 (en) Adaptive directional audio enhancement and selection
EP2896126B1 (en) Long term monitoring of transmission and voice activity patterns for regulating gain control
US9532138B1 (en) Systems and methods for suppressing audio noise in a communication system
US9225937B2 (en) Ultrasound pairing signal control in a teleconferencing system
CN108141694B (en) Event detection for playback management in audio devices
JP2021531675A (en) Forced gap insertion for pervasive restoring
JPH0327698A (en) Sound signal detection method
Krasny et al. Voice activity detector for microphone array processing in hand-free systems
JP2024501427A (en) Gaps organized for pervasive listening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant