GB2499781A - Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector - Google Patents

Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector Download PDF

Info

Publication number
GB2499781A
GB2499781A GB201202662A GB201202662A GB2499781A GB 2499781 A GB2499781 A GB 2499781A GB 201202662 A GB201202662 A GB 201202662A GB 201202662 A GB201202662 A GB 201202662A GB 2499781 A GB2499781 A GB 2499781A
Authority
GB
United Kingdom
Prior art keywords
mouth
signal
frequency
proximity
ultrasonic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB201202662A
Other versions
GB201202662D0 (en
Inventor
Ian Vince Mcloughlin
Faraneh Ahmadi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to GB201202662A priority Critical patent/GB2499781A/en
Publication of GB201202662D0 publication Critical patent/GB201202662D0/en
Publication of GB2499781A publication Critical patent/GB2499781A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/52Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S15/00
    • G01S7/539Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S15/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

An apparatus and a method for using low-frequency ultrasonic information to improve speech communications, recognition or other processing tasks through determination of mouth state, openness, orientation, proximity, shape and so on. A low frequency ultrasonic signal which can be a chirp pulse train is generated within a device located in the proximity of the mouth, transmitted towards the users face and the reflected acoustic signal is transmitted back from the human face to this or another cooperating device. The received signals are analyzed to determine both time-domain and frequency-domain representations to reveal specific and useful information pertaining to the users mouth state. The system comprises a voice activity detector (VAD) and the information is used to ensure that it only operates when speech is present and not when background noise is present.

Description

1
Method and apparatus for mouth state determination using acoustic information
BACKGROUND
Devices that capture human speech generally use microphones to pick up the sound, but they also pick up interfering noise (such as extraneous acoustic background noise and electrical noise) at the same time. This interfering noise is mixed in with the speech and corrupts, or reduces the quality of the recorded signal.
In general terms, the further away the microphone is from the mouth, the greater the proportion of background noise is picked up. Although some types of background noise can be mainly removed using post-processing, many types of common noise can not be effectively removed. Also, different levels of background noise exist in different usage situations. For example, a library environment may exhibit very low levels of background noise, whereas a busy railway platform may exhibit high levels. Most speech systems are required to be capable of operating in both classes of environment.
Systems such as mobile telephones commonly include something called a voice activity detector (VAD), that triggers whenever a spoken voice is detected. Such systems will often only process and transmit sound once the VAD has been triggered. At other times they will remain idle (and thus consume less power). This can be true for mobile phones, video conferencing systems, speech recognition systems and voice recorders. The VAD will often have a "hang time" of around a second, meaning that it will remain turned on for this length of time even after it has detected speech has ceased.
In normal use in a mobile phone scenario, if the user of such a system is involved in a conversation, he may typically speak for only 40% of the time. The use of a VAD switch means that the system can save 60% of the energy that would otherwise be spent on running complex coding algorithms. Likewise, the system can save up to 60% of the data that must be transmitted over the wireless connection.
In noisy environments, background noise will often trigger the VAD in such systems, even when the user is not speaking. Three negative consequences of this are that (i) the systems assume that speech is present when in fact it is not, and can consume a significant amount of energy attempting to encode or process speech that is not really there, (ii) in a communications system, the transmission channel (which would normally turn on to transmit useful data when speech is present) will be actively transmitting for a greater proportion of time, and will be spending much of that time transmitting nothing more than noise, (iii) non-speech noise signals will be confused as being speech, and will therefore be processed and transmitted, and eventually heard (as corrupting noise) at the other end of the communications system. This will reduce the quality and impact the intelligibility of the speech communications.
2
STATEMENT OF INVENTION
The present invention is able to overcome these issues in a number of ways. Firstly, it can determine the mouth state of a user (e.g. open/closed/in between). This is accomplished by examining the signals from an audio transducer, or two audio transducers (such as microphone and speaker), in either a passive or active configuration. In the active configuration, a transducer (e.g. loudspeaker) generates an acoustic signal. This signal propagates through the air, by skin contact or body transmission to the face and head of the user. In turn, an acoustic return signal is picked up a short time later by a transducer (e.g. microphone). In a passive configuration, signals that may already be present or generated elsewhere are used as the source.
The information gleaned from the return signal is not just a static indicator of whether the mouth is open or closed, but includes the dynamics statistics of mouth opening and closing, reveals face shape, and can differentiate degrees of mouth opening. The dynamic information reveals prosody, syllabic rate, word rate and sentence rate during speech. All of these are items of useful information to applications such as speech processing, and particularly in speech recognition.
The invention, in its active configuration, transmits an acoustic signal such as chirp pulse train (linear sweep-frequency cosine) from a source such as a loudspeaker, towards the face of the user, and receives back an altered signal (due to reflection, conduction and other mechanisms). The signals used can be audible or inaudible (such as ultrasonic), and in the preferred embodiment, the inventors use a chirp pulse train within a frequency range of approximately 14 to 21kHz, which is slightly above the threshold of hearing for most people (this is called low-frequency ultrasonics).
ADVANTAGES
Firstly, the system allows the operation of existing speech processing systems to be improved. For example (i) systems need only operate when speech is present (as with the VAD, but will not be triggered accidentally by background noises), (ii) systems need only transmit sound when speech is present and will not find themselves largely transmitting background noise, (iii) systems can ignore those background noises - even very speech-like noises that would always trigger a VAD - that occur when the users mouth is closed, (iv) speech recognition and/or processing systems can determine syllabic, prosodic, word, sentence rate, and adapt their energy use, data transmission use, and CPU scheduling patterns, accordingly. This will reduce energy consumption and increase the recognition accuracy of received speech. It will also allow such systems to adapt to new users quicker.
Using a near-audible ultrasonic chirp (such as the one starting at 14kHz mentioned above), a user will not hear any sound, and yet normal transducers can be used, like the microphones and loudspeakers already built into most modern smart phones. This means that the present invention can be used in a smart phone without requiring any special additional hardware to be installed.
The signal processing necessary to decode and analyse the received acoustic signal can range from very low complexity techniques with reasonable accuracy that determine only an open or closed state, up to far more advanced techniques that yield degree of mouth opening, as well as the dynamic statistics mentioned above which are of great advantage to computer speech recognition and associated systems (e.g. automatic speech recognition, pass phrase recognition or validation, speaker recognition or validation, language detection or validation, emotional state detection and so on). The revealed face information can also be used for security purposes.
3
INTRODUCTION TO DRAWINGS
An example of the invention will now be described by referring to the accompanying drawings.
^Figure 1 shows the invention installed within a standard mobile phone, operated by a user.
* Figure 2 illustrates an acoustic signal propagating from the unit containing the invention, impinging upon the face of the user, propagating back to the unit containing the invention, where it is captured and analysed.
^Figure 3 shows the computational analytical hardware connected to transducers (in this case a microphone and a loudspeaker).
^Figure 4 provides detail of the analytical system used within the computational hardware.
^Figure 5 shows how the output of the analytic system serves as an input to existing and common speech processing algorithms.
^Figure 6 shows the chirp signal as generated by the invention, showing normalised amplitude (y-axis) plotted against normalised time (x-axis).
^Figure 7 plots the received signal in the time domain after being reflected back from the mouth of the user (as amplitude against time), showing periods of closed mouth, followed by open mouth, and then closed mouth again. This plot spans a total of 15 received chirps.
^Figure 8 shows detection analysis results for mouth closed, open and then closed.
^Figure 9. Time-frequency analysis of the closed/open/closed reflected chirp signal.
4
DETAILED DESCRIPTION
In practice, the invention may exist within a mobile phone 101, where the presence of the invention may not even be obvious to the user 103, unless he notices the longer battery life and better quality speech transmission that the invention can lead to. In use, as the user locates the mobile phone in a natural orientation for speaking and listening, an acoustic signal 102 produced by the mobile phone impinges upon the users face, lip and mouth area, is reflected, and then received back by the device. A second embodiment can also be implemented as an external headset or other device used for vocal communications, which could also be attached to a mobile phone, to a sound recorder or to some other speech processing unit.
Hardware and software within the mobile phone 101 would pick up the received signal to determine the state of the users face. The most important characteristic of this state for the present application is whether the mouth is open or closed. Whilst the present apparatus is designed to detect either open or closed mouth, identical hardware and signal processing using a different decision-making process could also be used to provide an estimate of the degree of mouth openness, and indeed also gauge the proximity and orientation of the mobile phone with respect to the users mouth.
The front-end device 101 preferably creates and causes an ultrasonic signal 102 to impinge upon the face of the user 103. The reflection received back at 101 has different signal characteristics depending upon the shape of the face 103 and the proximity of 101 to 103.
As with most current digital audio and speech systems, one or more units of computational hardware 110 drive a loudspeaker 124 through a digital-to-analogue converter 120 and conditioner 123. Similarly, received acoustic signals are captured with a microphone 119, conditioned 118 and then converted to a form suitable for the computational hardware using an analogue-to-digital converter 115.
The computational hardware block 110, as well as creating the signal to be output, also analyses the received information. Those skilled in the art would recognise that 119 and 124 could be the same physical transducer device, and that there is no fixed requirement for 110 to handle both receive and transmit; these signals could equally well be handled by separate hardware, but are combined within the present embodiment of this invention for reasons of cost and efficiency.
It should also be recognised that 110, 115, 118, 119, 120, 123 and 124 are not specialised hardware - they are common to the vast majority of modern audio equipment that is capable of both recording and playback of sound. One requirement is that the digital elements of the system, the analogue signal paths, and transducers 119, 124 are all capable of handling the transmitted and received signals. For example, if the chirp frequency lies between 14kHz and 21kHz then the hardware should be capable of handling signals of up to 21kHz. For the digital part, the well known Nyquist criteria states that the sample rate should be at least 42kHz. In practice, most digital audio systems are able to operate with sample rates of at least 44.1kHz.
Ideally, the system should operate with inaudible signals so as not to impede normal operation. This implies either ultrasonic or infrasonic signals, although low-frequency ultrasonics are used in the preferred embodiment. These lie just above the threshold of human hearing (i.e. above 14kHz for most people, or 20kHz for 'golden eared' audiophiles. We place no upper limit on this frequency, however for this invention to be technologically attractive for use in smart phones, it is likely that the optimal signal range would need to be similar to that used in the demonstration
5
system (approximately 14 kHz - 21 kHz): this can be handled by most existing speakers and microphones, does not require extra regulatory approval (which higher ultrasound frequencies might require), and is easy to both generate and process.
In fact, a plurality of possible signals could be transmitted from the device. We have found that many alternatives can be made to work, but preferably signals that spread across the frequency band of interest. This would include pulses, steps, white noise or chirps. Also, we prefer to actively generate this ultrasonic signal, however use and detection of passive signals is also possible.
Best results have been demonstrated by transmitting a linear chirp 300 that slides in frequency in equal steps from 14 kHz up to 21 kHz at a repetition rate of perhaps 0.3Hz to 5Hz (reduced repetition rate does not affect the operation of the analysis, simply how frequently the determination of mouth open/mouth closed is made and the degree of computational power required to process it).
The invention can be operated in real-time (that is, the analysis of mouth open/mouth closed is made immediately after each chirp has been received), or the signals may be recorded, stored, and processed in retrospect. In this case, the same determination is made, chirp-by-chirp at the time of analysis.
In general use, a signal generator block 171 , operating inside the computation hardware unit 110, would produce acoustic signal 300, called the excitation signal, in conjunction with the acoustic transmission transducer 124, nominally located in a handset or headset 134 located as close to the mouth of the user 103 as possible.
As already described, the acoustic signal impinges upon the face of the user 103. The acoustic transducer 119 receives a signal 400 which could contain recognisable periodic chirp signals, but which differ from the originally transmitted chirps 300.
The analysis of the received signal 400 is key to this invention. One preferred method of analysis is to begin by comparing each transmitted chirp 300 as output from the signal generator 171 with the received chirp 400 as captured by the input transducer 134 and associated hardware. The time shift between the two signals yields information pertaining to the distance between the transducers 134 and the users face 103.
The amplitude envelope of the received chirp 400 also reveals the resonant frequencies of the spatial resonant chamber formed between the loudspeaker, face/mouth and the microphone (i.e. between 134 and 103). Very clearly, the resonant pattern in the received signal 400 changes between the two conditions of mouth open 402 and mouth closed 401.
The simplest explanation is that the human vocal tract is a highly resonant cavity, so opening the mouth provides the transmitted ultrasonic signal 300 with a frequency selective resonance chamber (comprising the mouth plus vocal tract, and taking into account the resonances between the transducers and face).
As a result the envelope of the received signal 400 in the open mouth state 402, shows significant peaks and troughs indicating the resonances within the mouth/vocal tract/face system (similar to the occurrence of formants in audible speech). This is contrary to the closed mouth state 401 in which the received signal is generally simply a reflection of the chirp from the face
6
skin, hence it is still chirp-like in shape.
In a dynamic context, mouth open and mouth closed conditions are determined through the difference between these chirp responses (i.e. the change in chirp response as the mouth is opened and closed). In a static context, the determination can be made by comparing amplitudes at various frequency positions. It is also possible, as anyone skilled in the art would know, that pattern matching and/or parametric determination can be used either in the frequency domain or in the time domain to interpret these signals.
A preferred analysis method uses a double approach algorithm. The received 14-21 kHz chirp signal 400, sampled at 96 kHz or 48 kHz is first demodulated to baseband to cover the frequency span of 0-7 kHz, and then re-sampled at 32 kHz. It is next segmented in overlapping segments, and a segment length of around one second works well. For each segment, the beginning of the chirp can be detected using autocorrelation 172 between the known generated chirp 300 and the segment being analysed 175.
The resulting autocorrelation or comparison output has clear peaks at the beginning of the chirp pulses. Since the linear source-filter theory applies to the ultrasonic excitations of the vocal tract (VT), the VT acts as the filter for the chirp signal (the source, 300). Consequently, the envelope of the received reflected chirp can be considered analogous to the frequency spectrum of the vocal tract. This envelope, 416, is extracted as the basis for detecting the status of the mouth.
A preferred method is to apply twin detecting approaches to determine the mouth status. The first considers the increase in the number of peaks in the received frequency spectrum 400, 410, when the mouth is open. Since in the open state 402, VT resonances appear in the response, the number of envelope 416 peaks dramatically increases and this can be counted for decision making, 417. The second approach assumes the resonances of the vocal tract to have distinct peaks and troughs. Considering Xp to denote the peaks and Xv to denote the valleys, Up and Uv to denote the mean values of each, a simple metric, C, can be derived:
C = E[(X-Up)2]+E[(Xv-Uv)2]
The count metric, C, thus indicates the variance of the peaks and valleys in the power spectrum and demonstrates a clear increase when opening the mouth, which can be used as an indication for understanding the mouth state. Applying a threshold to C will derive an indication 414 of the mouth state. Other detection approaches are usable, including zero crossing rate, kurtosis determination and differential energy.
It should also be clear to those skilled in the art that frequency domain analysis can equally be used. In this case the received reflected chirp signal 400, 410, 420 can be analysed either continually or on a frame-by-frame basis by being converted to the frequency domain 173. The resulting analysis 176 could include time-frequency analysis or any one of a plethora of similar techniques. The resulting signal 420, similar to 410, shows a smooth spectrum when the mouth is closed, but as soon as it opens, 421, the spectrum becomes significantly more peaky. Once the mouth closes, 422, the spectrum becomes smooth again.
7

Claims (16)

1. A system that uses acoustic information to detect the proximity, shape, orientation and features of a nearby human face.
2. A system according to claim 1 in which the acoustic information comprises ultrasonic sounds.
3. A system according to claim 2 in which the ultrasonic sounds lie just above the threshold of human hearing, referred to here as "low-frequency ultrasonic signals".
4. A system according to claim 1 that generates an acoustic signal internally, and outputs this from a transducer such as a loudspeaker.
5. A system according to claim 1 that receives acoustic information from a transducer such as a microphone.
6. A system according to claim 4 in which the acoustic signal is generated according to a predefined or adjustable specification.
7. A system according to claims 3 and 6 in which the received ultrasonic signal is stored or processed periodically within a portable or mobile device or headset.
8. A system according to claims 4 and 5 in which the generated signal is a swept-frequency signal.
9. A system according to claim 3 in which swept-frequency low-frequency ultrasonic signals, reflected from a human face, are captured and analysed to reveal the proximity or features of that face.
10. A system according to claim 9 that determines the degree of mouth opening or mouth shape by analysing the captured acoustic signal.
11. A method that uses low-frequency ultrasonic reflection from the human face to detect the mouth state and or proximity.
12. A system that uses standard audio hardware to generate near-audible low-frequency ultrasonic swept-frequency signals and receive and analyse the reflected version of these same signals.
13. A system that excites the human mouth, nasal tract and vocal tract from the proximity of the mouth using ultrasonic excitation.
14. A system according to claim 13 that obtain resonances at feature sizes comparable to those of audible speech by making use of low-frequency ultrasonic excitation.
15. A system that uses a co-located excitation generator such as loudspeaker or buzzer with a detector or transducer such as a microphone, located in the proximity of the human mouth, for the purpose of determining mouth shape.
16. A system according to claim 12 that measures the presence of peaks in the time domain received signal to detect resonances.
GB201202662A 2012-02-16 2012-02-16 Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector Withdrawn GB2499781A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB201202662A GB2499781A (en) 2012-02-16 2012-02-16 Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB201202662A GB2499781A (en) 2012-02-16 2012-02-16 Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector

Publications (2)

Publication Number Publication Date
GB201202662D0 GB201202662D0 (en) 2012-04-04
GB2499781A true GB2499781A (en) 2013-09-04

Family

ID=45939716

Family Applications (1)

Application Number Title Priority Date Filing Date
GB201202662A Withdrawn GB2499781A (en) 2012-02-16 2012-02-16 Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector

Country Status (1)

Country Link
GB (1) GB2499781A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI474317B (en) * 2012-07-06 2015-02-21 Realtek Semiconductor Corp Signal processing apparatus and signal processing method
CN107076840A (en) * 2014-10-02 2017-08-18 美商楼氏电子有限公司 Acoustic equipment with double MEMS devices
US10529356B2 (en) 2018-05-15 2020-01-07 Cirrus Logic, Inc. Detecting unwanted audio signal components by comparing signals processed with differing linearity
US10616701B2 (en) 2017-11-14 2020-04-07 Cirrus Logic, Inc. Detection of loudspeaker playback
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US10770076B2 (en) 2017-06-28 2020-09-08 Cirrus Logic, Inc. Magnetic detection of replay attack
US10832702B2 (en) 2017-10-13 2020-11-10 Cirrus Logic, Inc. Robustness of speech processing system against ultrasound and dolphin attacks
US10839808B2 (en) 2017-10-13 2020-11-17 Cirrus Logic, Inc. Detection of replay attack
US10847165B2 (en) 2017-10-13 2020-11-24 Cirrus Logic, Inc. Detection of liveness
US10853464B2 (en) 2017-06-28 2020-12-01 Cirrus Logic, Inc. Detection of replay attack
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US10984083B2 (en) 2017-07-07 2021-04-20 Cirrus Logic, Inc. Authentication of user using ear biometric data
US11017252B2 (en) 2017-10-13 2021-05-25 Cirrus Logic, Inc. Detection of liveness
US11023755B2 (en) 2017-10-13 2021-06-01 Cirrus Logic, Inc. Detection of liveness
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
US11042618B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11042617B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11042616B2 (en) 2017-06-27 2021-06-22 Cirrus Logic, Inc. Detection of replay attack
CN113455017A (en) * 2018-12-19 2021-09-28 日本电气株式会社 Information processing device, wearable device, information processing method, and storage medium
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11270707B2 (en) 2017-10-13 2022-03-08 Cirrus Logic, Inc. Analysing speech signals
US11276409B2 (en) 2017-11-14 2022-03-15 Cirrus Logic, Inc. Detection of replay attack
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11755701B2 (en) 2017-07-07 2023-09-12 Cirrus Logic Inc. Methods, apparatus and systems for authentication
US11829461B2 (en) 2017-07-07 2023-11-28 Cirrus Logic Inc. Methods, apparatus and systems for audio playback

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
US6483532B1 (en) * 1998-07-13 2002-11-19 Netergy Microelectronics, Inc. Video-assisted audio signal processing system and method
US20030128848A1 (en) * 2001-07-12 2003-07-10 Burnett Gregory C. Method and apparatus for removing noise from electronic signals
EP1443498A1 (en) * 2003-01-24 2004-08-04 Sony Ericsson Mobile Communications AB Noise reduction and audio-visual speech activity detection
WO2004077090A1 (en) * 2003-02-25 2004-09-10 Oticon A/S Method for detection of own voice activity in a communication device
WO2010048635A1 (en) * 2008-10-24 2010-04-29 Aliphcom, Inc. Acoustic voice activity detection (avad) for electronic systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706394A (en) * 1993-11-30 1998-01-06 At&T Telecommunications speech signal improvement by reduction of residual noise
US6483532B1 (en) * 1998-07-13 2002-11-19 Netergy Microelectronics, Inc. Video-assisted audio signal processing system and method
US20030128848A1 (en) * 2001-07-12 2003-07-10 Burnett Gregory C. Method and apparatus for removing noise from electronic signals
EP1443498A1 (en) * 2003-01-24 2004-08-04 Sony Ericsson Mobile Communications AB Noise reduction and audio-visual speech activity detection
WO2004077090A1 (en) * 2003-02-25 2004-09-10 Oticon A/S Method for detection of own voice activity in a communication device
WO2010048635A1 (en) * 2008-10-24 2010-04-29 Aliphcom, Inc. Acoustic voice activity detection (avad) for electronic systems

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI474317B (en) * 2012-07-06 2015-02-21 Realtek Semiconductor Corp Signal processing apparatus and signal processing method
US8972252B2 (en) 2012-07-06 2015-03-03 Realtek Semiconductor Corp. Signal processing apparatus having voice activity detection unit and related signal processing methods
CN107076840A (en) * 2014-10-02 2017-08-18 美商楼氏电子有限公司 Acoustic equipment with double MEMS devices
US11042616B2 (en) 2017-06-27 2021-06-22 Cirrus Logic, Inc. Detection of replay attack
US10770076B2 (en) 2017-06-28 2020-09-08 Cirrus Logic, Inc. Magnetic detection of replay attack
US11704397B2 (en) 2017-06-28 2023-07-18 Cirrus Logic, Inc. Detection of replay attack
US11164588B2 (en) 2017-06-28 2021-11-02 Cirrus Logic, Inc. Magnetic detection of replay attack
US10853464B2 (en) 2017-06-28 2020-12-01 Cirrus Logic, Inc. Detection of replay attack
US11714888B2 (en) 2017-07-07 2023-08-01 Cirrus Logic Inc. Methods, apparatus and systems for biometric processes
US11042618B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11755701B2 (en) 2017-07-07 2023-09-12 Cirrus Logic Inc. Methods, apparatus and systems for authentication
US11829461B2 (en) 2017-07-07 2023-11-28 Cirrus Logic Inc. Methods, apparatus and systems for audio playback
US10984083B2 (en) 2017-07-07 2021-04-20 Cirrus Logic, Inc. Authentication of user using ear biometric data
US11042617B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US10847165B2 (en) 2017-10-13 2020-11-24 Cirrus Logic, Inc. Detection of liveness
US10839808B2 (en) 2017-10-13 2020-11-17 Cirrus Logic, Inc. Detection of replay attack
US11023755B2 (en) 2017-10-13 2021-06-01 Cirrus Logic, Inc. Detection of liveness
US11017252B2 (en) 2017-10-13 2021-05-25 Cirrus Logic, Inc. Detection of liveness
US10832702B2 (en) 2017-10-13 2020-11-10 Cirrus Logic, Inc. Robustness of speech processing system against ultrasound and dolphin attacks
US11705135B2 (en) 2017-10-13 2023-07-18 Cirrus Logic, Inc. Detection of liveness
US11270707B2 (en) 2017-10-13 2022-03-08 Cirrus Logic, Inc. Analysing speech signals
US11276409B2 (en) 2017-11-14 2022-03-15 Cirrus Logic, Inc. Detection of replay attack
US11051117B2 (en) 2017-11-14 2021-06-29 Cirrus Logic, Inc. Detection of loudspeaker playback
US10616701B2 (en) 2017-11-14 2020-04-07 Cirrus Logic, Inc. Detection of loudspeaker playback
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11694695B2 (en) 2018-01-23 2023-07-04 Cirrus Logic, Inc. Speaker identification
US10529356B2 (en) 2018-05-15 2020-01-07 Cirrus Logic, Inc. Detecting unwanted audio signal components by comparing signals processed with differing linearity
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
US11631402B2 (en) 2018-07-31 2023-04-18 Cirrus Logic, Inc. Detection of replay attack
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11748462B2 (en) 2018-08-31 2023-09-05 Cirrus Logic Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
EP3902283A4 (en) * 2018-12-19 2022-01-12 NEC Corporation Information processing device, wearable apparatus, information processing method, and storage medium
CN113455017A (en) * 2018-12-19 2021-09-28 日本电气株式会社 Information processing device, wearable device, information processing method, and storage medium
US11895455B2 (en) 2018-12-19 2024-02-06 Nec Corporation Information processing device, wearable device, information processing method, and storage medium

Also Published As

Publication number Publication date
GB201202662D0 (en) 2012-04-04

Similar Documents

Publication Publication Date Title
GB2499781A (en) Acoustic information used to determine a user's mouth state which leads to operation of a voice activity detector
US9165567B2 (en) Systems, methods, and apparatus for speech feature detection
EP2633519B1 (en) Method and apparatus for voice activity detection
US8620672B2 (en) Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal
US9305567B2 (en) Systems and methods for audio signal processing
US8321214B2 (en) Systems, methods, and apparatus for multichannel signal amplitude balancing
US10218327B2 (en) Dynamic enhancement of audio (DAE) in headset systems
EP2599329B1 (en) System, method, apparatus, and computer-readable medium for multi-microphone location-selective processing
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
US8284947B2 (en) Reverberation estimation and suppression system
EP2770750B1 (en) Detecting and switching between noise reduction modes in multi-microphone mobile devices
US9959886B2 (en) Spectral comb voice activity detection
US11290802B1 (en) Voice detection using hearable devices
McLoughlin The use of low-frequency ultrasound for voice activity detection
Haderlein et al. Speech recognition with μ-law companded features on reverberated signals
Moir et al. Knowing the wheat from the weeds in noisy speech

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)