US11863710B2 - Audio device and method for detecting device status of audio device in audio/video conference - Google Patents

Audio device and method for detecting device status of audio device in audio/video conference Download PDF

Info

Publication number
US11863710B2
US11863710B2 US17/515,909 US202117515909A US11863710B2 US 11863710 B2 US11863710 B2 US 11863710B2 US 202117515909 A US202117515909 A US 202117515909A US 11863710 B2 US11863710 B2 US 11863710B2
Authority
US
United States
Prior art keywords
status
loudspeaker
microphone
filter coefficients
audio device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US17/515,909
Other versions
US20230133061A1 (en
Inventor
Shaw-Min Lei
Yiou-Wen Cheng
Liang-Che Sun
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MediaTek Inc
Original Assignee
MediaTek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MediaTek Inc filed Critical MediaTek Inc
Priority to US17/515,909 priority Critical patent/US11863710B2/en
Assigned to MEDIATEK INC. reassignment MEDIATEK INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHENG, YIOU-WEN, LEI, SHAW-MIN, SUN, LIANG-CHE
Priority to CN202111405666.1A priority patent/CN116074489A/en
Priority to TW110144096A priority patent/TWI797850B/en
Publication of US20230133061A1 publication Critical patent/US20230133061A1/en
Application granted granted Critical
Publication of US11863710B2 publication Critical patent/US11863710B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1785Methods, e.g. algorithms; Devices
    • G10K11/17853Methods, e.g. algorithms; Devices of the filter
    • G10K11/17854Methods, e.g. algorithms; Devices of the filter the filter being an adaptive filter
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/175Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
    • G10K11/178Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
    • G10K11/1787General system configurations
    • G10K11/17879General system configurations using both a reference signal and an error signal
    • G10K11/17881General system configurations using both a reference signal and an error signal the reference signal being an acoustic signal, e.g. recorded with a microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/085Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using digital techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Definitions

  • the invention relates to video conferences, and, in particular, to an audio device and a method for detecting device status of an audio device in an audio/video conference.
  • an audio device in an exemplary embodiment, includes processing circuitry which is connected to a loudspeaker and a microphone.
  • the processing circuitry is configured to play an echo reference signal from a far end on the loudspeaker, and perform an acoustic echo cancellation (AEC) process using the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter.
  • the processing circuitry repeatedly determines a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal, and transmits a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
  • AEC acoustic echo cancellation
  • the processing circuitry determines the microphone being muted. In response to the processing circuitry determining that the signal level of the microphone is higher than the threshold, the processing circuitry determines a second status of the microphone being working normally, sends a second status signal indicating the second status of the microphone to the far end through the cloud network, obtains the filter coefficients from the AEC adaptive filter, and calculates similarity between the obtained filter coefficients and the reference filter coefficients.
  • the processing circuitry determines that the first status of the loudspeaker is that it is not working. In response to the processing circuitry determining that the calculated similarity is higher than or equal to the preset threshold, the processing circuitry determines that the first status of the loudspeaker is that the loudspeaker is working normally.
  • the reference filter coefficients are calculated using the AEC adaptive filter by playing white noise and sweeping tones on the loudspeaker for a first predetermined period of time, and the calculated reference filter coefficients are pre-stored in a nonvolatile memory of the audio device during a process of manufacturing the audio device in a factory.
  • the processing circuitry initializes the filter coefficients of the AEC adaptive filter to zero, and obtains the filter coefficients from the AEC adaptive filter at runtime as the reference filter coefficients by calculating an average of the filter coefficients of the AEC adaptive filter within a second predetermined period of time.
  • the processing circuitry calculates cosine similarity between the filter coefficients and the reference filter coefficients as the similarity.
  • the processing circuitry receives a third status signal and a fourth status signal respectively indicating a third status of a loudspeaker of another audio device and a fourth status of a microphone of the another audio device at the far end through the cloud network, and displays icons corresponding to the third status and the fourth status on a graphical user interface of a video-conferencing application running on a video device in which the audio device is disposed.
  • a method for use in an audio device is provided.
  • the audio device is connected to a loudspeaker and a microphone.
  • the method includes the following steps: playing an echo reference signal from a far end on the loudspeaker; performing an acoustic echo cancellation (AEC) process on the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter; determining a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal; and transmitting a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
  • AEC acoustic echo cancellation
  • FIG. 1 is a block diagram of a video-conferencing system in accordance with an embodiment of the invention.
  • FIG. 2 is a block diagram of the audio device in accordance with an embodiment of the invention.
  • FIG. 3 is a diagram of the flow of an acoustic echo cancellation (AEC) process in accordance with an embodiment of the invention
  • FIG. 4 is a flow chart of a method for detecting a device status of an audio device in an audio/video conference in accordance with an embodiment of the invention.
  • FIGS. 5 A- 5 B are diagrams showing the graphical user interface with icons of different device statuses of the audio device in accordance with an embodiment of the invention.
  • FIG. 1 is a block diagram of a video-conferencing system in accordance with an embodiment of the invention.
  • the video-conferencing system 10 may include two or more video-conferencing apparatuses 100 connecting to each other through a cloud network 20 .
  • Each video device 100 may be an electronic device that include a display function, a web-camera function, a loudspeaker function, and a microphone function, such as a desktop computer equipped with a loudspeaker and a microphone, a laptop computer, a smartphone, or a tablet PC, but the invention is not limited thereto.
  • the loudspeaker function and microphone function in each video device 100 may be implemented by an audio device 200 .
  • each video device 100 may execute a video-conferencing application that renders a graphical user interface on its display.
  • the user of each video device 100 can see the device status (e.g., including the statuses of the microphone and loudspeaker) of the audio device 200 of other participants in the video conference via the graphical user interface.
  • the audio device 200 may include an acoustic echo cancellation (AEC) function so as to provide high-quality acoustic signal for everyone in the audio/video conference.
  • the audio device 200 may be an electronic device that handles both the loudspeaker and microphone functions, such as a desktop audio device, a tabletop audio device, a soundbar with a microphone array, a smartphone, a tablet PC, a laptop computer, or a personal computer equipped with a standalone microphone (e.g., may be a microphone with a 3.5 mm jack, a USB microphone, or a Bluetooth microphone) and a standalone loudspeaker, but the invention is not limited thereto.
  • the audio device 200 may be disposed in the video device 100 .
  • the audio device 200 is electrically connected to the video device 100 , and the audio device 200 and video device 100 are standalone devices.
  • FIG. 2 is a block diagram of the audio device 200 in accordance with an embodiment of the invention.
  • the audio device 200 may include processing circuitry 210 , a memory 215 , digital-to-analog converter (DAC) 220 , an amplifier (AMP) 230 , one or more loudspeakers 240 , and one or more microphones 250 .
  • the processing circuitry 210 , buffer memory 215 , DAC 220 , and amplifier 230 may be implemented by an integrated circuit (or system-on-chip) 270 .
  • the processing circuitry 210 may be implemented by a central processing unit (CPU), a digital-signal processor (DSP), or an application-specific integrated circuit (ASIC), multiple processors and/or a processor having multiple cores, but the invention is not limited thereto.
  • the memory 215 may be a type of computer storage media and may include volatile memory and non-volatile memory.
  • the memory 215 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, or other memory technology.
  • the loudspeaker 240 may be configured to emit a speaker signal from other audio device 200 in the video-conferencing system 10 .
  • the loudspeaker 240 may also emit an echo reference signal 212
  • the microphone 250 may receive a local speech signal and other sounds from the user environment in addition to the echo reference signal.
  • the microphone 250 may include an analog-to-digital converter (ADC) (not shown in FIG. 2 ) to convert the received analog acoustic signal into a discrete acoustic signal for subsequent AEC processing.
  • ADC analog-to-digital converter
  • the processing circuitry 210 may perform an AEC process on the acoustic signal (i.e., including the echo reference signal, local speech signal, and other environment sounds) received by the microphone 250 so as to estimate the status of the echo path from the loudspeaker 240 to the microphone 250 .
  • the AEC process may be implemented by an AEC adaptive filter such as an LMS (least mean squares) filter, an NLMS (normalized least mean squares) adaptive filter, or an adaptive filter of other types with a predetermined number of taps, but the invention is not limited thereto.
  • the positions of the loudspeaker 240 and microphone 250 are generally fixed, and the distance between the loudspeaker 240 and microphone 250 are also fixed.
  • the loudspeaker 240 and microphone 250 are working normally, it indicates that the echo path from the loudspeaker 240 to the microphone 250 is valid, the coefficients of the AEC adaptive filter will converge, and they will be close to predefined coefficients.
  • the coefficients of the AEC adaptive filter will diverge. The details of the AEC process will be described in the following section.
  • FIG. 3 is a diagram of the flow of an acoustic echo cancellation (AEC) process in accordance with an embodiment of the invention.
  • AEC acoustic echo cancellation
  • the processing circuitry 210 may store a predetermined number of input samples from the far end (e.g., other audio device 200 in the video-conferencing system 10 ) in the memory 215 , where the predetermined number of input samples may be equal to the number of taps of the AEC adaptive filter 214 .
  • the NLMS (normalized least mean square) algorithm is used in the AEC adaptive filter 214 of the processing circuitry 210 , and the AEC adaptive filter 214 may find the filter coefficients that relate to producing the least normalized mean square of the error signal (e.g., difference between the desired and the actual signal).
  • the echo path is an unknown system that has a transfer function h(n) to be identified, and the AEC adaptive filter 214 attempts to adapt its transfer function h(n) to make it as close as possible to the transfer function h(n) of the echo path.
  • n is the number of the current input sample
  • p is the number of filter taps
  • the AEC adaptive filter 214 may calculate the inner product of the Hermitian transpose of the transfer function ⁇ (n) and the echo reference signal x(n) to obtain an output signal ⁇ (n).
  • the subtracter 216 may subtract the output signal ⁇ (n) from the acoustic signal d(n) to obtain a residual echo signal e(n) that is sent to the far end (e.g., other audio devices 200 in the video-conferencing system 10 ).
  • the transfer function ⁇ (n) of the AEC adaptive filter 214 may be regarded as a matrix of filter coefficients of the AEC adaptive filter 214 .
  • the residual echo signal e(n) is fed back to the AEC adaptive filter 214 . If the residual echo signal e(n) is large, the AEC adaptive filter 214 may adjust its filter coefficients significantly so as to fit the transfer function h(n) of the echo path. If the residual echo signal e(n) is small, it may indicate that the currently used filter coefficients of the AEC adaptive filter 214 is close to the transfer function h(n) of the echo path, and the AEC adaptive filter 214 may adjust its filter coefficients slightly to fit the transfer function h(n) of the echo path.
  • the AEC adaptive filter 214 may calculate its transfer function at time n+1, where
  • the distance between the loudspeaker 240 and microphone 250 is also fixed.
  • the filter coefficients of the AEC adaptive filter 214 will converge, and it indicates that the residual echo signal e(n) may be very close to 0.
  • a smartphone is used as the audio device 200 , it is inherent that the positions of the loudspeaker 240 and microphone 250 are fixed and the distance between the loudspeaker 240 and microphone 250 is fixed.
  • the filter coefficients of the AEC adaptive filter 214 may converge and be close to reference filter coefficients that were previously tested and calibrated in the factory.
  • the echo path may be invalid.
  • the microphone 250 works normally and the loudspeaker 240 is turned off or does not work normally, the microphone 250 will not receive the echo reference signal emitted from the loudspeaker 240 .
  • the AEC adaptive filter 214 still generates the output signal ⁇ (n) using the echo reference signal x(n). Since the component y(n) is absent in the acoustic signal d(n), the difference between the acoustic signal d(n) and the output signal ⁇ (n), which is regarded as the residual echo signal e(n) will be large. As a result, the AEC adaptive filter 214 may erroneously estimate the transfer function (i.e., matrix of filter coefficients) of the echo path, and it will cause the estimated filter coefficients to diverge.
  • the transfer function i.e., matrix of filter coefficients
  • the loudspeaker 240 may emit the echo reference signal x(n), but the microphone 250 will not receive any acoustic signal.
  • the acoustic signal d (n) is approximately close to 0.
  • the AEC adaptive filter 214 still generates the output signal ⁇ (n) using the echo reference signal x(n). Since the acoustic signal d(n) is approximately close to 0, the difference between the acoustic signal d (n) and the output signal ⁇ (n), which is regarded as the residual echo signal e(n) will be large.
  • the AEC adaptive filter 214 may erroneously estimate the transfer function (i.e., matrix of filter coefficients) of the echo path, and it will cause the estimated filter coefficients to diverge.
  • the reference filter coefficients for use in the AEC adaptive filter 214 may be generated in the manufacturing process for the audio device 200 with fixed locations of loudspeakers 240 and microphones 250 (e.g., a smartphone, laptop computer, tablet PC, desktop audio device, etc.). For example, during the manufacturing process in the factory, white noise or sweeping tone can be played on the audio device 200 , and the processing circuitry 210 of the audio device 200 may perform the AEC process simultaneously.
  • the reference filter coefficients for the AEC adaptive filter 214 can be obtained after performing the AEC process for a predetermined period of time, and the obtained reference filter coefficients can be stored in a non-volatile memory of the audio device 200 .
  • the reference filter coefficient for use in the AEC adaptive filter 214 may be calculated at runtime.
  • the processing circuitry 210 of the audio device 200 may automatically run the AEC process to obtain the reference filter coefficients for the AEC adaptive filter 214 .
  • the user environment may be different from the test environment in the factory, and thus the echo path and interference in the user environment may be different from those in the factory.
  • the processing circuitry 210 may automatically run the AEC process to obtain the reference filter coefficients in response to detecting that the audio device 200 is being used in an audio conference or a video conference.
  • the nonvolatile memory of the audio device 200 may store preset reference filter coefficients that have been tested and calibrated in the factory. However, the preset reference filter coefficients may be not suitable for the user environment in some cases.
  • the processing circuitry 210 may load the preset reference filter coefficients from the nonvolatile memory as the initial filter coefficients for the AEC adaptive filter 214 . The processing circuitry 210 may then perform the AEC process and determine whether the preset reference filter coefficients are suitable for the user environment.
  • FIG. 4 is a flow chart of a method for detecting a device status of an audio device in an audio/video conference in accordance with an embodiment of the invention. Please refer to FIG. 2 and FIG. 4 .
  • step S 410 it is determined whether the signal level of the microphone 250 is higher than a threshold. If it is determined that the signal level of the microphone 250 is higher than the threshold, step S 420 is performed. If it is determined that the signal level of the microphone 250 is not higher than the threshold, it indicates that the microphone 250 is muted (step S 415 ), and the flow ends. Meanwhile, the audio device 200 or the video device 100 at the local end may transmit an indication signal to the cloud network 20 so as to inform the audio devices 200 or video-conferencing apparatuses 100 at the far end that the microphone 250 of the local user is muted, such as showing an icon of a muted microphone on the graphical user interface of the video-conferencing application running on each video device 100 in the video-conferencing system 10 .
  • step S 420 the filter coefficients of the AEC adaptive filter 214 are obtained.
  • the AEC adaptive filter 214 may update its filter coefficients at runtime, and the processing circuitry 210 may repeatedly obtain the filter coefficients of the AEC adaptive filter 214 every predetermined period of time.
  • step S 430 the similarity between the obtained filter coefficients and a plurality of reference filter coefficients are calculated.
  • the processing circuitry 210 may calculate cosine similarity between the obtained filter coefficients and the reference filter coefficients.
  • the cosine similarity between two vectors a and b can be expressed using equation (1):
  • AdaptSim cos sim( h adapt ,h ref ) (2)
  • step S 440 it is determined whether the similarity is greater than or equal to a preset threshold. If it is determined that the similarity is less than the preset threshold, it indicates that the loudspeaker 240 is not working (step S 450 ), and the flow ends. If it is determined that the similarity is greater than or equal to the preset threshold, it indicates that the loudspeaker 240 and microphone 250 are working normally (step S 460 ), and the flow goes back to step S 410 .
  • steps S 415 , S 450 , and S 460 in FIG. 4 may represent different device statuses of the audio device 200 during the audio or video conference.
  • the processing circuitry 210 of the audio device 200 of the local end may transmit a status signal to the cloud network 20 to indicate the current device status of the audio device 200 , and the cloud network 20 may forward the status signal to each video device 100 in the video-conferencing system 10 .
  • each video device 100 in the video-conferencing system 10 may show a status icon of the audio device 200 of user A on the graphical user interface of the video-conferencing application running on each video device 100 . If user A is speaking during the video conference, user A can know whether users B and C can hear what he or she is saying from the graphical user interface.
  • the device status of the audio device 200 indicates that the microphone 250 is muted. If the flow in FIG. 4 proceeds to step S 450 , the device status of the audio device 200 indicates that the loudspeaker 240 is not working. If the flow in FIG. 4 proceeds to step S 460 , the device status of the audio device 200 indicates that the loudspeaker 240 and microphone 250 are working normally.
  • the processing circuitry 210 may repeatedly determine a first status of the loudspeaker 240 according to a relation between the played echo reference signal and the received acoustic signal, and transmit the first status signal indicating the first status of the loudspeaker 240 to the far end through the cloud network 20 .
  • the relation between the played echo reference signal and the received acoustic signal may be represented using filter coefficients and reference filter coefficients of the AEC adaptive filter.
  • the relation between the played echo reference signal and the received acoustic signal may be represented using some other coefficients determined from the played echo reference signal and the received acoustic signal.
  • FIGS. 5 A- 5 B are diagrams showing the graphical user interface with icons of different device statuses of the audio device in accordance with an embodiment of the invention. Please refer to FIG. 2 , FIG. 4 , and FIGS. 5 A- 5 B .
  • the video device 100 of user A may show a graphical user interface 500 that includes blocks 510 , 520 , and 530 , as shown in FIG. 5 A .
  • block 510 may contain the username 511 (e.g., user B), video screen 512 , and blocks 513 and 514 of the audio device 200 of user B, where block 513 shows the status of the microphone 250 of the audio device 200 of user B, and block 514 shows the status of the loudspeaker 240 of the audio device 200 of user B.
  • Block 520 may contain the username 521 (e.g., user C), video screen 522 , and blocks 523 and 524 of the audio device 200 of user C, where block 523 shows the status of the microphone 250 of the audio device 200 of user C, and block 524 shows the status of the loudspeaker 240 of the audio device 200 of user C.
  • Block 530 may show the video screen of user A (i.e., the local user).
  • FIG. 5 A it is assumed that the loudspeakers 240 and microphones 250 of the audio devices 200 of users B and C are working normally, and thus blocks 513 and 523 may show a microphone pattern with a specific color (e.g., green), and the loudspeaker-status icons 514 and 524 may show a loudspeaker pattern with a specific color (e.g., green). Accordingly, user A can know that the loudspeakers 240 and microphones 250 of the audio devices 200 of users B and C are working normally via the microphone-status icons 513 and 523 and the loudspeaker-status icons 514 and 524 .
  • a specific color e.g., green
  • the audio device 200 of user B may send the first status signal of user B indicating that the microphone 250 is muted to the cloud network 20 , and the audio device 200 of user A can receive the status signal from the cloud network 20 .
  • the video-conferencing application running on the video device 100 of user A may show a microphone pattern covered with a red-color X mark on block 513 .
  • the audio device 200 of user B may determine that its loudspeaker 240 is working normally, and send the second status signal of user B indicating that the loudspeaker 240 is working normally to the cloud network 20 .
  • the audio device 200 of user A can receive the second status signal from the cloud network 20
  • the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern in green color.
  • the audio device 200 of user C may send the second status signal of user C indicating that the loudspeaker 240 is not working to the cloud network 20 , and the audio device 200 of user A can receive the second status signal of user C from the cloud network 20 , and the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern covered with a cross in red color.
  • the audio device 200 of user C may determine that its microphone 250 works normally, and send the first status signal indicating that the microphone 250 is working normally to the cloud network 20 .
  • the audio device 200 of user A can receive the first status signal of user C from the cloud network, and the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern covered with a cross in red color.
  • the processing circuitry 210 of audio device 200 at the far end may determine that its loudspeaker 240 and/or microphone 250 are not working, and the video device 100 of the local user (e.g., user A) can know the device status of the audio device 200 at the far end by viewing the icons in the corresponding blocks of the graphical user interface.
  • the local user e.g., user A
  • the local user needs not to ask the question “did you hear me?” during the audio or video conference.
  • an audio device and a method of detecting a device status during an audio/video conference are provided, which are capable of detecting whether the loudspeaker or microphone of the audio device at the local end are working normally, and then providing the detected device status of the loudspeaker and microphone to other audio devices or video devices in the video-conferencing system. Accordingly, the user at the far end can know the device status of the loudspeaker and microphone of the audio device at the local end as well as the user at the local end can also know the device status of the loudspeaker and microphone of the audio device at the far end, thereby improving user experience during the audio or video conference.

Abstract

An audio device is provided. The audio device includes processing circuitry which is connected to a loudspeaker and a microphone. The processing circuitry is configured to play an echo reference signal from a far end on the loudspeaker, and perform an acoustic echo cancellation (AEC) process using the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter. The processing circuitry repeatedly determines a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal, and transmits a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.

Description

BACKGROUND OF THE INVENTION Field of the Invention
The invention relates to video conferences, and, in particular, to an audio device and a method for detecting device status of an audio device in an audio/video conference.
Description of the Related Art
The questions “did you hear me?” and “what did you say?” are asked frequently in audio/video conferences because a speaker needs to know whether the other participants are online and capable of hearing the sound from their speakers. However, it is frustrating for the speaker to constantly ask these questions in an audio/video conference.
Therefore, there is demand for a video-conferencing audio device and a method for detecting device status in an audio/video conference to solve the aforementioned issue.
BRIEF SUMMARY OF THE INVENTION
A detailed description is given in the following embodiments with reference to the accompanying drawings.
In an exemplary embodiment, an audio device is provided. The audio device includes processing circuitry which is connected to a loudspeaker and a microphone. The processing circuitry is configured to play an echo reference signal from a far end on the loudspeaker, and perform an acoustic echo cancellation (AEC) process using the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter. The processing circuitry repeatedly determines a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal, and transmits a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
In some embodiments, in response to the processing circuitry determining that a signal level of the microphone is lower than or equal to a threshold, the processing circuitry determines the microphone being muted. In response to the processing circuitry determining that the signal level of the microphone is higher than the threshold, the processing circuitry determines a second status of the microphone being working normally, sends a second status signal indicating the second status of the microphone to the far end through the cloud network, obtains the filter coefficients from the AEC adaptive filter, and calculates similarity between the obtained filter coefficients and the reference filter coefficients.
In some embodiments, in response to the processing circuitry determining that the calculated similarity is lower than a preset threshold, the processing circuitry determines that the first status of the loudspeaker is that it is not working. In response to the processing circuitry determining that the calculated similarity is higher than or equal to the preset threshold, the processing circuitry determines that the first status of the loudspeaker is that the loudspeaker is working normally.
In some embodiments, the reference filter coefficients are calculated using the AEC adaptive filter by playing white noise and sweeping tones on the loudspeaker for a first predetermined period of time, and the calculated reference filter coefficients are pre-stored in a nonvolatile memory of the audio device during a process of manufacturing the audio device in a factory.
In some embodiments, the processing circuitry initializes the filter coefficients of the AEC adaptive filter to zero, and obtains the filter coefficients from the AEC adaptive filter at runtime as the reference filter coefficients by calculating an average of the filter coefficients of the AEC adaptive filter within a second predetermined period of time.
In some embodiments, the processing circuitry calculates cosine similarity between the filter coefficients and the reference filter coefficients as the similarity.
In some embodiments, the processing circuitry receives a third status signal and a fourth status signal respectively indicating a third status of a loudspeaker of another audio device and a fourth status of a microphone of the another audio device at the far end through the cloud network, and displays icons corresponding to the third status and the fourth status on a graphical user interface of a video-conferencing application running on a video device in which the audio device is disposed.
In another exemplary embodiment, a method for use in an audio device is provided. The audio device is connected to a loudspeaker and a microphone. The method includes the following steps: playing an echo reference signal from a far end on the loudspeaker; performing an acoustic echo cancellation (AEC) process on the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter; determining a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal; and transmitting a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention can be more fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
FIG. 1 is a block diagram of a video-conferencing system in accordance with an embodiment of the invention.
FIG. 2 is a block diagram of the audio device in accordance with an embodiment of the invention;
FIG. 3 is a diagram of the flow of an acoustic echo cancellation (AEC) process in accordance with an embodiment of the invention;
FIG. 4 is a flow chart of a method for detecting a device status of an audio device in an audio/video conference in accordance with an embodiment of the invention; and
FIGS. 5A-5B are diagrams showing the graphical user interface with icons of different device statuses of the audio device in accordance with an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
The following description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
FIG. 1 is a block diagram of a video-conferencing system in accordance with an embodiment of the invention.
In an embodiment, the video-conferencing system 10 that may include two or more video-conferencing apparatuses 100 connecting to each other through a cloud network 20. Each video device 100 may be an electronic device that include a display function, a web-camera function, a loudspeaker function, and a microphone function, such as a desktop computer equipped with a loudspeaker and a microphone, a laptop computer, a smartphone, or a tablet PC, but the invention is not limited thereto. In some embodiments, the loudspeaker function and microphone function in each video device 100 may be implemented by an audio device 200.
In some embodiments, each video device 100 may execute a video-conferencing application that renders a graphical user interface on its display. The user of each video device 100 can see the device status (e.g., including the statuses of the microphone and loudspeaker) of the audio device 200 of other participants in the video conference via the graphical user interface.
The audio device 200 may include an acoustic echo cancellation (AEC) function so as to provide high-quality acoustic signal for everyone in the audio/video conference. In some embodiments, the audio device 200 may be an electronic device that handles both the loudspeaker and microphone functions, such as a desktop audio device, a tabletop audio device, a soundbar with a microphone array, a smartphone, a tablet PC, a laptop computer, or a personal computer equipped with a standalone microphone (e.g., may be a microphone with a 3.5 mm jack, a USB microphone, or a Bluetooth microphone) and a standalone loudspeaker, but the invention is not limited thereto. In some embodiments, the audio device 200 may be disposed in the video device 100. In some other embodiments, the audio device 200 is electrically connected to the video device 100, and the audio device 200 and video device 100 are standalone devices.
FIG. 2 is a block diagram of the audio device 200 in accordance with an embodiment of the invention.
In an embodiment, the audio device 200 may include processing circuitry 210, a memory 215, digital-to-analog converter (DAC) 220, an amplifier (AMP) 230, one or more loudspeakers 240, and one or more microphones 250. The processing circuitry 210, buffer memory 215, DAC 220, and amplifier 230 may be implemented by an integrated circuit (or system-on-chip) 270. The processing circuitry 210 may be implemented by a central processing unit (CPU), a digital-signal processor (DSP), or an application-specific integrated circuit (ASIC), multiple processors and/or a processor having multiple cores, but the invention is not limited thereto. The memory 215 may be a type of computer storage media and may include volatile memory and non-volatile memory. The memory 215 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, or other memory technology.
The loudspeaker 240 may be configured to emit a speaker signal from other audio device 200 in the video-conferencing system 10. In addition, the loudspeaker 240 may also emit an echo reference signal 212, and the microphone 250 may receive a local speech signal and other sounds from the user environment in addition to the echo reference signal. In some embodiments, the microphone 250 may include an analog-to-digital converter (ADC) (not shown in FIG. 2 ) to convert the received analog acoustic signal into a discrete acoustic signal for subsequent AEC processing.
The processing circuitry 210 may perform an AEC process on the acoustic signal (i.e., including the echo reference signal, local speech signal, and other environment sounds) received by the microphone 250 so as to estimate the status of the echo path from the loudspeaker 240 to the microphone 250. In some embodiments, the AEC process may be implemented by an AEC adaptive filter such as an LMS (least mean squares) filter, an NLMS (normalized least mean squares) adaptive filter, or an adaptive filter of other types with a predetermined number of taps, but the invention is not limited thereto.
Specifically, when the user joins a video conference or an audio conference using the audio device 200, the positions of the loudspeaker 240 and microphone 250 are generally fixed, and the distance between the loudspeaker 240 and microphone 250 are also fixed. When the loudspeaker 240 and microphone 250 are working normally, it indicates that the echo path from the loudspeaker 240 to the microphone 250 is valid, the coefficients of the AEC adaptive filter will converge, and they will be close to predefined coefficients. When the loudspeaker 240 or the microphone 250 is turned off or does not work normally, the coefficients of the AEC adaptive filter will diverge. The details of the AEC process will be described in the following section.
FIG. 3 is a diagram of the flow of an acoustic echo cancellation (AEC) process in accordance with an embodiment of the invention.
Referring to FIG. 3 , the processing circuitry 210 may store a predetermined number of input samples from the far end (e.g., other audio device 200 in the video-conferencing system 10) in the memory 215, where the predetermined number of input samples may be equal to the number of taps of the AEC adaptive filter 214.
For ease of description, the NLMS (normalized least mean square) algorithm is used in the AEC adaptive filter 214 of the processing circuitry 210, and the AEC adaptive filter 214 may find the filter coefficients that relate to producing the least normalized mean square of the error signal (e.g., difference between the desired and the actual signal). For example, the echo path is an unknown system that has a transfer function h(n) to be identified, and the AEC adaptive filter 214 attempts to adapt its transfer function h(n) to make it as close as possible to the transfer function h(n) of the echo path.
Definition of Symbols of AEC Adaptive Filter
In this section, symbols used in the AEC adaptive filter 214 are defined, where: n is the number of the current input sample; p is the number of filter taps; x(n) is the echo reference signal from the far end (e.g., from other audio device 200 in the video-conferencing system 10), where x(n)=[x(n), x(n−1), . . . , x(n−p+1)]T; y(n) is the echo reference signal received by the microphone 250 through the echo path, where y(n)=hH(n)·x(n); v(n) is the local speech signal (i.e., at the near end) plus the environment sound signal; d(n) is the acoustic signal generated by the microphone 250, where d(n)=y(n)+v(n); ĥ(n) is the transfer function of the AEC adaptive filter 214; ŷ(n) is the output signal of the AEC adaptive filter and it can be regarded as the estimated echo signal, where ŷ(n)=ĥH·x(n); e(n) is the residual echo signal or the error signal, where e(n)=d (n)−y(n)=d(n)−hH·x(n).
Specifically, the echo reference signal x(n) is a matrix of the current input sample (i.e., at time n) and (p−1) previous input samples (i.e., at time=n−1, n−2, . . . , n−p+1) from the far end, such as other audio device 200 in the video-conferencing system 10. The AEC adaptive filter 214 may calculate the inner product of the Hermitian transpose of the transfer function ĥ(n) and the echo reference signal x(n) to obtain an output signal ŷ(n). The subtracter 216 may subtract the output signal ŷ(n) from the acoustic signal d(n) to obtain a residual echo signal e(n) that is sent to the far end (e.g., other audio devices 200 in the video-conferencing system 10).
In some embodiments, the transfer function ĥ(n) of the AEC adaptive filter 214 may be regarded as a matrix of filter coefficients of the AEC adaptive filter 214. In addition, the residual echo signal e(n) is fed back to the AEC adaptive filter 214. If the residual echo signal e(n) is large, the AEC adaptive filter 214 may adjust its filter coefficients significantly so as to fit the transfer function h(n) of the echo path. If the residual echo signal e(n) is small, it may indicate that the currently used filter coefficients of the AEC adaptive filter 214 is close to the transfer function h(n) of the echo path, and the AEC adaptive filter 214 may adjust its filter coefficients slightly to fit the transfer function h(n) of the echo path.
In some embodiments, the AEC adaptive filter 214 may calculate its transfer function at time n+1, where
h ^ ( n + 1 ) = h ^ ( n ) + μ e * ( n ) x ( n ) x H ( n ) x ( n ) .
In some other embodiments, the AEC adaptive filter 214 may calculate its transfer function at time n+1, where ĥ(n+1)=ĥ(n)+μe*(n)×(n). Thus, the AEC adaptive filter 214 can compare the transfer functions (i.e., filter coefficients) at time n+1 and time n so as to determine whether to adjust its filter coefficients to fit the transfer function h(n) of the echo path.
Specifically, as described in the aforementioned embodiments, given that the positions of the loudspeaker 240 and microphone 250 are fixed, the distance between the loudspeaker 240 and microphone 250 is also fixed. In this case, if both the loudspeaker 240 and microphone 250 are turned on and work normally, the echo path may be quite stable. As a result, the filter coefficients of the AEC adaptive filter 214 will converge, and it indicates that the residual echo signal e(n) may be very close to 0. In addition, if a smartphone is used as the audio device 200, it is inherent that the positions of the loudspeaker 240 and microphone 250 are fixed and the distance between the loudspeaker 240 and microphone 250 is fixed. Thus, if both the loudspeaker 240 and microphone 250 are turned on and work normally, the filter coefficients of the AEC adaptive filter 214 may converge and be close to reference filter coefficients that were previously tested and calibrated in the factory.
However, if the loudspeaker 240 or the microphone 250 is turned off or does not work normally, the echo path may be invalid. For example, given that the microphone 250 works normally and the loudspeaker 240 is turned off or does not work normally, the microphone 250 will not receive the echo reference signal emitted from the loudspeaker 240. Meanwhile, the AEC adaptive filter 214 still generates the output signal ŷ(n) using the echo reference signal x(n). Since the component y(n) is absent in the acoustic signal d(n), the difference between the acoustic signal d(n) and the output signal ŷ(n), which is regarded as the residual echo signal e(n) will be large. As a result, the AEC adaptive filter 214 may erroneously estimate the transfer function (i.e., matrix of filter coefficients) of the echo path, and it will cause the estimated filter coefficients to diverge.
In another case, given that the loudspeaker 240 works normally and the microphone 250 is turned off or does not work normally, the loudspeaker 240 may emit the echo reference signal x(n), but the microphone 250 will not receive any acoustic signal. As a result, the acoustic signal d (n) is approximately close to 0. Meanwhile, the AEC adaptive filter 214 still generates the output signal ŷ(n) using the echo reference signal x(n). Since the acoustic signal d(n) is approximately close to 0, the difference between the acoustic signal d (n) and the output signal ŷ(n), which is regarded as the residual echo signal e(n) will be large. As a result, the AEC adaptive filter 214 may erroneously estimate the transfer function (i.e., matrix of filter coefficients) of the echo path, and it will cause the estimated filter coefficients to diverge.
In some embodiments, the reference filter coefficients for use in the AEC adaptive filter 214 may be generated in the manufacturing process for the audio device 200 with fixed locations of loudspeakers 240 and microphones 250 (e.g., a smartphone, laptop computer, tablet PC, desktop audio device, etc.). For example, during the manufacturing process in the factory, white noise or sweeping tone can be played on the audio device 200, and the processing circuitry 210 of the audio device 200 may perform the AEC process simultaneously. Thus, the reference filter coefficients for the AEC adaptive filter 214 can be obtained after performing the AEC process for a predetermined period of time, and the obtained reference filter coefficients can be stored in a non-volatile memory of the audio device 200.
In some other embodiments, the reference filter coefficient for use in the AEC adaptive filter 214 may be calculated at runtime. For example, during the audio conference, the processing circuitry 210 of the audio device 200 may automatically run the AEC process to obtain the reference filter coefficients for the AEC adaptive filter 214. For example, the user environment may be different from the test environment in the factory, and thus the echo path and interference in the user environment may be different from those in the factory. Thus, the processing circuitry 210 may automatically run the AEC process to obtain the reference filter coefficients in response to detecting that the audio device 200 is being used in an audio conference or a video conference. The processing circuitry 210 may first set the initial filter coefficients ĥ(0)=zeros(p), and it may calculate the runtime filter coefficients for the AEC adaptive filter 214 by calculating the average of the adaptive filter coefficients within a predetermined period of time when the loudspeaker 240 and microphone 250 are working normally.
In some other embodiments, the nonvolatile memory of the audio device 200 may store preset reference filter coefficients that have been tested and calibrated in the factory. However, the preset reference filter coefficients may be not suitable for the user environment in some cases. When the audio device 200 is turned on, the processing circuitry 210 may load the preset reference filter coefficients from the nonvolatile memory as the initial filter coefficients for the AEC adaptive filter 214. The processing circuitry 210 may then perform the AEC process and determine whether the preset reference filter coefficients are suitable for the user environment. For example, the processing circuitry 210 may determine whether the residual echo signal e(n) is smaller than a preset threshold to keep the updated filter coefficients converge for a predetermined period of time upon detecting that the audio device 200 is being used in an audio conference or a video conference. If the residual echo signal e(n) is smaller than the preset threshold for the predetermined period of time, the processing circuitry 210 may use the preset reference filter coefficients as the initial filter coefficients of the AEC adaptive filter 214. If the residual echo signal e(n) is not smaller than the preset threshold for the predetermined period of time, the processing circuitry 210 may initialize the filter coefficients ĥ(0)=zeros(p), that is, all components in the matrix are zeros. Thus, the AEC adaptive filter 214 may refine the filter coefficients at runtime.
FIG. 4 is a flow chart of a method for detecting a device status of an audio device in an audio/video conference in accordance with an embodiment of the invention. Please refer to FIG. 2 and FIG. 4 .
In step S410, it is determined whether the signal level of the microphone 250 is higher than a threshold. If it is determined that the signal level of the microphone 250 is higher than the threshold, step S420 is performed. If it is determined that the signal level of the microphone 250 is not higher than the threshold, it indicates that the microphone 250 is muted (step S415), and the flow ends. Meanwhile, the audio device 200 or the video device 100 at the local end may transmit an indication signal to the cloud network 20 so as to inform the audio devices 200 or video-conferencing apparatuses 100 at the far end that the microphone 250 of the local user is muted, such as showing an icon of a muted microphone on the graphical user interface of the video-conferencing application running on each video device 100 in the video-conferencing system 10.
In step S420, the filter coefficients of the AEC adaptive filter 214 are obtained. For example, the AEC adaptive filter 214 may update its filter coefficients at runtime, and the processing circuitry 210 may repeatedly obtain the filter coefficients of the AEC adaptive filter 214 every predetermined period of time.
In step S430, the similarity between the obtained filter coefficients and a plurality of reference filter coefficients are calculated. For example, the processing circuitry 210 may calculate cosine similarity between the obtained filter coefficients and the reference filter coefficients. For example, the cosine similarity between two vectors a and b can be expressed using equation (1):
cos sim ( a , b ) = cos θ = a · b a · b ( 1 )
Given the obtained filter coefficients hadapt and the reference coefficients href, the similarity AdaptSim between the obtained filter coefficients hadapt and the reference coefficients href can be expressed using equation (2):
AdaptSim=cos sim(h adapt ,h ref)  (2)
In step S440, it is determined whether the similarity is greater than or equal to a preset threshold. If it is determined that the similarity is less than the preset threshold, it indicates that the loudspeaker 240 is not working (step S450), and the flow ends. If it is determined that the similarity is greater than or equal to the preset threshold, it indicates that the loudspeaker 240 and microphone 250 are working normally (step S460), and the flow goes back to step S410.
Specifically, steps S415, S450, and S460 in FIG. 4 may represent different device statuses of the audio device 200 during the audio or video conference. The processing circuitry 210 of the audio device 200 of the local end may transmit a status signal to the cloud network 20 to indicate the current device status of the audio device 200, and the cloud network 20 may forward the status signal to each video device 100 in the video-conferencing system 10. Thus, each video device 100 in the video-conferencing system 10 may show a status icon of the audio device 200 of user A on the graphical user interface of the video-conferencing application running on each video device 100. If user A is speaking during the video conference, user A can know whether users B and C can hear what he or she is saying from the graphical user interface. For example, if the flow in FIG. 4 proceeds to step S415, the device status of the audio device 200 indicates that the microphone 250 is muted. If the flow in FIG. 4 proceeds to step S450, the device status of the audio device 200 indicates that the loudspeaker 240 is not working. If the flow in FIG. 4 proceeds to step S460, the device status of the audio device 200 indicates that the loudspeaker 240 and microphone 250 are working normally. In brief, during the audio or video conference, the processing circuitry 210 may repeatedly determine a first status of the loudspeaker 240 according to a relation between the played echo reference signal and the received acoustic signal, and transmit the first status signal indicating the first status of the loudspeaker 240 to the far end through the cloud network 20. For example, the relation between the played echo reference signal and the received acoustic signal may be represented using filter coefficients and reference filter coefficients of the AEC adaptive filter. In some other embodiments, the relation between the played echo reference signal and the received acoustic signal may be represented using some other coefficients determined from the played echo reference signal and the received acoustic signal.
FIGS. 5A-5B are diagrams showing the graphical user interface with icons of different device statuses of the audio device in accordance with an embodiment of the invention. Please refer to FIG. 2 , FIG. 4 , and FIGS. 5A-5B.
Assuming that users A, B, and C join a video conference, the video device 100 of user A may show a graphical user interface 500 that includes blocks 510, 520, and 530, as shown in FIG. 5A. For example, block 510 may contain the username 511 (e.g., user B), video screen 512, and blocks 513 and 514 of the audio device 200 of user B, where block 513 shows the status of the microphone 250 of the audio device 200 of user B, and block 514 shows the status of the loudspeaker 240 of the audio device 200 of user B. Block 520 may contain the username 521 (e.g., user C), video screen 522, and blocks 523 and 524 of the audio device 200 of user C, where block 523 shows the status of the microphone 250 of the audio device 200 of user C, and block 524 shows the status of the loudspeaker 240 of the audio device 200 of user C. Block 530 may show the video screen of user A (i.e., the local user).
In FIG. 5A, it is assumed that the loudspeakers 240 and microphones 250 of the audio devices 200 of users B and C are working normally, and thus blocks 513 and 523 may show a microphone pattern with a specific color (e.g., green), and the loudspeaker- status icons 514 and 524 may show a loudspeaker pattern with a specific color (e.g., green). Accordingly, user A can know that the loudspeakers 240 and microphones 250 of the audio devices 200 of users B and C are working normally via the microphone- status icons 513 and 523 and the loudspeaker- status icons 514 and 524.
Referring to FIG. 5B, if the audio device 200 of user B detects that the signal level of its microphone 250 is below the threshold, the audio device 200 of user B may send the first status signal of user B indicating that the microphone 250 is muted to the cloud network 20, and the audio device 200 of user A can receive the status signal from the cloud network 20. Thus, the video-conferencing application running on the video device 100 of user A may show a microphone pattern covered with a red-color X mark on block 513. Meanwhile, the audio device 200 of user B may determine that its loudspeaker 240 is working normally, and send the second status signal of user B indicating that the loudspeaker 240 is working normally to the cloud network 20. Thus, the audio device 200 of user A can receive the second status signal from the cloud network 20, and the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern in green color.
In addition, if the audio device 200 of user C detects that its loudspeaker 240 is not working using the flow described in FIG. 4 , the audio device 200 of user C may send the second status signal of user C indicating that the loudspeaker 240 is not working to the cloud network 20, and the audio device 200 of user A can receive the second status signal of user C from the cloud network 20, and the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern covered with a cross in red color. Meanwhile, if the audio device 200 of user C detects that the signal level of its microphone 250 is higher than the threshold, the audio device 200 of user C may determine that its microphone 250 works normally, and send the first status signal indicating that the microphone 250 is working normally to the cloud network 20. Thus, the audio device 200 of user A can receive the first status signal of user C from the cloud network, and the video-conferencing application running on the video device 100 of user A may show a loudspeaker pattern covered with a cross in red color.
Specifically, when user A is speaking during the audio conference, user A can view the icons in blocks 513-514 and 523-524 on the graphical user interface to know whether users B and C can hear what he or she said. Because the AEC process is a recursive finite-impulse response (FIR) filter, if any problem happens to the echo path or the AEC loop at certain time during the audio conference, the processing circuitry 210 of audio device 200 at the far end (e.g., audio devices of users B and C) may determine that its loudspeaker 240 and/or microphone 250 are not working, and the video device 100 of the local user (e.g., user A) can know the device status of the audio device 200 at the far end by viewing the icons in the corresponding blocks of the graphical user interface. Thus, the local user (e.g., user A) needs not to ask the question “did you hear me?” during the audio or video conference.
In view of the above, an audio device and a method of detecting a device status during an audio/video conference are provided, which are capable of detecting whether the loudspeaker or microphone of the audio device at the local end are working normally, and then providing the detected device status of the loudspeaker and microphone to other audio devices or video devices in the video-conferencing system. Accordingly, the user at the far end can know the device status of the loudspeaker and microphone of the audio device at the local end as well as the user at the local end can also know the device status of the loudspeaker and microphone of the audio device at the far end, thereby improving user experience during the audio or video conference.
While the invention has been described by way of example and in terms of the preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Claims (16)

What is claimed is:
1. An audio device, comprising:
processing circuitry, connected to a loudspeaker and a microphone, wherein the processing circuitry is configured to play an echo reference signal from a far end on the loudspeaker, and perform an acoustic echo cancellation (AEC) process using the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter,
wherein the processing circuitry repeatedly determines a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal, and transmits a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
2. The audio device as claimed in claim 1, wherein the relation between the played echo reference signal and the received acoustic signal is represented using a plurality of filter coefficients and a plurality of reference filter coefficients of the AEC adaptive filter.
3. The audio device as claimed in claim 2, wherein in response to the processing circuitry determining that a signal level of the microphone is lower than or equal to a threshold, the processing circuitry determines that the microphone is muted,
wherein in response to the processing circuitry determining that the signal level of the microphone is higher than the threshold, the processing circuitry determines a second status of the microphone is that the microphone is working normally, sends a second status signal indicating the second status of the microphone to the far end through the cloud network, obtains the filter coefficients from the AEC adaptive filter, and calculates similarity between the obtained filter coefficients and the reference filter coefficients.
4. The audio device as claimed in claim 3, wherein in response to the processing circuitry determining that the calculated similarity is lower than a preset threshold, the processing circuitry determines that the first status of the loudspeaker is that the loudspeaker is not working,
wherein in response to the processing circuitry determining that the calculated similarity is higher than or equal to the preset threshold, the processing circuitry determines that the first status of the loudspeaker is that the loudspeaker is working normally.
5. The audio device as claimed in claim 2, wherein the reference filter coefficients are calculated using the AEC adaptive filter by playing white noise and sweeping tones on the loudspeaker for a first predetermined period of time, and the calculated reference filter coefficients are pre-stored in a nonvolatile memory of the audio device during a process of manufacturing the audio device in a factory.
6. The audio device as claimed in claim 2, wherein the processing circuitry initializes the filter coefficients of the AEC adaptive filter to zero, and obtains the filter coefficients from the AEC adaptive filter at runtime as the reference filter coefficients by calculating an average of the filter coefficients of the AEC adaptive filter within a second predetermined period of time.
7. The audio device as claimed in claim 2, wherein the processing circuitry calculates cosine similarity between the filter coefficients and the reference filter coefficients as the similarity.
8. The audio device as claimed in claim 1, wherein the processing circuitry receives a third status signal and a fourth status signal respectively indicating a third status of a loudspeaker of another audio device and a fourth status of a microphone of said other audio device at the far end through the cloud network, and displays icons corresponding to the third status and the fourth status on a graphical user interface of a video-conferencing application running on a video device in which the audio device is disposed.
9. A method, for use in an audio device connected to a loudspeaker and a microphone, the method comprising:
playing an echo reference signal from a far end on the loudspeaker;
performing an acoustic echo cancellation (AEC) process on the echo reference signal and an acoustic signal received by the microphone using an AEC adaptive filter;
determining a first status of the loudspeaker according to a relation between the played echo reference signal and the received acoustic signal; and
transmitting a first status signal indicating the first status of the loudspeaker to the far end through a cloud network.
10. The method as claimed in claim 9, wherein the relation between the played echo reference signal and the received acoustic signal is represented using a plurality of filter coefficients and a plurality of reference filter coefficients of the AEC adaptive filter.
11. The method as claimed in claim 10, further comprising:
in response to determining that a signal level of the microphone is lower than or equal to a threshold, determining that a second status of the microphone is that the microphone is muted; and
in response to determining that the signal level of the microphone is higher than the threshold, performing the following steps:
determining that the second status of the microphone is that the microphone is working normally;
sending a second status signal indicating the second status of the microphone to the far end through the cloud network;
obtaining the filter coefficients from the AEC adaptive filter; and
calculating similarity between the obtained filter coefficients and the reference filter coefficients.
12. The method as claimed in claim 11, further comprising:
in response to determining that the calculated similarity is lower than a preset threshold, determining that the first status of the loudspeaker is that the loudspeaker is not working; and
in response to determining that the calculated similarity is higher than or equal to the preset threshold, determining that the first status of the loudspeaker is that the loudspeaker is working normally.
13. The method as claimed in claim 10, further comprising:
calculating the reference filter coefficients using the AEC adaptive filter by playing white noise and sweeping tones on the loudspeaker for a first predetermined period of time; and
pre-storing the calculated reference filter coefficients in a nonvolatile memory of the audio device during a process of manufacturing the audio device in a factory.
14. The method as claimed in claim 10, further comprising:
initializing the filter coefficients of the AEC adaptive filter to zero; and
obtaining the filter coefficients from the AEC adaptive filter at runtime as the reference filter coefficients by calculating an average of the filter coefficients of the AEC adaptive filter within a second predetermined period of time.
15. The method as claimed in claim 10, further comprising:
calculating cosine similarity between the filter coefficients and the reference filter coefficients as the similarity.
16. The method as claimed in claim 9, further comprising:
receiving a third status signal and a fourth status signal respectively indicating a third status of a loudspeaker of another audio device and a fourth status of a microphone of the another audio device at the far end through the cloud network; and
displaying icons corresponding to the third status and the fourth status on a graphical user interface of a video-conferencing application running on a video device in which the audio device is disposed.
US17/515,909 2021-11-01 2021-11-01 Audio device and method for detecting device status of audio device in audio/video conference Active 2042-06-25 US11863710B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/515,909 US11863710B2 (en) 2021-11-01 2021-11-01 Audio device and method for detecting device status of audio device in audio/video conference
CN202111405666.1A CN116074489A (en) 2021-11-01 2021-11-24 Method for detecting device status of audio device in audio/video conference and audio device
TW110144096A TWI797850B (en) 2021-11-01 2021-11-26 Audio device and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US17/515,909 US11863710B2 (en) 2021-11-01 2021-11-01 Audio device and method for detecting device status of audio device in audio/video conference

Publications (2)

Publication Number Publication Date
US20230133061A1 US20230133061A1 (en) 2023-05-04
US11863710B2 true US11863710B2 (en) 2024-01-02

Family

ID=86144725

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/515,909 Active 2042-06-25 US11863710B2 (en) 2021-11-01 2021-11-01 Audio device and method for detecting device status of audio device in audio/video conference

Country Status (3)

Country Link
US (1) US11863710B2 (en)
CN (1) CN116074489A (en)
TW (1) TWI797850B (en)

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003099B1 (en) * 2002-11-15 2006-02-21 Fortmedia, Inc. Small array microphone for acoustic echo cancellation and noise suppression
US7672445B1 (en) * 2002-11-15 2010-03-02 Fortemedia, Inc. Method and system for nonlinear echo suppression
US8085947B2 (en) 2006-05-10 2011-12-27 Nuance Communications, Inc. Multi-channel echo compensation system
US9916840B1 (en) 2016-12-06 2018-03-13 Amazon Technologies, Inc. Delay estimation for acoustic echo cancellation
CN104778950B (en) 2014-01-15 2018-03-27 华平信息技术股份有限公司 A kind of microphone signal delay compensation control method based on echo cancellor
US9967661B1 (en) 2016-02-09 2018-05-08 Amazon Technologies, Inc. Multichannel acoustic echo cancellation
US20190066654A1 (en) * 2016-02-02 2019-02-28 Dolby Laboratories Licensing Corporation Adaptive suppression for removing nuisance audio
TW201933335A (en) 2018-01-25 2019-08-16 南韓商三星電子股份有限公司 Application processor supporting low power echo cancellation, electronic device including the same and method of operating the same
CN110310654A (en) 2019-07-26 2019-10-08 歌尔科技有限公司 Echo cancel method and device, electronic equipment, readable storage medium storing program for executing
US10636435B1 (en) * 2018-12-22 2020-04-28 Microsemi Semiconductor (U.S.) Inc. Acoustic echo cancellation using low-frequency double talk detection
CN111128210A (en) 2018-10-30 2020-05-08 哈曼贝克自动系统股份有限公司 Audio signal processing with acoustic echo cancellation
US10650840B1 (en) * 2018-07-11 2020-05-12 Amazon Technologies, Inc. Echo latency estimation
CN111418011A (en) 2017-09-28 2020-07-14 搜诺思公司 Multi-channel acoustic echo cancellation
CN111640449A (en) 2020-06-09 2020-09-08 北京大米科技有限公司 Echo cancellation method, computer readable storage medium and electronic device
US10937441B1 (en) * 2019-01-04 2021-03-02 Amazon Technologies, Inc. Beam level based adaptive target selection
US20220053268A1 (en) * 2020-08-12 2022-02-17 Auzdsp Co., Ltd. Adaptive delay diversity filter and echo cancellation apparatus and method using the same
US11404073B1 (en) * 2018-12-13 2022-08-02 Amazon Technologies, Inc. Methods for detecting double-talk
US11451905B1 (en) * 2019-10-30 2022-09-20 Social Microphone, Inc. System and method for multi-channel acoustic echo and feedback compensation
US20220310106A1 (en) * 2021-03-29 2022-09-29 Semiconductor Components Industries, Llc Echo canceller with variable step-size control
US11552611B2 (en) * 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
US20230171346A1 (en) * 2020-04-15 2023-06-01 Hewlett-Packard Development Company, L.P. Double talk detectors

Patent Citations (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7003099B1 (en) * 2002-11-15 2006-02-21 Fortmedia, Inc. Small array microphone for acoustic echo cancellation and noise suppression
US7672445B1 (en) * 2002-11-15 2010-03-02 Fortemedia, Inc. Method and system for nonlinear echo suppression
US8085947B2 (en) 2006-05-10 2011-12-27 Nuance Communications, Inc. Multi-channel echo compensation system
JP5288723B2 (en) 2006-05-10 2013-09-11 ニュアンス コミュニケーションズ, インコーポレイテッド Multi-channel echo compensation
CN104778950B (en) 2014-01-15 2018-03-27 华平信息技术股份有限公司 A kind of microphone signal delay compensation control method based on echo cancellor
US20190066654A1 (en) * 2016-02-02 2019-02-28 Dolby Laboratories Licensing Corporation Adaptive suppression for removing nuisance audio
US9967661B1 (en) 2016-02-09 2018-05-08 Amazon Technologies, Inc. Multichannel acoustic echo cancellation
US9916840B1 (en) 2016-12-06 2018-03-13 Amazon Technologies, Inc. Delay estimation for acoustic echo cancellation
US11538451B2 (en) 2017-09-28 2022-12-27 Sonos, Inc. Multi-channel acoustic echo cancellation
CN111418011A (en) 2017-09-28 2020-07-14 搜诺思公司 Multi-channel acoustic echo cancellation
TW201933335A (en) 2018-01-25 2019-08-16 南韓商三星電子股份有限公司 Application processor supporting low power echo cancellation, electronic device including the same and method of operating the same
US11044368B2 (en) 2018-01-25 2021-06-22 Samsung Electronics Co., Ltd. Application processor supporting low power echo cancellation, electronic device including the same and method of operating the same
US10650840B1 (en) * 2018-07-11 2020-05-12 Amazon Technologies, Inc. Echo latency estimation
CN111128210A (en) 2018-10-30 2020-05-08 哈曼贝克自动系统股份有限公司 Audio signal processing with acoustic echo cancellation
US10979100B2 (en) 2018-10-30 2021-04-13 Harman Becker Automotive Systems Gmbh Audio signal processing with acoustic echo cancellation
US11404073B1 (en) * 2018-12-13 2022-08-02 Amazon Technologies, Inc. Methods for detecting double-talk
US10636435B1 (en) * 2018-12-22 2020-04-28 Microsemi Semiconductor (U.S.) Inc. Acoustic echo cancellation using low-frequency double talk detection
US10937441B1 (en) * 2019-01-04 2021-03-02 Amazon Technologies, Inc. Beam level based adaptive target selection
CN110310654A (en) 2019-07-26 2019-10-08 歌尔科技有限公司 Echo cancel method and device, electronic equipment, readable storage medium storing program for executing
US11451905B1 (en) * 2019-10-30 2022-09-20 Social Microphone, Inc. System and method for multi-channel acoustic echo and feedback compensation
US11552611B2 (en) * 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
US20230171346A1 (en) * 2020-04-15 2023-06-01 Hewlett-Packard Development Company, L.P. Double talk detectors
CN111640449A (en) 2020-06-09 2020-09-08 北京大米科技有限公司 Echo cancellation method, computer readable storage medium and electronic device
US20220053268A1 (en) * 2020-08-12 2022-02-17 Auzdsp Co., Ltd. Adaptive delay diversity filter and echo cancellation apparatus and method using the same
US20220310106A1 (en) * 2021-03-29 2022-09-29 Semiconductor Components Industries, Llc Echo canceller with variable step-size control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Chinese language office action dated Nov. 8, 2022, issued in application No. TW 110144096.

Also Published As

Publication number Publication date
CN116074489A (en) 2023-05-05
US20230133061A1 (en) 2023-05-04
TW202320059A (en) 2023-05-16
TWI797850B (en) 2023-04-01

Similar Documents

Publication Publication Date Title
US20230164274A1 (en) Post-mixing acoustic echo cancellation systems and methods
US8842851B2 (en) Audio source localization system and method
US8503669B2 (en) Integrated latency detection and echo cancellation
US9111543B2 (en) Processing signals
CN105794189B (en) Device and method for echo cancellor
EP2987316B1 (en) Echo cancellation
GB2495472B (en) Processing audio signals
US20090253418A1 (en) System for conference call and corresponding devices, method and program products
US20090046866A1 (en) Apparatus capable of performing acoustic echo cancellation and a method thereof
US20130148821A1 (en) Processing audio signals
CN106663447B (en) Audio system with noise interference suppression
JP6903884B2 (en) Signal processing equipment, programs and methods, and communication equipment
US20210020188A1 (en) Echo Cancellation Using A Subset of Multiple Microphones As Reference Channels
CN105324981B (en) Method, equipment, medium and the device of echo suppressing
US9508357B1 (en) System and method of optimizing a beamformer for echo control
CN106961509B (en) Call parameter processing method and device and electronic equipment
US20150086006A1 (en) Echo suppressor using past echo path characteristics for updating
US11902758B2 (en) Method of compensating a processed audio signal
CN104871520A (en) Echo suppression
US11741984B2 (en) Method and apparatus and telephonic system for acoustic scene conversion
US11863710B2 (en) Audio device and method for detecting device status of audio device in audio/video conference
US10366701B1 (en) Adaptive multi-microphone beamforming
CN102970638B (en) Processing signals
US10389325B1 (en) Automatic microphone equalization
CN111292760B (en) Sounding state detection method and user equipment

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: MEDIATEK INC., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LEI, SHAW-MIN;CHENG, YIOU-WEN;SUN, LIANG-CHE;REEL/FRAME:058178/0140

Effective date: 20211108

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE